You’d be forgiven for thinking that an astronomer’s job consists of not much more than going outside at night, looking through a telescope, and thinking just how lovely the stars are. For many professional astronomers that’s how their interest began, but like any profession, there’s a lot more to it that we might think. Modern astronomy is a complex and varied affair; there are countless sub-fields in astronomy, and with the advancement of technology, astronomers now have access to practically endless data. As you can imagine, computer play a big part in astronomy, and indeed, most of an astronomer’s time will be spent in front of a computer rather than at a telescope.

I was playing with date from the Sloan Digital Sky Survey recently, and I wanted to write a little piece to show how computers can be used to easily and efficiently extract and analyse astronomical data, and how that data is then used to make observations about the Universe.

The Sloan Digital Sky Survey (SDSS) is a massive project that uses a big telescope in the United States to gather and prepares data, and then store it in a database for people to access and use. Its main aim is to get images and spectral information of galaxies, including colours, brightness, distance, and so on. SDSS has been running since 2000 and now has data covering 35% of the sky. This vast database includes half a billion photometric observations (colours, photos, etc) and a million spectroscopic observations (chemical compositions, etc).

In the olden days it would have taken astronomers years or decades to go through data even a fraction of the size of the current SDSS database, but thanks to modern computers, a single user can potentially scan the entire database in a matter of minutes.

If you work in IT you have probably heard of – or even use – SQL to search databases. If not, SQL allows a person or software to access a database and pull out the required information quickly and efficiently. For example, say a nationwide company wanted to get the names of all of its customers over the age of 25 in Kildare. They could go through the list one-by-one, but with SQL the command might be:

SELECT Name FROM CustomerList WHERE County = 'Kildare' AND Age > 25

Within moments the company will have a full list of names. SDSS uses the same approach, but instead of things like ‘Name’ and ‘County’ it might have ‘ObjectType’ and ‘Redshift’.

A quasar

I used SDSS to get a rough idea of how quasars are distributed through the Universe. A quasar is a strange object located at the centre of a galaxy, and is believed to be caused by a supermassive black hole. A quasar (or QSO: quasi-stellar object) is extremely bright and energetic, often blasting ‘beams’ of matter and radiation out into intergalactic space. The thing is, there doesn’t seem to be many quasars in modern galaxies. That is, they seem to all have existed sometime in the past.

This is where redshift comes in. Redshift is basically a property of a galaxy that can be used to calculate its distance. The further away a galaxy is, the higher its redshift. But as well as that, it can also be used to calculate the age of the galaxy. Think of this: it takes light eight minutes to get from the Sun to Earth, 4.3 years from the next nearest star to the Solar System, 2.5 million years from the Andromeda Galaxy to us. So, in the case of the Andromeda Galaxy, we’re seeing it as it was 2.5 million years ago. For more distant galaxies, it could be ten million years ago, half a billion years ago, maybe 10 billion years ago.

In essence, the higher the redshift of a galaxy, the further away from us it is, and the younger it looks.

When astronomers go looking for quasars, they don’t find any nearby (or currently active), but the further out they look, the more they find, at distances corresponding to much younger galaxies.

So with that in mind, I took a look at the data to find out a little more about quasars.

I decided to break up the search into sections of redshift – 0 to 0.5, 0.5 to 1, 1 to 1.5 and so on – to find out how many quasars exist in those distance ranges.

The SQL queries I used were all variations of:

SELECT COUNT(*) FROM SpecPhoto WHERE specclass=2 AND z BETWEEN 1 AND 1.5 AND zconf > .35

This command simply counts the number of quasars with a redshift between 1 and 1.5, where the redshift measurement has a confidence level of over 35%. ‘specclass=2’ is how one looks for quasars using SDSS in this instance.

I repeated the queries again, using a specclass corresponding to galaxies. I did this as I knew quasars existed inside galaxies, and wondered if there was any relation.

Here’s what I got:

qsoThe orange line is the number of quasars as a function of distance, and the yellow line is the number of galaxies. The left side of the graph corresponds to ‘here and now’; that is, objects that are nearby in the Universe where we see them close to their actual age. As you move to the right, higher redshift (or ‘z’) means the numbers of objects that are more distant and therefore younger.

So what does this graph tell us? Well first of all, you’ll notice that the galaxy line goes off the chart. The query resulted in millions of galaxies near to us, with only around 15,000 quasars. This shows that there are very few quasars in the local Universe compared to galaxies.

However, as we move back in time, we see that there were fewer and fewer galaxies, but that there was a big spike in the number of quasars at around a redshift of 1.5 to 2. This corresponds to a time when the Universe was between 3.5 and 4.5 billion years old (it’s currently almost 14 billion years old). We see that quasars started to form in large numbers when the Universe was 2 billion years old, at a redshift of 3, when galaxies were very, very young. The graph also shows that the numbers of quasars is gradually dropping as we get closer to modern times (redshift of 0), meaning that the black holes powering the older quasars seem to be ‘switching off’.

As we can see, there were many more quasars than ‘normal’ galaxies in the early Universe: it’s therefore not unreasonable to think that quasars turn into those normal galaxies once their black holes lose power. Indeed, other observations suggest that most spiral galaxies (like our Milky Way) have supermassive black holes at their cores, which could have once powered a quasar.

The surge in galaxies occurred at around a redshift of 1, or when the Universe was about 8 billion years old. But if quasars turn into galaxies, then how come there’s so many more galaxies than quasars? Surely if there are, say, a million galaxies now, there should be roughly a million quasars at higher redshifts?

That’s a great observation, but one that we can explain when we think about it. Remember that galaxies close to a redshift of 0 are ‘modern’, middle-aged galaxies. We see them pretty much as they are now. So, if those older, nearby galaxies have already been through their quasar phase several billion years ago, we don’t expect to see any as quasars, as we’re seeing them as they looked maybe a hundred million years ago.

I wrote this post to give you an idea of how a modern astronomer works with computers, using modern database technology to work on massive datasets to extract scientific data. I also hope I showed how astronomers then interpret the results to glean information about the Universe.

If you fancy playing around with the Sloan Digital Sky Survey, they have an excellent tutorial section that lets you play with the database to learn how it works, and to let you find cool and interesting objects. You can check it out by visiting the SDSS SkyServer Basic Projects page.