View PI's Perspective Archive »

The PI's Perspective


The Guest Perspective: Data for the Next Generations

November 7, 2007

By Joe Peterson

New Horizons is about to enter hibernation for its long trip to Pluto. It will be deep in slumber, but not forgotten, and we’ve taken a crucial step to ensure that its precious data will never be forgotten either. All planetary missions undergo a process called "data archiving," which protects the information against the ravages of time.

These archives have proven their value -- for example, scientists are still using data archives from the Voyager missions of the 1970s. The concept of archiving is simple, but to do it right, there is much to be considered.


The New Horizons Science Operations Center at Southwest Research Institute, Boulder, Colorado. Click on image to view larger version.
Click here to learn more about New Horizons science operations.

Preserving data for the future is a challenge for everyone. Many of us have old floppy disks containing documents we'd like to be able to use at some point, but what will happen when we try to load those documents in 2010, especially if some of files were written with programs from 1995? Most of today’s computers don't have a floppy drive, and even if they did, we cannot be sure the old disks have not degraded or that modern software can still read the files they contain.

The people in charge of NASA’s Planetary Data System (PDS), where archives of data from missions like New Horizons are kept, worry about exactly this kind of thing every day. The periods over which scientists will want to study spacecraft data are very long. Therefore, the PDS wants to guarantee that data gathered today will last at least 50 or 100 years. No one knows what computers will be like that far into the future, but at PDS, it is routine to think ahead -- way ahead -- and they strive to make sure nothing will prevent long-term use of the planetary data entrusted to them.
 
New Horizons gathered some absolutely breathtaking images and other data during its Jupiter encounter earlier this year, but we cannot know exactly how this information will be valuable to a scientist 100 years from now. Archiving it properly is a real challenge, one that the New Horizons Science Operations Center has recently undertaken, resulting in our first PDS archive, a kind of "time capsule" that is designed to last into the distant future. All told, our archives include about 13,000 data files – including images and other data – adding up to about 54 gigabytes.

Useful Records

Creating a truly useful archive depends on several things. First of all, the physical media has to remain readable over many years. It would be a tragedy to take the archive off the shelf in 2050 and find that the data has all "flaked" off of the disks. Careful choices need to be made. Floppy disks, for many reasons, would be a poor choice - their capacity is low and they are too fragile, easily destroyed by magnets, fingerprints, dust, temperature extremes, and the like. The current "standard" media is CD-ROMs or DVD data discs. If you really want to be safe, CDs are a good bet, since they have stood the test of time very well so far. But when there is a lot of data, DVDs are a valid choice. The PDS can say "no" to any medium it thinks might be risky, as the consequences of lost data are dire.

 

The New Horizons data collection includes hundreds of images from the spacecraft's flight through the Jupiter system, including this Long Range Reconnaissance Imager (LORRI) photo of the moon Io peeking out from behind the giant planet. More photos are available in the Science Operations Center's LORRI gallery.


Next, the format of the data is important. In general, transparent, non-proprietary formats are best. For example, PDS does not consider word processing programs (such as Microsoft Word) to be a "safe" format. The specifications of the formats are not openly published, so if the company that sells and supports the software were to cease to exist in 100 years, or even if the version of program used to write the file became archaic, it would be difficult to open such files. So if you include a document in Word, you had better also include it in plain text; it is assumed that the scientists of the future will at least be able to make sense of regular ASCII bytes. As for visual elements such as figures and illustrations, the PDS allows these to be included as individual images. As you can see, creating such a "future proof" archive is not easy, and it takes a lot of work. The PDS even prohibits using conventions like very long filenames or those with mixed upper and lower-case letters even though today's computers can handle these. Part of ensuring compatibility is adhering to standards that have existed for quite some time.
 
Finally, an archive must have good documentation. We have to assume that future scientists will have no prior knowledge about the archive's contents, so to understand what a spacecraft sent back to Earth, it is vital to know how the spacecraft and its instruments worked, how the data was calibrated, and even what the data means. It is more than likely that the people who worked with the data when it was gathered (and therefore knew it very well) will be long gone, so the archive must stand on its own. This is one reason why the PDS insists that the archives be reviewed carefully by people who are independent of the mission. We at the New Horizons Science Operations Center have just gone through this review process, and I can tell you that it is absolutely rigorous, ensuring the data is usable by the widest audience possible.
 
Checking on Changes

Thanks to our PDS archive – coming soon to the PDS Web site – the New Horizons data will be available far into the future. Over the years, humans will surely want to look at what New Horizons has seen, prompted by either an expanded understanding of how our solar system works or simply the desire to see how things have changed in our little region of the universe.

Joe Peterson is manager of the New Horizons Science Operations Center at Southwest Research Institute, Boulder, Colo.