Quite a lot, according to experts. For one, what we believe to be permanent is not. Digital storage systems can become unreadable after just three to five years. Librarians and archivists race to copy things into newer formats. But entropy is always there, waiting in the wings. “Our professions and our people often try to extend normal lifespans as much as possible through a variety of techniques, but that still holds the tide,” says Joseph Janes, an associate professor at the University of Washington Information School.
To complicate matters further, archivists now have to deal with an unprecedented flood of information. In the past, materials were scarce and storage space was limited. “Now we have the opposite problem,” says Janes. “Everything is constantly recorded.”
In principle, this could correct a historical injustice. For centuries, countless people did not have the right culture, gender, or socioeconomic class for their knowledge or work to be discovered, valued, or preserved. But the sheer scale of the digital world now presents us with a unique challenge. According to an estimate last year by market research firm IDC, the amount of data that companies, governments and individuals will generate in the next few years will be double all digital data generated since the dawn of the computer age.
Entire schools within some universities are working to find better approaches to storing the data under their roof. The Data and Service Center for Humanities at the University of Basel, for example, has developed a software platform called Knora not only to archive the diverse data from humanities work, but also to ensure that people can read and use them in the future. And yet the process is lengthy.
“We can’t salvage everything … but that’s no reason not to do what we can.”
“You make educated guesses and hope for the best, but there are records that get lost because nobody knew they would be useful,” said Andrea Ogier, associate dean and director of data services at Virginia Tech’s campus libraries.
There are never enough people or money to do all the work necessary – and formats are constantly changing and multiplying. “How do we best allocate resources to preserve things? Because the budgets are just that big,” says Janes. “In some cases, that means things are stored or stored but just there, uncataloged and unprocessed, and therefore all but impossible to find or access.” In some cases, archivists end up rejecting new collections.
The formats used to store data are themselves volatile. NASA has tucked away about 170 tapes of lunar dust data collected during the Apollo era. When researchers began using the tapes in the mid-2000s, they could not find anyone with the 1960s IBM 729 Mark 5 machine needed to read them. With help, the team eventually tracked down one in its raw state in storage at the Australian Computer Museum. Volunteers helped overhaul the machine.
Software also has a shelf life. Ogier recalls trying to examine an old Quattro Pro spreadsheet file only to find that there was no readily available software that could read it.