PDF version of the article

Digital obsolescence is a threat to preserving the records of the 21st century. Information systems of the future will need to be preservation aware by design to ensure the long-term access to and integrity of valuable economic, cultural and intellectual assets.

Clay tablets appeared in Mesopotamia around 2400 B.C. for writing that was meant to last, as opposed to writing on more perishable material such as papyrus. And indeed such tablets did last even beyond the expectations of their authors. Collections of tablets impressed with a stylus in soft clay and then baked in the sun have reached us today, in various languages albeit in different states of conservation. They cover a multiplicity of topics, such as business records, poetry, prayers, hymns, history, divination and science.

Among the most ancient texts recorded on tablets, one may mention the famous Epic of Gilgamesh whose oldest Babylonian fragments date back from the 18th century B.C. or Sumerian philosophical disputations like the Debate between Bird and Fish (2100 B.C.) where the fish tells the bird: “You are shameless: you fill the courtyard with your droppings” while the bird answers “But I am the beautiful and clever Bird! Fine artistry went into my adornment…”

These tablets are striking examples of tangible cultural heritage; where the legacy of physical artefacts inherited from past generations conveys artistic, cultural, religious, documental or aesthetic meaning often produced within a long-gone society.A tangible heritage is one that can be stored and physically touched. Accessing the meaning behind physical items inherited from the past requires the expertise to recover, translate, compare, interpret and contextualize the text encrypted in the clay tablets or any other ancient artefact such as the infamous Rosetta stone.

The collections of ancient text that still remain to be deciphered are actually huge. The task is so complex and time consuming, that crowdsourcing projects are now emerging, to support experts in this daunting endeavour. For instance, thanks to citizen science projects such as Ancient Lives, non-expert volunteers are invited to help catalogue and transcribe ancient papyri via the Web. You are welcome to give it a try but be warned, it is not easy! 

The tangible traces of ancient civilizations that have reached us today indicate which assets were considered of highest economic or symbolic value at the time, deserving the effort to be engraved on more demanding supports. Their remarkable preservation, even if not originally intended, is a natural side effect of the consideration they received at the time of their creation.

But what about preserving intellectual assets created today? For the most part these exist in digital form, such as documents, emails, images, videos, sound recordings, computer graphics, websites, sensor data, scientific measurements, medical or legal records. In this digital world, there is no direct equivalent of the tangible objects of the past, such as stone tablets or books. Physically stored data cannot alone be considered as a tangible item in the sense of traditional preservation, since digital content cannot be accessed, and to a large extent, does not exist, without the mediation of a complex computer environment including, beyond physical storage proper, various combinations of hardware and software. Moreover, computer environments change at a rapid pace and are quickly obsolete, making any digital content that relies on a specific environment at a given point in time at risk of soon becoming inaccessible and hence lost. This process is known as digital obsolescence.

Digital obsolescence is a greater threat to the preservation of digital content than the hazards associated with traditional paper documents such as acid, mould and looting combined. Digital obsolescence happens quickly, is pervasive and hard to control. Obsolescence threatens all aspects of the rendering chain, from bits in storage that degrades or for which readers are no longer available, to data formats with outdated documentation or for which the rendering software has disappeared, to software that runs on dead or rare devices and retired operating systems.

Digital preservation is therefore not just about preserving well identified tangible objects, as in the good old days, when maintaining the physical integrity of books, newspapers, manuscripts, pictures, etc. could be achieved with reasonable effort and care, and at a manageable pace. One could always store boxes of documents on long shelves and timely assess, organise and protect the acquired collections. This is no longer possible with digital content. There is little hope to secure durable access without taking specific actions before digital obsolescence comes into play. Soon after digital content has been produced, one must take irrevocable decisions about whether it should be sent to the future and in which form. Otherwise it will be lost forever.

As digital content exponentially grows and digital obsolescence accelerates, preservation will become a major concern for organisations with large data holdings and the need to preserve the critical knowledge contained within. It is anticipated that digital preservation will at some point be seamlessly integrated into the information lifecycle: information systems of the future will be preservation-aware by design. To make this happen, the digital objects of the future will not be treated simply as bit streams associated with adequate hardware and software at their time of creation. They will become part of a rich information ecosystem self-descriptive of all that is essential to know about itself: its purpose, intended behaviour, the context within which it was created, the user experience and more.To make the descriptions of such ecosystems sustainable, they will be infrastructure-independent, with a strong focus on capturing their temporal evolution and authenticity.
Eventually these descriptions will travel into the future, where yet unknown information systems will need to make sense of them, irrespective of the hardware and software in place at the time when the content was initially created and used. Future generations will then reconstruct not the original digital objects, but new ones. Ones that will convey the essential properties of the originals albeit rendered in a significantly different mode.

This is a major shift in preservation: one no longer preserves tangible physical objects per se, but views or abstract representations of such objects that can be reconstructed in an unpredictable technological future. This shift represents a major challenge for the long term preservation of modern cultural, intellectual and economic assets, the consequences of which are not yet widely recognized. Eventually, digital preservation will become a natural and transparent side effect of the proper governance of information and data.

Digital Preservation at Xerox Research Centre Europe
The Xerox Research Centre Europe has been active in the field of digital preservation for over a decade. In the European projects VIKEF, SHAMAN and more recently PERICLES, research addresses various aspects of data and process modelling: metadata creation for collections of digitized books and documents, XML-based document processes, co-design of printed circuit boards, ecosystems of linked IT resources. Beyond digital preservation proper, this research addresses broad challenges that are relevant to Xerox services business, such as paper to digital transition, process modelling and data governance.

About the author:
Jean-Pierre Chanod is senior scientist and area manager for the Enterprise Architecture group at the Xerox Research Centre Europe. His main research interests include natural language understanding, document processing, data and process modelling.