Everyone relies on the integrity of digital information, from the citizen to heads of government. It is essential that this information is preserved for future generations, just as traditional records have been preserved for us on paper and parchment.
Our approach to digital preservation
Digital preservation encompasses the long term storage of and continued access to digital records. The digital records preserved by The National Archives fall into three broad categories:
- ‘Born-digital’ – records produced in digital form such as spreadsheets, emails, websites, databases and video
- ‘Digitised records’ – scanned copies of paper or parchment records that are accessioned in place of the paper record
- ‘Digital surrogates’ – paper or parchment original is retained as the record and a digital image is used as an access copy
The records are ingested into our digital preservation system, the Digital Records Infrastructure (DRI). Records currently ingested into DRI include First World War diaries, Second World War Home Guard service records, UK Supreme Court video recordings and records from The Al Sweady and Leveson Inquiries.
The National Archives undertakes a parsimonious approach to digital preservation, focusing on preserving the original files. This is achieved by making sure they are virus free, undertaking fixity checks to ensure the integrity of the files over time and by “knowing what we’ve got“ (identifying all the file formats in our collections). We achieve file format identification by using the tools described in the section below.
We are increasingly turning our minds towards the issues of scale and speed of transfer, and delivery of digital records to end users, which we believe to be the current challenges of digital preservation at The National Archives.
Our approach to digital preservation is summarised in the paper below:
Parsimonious preservation paper (PDF, 0.04Mb)
Digital preservation tools developed by The National Archives
PRONOM is an online database containing details of more than 1,300 different digital file formats. Along with DROID, our file format identification tool (which utilises the signatures within the database), PRONOM enables digital archivists, records managers and anyone using the tool to find out what files they have and in which formats. DROID also outputs other useful information such as the last modified date and the size of the files scanned.
The DROID tool can scan a computer, hard drive or collection of digital files and identify file formats either through its file extension (for example .doc for Word files) or by matching the file’s internal signature with specific entries in the PRONOM database. Internal signatures are based on recurring patterns in the byte sequence in the headers and footers of file formats and as such are a far more accurate way of identifying file formats, as extensions can be easily changed, deleted or used by different formats or versions.
Currently around 70 percent of entries in the PRONOM database have internal signatures. The more signatures there are in the database, the more accurate the DROID tool is at identifying files, and the greater its use to the digital preservation community. In light of this we endeavor to keep PRONOM updated based on our own incoming digital collections, and file format research provided by the wider digital preservation community.
The CSV validator is a CSV validation and reporting tool. The National Archives receives metadata files alongside new digital records, currently this is received in CSV format. The CSV validator takes a CSV Schema file and a CSV file. It will verify that the CSV Schema itself is syntactically correct and then assert that each rule in the CSV Schema is upheld in the CSV file.