Research and collaboration

The digital preservation team is currently undertaking research in the following areas:

Safeguarding the nation’s digital memory

The nation’s digital heritage is rich, complex and fragile. It is under threat from rapidly evolving technology, outdated policies and a skills gap across the archives sector. To preserve this heritage for future generations, we must understand and navigate a vast and ever-shifting risk landscape. This is a challenge which no single archive is currently equipped to address.

We propose a collaborative approach to managing digital preservation risk, bringing established statistical risk management methods into the digital heritage sphere. Project participants will create a structured evidence base, pooling our collective experience to map and explain an interconnected network of risk events, actions and impact on heritage. This will build a holistic understanding of risk, enabling archivists to prioritise threats and choose the most effective actions to combat them.

This approach will help us capture, analyse and share our experience and reasoning to improve transparency and accountability. It can flex to accommodate the diverse contexts and priorities of our organisations, communities and collections. The project will improve access to robust methods and tools and our events will raise awareness of the approach, to help archives make better decisions for the future of our digital heritage.

We are working with five partners across the archives sector in England, academic partners from the Applied Statistics & Risk Unit (AS&RU) at the University of Warwick, and the Digital Preservation Coalition will provide independent evaluation and support for training events. See the project page for more detailed information and background.

This project is support by a grant of £93,500 from the National Lottery Heritage Fund, and AS&RU’s postdoctoral research assistant is supported by an EPSRC Impact Acceleration Account award.

AI for Digital Selection

The AI for Digital Selection project aims to learn more about existing AI tools that could be used to carry out the appraisal and selection of the ‘digital heap’ of documents, emails, datasets and other types of information held across government.

We are carrying out a review of the available tools and will identify between three and five for in-depth testing with a set of our own corporate records. We will find out how the tools fare in identifying which records should be selected for permanent preservation and those that should not.

By the end of the project we will understand the outputs of the tools, how they work and how they have been trained so we can assess their value. We will learn what metadata should be captured about the AI tools and processes themselves, to help the public understand and use public records selected via these methods. In addition we will be able to help government departments in using AI for selection including the ability to identify where these techniques can be incorporated into the process or workflow of selecting digital records for transfer to The National Archives.

PRONOM – file format research

PRONOM enables digital archivists, records managers and anyone using the tool to find out what files they have and in which formats. The digital preservation department carries out file format research in order to make sure that PRONOM covers the most important formats for The National Archives and the digital preservation community. The first priority is being able to identify all file formats transferred to us by UK Government departments. The second driving force of our research comes from the digital preservation community. We actively encourage organisations or individuals dealing with digital records to help provide information on the formats they encounter.

Contributors include The British Library, National Library of New Zealand, The Church of Jesus Christ of Latter-day Saints and many more.

See the wide spread of regular contributors from around the world who have played a vital part in expanding the coverage of the signatures included in PRONOM.

Plain text file format identification

The aim of this project is to support DROID,  The National Archives’ file format identification software. At present, DROID recognises binary file formats only, but we aim to  find a way of identifying the type of text files from their contents. An AI based methodology was used to build the prototype, which recognised five file types (.py, .java, .txt, .tsv and .csv) successfully, using prediction based classification to recognise file formats.

Two blogs – How to correctly identify the file type of a text file from its contents and Motivation to undertake file format identification research for plain text files – provide further insight.

Earlier research projects

For research projects carried out before 2020 see the UK Gsovernment web archive.