Research and collaboration

The digital preservation team is currently undertaking research in the following areas:

Safeguarding the nation’s digital memory

The nation’s digital heritage is rich, complex and fragile. It is under threat from rapidly evolving technology, outdated policies and a skills gap across the archives sector. To preserve this heritage for future generations, we must understand and navigate a vast and ever-shifting risk landscape. This is a challenge which no single archive is currently equipped to address.

We propose a collaborative approach to managing digital preservation risk, bringing established statistical risk management methods into the digital heritage sphere. Project participants will create a structured evidence base, pooling our collective experience to map and explain an interconnected network of risk events, actions and impact on heritage. This will build a holistic understanding of risk, enabling archivists to prioritise threats and choose the most effective actions to combat them.

This approach will help us capture, analyse and share our experience and reasoning to improve transparency and accountability. It can flex to accommodate the diverse contexts and priorities of our organisations, communities and collections. The project will improve access to robust methods and tools and our events will raise awareness of the approach, to help archives make better decisions for the future of our digital heritage.

We are working with five partners across the archives sector in England, academic partners from the Applied Statistics & Risk Unit (AS&RU) at the University of Warwick, and the Digital Preservation Coalition will provide independent evaluation and support for training events. See the project page for more detailed information and background.

This project is support by a grant of £93,500 from the National Lottery Heritage Fund, and AS&RU’s postdoctoral research assistant is supported by an EPSRC Impact Acceleration Account award.

AI for Digital Selection

Digital transformation in government has brought an increase in the scale and variety of public records along with a reduced emphasis on organising and structuring data. Traditional processes designed for paper records cannot handle the volume, diversity, complexity and distributed nature of departmental digital records.

The project explored the potential of Artificial Intelligence (AI) tools to assist with this challenge. Five AI vendors applied their tools to classify a dataset supplied by The National Archives. The tools and platforms evaluated were Adlib Elevate, Amazon Web Services, Microsoft Azure, InSight by Iron Mountain, and Records365 by RecordPoint.

See the project page for more detailed information and background.

PRONOM – file format research

PRONOM enables digital archivists, records managers and anyone using the tool to find out what files they have and in which formats. The digital preservation department carries out file format research in order to make sure that PRONOM covers the most important formats for The National Archives and the digital preservation community. The first priority is being able to identify all file formats transferred to us by UK Government departments. The second driving force of our research comes from the digital preservation community. We actively encourage organisations or individuals dealing with digital records to help provide information on the formats they encounter.

Contributors include The British Library, National Library of New Zealand, The Church of Jesus Christ of Latter-day Saints and many more.

See the wide spread of regular contributors from around the world who have played a vital part in expanding the coverage of the signatures included in PRONOM.

Plain text file format identification

The aim of this project is to support DROID,  The National Archives’ file format identification software. At present, DROID recognises binary file formats only, but we aim to  find a way of identifying the type of text files from their contents. An AI based methodology was used to build the prototype, which recognised five file types (.py, .java, .txt, .tsv and .csv) successfully, using prediction based classification to recognise file formats.

Two blogs – How to correctly identify the file type of a text file from its contents and Motivation to undertake file format identification research for plain text files – provide further insight.

Earlier research projects

For research projects carried out before 2020 see the UK Gsovernment web archive.