The Modern Records Centre at the University of Warwick have adopted an iterative approach to their digital preservation architecture with a focus on trialing new tools and workflows on a small budget.
The Modern Records Centre at the University of Warwick has been engaged in Digital Preservation activities since 2013. We have made steady if slow progress since then. Some workflows especially around digitisation were established but little progress was made with born digital materials on physical media (eg USB drives, portable hard drives and obsolescent media such as floppy disks).
We started by identifying what we might need to process these materials. Although we don’t have a specific budget for digital preservation software and equipment we can make small purchases, which means that modest commercial software solutions are a possibility. However when starting out we wanted to identify software which bridged the gaps in the workflow – so we did have to experiment to build up our workflow first. One tool we were already using was DROID which is the open source file format identification tool. Developed and supported by The National Archives, it is ‘designed to meet the fundamental requirement of any digital repository to be able to identify the precise format of all stored digital objects, and to link that identification to a central registry of technical information about that format and its dependencies’. Happily DROID also performs a number of other functions such as creating a file manifest (a list of the files) and checksums so all in all it’s a great piece of software to start with.
Decision-making process for the tools used in your framework
We began by auditing our collections to see what media we had (as far as was possible) and we identified a small number of 3.5 inch disks which included material likely to be of potentially significant research interest. These were prioritised as the hardware needed to access them is becoming increasingly fragile (we don’t have any means of accessing the even smaller number of 5.25” disks for example) so we began by imaging and extracting the content from approximately 20 floppy disks from the collection of Eric Hobsbawm, the noted Marxist historian.
When looking for tools we use sources such as COPTR and the BloggERS blogs and countless other blogs and articles which we come across. A large part of digital preservation work is Continuing Professional Development – keeping up to date with systems, tools and approaches which are out there.
The tool we started with was BitCurator, a suite of open source software tools for digital forensic work developed jointly by the University of North Carolina and the Maryland Institute for Technology. It was very attractive because it has excellent documentation and training materials and also because it was created by, with and for archivists and designed to integrate with library and archive systems. The software tools include support for triaging materials in the pre-imaging stage, disk imaging, analysis and reporting, sensitivity reviewing and the export of technical metadata. BitCurator runs on a Linux operating system which is itself open source (as opposed to the proprietary Microsoft Windows or Apple MacOS). We decided to install it on a virtual machine running on a windows operating system. We are also considering the possibility of getting a workstation which runs a Linux operating system which would improve performance for BitCurator and also allow us to experiment with other open source tools which we know are out there.
BitCurator is a powerful suite of tools but using it with the 1990s digital materials it felt a bit like overkill – some of the reporting features, for example the sensitivity reviews, were not suited or relevant to the kind of material we were dealing with. We looked around for alternatives and decided to try FTK Imager which is a proprietary but free software for forensically imaging disks. The outputs are using open standards and the software package is easy to get started with. For this scenario we found that FTK Imager was the most efficient tool but clearly BitCurator offers a great deal and I am looking forward to trying it on more recent digital deposits which involves much greater quantities of records and where sensitivity reviewing is of greater relevance.
Once the contents of the disk has been extracted using the chosen software then the files need to be appraised. BitCurator offers some support with this as it includes LibreOffice – open source office tools which was really useful for us when working with Hobsbawm’s papers. As already mentioned DROID, which is often used purely as a file format identification tool, also performs a number of other functions and when used together with the another open source tool developed by the National Archives – the CSV validator – it can be used for identifying duplicate files – an important part of the appraisal process.
Once the collection is ready for transfer to the preservation system we use another open source tool called Bagger, which was developed by the Library of Congress to support the Bagit specification. This enables a method of transferring data in such a way that you can easily check if any corruption or deletion has occurred during transfer.
Rationale for a predominantly open-source approach
We support the use of open source solutions where possible as it means the software and outputs are openly documented so that in the future it will be possible to recreate and understand fully the actions we took on our data now. For lots of reasons it is not always possible to use open source solutions and where we are not able to we still aim to document everything as fully as possible and look for solutions where the data can be extracted in a way which as far as possible support re-use and interoperability. Digital preservation is not an activity which takes place in isolation – it is a community effort and open source solutions fit well into this environment.
Manual processes you would like to automate but haven’t been able to yet
Our processes and workflows are very manual and based on what is available to us with a small budget. It also reflects the fact that we don’t have a systems developer to work with or dedicated IT support which means we are not easily able to integrate with other systems. It also means even simple software installation can take a long time as it is rarely a department-wide priority. Looking for a more dedicated IT support is something we are going to investigate. Overall we intend to take an iterative approach to the development of our architecture – looking out for new tools and trying new things – and hoping to integrate and improve them over time.
Contact the archive
Rachel MacGregor (Digital Preservation Officer): Rachel.MacGregor@warwick.ac.uk