When Hull was named as the 2016 City of Culture, the University of Hull used this as the catalyst to develop a full digital archive solution.
Description of your digital preservation architecture
Our architecture draws together several systems:
- Box is a proprietary cloud based storage system. The University already has institutional access to it. We use it as an initial storage space for digital records and related metadata. Sharing a folder of records starts the process for transfer and ingest to Archivematica.
- Hull Synchroniser is a specially written open source application which manages the integrations of the other parts of the system.
- Archivematica is an open source application designed for the long-term preservation of digital content. It provides a suite of micro-services such as file identification, package extraction, transcription and normalisation for preservation and access.
- Hyrax is a repository front-end (i.e. the bit that you can see and interact with) based on the Samvera repository framework. It allows us to manage our digital objects with a fine degree of control. Hyrax and Samvera are open source. The university has long been a key part of the Samvera (previously known as Hydra) Community.
- We use Calm for cataloguing. Many UK-based archivists will be familiar with Calm – it is a proprietary system provided by Axiell.
- In order to provide online access to our catalogue we use Blacklight, an open source discovery platform. We export catalogues from Calm using EAD which is then parsed by Blacklight. The reason for this is that we are part of a partnership with the Hull City Archives and Local Studies service. The City archives have their own instance of Calm and the Local Studies Library use cataloguing software called SIRSI. These can all be exported to XML and delivered through our singular instance of Blacklight, giving online visitors a singular online point of access to the History Centre’s collections. Spotlight is a Blacklight plug-in which we will be exploring to create online exhibitions.
- To provide access to digital content, we are embedding The Universal Viewer which is another open source application. If a record on Blacklight has a digital object attached to it, the Universal Viewer appears on the page and allows you to look at it.
This is all hosted on Microsoft Azure cloud storage, and has also been tested for deployment on Amazon Web Services using Docker packages.
Decision-making process for the tools used in your framework
The University of Hull was part the JISC-sponsored Filling the Digital Preservation Gap collaborative project with the University of York which ran from 2015 to 2016. This was a project focusing on the preservation of research data, part of which involved the creation of a prototype for long term preservation which drew together the use of Box, Archivematica and Samvera.
With the knowledge that that prototype worked, when the City of Culture came to Hull it seemed natural to use it as a starting point for developing a full digital archive solution. We already used Calm for cataloguing and Blacklight for online discovery so we hoped that incorporating them into the system architecture would provide some welcome continuity with the way that we already processed analogue records.
We put out a tender to do the software development and CoSector were the successful bidders. They suggested the specific use of Hyrax alongside the other systems we had requested be part of our solution.
Rationale for a predominantly open-source approach
The University of Hull has a long history with contributing to the community surrounding Samvera, which followed on from prior involvement in other open source communities and adoption of their solutions (e.g., the use of Sakai as an institutional VLE for some years). It is perhaps easier to get organisational approval for the use of open source technologies within universities than in some other sectors. The ethos around open source technology is something that makes sense within the academic environment because it is about questioning whether what is on offer is the best it can be, building in flexibility, and developing it within a wider community of interested parties.
That is not to say that open source is necessarily “better” than going with a commercial solution for everyone – the best fit for you is the best fit for your organisation. If you are going for an open source option you may have to accept that implementation will take longer; that technical support is more difficult to find or that the resource that you have to put in from within your organisation may be greater than if you had gone with a commercial solution.
Benefits and drawbacks of this approach
- Increased chance for customisation
- A voice in the development of systems
- Community support and interest
- “Free” – although there are associated costs for support and development that could bring costs in line with commercial offerings
- Unlikely to automatically have commercial helpdesk-style support long-term (though we do get this from CoSector currently)
- Depending on your organisation it may be more difficult to gain organisational support and understanding (though as said above, this is not something we have struggled with at the University of Hull)
- Unlikely anyone within your organisation will have experience with specific open source digital preservation tools. There will likely have to be a period of training if you aim to provide any internal support or development.
- May take longer to implement
Pressure points when designing or implementing the architecture
In the digital preservation world you often hear talk of the “three legged stool”. This refers to the three aspects underpinning successful digital preservation: technology, resources and organisation. Unless you’re extremely fortunate, it would be unusual to enter into a project to implement a digital archive system (or in our case, set of systems) without reaching some pressure points relating to at least one of the stool legs.
- Technology: our plan to integrate different systems, some open source and some proprietary was ambitious. Our developers had to be familiar with all of the systems to the extent that they could coordinate that integration. That’s a big ask! Documentation and helpdesk-style support for all the systems (proprietary and open source) isn’t of a consistent standard so some parts of the technical implementation have been easier and better-supported than others. It’s important to note that it is absolutely not the case that just because a system is proprietary that its documentation will be better.
- Resources: we have been through several chunks of funding. Currently we are funded by the Higher Education Innovation Fund. We also set out on the project expecting to be able to hire a dedicated in-house software developer. After two failed rounds of recruitment it was decided to put the work out to tender instead. This did slow down the project initially but ultimately ended up being the absolute best thing for the project as we have been able to rely on the expertise of CoSector and explore paths we mightn’t have working fully in-house.
- Organisation: as the project has progressed over two years, the university landscape has shifted around us. It has been really important to ensure that our digital archives work remained high on the agenda.
Manual processes you would like to automate but haven’t been able to yet
Whilst Archivematica has some appraisal functionality built-in, we have decided to do appraisal outside of the digital archive structure detailed above. We may revisit this decision at some point in the future but for now we have found that Archivematica doesn’t quite meet our needs in this area. We can appraise for value using Windows explorer, and use DROID to identify file duplication. Appraisal will always require archivist intervention as identifying value and relevance is a subjective, human process but what would be ideal in the future is a way to quickly browse through files to get an idea of their contents and then to be able to action and record appraisal decisions within Archivematica.
Something that is really important for us to ensure is that personal data is protected. To that end we have been testing the use of BitCurator to see how effective it is at identifying standard personally identifying information with a view to incorporating it into our digital archives workflow.
The University of Hull has published a White Paper on their digital preservation work: Digital Archiving Services A University White Paper (January 2020).