2. Ingest

This workflow describes how you prepare the content so it is ready for preserving at the next stage. If there are any terms that you are unfamiliar with on this page, please refer to the glossary for the most common terms used in digital preservation.

Step 2.1 Understand what you have

This is an essential step

  • Use software, such as DROID, to identify what you have and create a list of the content. This should include file names, file paths, sizes, file format, last modified date etc.
  • Identifying the file formats accurately is particularly important.
  • Save the list in an open format (e.g. CSV or XML) and store in the ‘metadata’ folder you created in step 1.2.

Software

  • DROID (identifies file format and other information).
  • Fido (identifies file format only).
  • Siegfried (file format identification tool).
  • MediaInfo (useful for identifying audiovisual files).
  • Karen’s Directory Printer (useful for creating lists of files but does not identify file formats with the same degree of certainty as DROID or Fido).

Further guidance

Step 2.2 Validate content

  • Validation software checks whether the content conforms to their file format specification. In some cases it can also fix issues.
  • It is not always seen as an essential step but can help flag issues. For example, if the content does not conform to this specification then it may be more difficult to read or manage in the future.
  • It can also be useful for checking the quality of digitised content.

Validation software

  • JHOVE (validates certain file formats and also carries out identification).
  • Jpylyzer (validates JP2 images).
  • veraPDF (validates PDF/A).
  • MediaConch (validates audiovisual files).

Further guidance

Step 2.3 Analyse and investigate

  • You may wish to analyse the metadata you captured during steps 2.1-2.2 and flag any issues for investigation.
  • This includes looking out for corrupt files, compressed files, encrypted files and password-protected files. You will probably need to go back to the depositor to resolve these.
  • It can also flag unidentified formats which could require further research.
  • Some archives also convert file formats to a preferred file format for preservation (see step 3.5).

Software

  • Freud (used by The National Archives to analyse a DROID export and pick up common issues to mark for investigation).
  • HxD Hex Editor (displays the bytes of a file and helps with file format research).

Further guidance

Step 2.4 Describe

This is an essential step

  • As a minimum, create a high-level description of the content.
  • You may decide to do more detailed cataloguing in accordance with your organisation’s cataloguing standards (either now or at a later date).
  • You can add the descriptions to the list you created in step 2.2 or create them in a CSV or XML file.
  • If you use a collection management system, you may wish to record the descriptions there (e.g. the accession record or catalogue).

Software

  • Quick View Plus (allows you to view over 300+ file formats. $99 per year)
  • VLC (for playing audio and video files)

Further guidance

Step 2.5 Appraise

  • You may have already carried out appraisal at step 1.2. At this stage, you may wish to carry out further appraisal.
  • As a minimum, you could consider identifying and removing duplicates by comparing the checksums of the content. There is software that can help you do this (see below).
  • However, you may decide to keep duplicates if they have useful contextual information (e.g. file name).

De-duplication software

Further guidance

2.6 Apply access restrictions

This is an essential step

  • Some of the content may contain personal, sensitive or confidential information.
  • If the content is subject to the Freedom of Information Act, you will need to use the act’s exemptions to inform any restrictions.
  • The depositor should help you identify this during transfer at step 1.2. Cataloguing at step 2.4 can also help with this.
  • There is software that can help you identify personal information. Some of it is commercial and expensive, but a list of free software can be found below.
  • Access restrictions or any risks should be recorded somewhere (e.g. in the list you created in step 2.2 and/or in any collection management system).

Software

  • Bulk Extractor
  • BitCurator (digital forensics tools for digital preservation including Bulk Extractor)
  • ePADD (can help identify sensitive information in email archives)

Further guidance

For the next stage of the digital preservation workflow, head over to the Preserve page.