Jack Ord – Cataloguing Officer (Military Records Project), November 2024
In this blog, Jack Ord explains the processes and challenges of cataloguing The National Archives’ biggest ever acquisition of military service personnel records, talks about some of the material found in the files, and highlights the latest project milestones.
Jack is the Cataloguing Officer (Military Records Project) for WO 423 in the Cataloguing, Taxonomy and Data department. He supports the process of adding records to Discovery through data quality control, data conversion and solving data problems.
For my contribution to Catalogue Week 2024, I am delighted to give you an overview of cataloguing the largest series of military personnel records The National Archives has taken on to date.
In February 2023, we commenced the mammoth task of processing just over 1.5 million records of the men and women who served in the British Army between 1939 and 1963. For most of the series, military service ends in the 1940s as they are primarily Second World War records, but it also includes National Service. We are capturing the names, service numbers and dates of birth of all the individuals. This will improve access to the records for historians, academics and anyone researching their family’s military history.
WO 423 is the series created for soldier ranks and these consist of:
- Warrant Officer Class 1
- Warrant Officer Class 2
- Staff Sergeant
- Sergeant
- Corporal
- Lance Corporal
- Private
Background
The work being undertaken to catalogue WO 423 is just one part of an even larger project. In 2020–21, The National Archives agreed to the transfer of around 10 million Army, Royal Navy and Royal Air Force personnel records from the Ministry of Defence, when they had reached the end of their retention period. The magnitude of the endeavour posed a variety of challenges, not least the amount storage space needed to house such a huge acquisition. An entirely new repository was fitted out with state-of-the-art roller racking, almost exclusively for WO 423.
It has also brought about a substantial increase in Freedom of Information requests. The 115-year closure from date of birth rule means that demand for access to closed records has soared over recent years. To begin with, a very small team within The National Archives’ Freedom of Information department handled these requests. However, the workload became unrealistic for just a few members of staff and so a specialised MoD Access Service team was created.
A separate digitisation project is also taking place alongside the cataloguing and access service provision. A vast section of The National Archives, previously occupied by the Record Copying Department, was repurposed and the work is being carried out by Ancestry. They have a contract to scan, transcribe and publish the first three million records in all series.
What do the records contain?
These files typically consist of each serviceman or servicewoman’s registration and attestation papers, their record of service and discharge forms. They can also contain medical documentation, pay books and a wide variety of correspondence. This tells us their basic details, when and where they enlisted, where they were posted to and the start and end dates of military service. The medical papers often include dental treatment cards and occasionally x-rays.
Every so often, we find files containing little photos of the individuals and in these cases, it is nice to put faces to the names. On much rarer occasions, we come across records containing even more unusual and colourful things. It was a pleasant surprise when I opened one file and found a ‘Ranks and Badges of Rank – Belgian Army’ poster on the inner cover.
The cataloguing process
Considering the volume of records in WO 423, it is a blessing that we do not have to index all of them from scratch. In fact, more than 99% of the metadata comes to us in Microsoft Excel master spreadsheets compiled by the MoD. The master spreadsheets are generally 30,000-100,000 rows long and we break them down into batches of a thousand, making the data easier to work on. At the first stage, we check each batch of a thousand records for the accuracy of the surnames, service numbers and dates of birth. Earlier in the project, this required a lot of filtering, paying close attention to spot any data inaccuracies and many hours of staring at spreadsheets.
However, since autumn 2023 we have been assisted by Talend Open Studio, a data integration tool that scans the raw data and highlights anything detected as potentially incorrect. The results of the Talend scans appear in the end columns, which means we can simply filter the spreadsheets to show only the rows with messages. We then use our common sense and judgement to decide if anything needs amending or further investigation. Whenever there are gaps in the key fields, such as incomplete surnames, we can usually look them up on the MoD database. In some instances of data that may be incorrect, especially when it comes to dates of birth, a physical document check is called for.
The next step is to prepare the data using the cataloguing template, which ensures the correct order of columns for the loading procedure. We run each batch of a thousand records through Talend, which processes the data at this stage to generate ‘load spreadsheets’. After that, the processed data is uploaded to PROCAT (Public Records Office Catalogue) via the SQL Server Reporting Services and we have an ‘edit set’ for each batch of a thousand records. Rather satisfyingly, in my opinion anyway, the scope and contents are displayed on a background of a wonderful shade of pink and our task is to spot-check for any errors that may have slipped the net. Lastly, we send the edit sets to pre-release and the catalogue descriptions make their final cyber-journey to their destination of Discovery.
For open records, the surname, initials, service number and full date of birth is visible to anyone searching the catalogue. Closed records, as well as those with redacted descriptions until 115 years from birth, only have the surname, initials and year of birth.
Where we are now
As we approach the end of 2024, we have cleared the halfway point in cataloguing WO 423; we have released up to piece number 827769 at the time of writing. We recently passed the one million milestone for records with catalogue descriptions added to Discovery, in all series, since the start of the MoD Records Transfer Project. I calculated that if all those million records were arranged end-to-end in a line on the floor, they would be 30km long. To put it another way, they would stretch from The National Archives all the way to Epping Forest.
Glossary
- Acquisition – Any object or group of objects legally transferred from one party to another
- Discovery – The National Archives’ publicly-searchable catalogue
- Metadata – The data providing information about one or more aspects of data, making tracking and working with specific data easier
- MoD – Ministry of Defence
- SQL – Structured Language Query, a domain-specific language used to manage data, particularly in a relational database management system
- WO – War Office