Research into Web & Social media archiving

FOI request reference: F0060376
Publication date: January 2020




Request & response 

1. What is the scope of this archiving – i.e. typically how many websites / how many pages of each website is archived?
We now operate two web archive services: the UK Government Web Archive and the EU Exit Web Archive.

The UK Government Web Archive (UKGWA) operates a domain scope model, meaning that each time we archive a UK government website we aim to capture it in its entirety. We archive approximately 800 websites on schedules between monthly and yearly, with the average website being crawled twice a year, as well as regular captures of several hundred social media accounts. Additional information can be found in the public domain, here:

The EU Exit Web Archive, which is largely an archive of a single, complex website, and is captured through a data-driven approach and limited as described on that page. Content is archived by seeding specific URLs to a crawler, rather than using conventional crawling. Public access and search services are also provided to the archived content.

2. What is the cost of this archiving per annum?
Between 1 January 2019 and 31 December 2019, the cost was £480,411 for the UK Government Web Archive and £283,630 for building, developing and hosting the EU Exit Web Archive.

3. What is the primary driver for the National Archive to do this?
For the UKGWA, under the Public Records Act, The National Archives is responsible for those records of central government departments that have be selected for permanent preservation as public records. The websites of central government departments have been selected for permanent preservation as public records. Please see the ‘Operational selection policy for the UK central Government Web Estate’ for more information:

For the EU Exit Web Archive, the duty authority is provided under Schedule 5 of the European Union (Withdrawal) Act 2018:

The Keeper and Chief Executive of The National Archives is the Queen’s Printer.