In addition to our work to archive central government websites, we undertook to archive the websites of primary local authorities and National Health Service (NHS) websites through two separate large-scale crawls.

The initial crawl was performed between September and November 2011. This was followed by a second, similar crawl, between August and September 2012. In all, approximately 100 million links were captured across nearly 3,000 websites.

Aims of the project

The purpose of the project was to:

  • ensure that  transparency datasets released on these websites as part of the government’s Transparency and Open Data initiative, and linked to from data.gov.uk, will be archived and remain permanently accessible
  • continue to lead the archives sector by combating the problem of the potential loss of this information from the historical record, and providing perpetual access to it; and
  • support our local authority web archiving pilot project with the aim of raising awareness and the necessary skills so that participants can decide on a web archiving model that meets their needs

How we carried it out

While we have carried out the crawl for many of the same reasons as our main web archive, the methodology is different in the following ways:

  • we have performed very little quality assurance on the crawls, as the project was designed to be low-cost and largely automated
  • we opted for breadth of capture rather than depth of capture to gather as much data as possible
  • we did not engage in active dialogue with website owners

View the websites

You can access the results of the crawls by viewing the NHS A-Z list of archived websites and the local authority A-Z list of archived websites.