- UK Government Web Archive
Web archiving and web continuity guidance
How we archive your websites
The National Archives captures websites using remote harvesting, performed under contract, by the Internet Memory Foundation (formerly known as the European Archive). The crawler (web archiving software) identifies itself as being from the Internet Memory Foundation or European Archive. Please ensure that your sites are set up to allow access to our crawler. Sites are crawled (archived) at a request rate of 1 request per 0.7 seconds. This means that our crawler should not cause you any difficulties when archiving sites. If you experience any problems with our crawler please contact us immediately.
We archive sites according to a regular schedule. Full details about our archiving schedules are available in the Government Website Database. Website owners should contact us to ensure that content is archived before removing it from the live web.
UK Government website review programme
Government is reviewing the number of websites it has, to provide a more user-friendly experience for the public. This activity is being managed by the Government Digital Service (GDS).
Please don't close your site ahead of schedule, as this will prevent us from being able to archive it. Generally, we need at least eight weeks' notice for changes to the closure schedule because the web archiving process takes a full eight weeks to complete. Please do not make major changes to your site during the eight weeks before closure or convergence as the changes may not be archived.
UK Government Website Database
The UK Government Website Database was developed by The National Archives to manage the web archiving schedule and maintain a record of the collection decisions made as a result of such website review programmes. It is accessible to webmasters and records officers of central government departments and bodies. To obtain access to this, please email us.
Web archiving and continuity guidance
The following guidance will help to ensure your website is archived fully:
- Cabinet Office Web Standard TG105: Web archiving guidance, providing a step-by-step guide, and best practice advice
- Cabinet Office Web Standard TG122: sitemap guidance, including the use of sitemap generation software. Using XML sitemaps can assist search engine optimisation and the comprehensive capture of website content in the archive
- Cabinet Office Web Standard TG125: guidance about managing URLs/links persistence and using redirection technology.
The National Archives' solution to preventing broken web links is both simple and innovative. Our web continuity service includes comprehensively archiving government websites and working with the owners of those site to automatically redirect people using government departments' websites to the pages captured in our web archive, if they click on a link that is no longer active on the live site.
There are web redirection software components which you can install to make sure that links persist over time. The components run on Apache and Microsoft IIS (Internet Information Server) web servers. They have been independently tested by a validation facility approved by AKAS (the UK Accreditation Scheme) under ISO/IEC 17025:2005.
The following guidance explains where to find the software components required to ensure that web continuity works and how to install it:
- Government Web Archive: Redirection Technical Guidance for Government Departments (PDF, 0.42Mb)
- Apache Accreditation Documentation (PDF, 0.10Mb)
- Ionics ISAPI Accreditation Documentation (PDF, 0.10Mb)
Technical limitationsSome limitations of web crawling technology are outlined on our Information on web archiving page. Outside of these known limitations it should be possible for us to archive sites which comply with the guidance mentioned above. However, unexpected technical difficulties can arise, so we recommend that website owners satisfy themselves that significant content has been successfully added to the UK Government Web Archive before removing it from the live web.
While we are able to add files of any size, we are not, at present, able to serve files greater than 20MB in size from the UK Government Web Archive, with the exception of PDF files. If your site contains any files greater than 20MB which you wish to remain accessible through the web archive we recommend that you split them into smaller files before the site is crawled. They can then be crawled as several smaller files which can be made accessible individually.
Contact details on archived websites
Website owners are reminded that users may attempt to use contact email addresses, mail to links and telephone numbers found in archived versions of sites. It is best practice to ensure that email addresses and telephone numbers used on websites are kept live for this reason. Site owners may find it easier to use generic (i.e. central or team) email addresses and telephone numbers.
Linking to archived sites hosted in the UK Government Web Archive
A majority of the sites in our collection are hosted by the Internet Memory Foundation. You can provide links to an index of all available snapshots of a website, or to specific, dated snapshots of a website in the collection.
An example of how you should create a href for linking to an index of all available snapshots of a website:
This predictable URL will link to an index page showing all available snapshots of the Ministry of Defence website (http://www.mod.uk/). The URL after the '/*/' can be altered to retrieve the indexes for other websites.
Please note that as the web archive is indexed at page and file level, additional snapshots can occur where the web crawler follows links outside of a given domain. The result is a partial crawl which is limited in depth and may have missing content, but is still a valuable part of the archive. Therefore, for information on which crawls in the index are complete crawls, please check the crawl schedule in the UK Government Website Database or email us.
To link to a specific archived instance of a website, you can follow links from the index page, and obtain the URL of the specified snapshot. The first eight digits of the code provided the date the crawl of the site began in YYYYMMDD format. For example, the following link is to the crawl of: http://www.number10.gov.uk/ dated 5 October 2009: http://webarchive.nationalarchives.gov.uk/20091005102710/http://www.number10.gov.uk/
Please contact us if you wish to link to a specific archived instance of a website hosted in one of our smaller collections.
When linking to the UK Government Web Archive, please state that the site has been archived by The National Archives and is available through the UK Government Web Archive.
Departmental security policies may require that firewalls are configured to block access to archived websites including those hosted by the Internet Memory Foundation. This may prevent staff from accessing the collection.
As the UK Government Web Archive is hosted at http://webarchive.nationalarchives.gov.uk access to the collection through a firewall will minimally require that URL to be opened up.
Before the archiving of websites was contracted to the Internet Memory Foundation in 2005 some sites were archived using the tools of other organisations involved in web archiving. As a result, the UK Web Archive and Internet Archive host some sites in our collection which were archived before 2005. Access to these collections through a firewall will minimally require the following urls to be opened up:
- http://www.webarchive.org.uk/ for sites archived by the UK Web Archive and
- https://archive.org/ and https://web.archive.org/ for sites archived by the Internet Archive