Creating and maintaining the official public inquiry website
The official public inquiry website will be a useful tool for sharing information and publishing reports or evidence. This resource is a primary record of any inquiry, and will be captured by The National Archives into the UK Government Web Archive. Therefore the management of an inquiry website should be part of a wider approach to information and records management. If you would like guidance before the inquiry website is created, please email us at firstname.lastname@example.org.
Here are some things that can be done to help make websites easier to archive. You can also read our detailed technical guidance on How to make your website archive compliant.
Website crawling and technical requirements
- Contact The National Archives as early in the process as possible, so that we are aware of the website and can work with the inquiry team to make preparations for its archiving. Once this contact is established, The National Archives can provide updates on progress and arrange the final crawl of the website once the inquiry is dissolved. You can contact the web archiving team at email@example.com.
- The initial crawl of the inquiry website will start early in the process, when there may be little content hosted on the website. This is both for posterity and to assess the suitability of the design of the website for web archiving.
- The site will be captured at least once each year while the inquiry is in progress.
- Keep all content under one root URL (for example https://www.mydomain.gov.uk/). As the scope of the web crawl is for content within this root, content from outside this will not automatically be archived. This makes it easy to identify that the entire site has been captured, thus ensuring transparency of process and data during and post the inquiry.
- Publishing content in cloud or web document services (such as Sharepoint or Google Drive) could cause problems in the web archiving process as they may not automatically be captured. If you plan to use any of these services, please contact the Web Archiving team.
- Present everything on your website through the HTTP or the HTTPS protocol and make sure that it is used consistently throughout the website.
- Only content linked to from a page within the scope of the crawl will be archived, as the crawler relies on discovering links in the coding of the page.
- Use meaningful URLs. These are good practice for a number of reasons, including usability, security, and search engine optimisation.
- Hyperlinks inside files attached to the website will not work in the archived version. This includes links in PDF, Word, Excel, ODF and other file types. Any resources only linked to via a hyperlink in a non-HTML document will not be captured. Please ensure links to the resources are also provided on an HTML page.
- We can’t archive content that relies on HTTP POST requests, since no query string is generated. Using POST parameters is fine for certain situations such as search queries, but you must make sure that the content is also accessible via a query string URL that is visible to the crawler, otherwise it will not be captured.
- It is not usually possible to crawl databases. Any data held in databases should be published on the website using basic, static links.
- Provide an XML sitemap, which lists and links to all of the content on your website. This is useful for users, makes your website more findable by search engines and supports archiving. Please let the Web Archiving team know the location of your XML sitemap once it is live.
- Information needs to be ‘machine reachable’, which means that it can be reached by a web crawler. Information that needs a tick box, pick list, drop-down menu or a search box to access it is not machine reachable and so cannot be captured by a web crawler. If this functionality must be a feature of the live website, provide plain HTML alternatives.
- The National Archives can only archive publicly-accessible content. Any content that is behind log-ins or in other inaccessible areas, should either be published on the website if appropriate, or transferred to The National Archives by other means. If unsure, please contact the Government Help Point at GovernmentHelpPoint@nationalarchives.gov.uk.
Video and audio
- We are able to archive videos hosted on YouTube directly from the YouTube channel. The videos will display as part of our collection of archived YouTube channels. It is not technically possible to embed archived YouTube videos in archived web pages.
- Media content can also be archived if it is presented on the website via progressive download, through HTTP or HTTPS and with absolute URLs.
- Audio-visual material should be linked to using absolute URLs (https://www.mydomain.gov.uk/video/video1.mp4) rather than relative URLs (…video/video1.mp4) in the coding of the page.
- Consider providing full transcripts of all audio-visual material.
- Where the inquiry website includes third party audio-visual material, inquiry staff will need to arrange for assignment of copyright to The National Archives or at the least permission from the copyright owners to reproduce the content from the UK Government Web Archive both during the inquiry and in perpetuity following the end of the inquiry lifetime.
Quality assurance and maintenance
- If any content on the live website is broken at the time the site is archived, it will not be captured as part of the web archiving process. The National Archives recommends the inquiry team checks the live website thoroughly and fixes any broken links before the final archiving process is launched.
- In order to ensure successful archiving, it is necessary to figure in the time required for crawling, quality assurance, fixing any issues and publishing the crawl in our public index. This takes approximately two months. Please ensure that the website will remain live and unchanging for this period, so that The National Archives can take a final and complete snapshot.
- The National Archives strongly recommends that those involved in the release of inquiry records, or are otherwise familiar with the design of the website, are available during the quality assurance stage to ensure the web capture is comprehensive.
- No content can be inserted into the archived website after the live website has been taken off-line. Any content not available on the website at the time of crawl, or not accessible because the above guidelines have not been met, cannot be inserted into the web archive after the live website comes down.
- The underlying code of an archived website cannot be altered in the web archive. That means that website managers should confirm that their website is ready to be archived and that the content will remain perpetually unchanged.
- Content can only be removed from archived websites in exceptional circumstances, when it adheres to one or more of the criteria set out in the Takedown and reclosure policy.
- Retain the domain after the final snapshot of the website has been made. This is essential as it prevents ‘cybersquatting’ and can give users continuity of access to the inquiry’s online records, if a redirect is set up into the web archive.
- We are able to archive videos hosted on YouTube directly from the YouTube channel. The videos will display as part of our collection of archived YouTube channels. It is not technically possible to embed archived YouTube videos in archived web pages. The National Archives recommends all videos related to an inquiry are hosted on a single channel. Please let the web archiving team know the location of the channel.
- Twitter, Flickr and Instagram feeds can also be archived. For Twitter only Tweets made by the feed will be captured, not retweets or responses to Tweets. Please let the web archiving team know the location of your channels.
- It is possible to capture content hosted on some blogging platforms such as WordPress and Tumblr. If you plan to host content on these or similar platforms please contact the web archiving team for advice.
- It is not possible to capture content hosted on other social media channels such as Facebook or LinkedIn. If you host content on these or other social media channels they will need to be preserved via your own website or in other electronic systems.
- In order for your website to be available in perpetuity through the UK Government Web Archive, it is essential that all content is either Crown copyright or appropriate licences are in place with third party copyright holders to allow The National Archives to copy and make available all content on the website.
- Make sure that your website has a clear copyright statement as this will make it clear to future users who own the copyright and under what terms it may be reused under the Open Government Licence. This applies to all content on your website.
- Make sure that any media or copy that is copyright to a third party is clearly marked as such.
Find out more:
How to archive a website with us
How to archive a social media channel with us