Creating an official inquiry website

Creating and maintaining the official public inquiry website

The official public inquiry website will be a useful tool for sharing information and publishing reports or evidence. This resource is a primary record of any inquiry, and will be captured by The National Archives into the UK Government Web Archive. Therefore, the management of an inquiry website should be part of a wider approach to information and records management. If you would like guidance before the inquiry website is created, please email us at: webarchive@nationalarchives.gsi.gov.uk.

Here are some things that can be done to help make websites easier to archive:

Website crawling and technical requirements

  • contact The National Archives as early in the process as possible, so that we are aware of the website and can work with the inquiry team to make preparations for its archiving. Once this contact is established, The National Archives can provide updates on progress and arrange the final crawl of the website once the inquiry is dissolved. You can contact the web archiving team at: webarchive@nationalarchives.gsi.gov.uk
  • the initial crawl of the inquiry website will start early in the process, when there may be little content hosted on the website. This is both for posterity and to assess the suitability of the design of the website for web archiving
  • the site will be captured at least once each year while the inquiry is in progress
  • keep all content under one root URL (for example http://www.mydomain.gov.uk/). As the scope of the web crawl is for content within this root, content from outside this will not automatically be archived. This makes it easy to identify that the entire site has been captured, thus ensuring transparency of process and data during and post the inquiry
  • publishing content in the cloud or in web document services could cause problems in the web archiving process as they may not automatically be captured. If you plan to use any of these services, please contact your Information Management Consultant or the Web Archiving team
  • present everything on your website through the HTTP or the HTTPS protocol and make sure that it is used consistently throughout the website
  • only content linked to from a page within the scope of the crawl will be archived, as the crawler relies on discovering links in the coding of the page
  • use meaningful URLs. These are good practice for a number of reasons, including usability, security, and search engine optimisation
  • due to the technical architecture of the web archive, we are able to archive, but are currently unable to provide access to, any file of 20MB and over. This affects all file types except for PDF files, which can be accessed if they are up to 200MB in size. The National Archives recommends splitting any large files into ‘parts’ (as with http://webarchive.nationalarchives.gov.uk/20120216072438/http:/7julyinquests.independent.gov.uk/)
  • hyperlinks inside files attached to the website will not work in the archived version. This includes links in PDF, Word, Excel, ODF and other file types. Any resources only linked to via a hyperlink in a non-HTMLdocument will not be captured. Please ensure links to the resources are also provided on an HTML page
  • keep navigation as basic as possible, by providing static links, link lists and basic page anchors, rather than JavaScript and dynamically generated URLs. If using scripting (such as JavaScript) on your website, provide plain HTML alternatives – this supports accessibility for users and supports archiving
  • it is not usually possible to crawl databases. Any data held in databases should be published on the website using basic, static links
  • provide an XML sitemap, which lists and links to all of the content on your website. This is useful for users, makes your website more findable by search engines and supports archiving. Please let the Web Archiving team know the location of your XML sitemap once it is live
  • information needs to be ‘machine reachable’, which means that it can be reached by a web crawler. Information that needs a tick box, pick list, drop-down menu or a search box to access it is not machine reachable and so cannot be captured by a web crawler. If this functionality must be a feature of the live website, provide plain HTML alternatives
  • The National Archives can only archive publicly-accessible content. Any content that is behind log-ins or in other inaccessible areas, should either be published on the website if appropriate, or transferred to The National Archives by other means. If unsure, please speak to your Information Management Consultant

Video and audio

  • we are able to archive videos hosted on YouTube directly from the YouTube channel. The videos will display as part of our collection of archived YouTube channels (http://nationalarchives.gov.uk/webarchive/videos.htm) it is not technically possible to embed archived YouTube videos in archived web pages
  • media content can also be archived if it is presented on the website via progressive download, through HTTP or HTTPS and with absolute URLs
  • audio-visual material should be linked to using absolute URLs (http://www.mydomain.gov.uk/video/video1.mp4) rather than relative URLs (…video/video1.mp4) in the coding of the page
  • consider providing full transcripts of all audio-visual material
  • where the inquiry website includes third party audio-visual material, inquiry staff will need to arrange for assignment of copyright to The National Archives or at the least permission from the copyright owners to reproduce the content from the UK Government Web Archive both during the inquiry and in perpetuity following the end of the inquiry lifetime
  • capturing of Flash elements in pages presents significant challenges, due to their complexity, and we cannot guarantee these will be archived

Quality assurance and maintenance

  • if any content on the live website is broken at the time the site is archived, it will not be captured as part of the web archiving process. The National Archives recommends the inquiry team checks the live website thoroughly and fixes any broken links before the final archiving process is launched
  • in order to ensure successful archiving, it is necessary to figure in the time required for crawling, quality assurance, fixing any issues and publishing the crawl in our public index. This takes approximately eight weeks. Please ensure that the website will remain live and unchanging for this period, so that The National Archives can take a final and complete snapshot
  • The National Archives strongly recommends that those involved in the release of inquiry records, or are otherwise familiar with the design of the website, are available during the quality assurance stage to ensure the web capture is comprehensive
  • no content can be inserted into the archived website after the live website has been taken off-line. Any content not available on the website at the time of crawl, or not accessible because the above guidelines have not been met, cannot be inserted into the web archive after the live website comes down
  • the underlying code of an archived website cannot be altered in the web archive. That means that website managers should confirm that their website is ready to be archived and that the content will remain perpetually unchanged
  • content can only be removed from archived websites in exceptional circumstances, when it adheres to one or more of the criteria set out in the Takedown Policy
  • retain the domain after the final snapshot of the website has been made. This is essential as it prevents ‘cybersquatting’ and can give users continuity of access to the inquiry’s online records, if a redirect is set up into the web archive. The National Archives recommends that the domain should be retained and any redirects remain in place in perpetuity after the closure of the live website. For more information see section 3.1.6 of the Web Archiving guidance

Social media

  • videos hosted on YouTube can be archived. They will be captured directly from the YouTube channel and displayed as part of The National Archives collection of archived YouTube content. It is not technically possible to embed archived YouTube videos in archived web pages. The National Archives recommends all videos related to an inquiry are hosted on a single channel. Please let the web archiving team know the location of the channel
  • Twitter feeds can also be archived. Only Tweets made by the feed will be captured, not retweets or responses to Tweets. Please see Operational Selection Policy 27 for details. Please let the web archiving team know the location of the channel
  • it is possible to capture content hosted on some blogging platforms such as WordPress and Tumblr. If you plan to host content on these or similar platforms please contact the web archiving team for advice
  • it is not possible to capture content hosted on other social media channels such as Flickr, Facebook and AudioBoom. If you host content on these or other social media channels they will need to be preserved via your own website or in other electronic systems

Copyright

  • in order for your website to be available in perpetuity through the UK Government Web Archive, it is essential that all content is either Crown copyright or appropriate licences are in place with third party copyright holders to allow The National Archives to copy and make available all content on the website
  • make sure that your website has a clear copyright statement as this will make it clear to future users who own the copyright and under what terms it may be reused under the Open Government Licence. This applies to all content on your website
  • make sure that any media or copy that is copyright to a third party is clearly marked as such
  • your Information Management Consultant will ask you to complete a Web Archiving Copyright Licence form to confirm that appropriate licences are in place. Complete this and return it promptly to avoid future queries

Find out more:

Web Archiving guidance

General information on web archiving