Because the archiving process stores the sites in a different format, not all content on an archived site will work or display properly.
All web archives are a snapshot, or representation, of what was online and accessible to the crawler at the time of the crawl and not a full working copy of a website. This is because the underlying systems (or ‘backend’) of the website cannot be archived using the remote harvesting method. The web archive is not a ‘backup’ of a website from which the original website can be restored at a later date.
These are the main things that can’t always be fully preserved in a working state:
- Links from archived websites to other non-government websites. For example, if you are viewing the archived version of the NHS website, the links to Facebook, YouTube, or the BBC won’t work.
- Links inside documents (.pdf, .doc, .docx, .xls, .xlsx, .csv documents) do not currently work in the web archive. If a user clicks on a link preserved in a document they will be taken to that location on the live web, not in the archive.
- Content that can only be reached by a user logging in, for example intranets or secured areas.
- Certain navigational features, for example drop-down menus and search.
- Document libraries or image galleries, or similar areas with large collections of content items can sometimes be difficult to capture correctly.
- Flash animations and games or streaming media.
- Embedded maps, such as Google Maps or OpenStreetMap.
- Embedded social media, for example embedded videos from Vimeo or YouTube or embedded Twitter feeds.
- Any social media platforms other than YouTube, Twitter, Flickr and Instagram.
- POST and Ajax functionality (most often used for uploading documents or completing web forms).
- E-commerce sites and functions.
You can read our core archiving requirements for a more comprehensive list of the limitations and our recommended alternatives/solutions if you have any of these on your website.