Tips for searching for an archived website

Finding captures of a URL

  1. Enter its URL into the search box.
  2. If the page is in the archive, you will be taken to a Timeline page displaying a calendar of all available captures.
  3. Select a capture date to view the archived webpage as it appeared at that time.

Refining the Timeline Page

  • Show redirects – Displays instances where a webpage redirected to another URL, helping track content movement.
  • Show one instance per day – Since web crawlers may capture multiple versions of a page daily, this option provides a clearer overview by limiting results to one per day.

Use the main search box to search across the entire archive. You can refine your queries using:

  • Search phrases – Use quotes for exact matches, e.g., “Budget 2010”.
  • Keyword searches – Searching for Budget 2010 (without quotes) returns pages containing both terms, even if they are not together. Multiple terms can be entered using spaces, commas or semicolons as separators.
  • Combining terms – Use a mix of phrases and individual words, e.g., “Budget 2010” Costings, to find results where both appear.
  • Automatic Boolean logic – Spaces between words act as an AND operator, ensuring all words must be present. Results where terms are closest together rank higher.

Enhancing your search experience

Adjusting Search Terms

  • Search terms appear as bubbles below the search bar.
  • Click the [x] on any bubble to remove a term and refine your search.
A screenshot of a search results page showing the location of the bubbles containing the search terms.

Location of bubbles containing search terms

Website-specific searches

When refining your search by website, the archive recognises www.example.com and example.com as the same site, but with differences in search behaviour:

  • Searching with www. returns results only for that specific subdomain.
  • Searching without www. triggers a wildcard search, including all subdomains (e.g., example.com, blog.example.com, news.example.com), ensuring broader results.

Filtering search results

Refine your search results by applying filters. The key filtering options include:

  • Filter by keyword – Exclude specific words or phrases from results. These appear as bubbles prefixed with “Excluding:”, acting as a NOT Boolean command.
  • Filter by website – Limit your search to specific websites or exclude sites from results. These appear as bubbles prefixed with “Site:” or “Excluding Site:”.
  • Filter by year archived – Search based on when content was archived rather than when it was published. Selected years appear as bubbles with the prefix “Year:”
  • Social Media Archive exception: For social media content, filtering by year refers to the year published, not archived.

Exporting search results

You can export a range of search results for further analysis, limited to the first 10,000 results per search.

Available export formats

  • CSV (Comma-Separated Values) – For spreadsheet applications.
  • JSON (JavaScript Object Notation) – For structured data processing.
  • NDJSON (Newline-Delimited JSON) – For handling large datasets.

Exported data fields

Each archive includes different metadata fields:

UK Government Web Archive (UKGWA)

  • urlKey – Normalized URL identifier.
  • timestamp – Date and time of capture.
  • url – Full URL of the archived page.
  • host – Host domain of the website.
  • mime – MIME type of the archived content.
  • digest – Unique hash identifier for the content.
  • length – File size of the archived content.
  • offset – Storage location offset in the archive.
  • textTitle – Extracted page title.
  • language – Detected language of the content.
  • mimeHuman – Human-readable MIME type descriptor.

UK Government Social Media Archive

  • _source – Root data.
  • content_title – Title of the social media post.
  • account_ident – Public identifier of the account.
  • account_id – Unique identifier for the account.
  • archived_at – Timestamp when the post was archived.
  • content_text – Textual content of the social media post.
  • platform_id – Unique post identifier on the platform.
  • created_at – Original timestamp of the post.
  • platform – Social media platform (e.g., YouTube, Facebook).
  • local_display_name – Account display name at the time of archiving.
  • local_account_ident – Local identifier of the account at the time of archiving.