Finding captures of a URL
- Enter its URL into the search box.
- If the page is in the archive, you will be taken to a Timeline page displaying a calendar of all available captures.
- Select a capture date to view the archived webpage as it appeared at that time.
Refining the Timeline Page
- Show redirects – Displays instances where a webpage redirected to another URL, helping track content movement.
- Show one instance per day – Since web crawlers may capture multiple versions of a page daily, this option provides a clearer overview by limiting results to one per day.
Performing a full-text search
Use the main search box to search across the entire archive. You can refine your queries using:
- Search phrases – Use quotes for exact matches, e.g., “Budget 2010”.
- Keyword searches – Searching for Budget 2010 (without quotes) returns pages containing both terms, even if they are not together. Multiple terms can be entered using spaces, commas or semicolons as separators.
- Combining terms – Use a mix of phrases and individual words, e.g., “Budget 2010” Costings, to find results where both appear.
- Automatic Boolean logic – Spaces between words act as an AND operator, ensuring all words must be present. Results where terms are closest together rank higher.
Enhancing your search experience
Adjusting Search Terms
- Search terms appear as bubbles below the search bar.
- Click the [x] on any bubble to remove a term and refine your search.

Location of bubbles containing search terms
Website-specific searches
When refining your search by website, the archive recognises www.example.com and example.com as the same site, but with differences in search behaviour:
- Searching with www. returns results only for that specific subdomain.
- Searching without www. triggers a wildcard search, including all subdomains (e.g., example.com, blog.example.com, news.example.com), ensuring broader results.
Filtering search results
Refine your search results by applying filters. The key filtering options include:
- Filter by keyword – Exclude specific words or phrases from results. These appear as bubbles prefixed with “Excluding:”, acting as a NOT Boolean command.
- Filter by website – Limit your search to specific websites or exclude sites from results. These appear as bubbles prefixed with “Site:” or “Excluding Site:”.
- Filter by year archived – Search based on when content was archived rather than when it was published. Selected years appear as bubbles with the prefix “Year:”
- Social Media Archive exception: For social media content, filtering by year refers to the year published, not archived.
Exporting search results
You can export a range of search results for further analysis, limited to the first 10,000 results per search.
Available export formats
- CSV (Comma-Separated Values) – For spreadsheet applications.
- JSON (JavaScript Object Notation) – For structured data processing.
- NDJSON (Newline-Delimited JSON) – For handling large datasets.
Exported data fields
Each archive includes different metadata fields:
UK Government Web Archive (UKGWA)
- urlKey – Normalized URL identifier.
- timestamp – Date and time of capture.
- url – Full URL of the archived page.
- host – Host domain of the website.
- mime – MIME type of the archived content.
- digest – Unique hash identifier for the content.
- length – File size of the archived content.
- offset – Storage location offset in the archive.
- textTitle – Extracted page title.
- language – Detected language of the content.
- mimeHuman – Human-readable MIME type descriptor.
UK Government Social Media Archive
- _source – Root data.
- content_title – Title of the social media post.
- account_ident – Public identifier of the account.
- account_id – Unique identifier for the account.
- archived_at – Timestamp when the post was archived.
- content_text – Textual content of the social media post.
- platform_id – Unique post identifier on the platform.
- created_at – Original timestamp of the post.
- platform – Social media platform (e.g., YouTube, Facebook).
- local_display_name – Account display name at the time of archiving.
- local_account_ident – Local identifier of the account at the time of archiving.