Legislation and data storage

FOI request reference:
Publication date: March 2022

I request, for legislation.gov.uk, BAILII, and any other official UK websites (including intranet sites) that do bulk programmatic display or processing of legislative or judicial information:

1. a full copy of all records documenting the API(s), database(s), codebase(s), XML/JSON/etc markup formats, etc., including both documentation meant for humans and documentation meant for processing with code, in native electronic format as stored (e.g. HTML, PDF, Markdown, XSLT, XSD, JSON, etc.);

2. a full copy of the backing database(s), and all backing records not in the database, in native electronic format as stored (e.g. TAR.GZ of SQL dump, JSON & XML files, etc.);

3. a full copy of the backing codebase(s), including all changes (commits, changelog, etc.) for any components that use a source control management system like Git, Subversion, Mercurial, etc., and codebases for any related tools (e.g. CLI utilities, ETL scripts, XML to HTML converters, tools for automated/assisted/manual markup of records as received from Parliament or the Judiciary, etc), in native electronic format as stored (e.g. TAR.GZ of the full Git repository);

4. direct third party access to the backing API(s), with read and query access but not write access, in the same manner as is used by other tools or front-end websites (e.g. REST, token based authentication, etc.);

5. direct third party access to the codebase(s), in the same manner as is used by developers (e.g. GitHub), with full read access (excluding cryptographic secrets like authentication tokens, database salt, passwords, SSL or SSH private keys, etc.) and full access to send & read pull requests, create & read issues, receive notifications, etc.; and

6. direct third party replication and query access to the backing database(s) and non-database backing record storage, in a manner suitable for ongoing full replication (e.g. MySQL slave database, RSS feed, URL callback hook, Docker swarm, pub/sub API, date or ID scoped query API or SQL access, etc.).

Outcome

Some information provided.

Response

1. a full copy of all records documenting the API(s), database(s), codebase(s), XML/JSON/etc markup formats, etc., including both documentation meant for humans and documentation meant for processing with code, in native electronic format as stored (e.g. HTML, PDF, Markdown, XSLT, XSD, JSON, etc.);

Following a search of our digital filing system we have located the following documentation which concerns The National Archives’ APIs, databases and codebases.

· “Legislation Website Information Guide.docx” – This is an introductory guide to the organisation of information on the legislation.gov.uk website and how to access the data through the Legislation API.
· “LEGISLATION API Presentation & Notes.zip” – As the Legislation API is a RESTful API, understanding the URI scheme for the website is key to data access. This is a presentation with notes that describes the logical model and principles of the URI scheme design and the API.
· “Website – Legislation API Service Guide.docx” – This is the full API service guide. Please note that this is in the process of being updated to reflect our addition of EU data to the website and to improve layout and organisation however most information contained should be accurate.
· “Legislation Data Description 20-11-2019.docx” – This contains another description of the API as well as detail of the formats held on the website and the completeness of the datasets on the website. It also contains information on our prototype bulk downloads service.

Please find attached a copy of these documents. This documentation was intended for humans. The National Archives does not hold documentation meant for processing with code in native electronic format as stored.

2. a full copy of the backing database(s), and all backing records not in the database, in native electronic format as stored (e.g. TAR.GZ of SQL dump, JSON & XML files, etc.);

It is our policy to make all of our data publically available via our public API as described in the API service guide. The “Legislation data description” documentation above contains information about accessing bulk downloads of legislation data in the following formats only: website default XHTML, Crown Legislation Mark-up Language (CLML) XML, Akoma Ntoso (AKN) XML, an HTML 5 serialisation of the AKN XML, PDF and plaintext.

This data is periodically updated and is accurate as of January 2022.

3. a full copy of the backing codebase(s), including all changes (commits, changelog, etc.) for any components that use a source control management system like Git, Subversion, Mercurial, etc., and codebases for any related tools (e.g. CLI utilities, ETL scripts, XML to HTML converters, tools for automated/assisted/manual markup of records as received from Parliament or the Judiciary, etc), in native electronic format as stored (e.g. TAR.GZ of the full Git repository);

Information about accessing a partial copy of the backing codebases including all changes for components that utilise a source control management system and codebases for related tools can be found within the ‘Accessing Legislation Data’ section of the aforementioned Legislation Data Access, Formats & Completeness document. This includes information about a GitHub repository https://github.com/legislation/legislation which is maintained by the National Archives and contractors. This repository contains a copy of the frontend code that underpins the www.legislation.gov.uk website. All code in this repository is free to reuse under the Open Government Licence v3.0 (https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3). However, please note that the code currently available is not the latest version, work is underway to prepare an update to bring the repository in line with recent changes to the website.

Part of these codebases is exempt under section 22 of the FOI Act which concern information which is intended for future publication. The legislation.gov.uk codebase is due to be published in a Code in the Open project due to commence later this year, and will see this information released into a publicly accessible bitbucket repository.

Part of these codebases contain the personal data of National Archives staff and external contractors, as well as information which if released would prejudice our IT security, and consequently is exempt under sections 40(2) and 31(1)(a) of the Act.

4. direct third party access to the backing API(s), with read and query access but not write access, in the same manner as is used by other tools or front-end websites (e.g. REST, token based authentication, etc.);

The National Archives is unable to provide such access to you or to the public in general as to do so would entail releasing information which is exempt under sections 31(1)(a) and 40(2) of the Act.

All of the data that is available through the public RESTful API is described in the aforementioned Legislation API Service Guide document,

5. direct third party access to the codebase(s), in the same manner as is used by developers (e.g. GitHub), with full read access (excluding cryptographic secrets like authentication tokens, database salt, passwords, SSL or SSH private keys, etc.) and full access to send & read pull requests, create & read issues, receive notifications, etc.;

The National Archives is unable to provide such access to you or to the public in general as to do so would entail releasing information which is exempt under sections 31(1)(a) and 40(2) of the Act.

6. direct third party replication and query access to the backing database(s) and non-database backing record storage, in a manner suitable for ongoing full replication (e.g. MySQL slave database, RSS feed, URL callback hook, Docker swarm, pub/sub API, date or ID scoped query API or SQL access, etc.).

EXPLANATORY ANNEX

Exemptions applied.

Section 21: Information readily available to the applicant by other means
Section 21 of the Freedom of Information Act 2000 (FOIA) does not oblige a public authority to provide information if it is already reasonably accessible by other means.

In this case the exemption applies because some of the information requested under question 2 is already available at https://leggovuk.s3-website-eu-west-1.amazonaws.com/

Further guidance on the application of this exemption can be found at:

https://ico.org.uk/media/for-organisations/documents/1203/information-reasonably-accessible-to-the-applicant-by-other-means-sec21.pdf

Section 22: Information intended for future publication
Section 22 of the Freedom of Information Act 2000 (FOIA) exempts from release information intended for future publication if (a) the information is held by the public authority with a view to its publication, by the authority or any other person, at some future date (whether determined or not), or (b) the information was already held with a view to such publication at the time when the request for information was made, and or (c) it is reasonable in all the circumstances that the information should be withheld from disclosure until the date referred to in paragraph (a).

Section 22 is a qualified exemption and we are required to conduct a public interest test when applying any qualified exemption. This means that after it has been decided that the exemption is engaged, the public interest in releasing the information must be considered. If the public interest in disclosing the information outweighs the public interest in withholding it then the exemption does not apply and the information must be released. In the FOIA there is a presumption that information should be released unless there are compelling reasons to withhold it

We have considered whether it would be in the public interest for us to provide you with the information ahead of publication, despite the exemption being applicable. Please find below the reasoning for and against disclosure.

Arguments in favour of disclosure:
Disclosure of the requested information would demonstrate The National Archives’ commitment to being a transparent and accountable organisation and would increase public awareness of the work of the archives sector.

Releasing information at the current time would allow for contemporary discussion on access to The National Archives’ backing database and would consequently enable and enrich public debate.

Arguments against disclosure:
There are public interest arguments against disclosure of this information at the present time. These arguments include that it is in the public interest to adhere to the existing publication process for access to The National Archives’ backing database, which includes time for the information to be gathered and properly verified before being placed in the public domain.

It is also in the public interest to ensure that the information is available to all members of the public at the same time, and premature publication could undermine the principle of making the information available to all at the same time through the official publication process.

On this occasion, we have concluded that the balance of the public interest test falls in favour of withholding this information.

Further guidance on the application of this exemption can be found at:
https://ico.org.uk/media/for-organisations/documents/1172/information-intended-for-future-publication-and-research-information-sections-22-and-22a-foi.pdf

Section 31(1)(a): Law Enforcement: Prevention or detection of crime
We are unable to provide you with information in respect to questions 3-6 because this information is exempt from disclosure under section 31 (1) (a) of the FOI Act. Section 31 (1) (a) exempts information if its disclosure is likely to prejudice the prevention or detection of crime.

Section 31 is a qualified exemption and we are required to conduct a public interest test when applying any qualified exemption. This means that after it has been decided that the exemption is engaged, the public interest in releasing the information must be considered. If the public interest in disclosing the information outweighs the public interest in withholding it then the exemption does not apply and the information must be released. In the FOI Act there is a presumption that information should be released unless there are compelling reasons to withhold it.

The public interest has now been concluded and the balance of the public interest has been found to fall in favour of withholding information covered by the section 31 (1) (a) exemption. Considerations in favour of the release of the information included the principle that there is a public interest in transparency and accountability in disclosing information about government procedure and its use of technology. However, release of this information would make The National Archives more vulnerable to crime. The crime in question here would be a malicious attack on The National Archives’ computer systems. As such release of this information would be seen to prejudice the prevention or detection of crime by making The National Archives’ computer system more vulnerable to hacking. There is an overwhelming public interest in keeping government computer systems secure which would be served by non-disclosure. This would outweigh any benefits of release. It has therefore been decided that the balance of the public interest lies clearly in favour of withholding the material on this occasion.

Section 40(2): Personal Information where the applicant is not the data subject
Section 40 exempts personal information about a ‘third party’ (someone other than the requester), if revealing it would breach the terms of Data Protection Legislation. Data Protection Legislation prevents personal information from release if it would be unfair or at odds with the reason why it was collected, or where the subject had officially served notice that releasing it would cause them damage or distress. Personal information must be processed lawfully, fairly and in a transparent manner as set out by Art. 5 of the General Data Protection Regulation (GDPR).

In this case the exemption applies because the requested material contains information which would identify junior members of staff and external contractors.

Publishing the names and contact details of such individuals is considered an unfair use of personal data. These individuals would have no expectation that information about their positions would be made available in the public domain; to do so would be unfair and contravene the first data protection principle of the Data Protection Act. As such, the names, positions and contact details of these individuals are withheld under section 40 (2) of the FOI Act.

Further guidance about the publication of employees data can be found here:
https://ico.org.uk/media/fororganisations/documents/1187/section_40_requests_for_personal_data_about_employees.pdf.