Telegraph Building (catalogue reference INF 9/482)

Prosopography in practice across Big Data

Thanks to funding from the Arts and Humanities Research Council (AHRC), The National Archives and partners are carrying out research which looks to create a generic, extensible approach to tracing the lives of real people through time and across the documentary evidence that survives them.

Project partners

  • The National Archives
  • The Institute of Historical Research
  • University of Brighton
  • University of Leiden

Project summary

This innovative, multi-disciplinary project will deliver practical analytical tools to support large-scale exploration of big historical datasets. The project aims to bring together international research experience in the digital humanities, natural language processing, information science, data mining and linked data, with large, complex and diverse 'big data' spanning over 500 years of British history.

The project's technical outputs will be a methodology and supporting toolkit that identify individuals within and across historical datasets, allowing people to be traced through the records and enabling their stories to emerge from the data. The tools will handle the 'fuzzy' nature of historical data, including aliases, incomplete information, spelling variations and the errors that are inevitably encountered in official records. The toolkit will be open and configurable, offering the flexibility to formulate and ask interesting questions of the data, exploring it in ways that were not imagined when the records were created. The open approach will create opportunities for further enhancement or re-use and offers the further potential to deliver the outputs as a service, extensible to new datasets as these become available. This brings the vision of finding and linking individuals in new combinations of datasets, from the widest range of historical sources.

Project objectives

There are four primary objectives each of which directly address a key research question in the field of Digital History. The project will also deliver one or more tangible, sustainable assets for use by the research community:

  • developing a methodology to identify and trace individuals across large and diverse historical datasets
  • creating a toolkit, suitable for embedding into a software product or service, by encoding the methodology as a series of computer-based patterns, rules and processes (algorithms)
  • enhancing the research value of the identified networks through assigning robust confidence measures describing the quality of each link identified by the algorithm
  • assessing how effectively the algorithms can be extended to additional datasets or to new combinations of datasets

Benefits to researchers

The project will benefit researchers across the whole spectrum of digital history:

  • assisting historians seeking evidence of life-events through a collective study of individual biographies
  • helping genealogists find and trace the paths of their ancestors across the landscape of the official record
  • helping researchers by signposting routes between historical collections, enabling links between datasets at a deep level and creating opportunities for discovery
  • for cultural organisations, illuminating effective approaches to creation and curation of new digital datasets to optimise their potential for linking and re-use
  • providing evidence to support policy making, helping balance the demands of Data Protection and information assurance with those of open data and Freedom of Information
  • providing a methodology to underpin the creation of new tools and resources, supporting the digital economy