Episode three: Ethics, communities and collection

In this episode of the Annual Digital Lecture audio series, our specialists come together to talk about the opportunities and challenges that arise when people access digitised records online.

We explore the different types of encounters between digitised records and online users. They can be chance encounters on the internet and social media, they can be driven by individual research or interest into a specific topic, or they can involve some form of structured online collaborative work, like in the case of editorial work.

Listen now



Lily:  Hello and welcome to the Annual Digital Lecture podcast. My name is Lily and I’m the academic engagement officer here at The National Archives in Kew. For the past six years, The National Archives has hosted the annual Digital Lecture, an event where we welcome leading speakers to talk about digital research and practice, in addition to highlighting some of the innovative digital work happening here at The National Archives. This past year’s lecture hosted on the 28th of November 2023, was delivered by the Creative Studio Identity 2.0 who explored self-care and memory making in a digital age.

Building on these discussions from November, this podcast explores how care is part of the innovative digital work happening at The National Archives. In each episode, we’ll have a conversation with colleagues across a range of different departments and teams who will lend their expertise to talk about how the digital and care intersect in their everyday work. From the care for our records through preservation and processes, to the care for people who work with and use our records. Join us to reflect and to look to the future.

In our previous episode, we delved into the world of digitisation and datafication. We explored some of the considerations and challenges of archiving born-digital material, as well as the work undertaken to make a physical record available digitally. In this final episode, we’d like to think about the opportunities and challenges that arise when online users access digitised records. Today we will explore the different types of encounters between digitised records and online users.

They can be chance encounters on the Internet or through social media. They can be driven by individual research and interest into a specific topic. Or they can involve some form of structured online collaborative work, like in the case of editorial work. So today, I’m really happy to be joined by Katherine Howells, Our Principal Records Specialist in Visual Collections, Angela Chung, our Research Fellow in Photography. Paul Carter, our Principal Records Specialist in Collaborative Projects and Bernard Odgen, our Research Software Engineer.

So Katherine, I would love to start by speaking to you first. You work with The National Archives Visual Collections which among many other records, includes photographic collections which we’ll be focusing on today. Can you first tell us which collections are available online and how they can be accessed?

Katherine: Yeah, so The National Archives has a really big photographic collection, millions of individual items. And they include photographs of events, people and places and they’re all kind of part of government activity, so they’re produced or collected by government departments for different purposes. So that can be publicity, propaganda, it could be surveys of particular locations, it can be military purposes, surveillance legal records and even photographs registered for copyright protection. So it’s a really broad collection, a really mixed collection and the vast majority of photographs in the collection are not digitised but some are.

And those that are, are digitised in quite a range of different ways and are accessible online in different ways as well. So some are digitised and available for commercial use via our image library. Many others are available online via our own webpages, social media and that kind of thing, so editorial content that’s created. And then there are sets of records that are available via Flickr and also Wikimedia Commons. I think it’s important to bear in mind that every different platform that contains digitised images impacts the way people might access them, changes the levels of accessibility.

And when it comes to the last couple of examples, Flickr and Wikimedia Commons, we have a particularly interesting example which is our Colonial Office collection of photographs which we’ll go into a little bit later.

Lily: How does the experience of accessing records online differ from that of seeing them in their physical form and in a physical context?

Katherine: So of course, there’s a huge difference between accessing an image online or in person. And this really is around the many different contexts that affect the way that you access those images. So the obvious one, when we think about accessing images and photographs in person at these institutional, social and experiential context. So to view in person, one would need to travel to The National Archives, go through the process of finding and ordering and collecting photographs and viewing them in a very particular institutional context. And so there are all these rituals and time spent in doing that. Viewing them online is much more instant, much greater accessibility to people all over the world. So that’s the main difference, I guess.

But beyond that, there are particular contextual elements, additional contextual information you get by viewing images in person as opposed to viewing online. And they are around the archival and the historical context. So when you access an image in person, you can often learn more about its arrangement, its connection with other material in the archive through the catalogue. There’s a lot of additional information you’ll find there, information about the creators and the context of the creation of the image, any notes and annotations you find with the actual physical images themselves. And then thinking more about physical context, photographs when you see them online, they might be separated from their physical context, i.e., how do they actually appear in front of you? Are they in an album? Are they in a folder? Are they pasted to a page with other information there? Do they have captions? All those physical elements.

So when an image is digitised online, this contextual information is easily lost. We have to work quite hard to maintain that contextual information or maybe more importantly, we have to determine which bits of contextual information are important to maintain. You can’t maintain everything. So basically, yeah, there are big impacts, dangers and opportunities to removing or changing the context of an image from in person to online. But it’s a balance, massively improved access, flexibility but potential loss of important context.

Lily: Angela, I know that you’ve been researching a very specific set of records. Can you tell us a little bit about your project and speak to what, from your perspective, are the challenges and limits of accessing records that you’ve worked with online, as opposed to in person at The National Archives?

Angela: So I’m particularly interested in photography of Hong Kong which is found within the Colonial Office Library collection, also known as CO1069. Now, this collection was founded by Lord Granville who in 1869 sent out a circular asking governors to send in photographs of the colonies. And over the next hundred years or so, the collection grew with the addition of photographs contributed by officials connected with the colonial administration. We have photos of landscapes, portraits, infrastructure projects and many other types of photographs. But it should be noted that the collecting was not systematic and it’s very much reflected in the eclectic subject matter.

The photographs were initially kept in the Colonial Office Library and then later absorbed into the Foreign Office and Commonwealth Office Library in the 1960s where they were mainly used by civil servants as reference material. And in the mid-noughties, the collection then moved to The National Archives where they were eventually made available for public research. And the collection contains many thousands of photographs and around 918 pieces. And a piece would constitute an album, folder or a box within which one photo or many photos might be contained. The photography is from all over the world. It’s an incredibly rich collection. And I’m interested in thinking about the relationship between photographs and the albums that they’re often found in and thinking about how this material relationship shaped or was shaped by their archival context.

Much of colonial photography, especially in the 19th century, is found in some sort of album, ranging from luxury albums with fancy tooling and leather covers to paper folders which are simply stapled together. So how can these albums and the photos that are found within help us to think beyond the image itself, which is arguably one of the pitfalls of digitising content and putting it online? And how can a consideration of this materiality be used to think about the broader context in which photographic meaning is made, to think about the photograph as an object that has an institutional life which bears traces of its passage in time in the archive?

So for example, the Colonial Office was sent two sets of photographs of the Duke of Edinburgh who was one of the sons of Queen Victoria. And he was the first royal to visit Hong Kong in 1869. One set was retained by the Colonial Office Library and was sent out for binding. And the other set was sent to Queen Victoria and which they also sent out for binding. When we compare the materiality of the albums, they reveal a great deal about the differences between photographs intended for public and later wider research use and private photography.

With the Queen Victoria album, this was kept in her own personal collection and it was bound in Moroccan leathers with really beautiful gold tooling, elaborate end papers and it’s hardly been handled. Whereas with The National Archives copy, the same photographs are actually placed in the back of another album and mixed in with a whole other load of photographs. The album was also rebound at least once and likely because the original binding was in very poor condition. And this is very common with Victorian bindings. They basically were made of very bad materials for storing photographs. They were made with wood pulp which is very acidic and very damaging to photographs, for example. Or made with papers that also created chemical reactions with photographs.

This rebinding actually points to the industrialised manufacture of albums which really took off in the 19th century which made materials cheaper. But they were constantly trying to replicate the look of fancy albums that you might find in Queen Victoria’s own personal collection. And the fact that these photographs are mixed in with other photographs also points to the fact that as many government departments were underfunded, there was a lack of resource and a lack of time. And the photographs here functioned in a very utilitarian manner.

The rebinding also demonstrates the growing professionalisation of archival science and also conservation science. And the rebinding also shows how more thought and more resource was directed to preserving the documents and ensuring that they can be used for future research and open to the wider public. So each album essentially embodies its function, the wear and tear or lack thereof is evident when you examine the materiality of the album bindings. And just to bring it back to the context of this podcast, which is to think about the issue of widening access to colonial photography, for example and the role of digitising these kinds of photographs. I think we’ve got to ensure that these material stories, as I’ve just mentioned with the bindings, are not lost when we digitise and upload a picture. And that we don’t think of the image part of the photograph being all that a photograph can communicate.

Lily: So Katherine, from your perspective, what are the main challenges and takeaways from sharing visual records openly online?

Katherine: Yeah, so thinking specifically about the Colonial Office Library Photographic Collection because it’s a really interesting little case study when it comes to the things Angela’s talked about around materiality but also digitisation and online sharing. Because this collection was digitised and made available on Flickr from around 2011. They were digitised in basically part of a project that ended up being called A World Through a Lens. So it was a project designed to make these images available to people all over the world and particularly encourage them to comment on the images and provide more information.

So a sort of crowdsourcing aim as well. And this was because the actual information, the historical information that was provided originally with these photographs was fairly limited. So knowledge about what the images depicted, the people in the images and the places was quite limited. So it was thought that it was worth opening these up which would benefit people who were able to access them for the first time and we could provide the archive with information about them. So there are lots of positives to doing this and you can see why it was done. So the wider circulation of these images is obviously a really important thing. So people being able to access these images who otherwise may never have come to view the originals themselves with these images, depict places all over the world and have meaning for people who are based all over the world.

So it’s really beneficial. They were also, once they were on Flickr, many of them ended up going on Wikimedia Commons which massively increased the numbers of views as well. So that was another real benefit. We found that they’ve been reused on websites of different languages, so that’s another benefit that we can see. We can see in the comments, that crowdsourcing aim in terms of gathering information by the images was fairly successful. And we have examples of people providing corrections to the captions about the locations that are depicted in photographs, the people that are there.

And also what was quite interesting is we can learn a little bit about how people engage with these photographs that is hard to know otherwise. So you can find people reminiscing and socialising around the images in the comments on Flickr. So people saying, “I used to live in this place. I remember this location,” and discussing it between them. So that’s all very positive. But of course, as Angela’s talked about, the material context is so easily lost when you make these images available online in this way. So they have been digitised in various ways. So sometimes, it is the whole album page with multiple images on there that is digitised. That’s a little bit of physical context preserved there. But generally speaking, the idea of the album, the structure of the album, the particular arrangement of photographs within the album is lost when these are on Flickr.

And even more so, once images are online, they can easily be copied and moved away further and further away from that basic information, that context. They can be repurposed for completely different topics. And there’s danger of misunderstanding. I think this is not an inherently digital issue. All interventions to improve access, so cataloguing, removing photographs and rehousing them, all these kinds of things that have been done for years before digitisation all have a danger of removing original context.

But obviously with digital, this is heightened. But I think it’s a case of trying to balance the pros and cons of this. Obviously, access is very important. Digitising and making available online is a really powerful way to improve access. But we need to find ways of doing it that preserve really important contextual information.

Lily: And why is it important to maintain that context?

Katherine: Yeah, so this contextual information really does matter to create a full understanding of the image. And Angela’s talked about the importance of understanding how an album came to be created and why certain images might appear in that album. And the whole history behind a physical object of an album, the whole archival historical story behind it. All those things are vital to understand the actual photographs. Removing the photographs from that context loses a huge amount of our understanding. So it can cause information to be obscured. It can allow very important things to be forgotten. It can cause people to be misled about the meaning of photographs.

And I think it matters even more potentially with this collection, where it’s got a very clear colonial context, it’s part of a colonial project of producing and collecting photographs. And that information, that fact, is not obvious necessarily from just looking at an individual photograph removed from contextual information. So there’s also, when we think of photographs, there are power dynamics within the production of a photograph that are really important to understand to know the history of that photograph. And the risk of not giving the full context of information means that we may be reproducing the political inequalities that were inherent in the creation of that photograph in the first place.

So that’s another very important thing. When we think about digital as well, images can form part of machine learning data sets, for example. This is the case with data, with catalogue data as well. So getting metadata right and making sure that we’re not reproducing inequalities and unfair power dynamics in images and in the information that goes with images is very important. And may be more and more important as technology moves on as well. We need to also consider the importance, the impacts of digitising some images and not other images. We can’t digitise everything all at once, so decisions have to be made but all those decisions have an impact and can skew people’s vision of kind of what the past looked like even, basically.

So the more we digitise, the less clear it is that there are images that are not yet digitised, so the less obvious it is that people need to come to the archive to see those images. So it’s a tricky thing but I think it’s important, not to be too negative about this because the benefits are very clear. But it’s just important to keep all these things in mind when developing projects and maintaining a balance of these kinds of pros and cons.

Lily: So far, Angela and Katherine, you’ve both spoken about these either accidental encounters with records or more specifically photographs online. Or those encounters that we have when we use digitised records in our research, for example. And how yourselves, as both record specialists and researchers, think about these challenges and how we may approach or mitigate these in the future. So Paul, I now wanted to speak with you. You were recently The National Archives lead on a project called In Their Own Right. And as part of this project, you work with a group of both on site and online volunteer editors. Can you first of all introduce us to the project?

Paul: Thanks Lily. Yeah, we worked on a collaborative project with Nottingham Trent University with a team of people led from there by Steve King who’s Professor of Social and Economic History there and myself who led a team here. And what we were interested in doing was to explore and determine the levels of agency of the Victorian poor in England and Wales. And we were specifically looking at England and Wales. That’s the collection that we hold. The law and collections for Scotland and Ireland are different and their archives are held elsewhere.

So we were looking at that specific system of poor relief and we were looking to determine, what did the poor think about welfare at that period? What did they think about if they were unemployed? For the want of a better phrase, what benefit system is in place? What if they were sick? What about education for the poor and poor children? And we wanted to find that out by looking at their records. It was a system referred to as the new poor law and it ostensibly sought to give them no agency. So that’s what we were looking to uncover and we were looking to try and uncover that by looking at records that they had left behind. Now, in a very natural way of speaking, the poor don’t leave an archive in the way that governments do or perhaps larger landholders might do.

But they did leave an archive and that archive was then in leaving with the official collection of pool or union correspondence, largely not catalogued or not catalogued to that level of granularity that you could do searches and find them. So we did a lot of work on levering out from a very large collection of material, these sources that gave the poor a voice in the 19th century. And that voice to some degree had been lost within this very large collection of poor law union correspondents.

Lily: And so how was the project, and more specifically the work with the volunteers, organised? How did they encounter these records?

Paul: Well, we always knew we were going to be working with volunteers. I’ve worked with volunteers a lot at The National Archives. When we wrote the grant, the project was funded from the HRC. And as we wrote the grant, we wrote into that, that we would have a volunteer group. And this was phrased as the Pauper Letters Research Group. And to some degree, some of the people that I’d worked with on cataloguing projects in the past, on records of the poor and poor law had said, “Well, as you’re moving over from cataloguing to research work, Paul, on these sort of records, we’d like to come, we’d like to come with you on that.” That’s where the conversation started.

So we had a group working on site and we had a group working off site. And the group on site worked with the research team, looking through the material. As I said, they weren’t catalogued. So it is a case of saying, we’re going to look through several thousands of volumes of this material and look for and find and record these records that gave the pauper voice. It might be a letter from an indoor pauper at a workhouse. It might be a letter from somebody who’s not a pauper, not in receipt of relief but who is poor and is actually arguing that, “I should be in receipt of relief, and we’ve been turned down.” And we also had a lot of letters from advocates, those people who the poor approached to write a letter on their behalf.

And we collected this material on a very systematic basis. We put workshops together, training sessions together, to allow us to collect this data in a very standardised way. We then had a group of volunteer editors off site. So we would simply digitise the letters that have been spreadsheeted, nothing very sophisticated, handheld camera, click, got it. And we put that up then on an image sharing site and the volunteers offsite would then transcribe that material. So we had those two big chunks of work that certainly the volunteers were an add on to this. They were critical parts of what we were doing. So that’s how we organised the work.

The transcripts would then be sent back and I could hook all of that into one single, huge, massive document that we could pass certain packages across to determine, without having to read through it ourselves and try and do word counts. That would go through there and count phrases or words or where certain words are next to each other, to determine what were the thoughts of the poor. And then we could bring into that exemplar materials, case studies to really illustrate what it is and why it is the poor were saying the kind of things that they were saying. They were effectively writing in to the central authority complaining about how the local officials were treating them when they were unemployed, when they were sick or when their children were sick, when their children were ill.

Lily: So in the previous episode, we spoke about the often invisible labour that’s involved in digital work and digital spaces. How did you make sure that here the volunteers work was acknowledged and that their labour was reciprocated?

Paul: It was really important for us because we knew that the work that they provided, the work that they undertook and the data that they would be producing would allow us a greater depth in terms of whatever analysis that we do than without them. We made sure of that in several ways. We didn’t want it to be a one way street where we benefit from their effort and nothing else goes back. We set up a group formally under the project, as I said earlier, the Pauper Letters Research Group. And we put money into the budget for support. So in the pre-COVID days where we had a conference, then we would pay for our colleagues, as we would always refer to them, that our volunteer editors to attend conferences that we were putting on.

We would go out into the localities and we would meet with them in their localities and we’d do workshops. We ran several skills workshops, some of that would be around transcription skills, some of that would be around discovery skills. We did a workshop on, if you want to publish something in a local history volume or a family history magazine or even a peer reviewed journal. So a number of our volunteer editors have published based on the work that they’ve done on the project.

We also organised online seminars. We used to have monthly seminars where we’d pick a subject and somebody from the research team would kick it off. Everybody had something to say. There were really good discussions. But we also used those seminars to look at, are you having any problems with the way that we’re working? Are you able to download the material at all? And that sort of thing. It built for a community, a proper one where we knew each other and we spoke to each other. Because they’d looked at so much material. They started to become quite experts in all of this. So we put on a conference for them, only they would provide the purpose for it. I remember we said, “What do you think of this? This will be great, won’t it?” And everyone went, “Hang on a minute. It’s online.”

And this was during COVID so a lot of people were just getting the grips with using Teams or Zoom. And a number of people said, “Well, we’ve not spoken before. We’ve not given a talk before. So that feels quite an ask.” And eventually we got nine people signed up to give papers and they all gave very different papers. And they were all absolutely brilliant at this because they knew the material. And the conference itself was a very safe area because the only people there were the 30 to 40 of us who were part of the project. And immediately that we’d done the project, the next day people were saying, “Can we do another one? Because we’ve found that we’ve got that much and we want to express it.”

And so I think giving those kinds of opportunities, either through writing workshops or through little mini conferences like this, I think were a really good way of saying, “Thank you and we’re listening to you now, as opposed to maybe you listening to us.” The other thing that we did when the project book was published, we listed everybody and acknowledge the work that everybody had done. And they’re all to receive a free copy of the book and that should be winging its way towards me. And what I will do then is again, I will go out to the localities. It’s just looking for those opportunities of how do we make sure that we can include them in a lot of the fun things. And I think they’d say, “Actually doing a lot of the work itself is fun things.” I’d sometimes get emails about 2:00 in the morning from somebody saying, “I’ve just found this and I felt I had to share it with somebody.”

Because you don’t see it until the next day. But it’s that kind of thing about working with records. A lot of people see the excitement of working with particularly material that’s never been seen before. And a lot of this hasn’t been looked at since it was inserted into the volumes 170, 160 years ago and closed up. It’s looking for those avenues, if you like, and opportunities.

Lily: Bernard, you were involved in a project called Engaging Crowds which similarly to In Their Own Right, involved the editorial work of volunteers but in a slightly different context and a slightly different scenario. Could you tell us a little bit about the project? And could you also tell us how the community that participated in it is different to In Their Own Right and how they engage with records may be different as well?

Bernard: So Engaging Crowds was a project to investigate the engagement of volunteers in crowdsourced projects. And it was a collaboration between us and the Royal Botanic Garden Edinburgh, Royal Museums Greenwich and the Zooniverse. And it was funded by the AHRC towards the national collection. So as part of the investigation, the partners developed three crowdsourcing projects for the Zooniverse platform. The team at TNA built one called Scarlets and Blues and that asked volunteers to transcribe meeting minutes of Royal Hospital Chelsea’s special board from around the time of the First World War.

Zooniverse is a volunteer research platform with a really huge community. I think they said they’ve had more than a million contributors. So it is quite a different kind of community. It’s external to TNA. It’s not something we’ve built ourselves. It’s people who are used to working with the Zooniverse platform. So it’s a more anonymous crowd. It’s more of a distant relationship. The volunteers who work on our project might or might not have lots of experience looking at these kinds of records.

Lily: So if the community is largely anonymous and outside of The National Archives, how do you go about working with them?

Bernard: The first thing is that the Zooniverse way is to get lots of copies of everything. So for us, every page was transcribed by several volunteers. We had five transcriptions of each page and then you do some algorithmic things and you put them together to get a consensus transcription. So you never rely on one anonymous volunteer for anything. The volunteers aren’t completely anonymous, they can be if they want to be, but many of them choose to sign in and they’ve got a screen name so we know that this screen name gave us that transcription. And also, there’s always a forum on a Zooniverse project, so the volunteers can talk to the project team, to us, and they can talk to each other in that way. And a few of our volunteers did that a lot, so I felt like we got to know at least those few fairly well and some of them asked lots and lots of questions.

So we could see that people were taking a lot of care in the work. And then there are other ways to provide guidance as well. Every Zooniverse project has a little intro, a little tutorial that runs the first time you open it. So you can get your main points across in that. There’s some online help so you can put specific guidance for a particular task as people are doing it. And then there’s a manual and you can put a lot into that, as much as you like. And we put tons and tons of detailed information into it. We also recorded videos to demonstrate what to do because of those complex detailed instructions we had in the manual, we felt we needed to glue it all together for people in that way.

And I think the most important thing with this kind of project is to try to design it in such a way that the interface is really intuitive. So the better you do at that, the more obvious it’s going to be what to do and the less chance there’s going to be that people make mistakes or do things in a way that you didn’t intend. And so trying to keep everybody to the same methodology. But if your interface is intuitive, it should also make the project more pleasant. So from a care point of view, it’s also important as well as being good for your goals by getting more and better transcriptions.

Lily: How well do you think that worked in Scarlet and Blues?

Bernard: Pretty well, in many ways. We got the data we were looking for. There’s definitely stuff I would do differently. So the detailed instructions, that was all about trying to get as much data as we could with as much reliability and reusability as we could. The more confident we can be about it, the more we can do with it. So it was all about trying to get maximum value from the time the volunteers were giving us. But it also means you’ve got a really complicated sequence of things to do and it’s quite tricky to follow. So it might not be as attractive to as wide a range of volunteers. We kind of got away with it because our project was quite small.

So we only had about 2000 pages to transcribe, so we could get away with being a bit of a niche project that was maybe just attractive to people who like the subject but were also willing to follow our quite complex, fiddly process to do our transcriptions. So we had about 400 volunteers. They transcribed 2000 pages, so 10,000 transcriptions in about two months.

Lily: And how did you give back to the volunteers?

Bernard: So I think at the most basic level, say, “Thank you,” and make sure to acknowledge their contributions where you use them. Then I think it’s important to make the most of the contributions. So for Scarlets and Blues, we’ve used information from the transcriptions to enhance the catalogue which is great. But there’s lots more stuff in that data that I’d like, I hope at some point we’re going to pull out and do more things with. And then again, for Scarlets and Blues, it’s the project I know best out of Engaging Crowds. I think the forums were a key part of giving back. We worked quite hard at being responsive to volunteer posts on the forums and the volunteers appreciated that.

And it was actually a kind of a two way process. So we did help with their questions but they had suggestions too and could help out with stuff. It could be really simple, practical stuff, like, “How do you type a pound sign on a US keyboard?” Which was a problem that came up as soon as we launched. Or it could be quite abstruse, like, “Here’s a really complicated table with lots and lots of stuff in it, how do you want me to transcribe that?” And also kind of like what Paul was saying, they would find interesting things and the forum gave them a way to share that. And they would do some background research around stuff they’d found and they’d share that too.

So the forum actually made for a little short lived community. The trouble with this kind of community is that not many volunteers are active on the forums. So giving back to the other volunteers is harder. I think for them maybe the main thing we can do is just aim to design a project that is as rewarding as possible. It should be interesting, it should have some real research value and it should give you the chance to engage with those records and do a bit of historical research yourself.

And then lastly, Engaging Crowds itself, so the umbrella project that Scarlets and Blues was a part of, that was looking at volunteer motivation and crowdsource projects. So Royal Museums Greenwich did host a workshop within that called Voices of the Volunteers which gave the volunteers themselves the chance to come to talk to us about their experience and their motivations. And I hope that that other work we did around that project contribute to a better understanding in the future, that this knowledge will help us to design better projects that are in some sense better to volunteer on.

So it’s a bit of an indirect way of giving back but I hope there’ll be some long term impact there.

Lily: So before I let you go, I have one final question for you all. I’d love to hear about any current research or areas, inquiries that you’re most excited about or that you personally would like to focus on in the near future. Paul, would you like to kick us off?

Paul: OK then. So my initial thoughts on that was, it’s really about the sharing of data or information and images. And it’s all of those things. I think together, so I’ve been involved in this sort of thing quite a while. You can see how things have speeded up over the last 10, 15 years, that ability to work with digital material. And that might be because you’re working with other archives, where there are sister collections. So I work a lot on the poor law. We’ve got the central poor law commission material but the local poor union records for 650 odd unions, they’re in county archives, borough archives, they’ve got this spread across the whole of England and Wales.

You can, in theory, work a lot quicker than you were able to do in the not so distant past and do things with a greater richness than you could in the not too distant past. So the question that I tend to ask myself about that is how do we do that kind of work but maintain things like data quality? How do you do that? Because we’ve mentioned a lot of the dangers of the digital archival working where you lose context, but also how do we make sure that people call things the same things?

So that from a user experience, if you are doing searches, online searches, you’re actually finding all of that data that’s being created. And how do those people who work in archives, that give advice to researchers, how can we give the advice, if actually this data has been created but it’s very inconsistent in terms of what we’ve called things or how we’ve catalogued or tagged or whatever phrases we’re using in terms of the creation of that data?

Because one of the dangers of that speed and richness is that we go out and do loads of stuff but actually then it’s not particularly useful. That’s my kind of answer to that kind of question.

Katherine: Yeah, I think I’d follow on from that by saying, particularly when it comes to visual materials, I think there’s a whole range of issues that come with that in terms of crowdsourcing and working with volunteers and adding value to images, trying to describe images. Because obviously, it’s very difficult to give guidance on that as opposed to transcribing text. How do you describe an image? And particularly when it comes to the kinds of images we’ve been talking about with the colonial context, which are very complex, contested, difficult histories that certainly not everyone is equipped to really describe and talk about in detail. And those who are, may not always agree on those sorts of issues as well. So there’s just a lot to consider there.

That’s just an area that I think we need to work on. What is the best way? And I think this is where a broad brush sort of approach to digitising and very wide crowdsourcing may not be appropriate necessarily for certain image collections. Where really what we need is a narrower engagement with particular groups, smaller groups of people with particular connections to the place or the context of the images. And I think as well, balance of benefit to the archive and to the volunteer. What are they gaining? What are they giving? And vice versa. It’s very important to work out a way of benefiting the volunteer.

With that comes ensuring sustainability of a project, so that you have a good sense of the information you’re going to get and whether it’s going to be of good enough quality to be useful. Because that’s fundamentally the biggest benefit to a lot of volunteers is obviously producing information that is beneficial that will go on and be used for years to come. And if the project isn’t designed perfectly for those particular records, there’s a danger that their work will not be valuable in the future.

Angela: Something that I’ve been thinking about a lot of my last year of this fellowship is the development of computer vision or AI in connection with the digitisation of photography collections. And I think there’s lots of research going on at the moment to develop AI to help generate metadata for digitised collections. And I know that there’s some interest from heritage institutions, for example, who are interested in this technology to help them widen access to these collections. I think access is very important, especially when we talk about colonial collections. But we also have to be very careful because essentially we’re creating data sets.

And in the case of colonial photography, the data sets are fundamentally shaped by colonial desires and visions. So we just have to be careful in thinking about their ethical aspects of using kind of these amazing technologies to help widen access to these collections, which I also think is really important. So that’s something I’ve been thinking about a lot and it’s very complex and I think we’re all working it out as a community. Watch this space.

Lily: Yes, watch this space. Bernard, over to you.

Bernard: I think for me, a lot of my thinking around it has been, how do you provide the instructions that give you confidence that the data that you’re getting is reliable and that it’s all being collected in the same way? And one thing I’m curious about there is more of a focus on building a community, maybe inspired a bit by open source software and how some of those projects organise. And then I’m also interested in looking at it as a kind of outreach. So it’s another way of making our collection available to the public. We’ve got exhibitions and we write books about them and people do documentaries and all that. And this is another way to engage, a much more active way to engage with our collection.

So I’m curious about how we can use these kinds of projects as a form of outreach, maybe even start looking into things like co-production. So allowing communities to actually help to shape the direction of projects from it, from an early stage as well which I think is one way that you get some insight from outside of our bubble. And maybe, maybe starts to help with problems like the bias that you get with AI data and then thinking about those kinds of things.

Lily: Thank you for listening to our Annual Digital Lecture podcast and thank you to our experts for taking the time to talk to us today. To learn more about the Annual Digital Lecture and watch recordings of our previous lectures, click the link in the text on the episode page or visit nationalarchives.gov.uk and search for Annual Digital Lecture. If you’re interested in learning more about our research, as well as our work as an independent research organisation, visit our website nationalarchives.gov.uk and search for research and academic collaborations. Follow us on X at UK National Archives Research to stay up to date with our research projects, upcoming events and other opportunities and don’t forget to read our blogs at blog.nationalarchives.gov.uk. This audio recording from The National Archives is Crown copyright. It is available for reuse under the terms of the Open Government Licence.

[End of recorded material 00:43:28]