How RAG technology makes cultural archives come alive

Your grandmother's voice is recorded. Your community's textile patterns are photographed. The exhibition catalogue from 1987 is digitised. All of it exists. None of it is findable.

That's the state of most cultural archives. Repositories, not resources. The reasons are structural, not technical, archives were designed by institutions for institutions. Your grandmother's story about the flood of 1953 doesn't exist in catalogue terms. It exists as memory, as anecdote, as testimony. The system was not built for you.

RAG (Retrieval-Augmented Generation) changes this. It combines language models with structured knowledge bases so that when you ask a question, the system retrieves relevant material from the archive and synthesises an answer in plain English, with citations. You don't need archival training. You ask, and the archive answers.

Cultural heritage AI isn't a replacement for archivists. It's a translator between institutional knowledge structures and ordinary human questions.

This is where cultural heritage AI enters the picture. A community centre in Hackney shouldn't need a degree in library science to build a Living Archive of oral histories. A school in Birmingham shouldn't need enterprise software to let students interact with their own cultural records. Cultural heritage AI, powered by RAG, makes both possible.

From the ecosystem · Studio Note

What Is a Living Archive? Meaning, Examples & Why It Matters

Most archives are beautifully organised cemeteries. A Living Archive is something else entirely.

From Retrieval to Understanding

Semantic search closed part of the access gap. Instead of matching exact keywords, it understands meaning. Search for "what did people eat during rationing" and the system returns relevant material even when those precise words never appear. But semantic search still only retrieves. It finds things. It doesn't interpret them.

RAG adds interpretation. The archive doesn't return a ranked list for you to wade through. It reads what it finds, understands the relationships between sources, and answers your question in context. You get narrative, not a catalogue entry.

Archives hold memory. They should unlock stories, not hide them behind cataloguing systems.

The difference matters in practice. Semantic search returns ten results and leaves you to make sense of them. RAG reads those ten results, identifies the connections between a 1972 letter about rising fabric costs, a 1973 factory closure, and a community protest documented in a newspaper clipping, and tells you the story. With sources cited.

How It Works

A RAG system for cultural archives has three layers, and getting each one right matters.

The knowledge base is the foundation. This isn't just digitised materials with metadata tags. It's a structured representation of meaning: what connects to what, which oral history relates to which photograph, which community member appears across which documents. The richer the semantic relationships, the better the system answers real questions.

In practice, this means chunking source material thoughtfully. A two-hour oral history interview can't be treated as a single document. The system needs it broken into meaningful segments (by topic, by time period, by speaker) so it can retrieve the right excerpt for a given question. Too large and the response loses focus. Too small and it loses context. For most cultural archives, chunks of 300 to 500 words with overlapping context windows work well as a starting point.

The retrieval layer uses vector embeddings to match questions to relevant chunks. When someone asks "what was life like for textile workers in Tottenham in the 1970s?", the system converts that question into a mathematical representation and finds the chunks closest in meaning. This is where multilingual collections get interesting. Modern embedding models can match a question in English to source material in Gujarati or Turkish, because the embeddings represent meaning, not words.

The generation layer takes the retrieved chunks and synthesises a response. This is where citation pipelines matter. Every claim in the generated response must trace back to a specific source: a named speaker, a dated recording, a timestamped moment. The system should surface its sources alongside the answer, not bury them in footnotes. For cultural heritage AI, provenance isn't optional, it's the entire basis of trust.

Where This Matters Most

The shift is most visible in oral history collections. Traditional access is either browsing metadata or listening to hours of tape. With RAG, the collection becomes searchable at the narrative level. Ask the archive "what do people remember about the local market?" and it retrieves relevant excerpts, cross-references them with photographs and documents, and presents a coherent story.

From the ecosystem · Studio Note

Oral history preservation in the AI age: a guide for cultural institutions

Most oral history collections are barely accessible to the people they belong to.

Threads of Memory, our pilot project, is building this with a collection of oral histories from migrant makers working along London's Weaver Line. A conductive textile map triggers stories when touched, and a conversational interface lets visitors ask questions of the archive. "What did the textile mill smell like in 1960?" gets an answer drawn from multiple voices, with full attribution. The testimony remains untouched. The access is transformed.

What This Changes for Institutions

For cultural institutions sitting on large digitised collections, RAG changes the value proposition of work already done. Digitisation is expensive. If no one can find the material, the investment sits inert. A museum with 100,000 digitised items and RAG access suddenly finds that 95 percent of its collection is genuinely discoverable.

The archive becomes a conversation partner instead of a filing cabinet.

A family tracing their own history can ask "are there any records of my relatives working in this neighbourhood?" and the archive answers directly. That's cultural heritage AI delivering on the promise digitisation made and never kept.

What RAG Can't Do

RAG is not magic, and pretending otherwise is how the heritage sector ends up with expensive tools nobody trusts.

RAG systems can hallucinate. Even with retrieval grounding, a language model may infer connections that don't exist in the source material, especially when the knowledge base is sparse. The safeguard is rigorous attribution: if the system can't cite a specific source for a claim, it shouldn't make it. This needs to be enforced at the architecture level, not hoped for.

Small collections are harder. A RAG system over fifty documents will struggle with nuanced questions because there simply isn't enough material to draw on. For very small collections, a well-structured semantic search interface may serve communities better than a conversational layer that overpromises.

And RAG doesn't solve governance. A technically brilliant conversational archive that extracts community material into institutional hands is still extraction. The technology serves whoever controls it. That's why community governance isn't a feature, it's a prerequisite.

Building RAG Archives

If you're starting a cultural archive, three principles matter.

Structure your knowledge base for human understanding, not just machine indexing. When you digitise something, ask: what does this relate to? Who else appears in it? What moment in time does it represent? Tag relationships between items, not just individual metadata fields. A photograph of a factory should link to the oral histories of people who worked there, the employment records from that period, and the community newsletters that mentioned it. That contextual layer is what lets the system answer real questions.

Keep attribution explicit at every layer. Every chunk in the knowledge base should carry its provenance: source document, contributor, date, access conditions. When the system generates a response, those citations flow through. This isn't just good practice, it's essential for communities whose histories have been extracted and recontextualised without consent.

Test with real users early. You build the system, people ask questions you didn't anticipate, you refine the knowledge base. The questions a community actually asks are rarely the ones an archivist would predict. Community involvement from the start makes everything stronger.

From the ecosystem · Current Project

Threads of Memory

Touch a textile, hear a story. An immersive archive rooted in London's Weaver Line, keeping the voices of textile workers, migrants, and makers alive.

The best archives are owned and understood by the communities they represent. RAG technology is the lever that makes that possible at scale.

If you're thinking about making a collection accessible, or building an archive that actually serves your community, we'd welcome the conversation.

How RAG technology makes cultural archives come alive

From Retrieval to Understanding

How It Works

Where This Matters Most

What This Changes for Institutions

What RAG Can't Do

Building RAG Archives

Build a Living Archive with us

From Retrieval to Understanding

How It Works

Where This Matters Most

What This Changes for Institutions

What RAG Can't Do

Building RAG Archives