How It Works
A RAG system for cultural archives has three layers, and getting each one right matters.
The knowledge base is the foundation. This isn't just digitised materials with metadata tags. It's a structured representation of meaning: what connects to what, which oral history relates to which photograph, which community member appears across which documents. The richer the semantic relationships, the better the system answers real questions.
In practice, this means chunking source material thoughtfully. A two-hour oral history interview can't be treated as a single document. The system needs it broken into meaningful segments (by topic, by time period, by speaker) so it can retrieve the right excerpt for a given question. Too large and the response loses focus. Too small and it loses context. For most cultural archives, chunks of 300 to 500 words with overlapping context windows work well as a starting point.
The retrieval layer uses vector embeddings to match questions to relevant chunks. When someone asks "what was life like for textile workers in Tottenham in the 1970s?", the system converts that question into a mathematical representation and finds the chunks closest in meaning. This is where multilingual collections get interesting. Modern embedding models can match a question in English to source material in Gujarati or Turkish, because the embeddings represent meaning, not words.
The generation layer takes the retrieved chunks and synthesises a response. This is where citation pipelines matter. Every claim in the generated response must trace back to a specific source: a named speaker, a dated recording, a timestamped moment. The system should surface its sources alongside the answer, not bury them in footnotes. For cultural heritage AI, provenance isn't optional, it's the entire basis of trust.