PASTFORWARD.AI

Applications

Five places where AI can genuinely help

These five application areas were chosen because they answer real needs in the daily work of a heritage institute: finding images, querying documents, enriching metadata, crossing language barriers and reading difficult texts. Each one becomes a proof of concept built on KIK-IRPA's own collections.

Application 02

Document Q&A with retrieval-augmented generation

Decades of conservation reports, research papers and intervention files contain answers that are practically impossible to find unless you already know which document to open.

What we are testing. Retrieval-augmented generation, or RAG. A large language model converts a plain question into a search across a document database, retrieves the most relevant passages, and formulates an answer grounded in those passages. We pay particular attention to chunking and indexing strategies for heterogeneous historical documents, and to summarisation quality.

Why it matters. Researchers could ask "which treatments has this panel painting undergone since the 1950s?" and get an answer with sources, instead of a week in the archives.

Application 03

Metadata enrichment through image recognition

Many digitised images have only minimal descriptions. Whatever is not described is, for most practical purposes, invisible.

What we are testing. Computer vision models that detect and recognise what is actually in an image: objects, scenes, materials, regions of interest, colour palettes and more. The suggested descriptions and tags can then enrich the catalogue records and make collections searchable in far more depth.

Why it matters. Better metadata improves findability for everyone, from researchers to the general public browsing BALaT. One principle is fixed: a human validates the suggestions before anything goes into production.

Application 04

Contextual translation

Belgian heritage records live in Dutch and French, and increasingly need to be available in English too. Generic machine translation stumbles over art-historical terminology and produces a different term every time.

What we are testing. Translation that uses the complete catalogue record as context for accuracy, rather than translating field by field in isolation. We also investigate how controlled heritage vocabularies can serve as the preferred terms, so that translations stay consistent with the thesauri the sector already relies on.

Why it matters. Multilingual records make collections accessible across Belgium's language communities and to international researchers, without an army of translators.

Application 05

Advanced OCR

Standard OCR does reasonably well on clean modern print. It does much worse on the material heritage institutes actually have: aged documents, complex layouts, tables, schemas and embedded images.

What we are testing. Two complementary routes. Specialised models that read a document as a whole and can even describe the schemas and images it contains. And a more traditional route, where AI post-processing corrects the mistakes of standard OCR engines to lift the quality of the final text.

Why it matters. Reliable text extraction is the gateway for everything else: full-text search, translation, and feeding documents into the RAG pipeline above.

Explanatory diagrams for advanced OCR, contextual translation and retrieval-augmented generation, showing how AI processes documents, translates records with context, and answers questions from a document database.
fig. 1 — Advanced OCR, contextual translation and retrieval-augmented generation, as sketched in the project proposal.
Explanatory diagrams for metadata enhancement and reverse image search, showing AI recognising elements in a painting and matching similar images through a vector database.
fig. 2 — Metadata enhancement and reverse image search through a vector database.
Proofs of concept, on purpose. These applications are built to test feasibility on real data, not as finished products. Some of our datasets are confidential, so the models run locally on the institute's own hardware. What we learn, including what does not work, is documented and shared.