Architecture
Overview
Section titled “Overview”The CHANGES Metadata Manager sits between a knowledge graph and a set of folders representing digitized cultural heritage objects. It reads RDF triples from the graph, slices them by object and processing stage, writes the result to disk, and optionally packages everything for deposit on Zenodo.
The input knowledge graph (data/kg.ttl) is not produced by this tool. It comes from morph-kgc-changes-metadata, which maps CSV spreadsheets into RDF following the CHAD-AP ontology.
Data flow
Section titled “Data flow”CSV spreadsheets | vmorph-kgc-changes-metadata --> data/kg.ttl | v folder_metadata_builder (this tool) | | v v meta.ttl prov.trig (per stage) (per stage) | | v v zenodo_upload.prepare | v ZIP + YAML config (per entity+stage) | v zenodo_upload.upload --> ZenodoModules
Section titled “Modules”folder_metadata_builder
Section titled “folder_metadata_builder”The main orchestrator. It:
- Loads the knowledge graph once into memory
- Walks the folder structure on disk
- For each
Sala/Folder/Stagecombination, extracts the object NR from the folder name, selects the relevant processing steps, and filters the graph accordingly - Writes
meta.ttl(the filtered triples) and runs SHACL validation - Calls
generate_provenance_snapshots()to produceprov.trig
The folder-to-ID mapping is largely automatic (the NR is parsed from the folder name), but about 200 folders have non-standard names and are handled through a hardcoded lookup table.
generate_provenance
Section titled “generate_provenance”Takes a directory containing RDF files, loads all of them into a single graph, and creates a provenance snapshot for each subject. Every snapshot is a named graph containing:
prov:specializationOfpointing to the original entityprov:generatedAtTimewith the current timestampprov:wasAttributedTolinking to the responsible agentprov:hadPrimarySourcelinking to the data sourcedcterms:descriptionwith a creation note
The provenance model follows the OpenCitations Data Model, which uses PROV-O named graphs to track entity history.
zenodo_upload
Section titled “zenodo_upload”Handles two tasks:
- prepare: walks the same folder structure, creates one ZIP per entity+stage (including data files only when a license exists), and writes a YAML config file for each. Creators are resolved from
data/creators_lookup.yamlwith roles assigned by step type. - upload: reads the YAML configs and calls piccione’s InvenioRDM module to create records on Zenodo.
Key data files
Section titled “Key data files”| File | Purpose |
|---|---|
data/kg.ttl | Knowledge graph with all RDF triples |
data/shapes-chadap.ttl | SHACL shapes for metadata validation |
data/creators_lookup.yaml | Maps creator names from RDF to structured InvenioRDM fields (family name, given name, affiliation, ORCID) |
zenodo_config.yaml | Base Zenodo record configuration |
SHACL validation
Section titled “SHACL validation”Every meta.ttl file is validated against the CHAD-AP shapes using pyshacl. The validation checks that the metadata conforms to the expected CIDOC-CRM patterns defined by the ontology. Validation failures are reported with details (focus node, path, message) but do not stop the pipeline, since some objects may legitimately have incomplete metadata.