Skip to content

Folder metadata builder

The folder metadata builder is the main entry point. It scans a folder hierarchy, extracts RDF triples from the knowledge graph for each object and stage, validates the output against SHACL shapes, and generates provenance snapshots.

Terminal window
uv run python -m changes_metadata_manager.folder_metadata_builder <root_directory> [options]
ArgumentRequiredDescription
root_directoryYesRoot directory containing Sala*/Folder/Stage/
OptionDefaultDescription
--no-validateFalseSkip SHACL validation of the generated metadata. By default, each meta.ttl is validated against the shapes in data/shapes-chadap.ttl.
--merge-provenanceNoneOutput path for a merged provenance file. When set, all individual prov.trig files are combined into a single TriG file at the given path.

The builder walks through every folder matching the Sala*/Folder/Stage/ pattern and, for each one:

  1. Extracts the object identifier from the folder name. Folder names follow patterns like S1-01-CNR_CartaNautica, where the numeric part after the sala prefix (01) is the object NR. A mapping table (FOLDER_TO_ID) handles non-standard names.

  2. Filters the knowledge graph. The input graph (data/kg.ttl) contains triples for all objects and all processing steps. The builder selects only the triples that belong to the current object and the steps associated with the current stage. Stages are cumulative: dcho includes steps 00, 01, and 02, so its metadata contains everything from raw and rawp as well.

  3. Writes meta.ttl. The filtered triples are serialized as Turtle and written to the stage folder.

  4. Validates against SHACL shapes. Unless --no-validate is passed, the output is checked against data/shapes-chadap.ttl using pyshacl. Validation errors are reported but do not stop the process.

  5. Generates prov.trig. For each subject in the metadata, a provenance snapshot is created as a named graph. The snapshot records who created the entity, when, and from what source. See Architecture for details on the provenance model.

Each stage includes triples from one or more processing steps:

StageSteps includedWhat it contains
raw00Original acquisition data
rawp00, 01Raw + initial processing
dcho00, 01, 02Everything up to the refined model
dchoo00, 01, 02, 03, 04, 05, 06Full pipeline including optimization and metadata authoring

Some folders are excluded from processing because they do not follow the standard structure:

  • S1-CNR_SoffittoSala1
  • S5-B basso-DICAM_FanoneBalenaAlto
  • materials
  • sala 4
  • _files

Generate metadata for a local folder tree:

Terminal window
uv run python -m changes_metadata_manager.folder_metadata_builder /data/aldrovandi

Skip validation:

Terminal window
uv run python -m changes_metadata_manager.folder_metadata_builder /data/aldrovandi \
--no-validate

Generate everything and also produce a single merged provenance file:

Terminal window
uv run python -m changes_metadata_manager.folder_metadata_builder /data/aldrovandi \
--merge-provenance /data/aldrovandi/provenance_all.trig