Workflow

The following subsections show the various phases of development of the Knowledge site. First, the XML/TEI P5 markup model is described. Then, the convertion process from the XML/TEI P5 documents into HTML documents is illustrated. Finally, the process of converting certain bits of information contained in the HTML documents into RDF statements is defined.

Markup model

The letters have been marked up in XML/TEI P5 format. The markup elements (people, codices, technical lexicon and citations) represent the access indexes to the letters. The following information has been encoded:

  1. the diplomatic transcription: all the information regarding the normalization has been solved within the markup, thus respecting the original text;
  2. the primary source representation (e.g. phenomena such as abbreviations, additions, deletions, corrections, conjectures, gaps, damages, hands, illuminations and rubrications);
  3. the structure, formulae (e.g. salutatio, datatio, etc.) and content of the letters, including paratextual information (e.g. sender and addressee), technical terms, attested names of entities (e.g. people, places, etc.) and philological notes. In particular:
    • the technical lexicon (either created or extracted from some other entities), which is organized in different categories (writing, binding, illumination, format or support);
    • the names, which have been normalized according to the Virtual International Authority File (VIAF) and other vocabularies recommendations, in order to ensure interchange between authority descriptions. Attested forms have been used whenever the normalized form could not be found on existing repertoires;
    • the citations, which have been listed in order of appearance and with their respective references;
  4. the description of codices and other supporting resources, accompanied by normalized forms of the author’s name and the title, as well as the shelfmark, the codicological description, the digital images and the people who possessed them;
  5. the edition metadata, such as the identifier, paratextual information, repository information, interpretative information and commentaries about people, places, codices, events and technical lexicon.

HTML conversion

A series of HTML documents have been created starting from the letters that have been marked up in XML/TEI P5. The HTML documents follow responsive design principles that ensure their readability on multiple platforms and supports.

Each HTML document represents a single letter and consists in a div element, identified by an @id attribute with form “letter-[number]”, which contains:

  1. a series of metadata related to information about sending, provenance, edition and visualization of the letter;
  2. the actual letter, treated as an article element with a @class attribute “Letter” and identified by an @id attribute with form “tomasi-letter-[number]”. The letter contains a number of section elements, each representing a section of the letter. Each section is specified by the @class attribute (e.g. “Salutatio”, “MainText”, “Datatio”) and is identified by the @id attribute with form “tomasi-letter-[number]-[class]”. The proper text in each section is delimited in a series of paragraphs, each highlighted by the p element, which in turn contains a varying number of span elements. Each span element is used to enclose and characterize the most relevant elements found in the text (e.g. names, references to codices, technical terms, etc.). A span element is characterized by a @class attribute that determines its nature and a @href attribute that links it to the respective entity it represents. A span element may also have a @title attribute that allows to differentiate between different related instances with the same @class attribute. For example, both the sender and the adressee of a letter have “Role” as its class, but the title of the former is “Sender”, while the title of the latter is “Addressee”;
  3. a series of footnotes, each treated as a stand-alone section element with a @class attribute “Footnote” and a @href attribute linking it to the note. As seen previously, the proper text in each section is delimited in a series of paragraphs, each highlighted by the p element, which in turn contain a varying number of span elements. References to other letters have been expressed through the a element, by setting the link destination (the other letter) in the @href attribute.

RDF conversion

The Resource Description Framework (RDF) is a data model according to which data is organized in a graph structure. This graph is based on a series of statements about the domain of knowledge. Each statement is structured in a triple, in the form of “subject-predicate-object”. In the graph, “subject” and “object” are nodes connected with each other by a directed arc labelled as “predicate”. The subject denotes a resource, the object denotes another resource or an attribute of the subject, and the predicate expresses a semantic relationship between the two. Subject, predicate and object are each identified by their corresponding Uniform Resource Identifier (URI), a mechanism that allows resources to be identified unambiguously in a persistent way.

After being harmonized so as to be cohesive, the HTML documents have been converted into RDF/XML documents. RDF/XML is an RDF serialization that expresses a RDF graph as a XML document.

Most of the values held by the @class attributes contained in span elements have been converted into ontological classes aligned with existing ontologies (e.g. the concept of “Person” has been aligned with the class foaf:Person, and so on). This has been done by using eXtensible Stylesheet Language Transformations (XSLT), a declarative programming language for converting a document in a XML-based format into another document in a different XML-based format. In order to generate a XSLT transformation two files are needed: the document to convert and a XSLT stylesheet that provides the semantics needed for the transformation. The XSLT stylesheet treats the document to convert as a set of nodes organized in a tree structure, and is made up by a series of templates containing the transformation rules to convert said tree structure into another tree structure.

The output RDF/XML data has been collected into a single RDF dataset. The HTML files are still used for visualizing the single letters, while the RDF graph is used to explore the data related to them (e.g. indices, paratextual information, etc.).

Reused models

A series of existing ontologies have been reused for describing the manuscript letters and the entities that contain their contextual information. In particular:

Most of these models are part of the Semantic Publishing and Referencing ontologies (SPAR), a suite of orthogonal, non-overlapping and complementary OWL 2 DL ontology modules for the creation of comprehensive machine-readable RDF metadata covering every aspect of semantic publishing and referencing. The SPAR Ontologies follows the FAIR principles for data publication and reuse existing standards developed for describing bibliographic resources, such as FRBR.

The Functional Requirements for Bibliographic Records standard (FRBR), a well-known and robust model proposed by the International Federation of Library Association (IFLA) for representing bibliographic resources and metadata, is a highly adaptable model that is not bound to a specific implementation and can be applied to both physical and digital resources. It describes each resource from four different conceptual points of view that are interlinked with each other and are defined by the following categories:

These concepts are organized in a structure in which each entity is in a specific relationship with one other entity, forming a continuous flow from Work to Item and vice versa. This framework allows a holistic perspective about the resource, on multiple levels of conceptualization, by breaking down the semantic and conceptual ambiguities related to objects created by human hands into different but related and layered concepts and by allowing the description of an artifact and its relations with other entities to be more expressive, precise and dynamic.

An example of how FRBR has been used in the RDF graph of the Knowledge site is illustrated in the figure below.

The notion of the digital edition of Vespasiano da Bisticci’s Letters published by Francesca Tomasi has been broken down into multiple concepts distributed across the four FRBR levels of conceptualization. At the Work level, there is the abstract of the edition. At the Expression level, there is the content of the edition. At the Manifestation level, there is the edition in its specified form. At the Item level, there is the proper and specific digital edition existing on the Web.