Long-term project funded by the German Research Foundation

      

Corpus Masoreticum, Heidelberg Center for Jewish Studies

Digital Edition: Our DH Infrastructure

At the Corpus Masoreticum project, sample editions of several figurative Masorah occurrences in MS Vat. ebr. 14 served as a test case that show that the use of Scalable Vector Graphics (SVG) fits the encoding requirements well. SVG, fundamentally, is an XML description for drawing object primitives like paths, polygons or circles on a virtual canvas in an X-Y coordinate system. It is recommended by the World Wide Web Consortium (W3C) as the default specification of two-dimensional vector graphics. Within this specification, the <textpath> markup allows for attaching strings of text to a drawn path which will perfectly align with the path; the reading direction of the text attached will be given by the drawing direction of the path and applies to right-to-left in the same way as to western left-to-right scripts. The following figures show a low-level example of the basic principles of SVG textpaths:

Corpus Masoreticum
<svg width=110% height=100% viewBox="0 0 1000 300"
	xmlns="http://www.w3.org/2000/svg"
	xmlns:xlink=" http://www.w3.org/1999/xlink">

<path id="SamplePath"
	 fill="none" stroke="red"
	d="M 100 200
	C 200 100 300 0 400 100
	C 500 200 600 300 700 200
	C 800 100 900 100 900 100" />

<text font-family="Verdana" font-size="94">
	<textPath xlink:href="#SamplePath">
	Corpus Masoreticum
	</textPath>
   </text>
</svg>

The next figures shows a clipped example of a masora figurata depicted in Codex London British Library Or. 2091, folio 203r, the “Knight’s Head”:

The SVG approach proposed here provides two major benefits: Firstly, SVG figures can easily be embedded in the TEI creation and export process, from being an XML dialect itself and by keeping compatibility with the framework’s schema rules. Secondly, rather than static bitmap imagery, SVG vector graphics allow for interactive data visualizations of the editing results by just applying standard web technologies like CSS styles and animations. This delivers multiple visual aids that help any user of the digital edition to make the edited scripts, especially the micrographic renderings of masora figurata content, readable.

Once editorial concepts, digital resources, data models, and technological components have been evaluated and settled, the overall architecture of the entire manuscript edition as a DH application has to be designed and implemented. As is recommended for modern and scalable software environments, the architecture should be kept as modular as possible, keeping in mind that several different exchange protocols, application programming interfaces (APIs), document formats, and data storage concepts will be involved.

The primary data models for manuscript transcriptions, annotations, and figurative text will be designed as data graphs, utilizing a Neo4J Graph database server as the main analytics and storage back end. However, this might not be considered best practice from an open-access and sustainability perspective, as this technology should be classified as “proprietary” software that is not institutionally guaranteed to be functional or maintained long-term by an open-source community, something that is usually required by institutional-research data archives. Furthermore, proprietary storage and query protocols like Neo4J/Cypher do not comply with current DH archival standards. As a consequence, a separate export layer has to be implemented to dynamically aggregate and break down knowledge-graph data to standard-compliant, reusable TEI documents, containing documentary transcripts, textual transcripts, and figurative text transcripts as embedded SVG markup, as well as supplementary RDF exports to provide users with knowledge-graph representations of the edition’s metadata.

Since the underlying digital manuscript facsimiles will be ingested via IIIF-enabled services, it is a reasonable add-on to also deliver supplementary IIIF Presentation API manifests with basic annotations (transcriptions, translations, scholarly notes) as a service. Appropriate server technologies along these specifications will be required accordingly. The outline of a digital edition framework for Corpus Masoreticum could be summarised as follows:

IIIF-compliant API for image resource ingest

An IIIF-compliant backup server for hosting intermediary image resources and the creation of enhanced resource manifests (Bodleian Digital Manuscripts Toolkit: https://t1p.de/8ggr; IIP Image Server: https://t1p.de/1zlf);

Neo4J Graph Database (https://t1p.de/zl27) as storage and analytics back end

XML document server for storage and export of TEI XML documents and RDF resources (eXist-Db [https://t1p.de/sts8] with TEI Publisher as an add-on [https://t1p.de/tbqt])

Analytics Server for Text Mining and explorative statistics (RapidMiner: https://t1p.de/mu0p)

Document servers for holding additional resources (Fedora Commons: https://t1p.de/svxo; NextCloud: https://t1p.de/jo2y)

Service APIs for basic communication and data exchange with user clients/web browser applications and external services.

Analytics Server
Rapid Miner

External Services
/ APIs

REST-API
PHP

XML Server
eXist-DB

RDBMS
MYSQL

GraphDb
Neo4j

Image Server
IIP / IIIF

Fileserver/Cloud
PHP/NEXTCLOUD

Cloud Server
Heicloud

Client / UI
Angular