INCEpTION - Cited Loci

Cited Loci

Annotate references to classical authors and their works.

Source: This example was kindly contributed by Matteo Romanello, Digital Humanities Laboratory, EPFL, Switzerland

The Cited Loci project, developed to index canonical references found in existing publications – be they born-digital or digitized. This is technically framed as a named entity recognition and linking problem. In the project, particular focus has been put on references to classical authors and their works.

INCEpTION has been used in this project to manually correct the automatic extraction and disambiguation of references, in particular in two contexts:

  • Epische Bauformen project: the index of cited passages of the printed publication (in 3 volumes) is produced by extracting references automatically, then corrected by student assistants manually in INCEpTION
  • Center for Hellenic Studies (CHS) in Harvard: summer interns have been creating a gold standard set, consisting of book chapters (from CHS’ open access series) annotated with information about canonical references and named entities

The data involved in these contexts was the following:

  • book chapters, journal articles in text format
  • legacy data in CONLL/Brat standoff format, converted into UIMA XML
  • HuCit knowledge base with classical authors and works, SPARQL API available

The overall workflow of the annotation project consisted of the following steps:

  • output of the automatic processing using the Cited Loci software was loaded into INCEpTION
  • student assistants/interns manually verified all annotations

To complete these tasks, we made in particular use of the following features of the INCEpTION platform:

  • the abiltiy to define custom layers for specialized relations/entities
  • the knowledge base integration: entities are linked to the corresponding author/work instance in the KB
  • the active learning support to speed up the annotation process
  • the monitoring to manage the process