A Corpus with Latin Poetry have you: The Lost OLAC Discourse Type


The Open Language Archiving Community (OLAC) and its DC-aligned metadata application profile are widely used by langage archives (Bird and Simons 2001; 2003; 2021; Simons and Bird 2003a; 2003b). Genre representation in digital libraries is not a well defined practice (Dragon 2020). Groundbreaking work in linguistic genre identification led by Johnson and Aristar-Dry (2012) resulted in the OLAC Discourse Types Vocabulary (OLAC-DTV) for cross-disciplinary language resource preservation and discovery. OLAC-DTV contrasts with both the concept of genre in literature studies (e.g., epic, tragedy, comedy, etc.) and the genre-and-form vocabularies often used within bibliographic records (e.g., MARC Genre Terms, Library of Congress Genre/Form Terms, etc.). OLAC-DTV is especially useful in applications requiring description of corpora or textual units within corpora.

OLAC-DTV has undergone several revisions (2002-11-21, 2002-12-17, 2003-01-27, 2006-04-06, 2012-02-04 → 2002-11-21), the latest being a reversion to the original proposal due to the accompanying XML/XSD not being maintained in step with the approved text. We maintain that: the management processes should have brought the XML/XSD file into alignment with the approved textual representation; and after intentional inclusion (Aristar-Dry and Sriram 2002) the term for ‘poetry’ was removed from OLAC-DTV between the 2002-12-17 and 2003-01-27 versions.

Using a digital library (archive) of Latin texts arranged for language-teaching (Paterson et al 2023) we show that poetry is important as a discourse genre and is relevant in language teaching as well as corpus based analysis. Wide consensus exists that Latin poetry and prose have distinct syntactic and other linguistic attributes (Pinkster 2021; Chaudhuri et al. 2019; Ferri 2011; Sciarrino 2011; Gale 2004) and should be treated appropriately when making corpora based claims about the language (Egbert, Biber, and Gray 2022; Biber 1993b; 1993a). Therefore, we argue that the term ‘poetry’ should be reinstated in OLAC-DTV.

3 Sep, 2024 10:40
Berlin, Germany

Video Presentation

Bibliography to the abstract

Aristar-Dry & Sriram (2002)
& (). Linguistic Data Types & Discourse Types & Linguistic Fields. Retrieved from http://www.language-archives.org/events/olac02/proceedings.pdf
Biber (1993)
(). Representativeness in Corpus Design. Literary and Linguistic Computing, 8(4). 243–257. https://doi.org/10.1093/llc/8.4.243
Biber (1993)
(). Using Register-Diversified Corpora for General Language Studies. Computational Linguistics, 19(2). 219–241. Retrieved from https://www.aclweb.org/anthology/J93-2001
Bird & Simons (2003)
& (). Extending Dublin Core Metadata to Support the Description and Discovery of Language Resources. Computers and the Humanities, 37(4). 375–388. https://doi.org/10.1023/A:1025720518994
Bird & Simons (2001)
& (). The OLAC metadata set and controlled vocabularies. In DeClerck, T., Krauwer, S. & Rosner, M. (Eds.), Proceedings of ACL/EACL Workshop on Sharing Tools and Resources for Research and Education. (pp. 7–18). EACL-ACL; elsnet. Retrieved from https://www.aclweb.org/anthology/W01-1506
Bird & Simons (2021)
& (). Towards an Agenda for Open Language Archiving. Proceedings of the International Workshop on Digital Language Archives: LangArc 2021. 25–28. https://doi.org/10.12794/langarc1851171
Chaudhuri, Dasgupta, Dexter & Iyer (2019)
, , & (). A small set of stylometric features differentiates Latin prose and verse. Digital Scholarship in the Humanities, 34(4). 716–729. https://doi.org/10.1093/llc/fqy070
Ferri (2011)
(). The Language of Latin Epic and Lyric Poetry. In Clackson, J. (Eds.), A Companion to the Latin Language. (1, pp. 344–366). Wiley. https://doi.org/10.1002/9781444343397.ch20
Dragon (2020)
(). Form and Genre Access to Academic Library Digital Collections. Journal of Library Metadata, 20(1). 29–49. https://doi.org/10.1080/19386389.2020.1723203
Egbert, Biber & Gray (2022)
, & (). Designing and Evaluating Language Corpora: A Practical Framework for Corpus Representativeness. Cambridge University Press. https://doi.org/10.1017/9781316584880
Gale (2004)
Gale, M. (). Latin epic and didactic poetry: genre, tradition and individuality. The Classical Press of Wales. https://doi.org/10.2307/j.ctv1n357vk
Johnson & Aristar Dry (2012)
& (). OLAC Discourse Type Vocabulary. Open Language Archive Community. Retrieved from http://www.language-archives.org/REC/discourse.html
(s.d.). Retrieved from //www.loc.gov/aba/publications/FreeLCGFT/
(). Relator Code and Term List – Term Sequence: MARC 21 Source Codes. Library of Congress. Retrieved from https://www.loc.gov/marc/relators/relaterm.html
Paterson III, Mulligan, Lacy & Guardiola (2023)
, , & (). Bridging Corpora: Creating Learner Pathways Across Texts. NOVA CLUNL, Portugal. Retrieved from https://aclanthology.org/2023.ldk-1.63
Pinkster (2021)
(). The Oxford Latin Syntax: Volume II: The Complex Sentence and Discourse. Oxford University Press.
Sciarrino (2011)
(). Cato the Censor and the beginnings of Latin prose: from poetic translation to elite transcription. Ohio State University Press.
Simons & Bird (2003)
& (). Building an Open Language Archives Community on the OAI foundation. Library Hi Tech, 21(2). 210–218. https://doi.org/10.1108/07378830310479848
Simons, Bird & Spanne (2008)
(N.A.). (). OLAC Metadata Usage Guidelines ( 2008-07-11 ). Open Language Archive Community. Retrieved from http://www.language-archives.org/NOTE/usage-20080711.html
Simons & Bird (2003)
& (). The Open Language Archives Community: An Infrastructure for Distributed Archiving of Language Resources. Literary and Linguistic Computing, 18(2). 117–128. https://doi.org/10.1093/llc/18.2.117
Mentioned Languages:
Content Mediums:
Hugh Paterson III
Hugh Paterson III
Collaborative Scholar

I specialize in bespoke research at the intersection of Linguistics, Law, Languages, and Technology; specifically utility and life-cycle management for information products in these spaces.

Bret Mulligan
Bret Mulligan
Professor of Classics

Latin Professor.