A Corpus with Latin Poetry have you: The Lost OLAC Discourse Type


The Open Language Archiving Community (OLAC) and its DC-aligned metadata application profile are widely used by langage archives (Bird and Simons 2001; 2003; 2021; Simons and Bird 2003a; 2003b). Genre representation in digital libraries is not a well defined practice (Dragon 2020). Groundbreaking work in linguistic genre identification led by Johnson and Aristar-Dry (2012) resulted in the OLAC Discourse Types Vocabulary (OLAC-DTV) for cross-disciplinary language resource preservation and discovery. OLAC-DTV contrasts with both the concept of genre in literature studies (e.g., epic, tragedy, comedy, etc.) and the genre-and-form vocabularies often used within bibliographic records (e.g., MARC Genre Terms, Library of Congress Genre/Form Terms, etc.). OLAC-DTV is especially useful in applications requiring description of corpora or textual units within corpora.

OLAC-DTV has undergone several revisions (2002-11-21, 2002-12-17, 2003-01-27, 2006-04-06, 2012-02-04 → 2002-11-21), the latest being a reversion to the original proposal due to the accompanying XML/XSD not being maintained in step with the approved text. We maintain that: the management processes should have brought the XML/XSD file into alignment with the approved textual representation; and after intentional inclusion (Aristar-Dry and Sriram 2002) the term for ‘poetry’ was removed from OLAC-DTV between the 2002-12-17 and 2003-01-27 versions.

Using a digital library (archive) of Latin texts arranged for language-teaching (Paterson et al 2023) we show that poetry is important as a discourse genre and is relevant in language teaching as well as corpus based analysis. Wide consensus exists that Latin poetry and prose have distinct syntactic and other linguistic attributes (Pinkster 2021; Chaudhuri et al. 2019; Ferri 2011; Sciarrino 2011; Gale 2004) and should be treated appropriately when making corpora based claims about the language (Egbert, Biber, and Gray 2022; Biber 1993b; 1993a). Therefore, we argue that the term ‘poetry’ should be reinstated in OLAC-DTV.

3 Sep, 2024 10:40
Berlin, Germany

Video Presentation

Bibliography to the abstract

