Abstract
The Open Language Archiving Community (OLAC) and its DC-aligned metadata application profile are widely used by langage archives (Bird and Simons 2001; 2003; 2021; Simons and Bird 2003a; 2003b). Genre representation in digital libraries is not a well defined practice (Dragon 2020). Groundbreaking work in linguistic genre identification led by Johnson and Aristar-Dry (2012) resulted in the OLAC Discourse Types Vocabulary (OLAC-DTV) for cross-disciplinary language resource preservation and discovery. OLAC-DTV contrasts with both the concept of genre in literature studies (e.g., epic, tragedy, comedy, etc.) and the genre-and-form vocabularies often used within bibliographic records (e.g., MARC Genre Terms, Library of Congress Genre/Form Terms, etc.). OLAC-DTV is especially useful in applications requiring description of corpora or textual units within corpora.
OLAC-DTV has undergone several revisions (2002-11-21, 2002-12-17, 2003-01-27, 2006-04-06, 2012-02-04 → 2002-11-21), the latest being a reversion to the original proposal due to the accompanying XML/XSD not being maintained in step with the approved text. We maintain that: the management processes should have brought the XML/XSD file into alignment with the approved textual representation; and after intentional inclusion (Aristar-Dry and Sriram 2002) the term for ‘poetry’ was removed from OLAC-DTV between the 2002-12-17 and 2003-01-27 versions.
Using a digital library (archive) of Latin texts arranged for language-teaching (Paterson et al 2023) we show that poetry is important as a discourse genre and is relevant in language teaching as well as corpus based analysis. Wide consensus exists that Latin poetry and prose have distinct syntactic and other linguistic attributes (Pinkster 2021; Chaudhuri et al. 2019; Ferri 2011; Sciarrino 2011; Gale 2004) and should be treated appropriately when making corpora based claims about the language (Egbert, Biber, and Gray 2022; Biber 1993b; 1993a). Therefore, we argue that the term ‘poetry’ should be reinstated in OLAC-DTV.
Video Presentation
Bibliography to the abstract
-
Bird & Simons
(2003)
-
Bird,
S. & Simons,
G.
(2003).
Extending Dublin Core Metadata to Support the Description and Discovery of Language Resources.
Computers and the Humanities, 37(4). 375–388.
https://doi.org/10.1023/A:1025720518994
-
Bird & Simons
(2001)
-
Bird,
S. & Simons,
G.
(2001).
The OLAC metadata set and controlled vocabularies. In DeClerck,
T.,
Krauwer,
S. & Rosner,
M. (Eds.),
Proceedings of ACL/EACL Workshop on Sharing Tools and Resources for Research and Education. (pp. 7–18).
EACL-ACL; elsnet. Retrieved from
https://www.aclweb.org/anthology/W01-1506
-
Bird & Simons
(2021)
-
Bird,
S. & Simons,
G.
(2021).
Towards an Agenda for Open Language Archiving.
Proceedings of the International Workshop on Digital Language Archives: LangArc 2021. 25–28.
https://doi.org/10.12794/langarc1851171
-
Chaudhuri,
Dasgupta,
Dexter & Iyer
(2019)
-
Chaudhuri,
P.,
Dasgupta,
T.,
Dexter,
J. & Iyer,
K.
(2019).
A small set of stylometric features differentiates Latin prose and verse.
Digital Scholarship in the Humanities, 34(4). 716–729.
https://doi.org/10.1093/llc/fqy070
-
Egbert,
Biber & Gray
(2022)
-
Egbert,
J.,
Biber,
D. & Gray,
B.
(2022).
Designing and Evaluating Language Corpora: A Practical Framework for Corpus Representativeness.
Cambridge University Press.
https://doi.org/10.1017/9781316584880
-
Paterson III,
Mulligan,
Lacy & Guardiola
(2023)
-
Paterson III,
H.,
Mulligan,
B.,
Lacy,
A. & Guardiola,
P.
(2023).
Bridging Corpora: Creating Learner Pathways Across Texts.
NOVA CLUNL, Portugal. Retrieved from
https://aclanthology.org/2023.ldk-1.63
-
Pinkster
(2021)
-
Pinkster,
H.
(2021).
The Oxford Latin Syntax: Volume II: The Complex Sentence and Discourse.
Oxford University Press.
-
Sciarrino
(2011)
-
Sciarrino,
E.
(2011).
Cato the Censor and the beginnings of Latin prose: from poetic translation to elite transcription.
Ohio State University Press.
-
Simons & Bird
(2003)
-
Simons,
G. & Bird,
S.
(2003).
The Open Language Archives Community: An Infrastructure for Distributed Archiving of Language Resources.
Literary and Linguistic Computing, 18(2). 117–128.
https://doi.org/10.1093/llc/18.2.117
Tags:
OLAC
Discorse
Discorese Type Vocaulary
Poetry
Latin
Categories:
Peer Reviewed
Conference
Mentioned Languages:
Latin
Content Mediums:
MovingImage
Text
Event
Collaborative Scholar
I specialize in bespoke research at the intersection of Linguistics, Law, Languages, and Technology; specifically utility and life-cycle management for information products in these spaces.
Professor of Classics
Latin Professor.