OLAC — Bottom-up and Top-down

OLAC — Bottom-up and Top-down

26 Feb, 2022 6 min read Jot, Research Note

Project

Several remarks in the literature characterize OLAC and its metadata schema. Many of these remarks are by linguists rather than persons from the tradition of information science. Therefore, it is hard to know how to evaluate these comments. For example, as a system the OLAC aggregator serves linguists as a user-group. So, should these remarks be classified as user-feedback, or should these comments be understood in the context of professional critique from information science professionals.

Additionally, there is the matter of perspective bias. By perspective bias I mean the difference between looking at OLAC’s metadata independently from the breadth of Dublin Core, versus looking at the breadth of Dublin Core and looking at how OLAC provides additional augmentation to the power of Dublin Core. In a way this is the difference between a Bottom-Up perspective, and a Top-Down perspective. Contrasting examples of this include (Citation: Austin, 2013) Austin, P. (2013). Language documentation and meta-documentation. In Jones, M. & Ogilvie, S. (Eds.), Keeping Languages Alive: Documentation, Pedagogy and Revitalization. (pp. 3–15). Cambridge University Press. Retrieved from http://ebooks.cambridge.org/ref/id/CBO9781139245890 and (Citation: Bird et al., 2003) Bird, S. & Simons, G. (2003). Extending Dublin Core Metadata to Support the Description and Discovery of Language Resources. Computers and the Humanities, 37(4). 375–388. https://doi.org/10.1023/A:1025720518994 .

In this case several articles in the literature make it clear that there is a systematic foundation of technologies which form the capabilities of OLAC (Citation: Bird et al., 2003) Bird, S. & Simons, G. (2003). Extending Dublin Core Metadata to Support the Description and Discovery of Language Resources. Computers and the Humanities, 37(4). 375–388. https://doi.org/10.1023/A:1025720518994 and (Citation: Simons et al., 2003) Simons, G. & Bird, S. (2003). Building an Open Language Archives Community on the OAI foundation. Library Hi Tech, 21(2). 210–218. https://doi.org/10.1108/07378830310479848 . However, these layers in the technology protocol are generally ignored by critics who see only the specific elements of the additions to Dublin Core.

Q: What is the difference between a microscope and a telescope? A: The side of the lens that the analyst sits on.

I have collected several critiques of OLAC and present them below. I then discuss the perspective bias and why I think these critiques are understandable, but not well grounded.

In 2013 Peter Austin wrote:

Since its establishment in the late 1990s language documentation has been dominated by declarative deductive approaches to recommendations for creating metadata which have been primarily influenced by library concepts (e.g. Dublin Core). Key metadata notions have been interoperability, standardization, discovery, and access (OLAC, 4 E-MELD, Good 2002, Farrar & Lewis 2007). However, the wider goals of language documentation (including the wider social goals relating to speaker community involvement) mean this is not powerful enough and we need, …

In his work, Austin incompletely presents the metadata available via OLAC through Dublin Core.

The sentiment that OLAC’s descriptive power is less than what language resource users desire is also mentioned by several authors working at CLARIN, TLA, or MPI including (Citation: Van Uytvanck et al., 2012) Van Uytvanck, D., Stehouwer, H. & Lampen, L. (2012). Semantic metadata mapping in practice: The Virtual Language Observatory. European Language Resources Association (ELRA). Retrieved from http://www.lrec-conf.org/proceedings/lrec2012/pdf/437_Paper.pdf where they say:

Traditionally the format used for metadata is OLAC, an extension of the ubiquitous Dublin Core schema, targeted towards the linguistic community. Although useful, OLAC lacks a deep semantic definition and relies mostly on best practice guidelines.

Van Uytvanck et al. are correct in that several of the Dublin Core fields such as description are left open for interpretation by the data provider. This is normal for Dublin Core use-cases because the communication protocol is used for bringing together diverse information sources from multiple independent institutions where practices may diverge significantly. However, they miss that several of the modifier terms within the OLAC vocabularies are defined as being equivalent to linked data items such as Library of Congress Subject Heading terms. By looking for deep semantics from OLAC’s fields it ignores the purpose of its Dublin Core foundation — high level compatibility. Deep congruent semantics in the records found in OLAC will only come as OLAC data providers agree to a common practice of institutional curation and description of their holdings.

At the same conference (Citation: Broeder et al., 2012) Broeder, D., Windhouwer, M., Uytvanck, D., Trippel, T. & Goosen, T. (2012). CMDI: a Component Metadata Infrastructure. European Language Resources Association (ELRA). Retrieved from http://www.lrec-conf.org/proceedings/lrec2012/index.html say the following:

Since 2000 we have seen the rise of new LR [Language Resource] metadata schema as IMDI, IMDI [2003], IMDI [2009] and OLAC but application and uptake of these schema has been limited. Although OLAC is now used more or less as a standard for information exchange between LR archives, it is still delivering low specificity.

Again low specificity seems to be leveraged at OLAC’s vocabularies without taking into account that specify can be enhanced through a variety of mechanisms including the use of Library of Congress Subject Heading terms and other Dublin Core linked vocabularies.

Then (Citation: McCrae et al., 2015) McCrae, J., Labropoulou, P., Gracia, J., Villegas, M., Rodríguez-Doncel, V. & Cimiano, P. (2015). One Ontology to Bind Them All: The META-SHARE OWL Ontology for the Interoperability of Linguistic Datasets on the Web. Springer International Publishing. https://doi.org/10.1007/978-3-319-25639-9_42 say:

This lack of consensus resides also in the description of LRs, even for non-linguistic concepts. In fact, there are as many metadata schema for their descriptions as catalogs and repositories for their presentation (e.g. those used by ELRA and the LDC) and communities describing them (e.g. TEI [14] or CES [13]). The most widely used schema for the exchange of LRs is the one suggested by the Open Language Archives Community [1, OLAC], which builds on the Dublin Core metadata and has been criticized as being too reductionistic. Differences between the schema lie in the range of features used and their labels and datatypes.

While McCrae et al. summarize the literature, they also do not realize that the general perspective in the literature is top-down; first looking at the OLAC vocabularies and then (mostly missing) other kinds of modifying capabilities that Dublin Core and OAI provide to the OLAC application profile.

It is my opinion that important research can be conducted to look for relevant concepts to linguistics and language documentation in the various linked vocabularies natively occurring in Dublin Core.

Bibliography

Simons & Bird (2003): Simons, G. & Bird, S. (2003). Building an Open Language Archives Community on the OAI foundation. Library Hi Tech, 21(2). 210–218. https://doi.org/10.1108/07378830310479848
Van Uytvanck, Stehouwer & Lampen (2012): Van Uytvanck, D., Stehouwer, H. & Lampen, L. (2012). Semantic metadata mapping in practice: The Virtual Language Observatory. European Language Resources Association (ELRA). Retrieved from http://www.lrec-conf.org/proceedings/lrec2012/pdf/437_Paper.pdf
Broeder, Windhouwer, Uytvanck, Trippel & Goosen (2012): Broeder, D., Windhouwer, M., Uytvanck, D., Trippel, T. & Goosen, T. (2012). CMDI: a Component Metadata Infrastructure. European Language Resources Association (ELRA). Retrieved from http://www.lrec-conf.org/proceedings/lrec2012/index.html
McCrae, Labropoulou, Gracia, Villegas, Rodríguez-Doncel & Cimiano (2015): McCrae, J., Labropoulou, P., Gracia, J., Villegas, M., Rodríguez-Doncel, V. & Cimiano, P. (2015). One Ontology to Bind Them All: The META-SHARE OWL Ontology for the Interoperability of Linguistic Datasets on the Web. Springer International Publishing. https://doi.org/10.1007/978-3-319-25639-9_42
Bird & Simons (2003): Bird, S. & Simons, G. (2003). Extending Dublin Core Metadata to Support the Description and Discovery of Language Resources. Computers and the Humanities, 37(4). 375–388. https://doi.org/10.1023/A:1025720518994
Austin (2013): Austin, P. (2013). Language documentation and meta-documentation. In Jones, M. & Ogilvie, S. (Eds.), Keeping Languages Alive: Documentation, Pedagogy and Revitalization. (pp. 3–15). Cambridge University Press. Retrieved from http://ebooks.cambridge.org/ref/id/CBO9781139245890

Tags: Library Science Metadata Approaches to Science
Categories: Jot Research Note

Hugh Paterson III

Collaborative Scholar

I specialize in bespoke research at the intersection of Linguistics, Law, Languages, and Technology; specifically utility and life-cycle management for information products in these spaces.

OLAC — Bottom-up and Top-down

Bibliography

Hugh Paterson III

Collaborative Scholar

Related