OLAC Social Network

In their 2021 paper (Citation: et al., ) & (). Towards an Agenda for Open Language Archiving. University of North Texas. https://doi.org/10.12794/langarc1851171 Bird and Simons discuss the health of the network of OLAC data providers. They state:

of the 62 registered archives, 27 have not been updated in the past five years, and an overlapping 19 archives are failing to harvest.

Having been an long-time OLAC observer, I can attest that OLAC data providers both come and go, and some refresh their feeds. For example, the SIL Language & Culture Archive did not have an active OLAC feed between 2013 and 2021. But this has been fixed in 2022.

SIL Language and Culture Archive history

Another way to look at the issues which beset the network of OLAC data providers is to look at the network of technologies responsible for the data feed. To do this is challenging. Exact roles are not specified in the OLAC documentation. These roles appear in the OAI header for each data provider and the aggregation of these can be found in two places on the the OLAC website here (along with the detail pages) and here.

When we look at the set of the data providers and we see the extent of providers which do not maintain their OLAC feeds we could interpret that as the death of nodes or part of the network. This might be better understood with a network diagram. So I made one.

SIL Language and Culture Archive history

While conducting this analysis it became apparent that there were several clades of providers which developed. By clade I mean a group of data providers which evolved together sharing an exposure to OLAC metadata and a technologist. Those clades are indicated in this image. The indicated clades are to be treated as suspect, but I think there is reason to believe that there is a great amount of sociological influence which permits these clades to form. For example, I know that at one time Gary Simons assisted AILLIA with their OLAC feed, so perhaps that archive might belong in a common clade with the entities within the SIL clade. If that is the case then maybe the SIL clade ought to be renamed as the Gary Simons clade. I also suspect there might be a CLARIN clade, but I don’t know for sure or how these relationships are or were structured. I also suspect that there might be or was an Australian clade in which Nick Thieberger would have been influential. If this is the case then the Hawaii clade may be suspect and rather than two clades (Hawaii and Australia) there may only be one which would include those entities assisted or guided by Professor Thieberger.

The image is incomplete with regard to the total number of data providers. It lacks the following data providers which were left out for visual space constraint reasons (and no other reason):

  • Language resources at the Text Laboratory
  • IULA UPF OAI Archive
  • ODIN - The Online Database of Interlinear Text
  • Cornell Language Acquisition Lab
  • Living Archive of Aboriginal Languages
  • Lund University Humanities Lab corpusserver
  • Magoria Books’ Carib and Romani Archive
  • Multimodal Learning and teaching Corpora Exchange
  • Perseus digital library
  • The LINGUIST List Language Resources

It is my hope that a better understanding of where in the network and why network death occurs will contribute to a healthier social practice and healthier network.

Hugh Paterson III
Hugh Paterson III
Collaborative Scholar

I specialize in bespoke research at the intersection of Linguistics, Law, Languages, and Technology; specifically utility and life-cycle management for information products in these spaces.