Blog post metadata Diversity and Typology Across OLAC Providers | Hugh's Curriculum Vitae

Diversity and Typology Across OLAC Providers

Are all data providers to OLAC archives as some in the literature seem to imply, or are these implications an artifact of some other framing mechanism? In linguistics, framing is the establishment of a context underwhich a narrative is expereinced. Lakoff and Filmore are both researchers in the concepts around framing.

The Open Language Archives Community (OLAC) metadata profile is built on Dublin Core and OAI. Essentially OLAC OAI feeds (to the OLAC aggregator) are modified (or “enhanced” using the XML attribute xsi:type=" ") OAI-PMH feeds. The OAI protocol was originally developed to be used by repository software to communicate with other repositories—in an effort to make a network of repositories. The OAI protocol works well as designed. The OLAC implementation of OAI inherits some of the OAI provided terms and concepts. When OLAC was first created, the profile was originally assumed to be describing archived language resources, and archives were assumed to be repositories—at least in the OAI sense. Many, including recent publications talk about “archives” providing records to the OLAC aggregator. These claims and references are interesting, confusing, and inaccurate—though if the sense of archive, is really and OAI repository, then I seen the connection. However, that is not what most people think of when they hear language archive or archive.

The framing of terms from the OAI context is laid out in the introduction to the OAI-PMH standard. The following definitions are given.

  • Data Providers administer systems that support the OAI-PMH as a means of exposing metadata; and
  • Service Providers use metadata harvested via the OAI-PMH as a basis for building value-added services.
  • A repository is a network accessible server that can process the 6 OAI-PMH requests in the manner described in this document. A repository is managed by a data provider to expose metadata to harvesters. To allow various repository configurations, the OAI-PMH distinguishes between three distinct entities related to the metadata made accessible by the OAI-PMH.

The distinction here is that the corporate entity implementing data sharing technology is the Data Provider. While the server technology implementing the access is call the repository.

However, within the OLAC documentation and academic literature presenting OLAC further conceptualization of terms occurs.

The use of the term “archive” to mean all data providers to OLAC is interesting from a linguistic point of view because by calling Data Providers (in the OAI sense) “archives” authors use the language of inclusion which maps well to the concepts of “open” and “community” which can be found in the OLAC name, but the language of inclusion here does not acknowledge the diversity of the kinds of data providers. While inclusion and an under representation of diversity are the impacts, there may be other factors as well. If we look at the OLAC documentation, the documentation in the examples and in the XML elements calls things “archives”. Consider the OLAC Protocol which is an extension of the basic OAI-PMH protocol. Specifically, in section 3 where OLAC data providers are required to return a specific element when the OAI-PMH verb Identify is used. The title of the XML element is olac-archive. It has two possible attribute values personal or institutional. The example given is as follows:

<olac-archive
      xmlns="http://www.language-archives.org/OLAC/1.0/"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.language-archives.org/OLAC/1.0/
                 http://www.language-archives.org/OLAC/1.0/olac-archive.xsd"
      type="institutional">
   <archiveURL>http://www.ethnologue.com</archiveURL>
   <curator>Raymond G. Gordon, Jr.</curator>
   <curatorTitle>Ethnologue Editor</curatorTitle>
   <curatorEmail>mailto:editor_ethnologue@sil.org</curatorEmail>
   <institution>SIL International</institution>
   <institutionURL>http://www.sil.org</institutionURL>
   <shortLocation>Dallas, USA</shortLocation>
   <location>7500 W. Camp Wisdom Rd., Dallas, TX 75236, U.S.A.</location>
   <synopsis>The Ethnologue data provider gives a metadata record for every
   language entry in the Web edition of the Ethnologue.  The latter provides
   basic information about each of the 7,000+ modern language of the world
   (both living and recently extinct).</synopsis>
   <access>Every resource described by the Ethnologue data provider is a
   public Web page that may be accessed without restriction. Reuse of 
   material on the site is subject to the Terms of Use that are
   posted.</access>
</olac-archive>

As we see in the above example, the olac-archive or OAI repository is the Ethnologue. It is typed as an institutional OLAC archive. However, is the Ethnologue actually a language archive? What language resource can you go to the archive and get? It seems to me to be more like a typological database about socio-linguistic topics. While it might be too late now to do anything about it, the OLAC schema could have used the XML element olac-provider as a more neutral term.

The use of the term “archive” to mean all data providers to OLAC is confusing because the language of academic discourse around language archives and OLAC data providers is not discrete. That is, the language does not provide distinct categories where categories would be helpful. The OLAC protocol distinguishes between organizations and personal data providers, but OLAC doesn’t make any further distinction. Research which uses OLAC records as a dataset should take the diversity of data providers into account as analyses present observations and claims—especially about the description of artifacts, or percentages of Dublin Core Element use.

Finally the claims that all data providers are “archives” is inaccurate—or careless at best. In my assessment I find that there is a mix of provider types which does not neatly fit into the personal / institutional dichotomy. In part that is because the personal / institutional dichotomy describes the OAI Data provider, not the OAI repository. As I evaluate the diversity of OAI repositories I find the dimension of personal / institutional less useful than one which included the terms: Archive, Repository, Museum, Gallery, Library, Special Collection, Personal Portfolio, Lab or Department Portfolio, Project Portfolio, Typological Database, and Bibliography. Roughly these fall into the broad categories of access instutions, exibits, and reference resources. The following diagram puts these terms on a gradient of a data provider which is highly likely to have an advertised resource and be able to provide access to that resource to a provider which is likely not able to provide access to an advertised resource. It is this second dimension which is likely to impact the nature of OLAC records more than the first dimension of personal / institutional. The second dimention should be considered when using OLAC records as an object of study. If OLAC were to revise its protocol, these other terms should be considered to provide a better context for evaluating the qality and completness of the records.

Types of OLAC Data Providers.

Types of OLAC Data Providers. Credit: Hugh Paterson III

Tags:
Categories:
Content Mediums:
Hugh Paterson III
Hugh Paterson III
Collaborative Scholar

My research interests include typological patterns in articulatory phonetics; User Experience design in language tools; and graph theory applied to language and linguistic resource discovery.

Related