An Analysis of Crubadan OLAC Records

The following are questions I have after looking at the following record:

Screen shot of the OLAC Interface

Crubadan record for Karbi (Hills Karbi) in the OLAC Interface Credit: OLAC Interface

  1. Is the license actually applicable? That is who is the copyright holder and is the content actually meeting the threshold for copyright? The creator is not always the copyright holder. Nor should the creator/author be assumed to be the copyright holder.
  2. Why is the source text not indicated in the source field?
  3. Where is the source software used to generate the resource?
  4. The format is an application/zip and the DCMIType is Dataset, so which application is the data consumable in? or which formats are used within the .zip file?
  5. If an abstract is a summary, can or should that field be used then to declare what are the rows and columns or tables and relationships in the dataset are?
  6. How can you have rights without a rights-holder? No copyright holder is declared. The creative commons license is only valid if there is a valid copyright claim.
  7. The specific resource should have a relationship to a Crubadan collection record. It doesn’t have these relationships declared. This means that the automatically generated OLAC citation is also malformed.
  8. The part record, all records actually, should have the DCTerms citation element included. These records don’t.
  9. Why are there no Library of Congress Subject Heading (LCSH) subjects? This is a really hard question to answer. What exactly is the subject of a corpus of letter-frequencies? Computational linguistics? Language Identification? This is where subject hood in language resources seems to break down a bit unless we understand subject to be not just about-ness, but also of-ness and for-ness (utility).

The following two revelations about the user interface ought to be incorporated into future versions of the OLAC interface:

  1. OLAC presents a citation but doesn’t pull that from the record. Where does it get that information? — What is the generative process?
  2. Why doesn’t the OLAC interface report that the links are link-rotted and now 404?
  3. The included XML file appears to be unqualified Dublin Core with the addition of the OLAC metadata. However, the OLAC profile calls for the useage of Qualified Dublin Core. An investigation should be made to see if this is the result of the OLAC infastructure or the data as it was recived by OLAC. (It seems that when this record is compared with the record from PARADISEC OLAC Records and Rights that OLAC does provide QDC when it has been provided QDC.)

The following XML record replicated below is from OLAC. It is retrievable here, while the record is viewable in the OLAC interface here.

<OAI-PMH xmlns="" xmlns:xsi="" xsi:schemaLocation="">
<request verb="GetRecord" identifier="" metadataPrefix="olac"></request>
<olac:olac xmlns:dc="" xmlns:dcterms="" xmlns:olac="" xsi:schemaLocation=" ">
<dc:title>Crúbadán language data for Karbi</dc:title>
<dc:contributor xsi:type="olac:role" olac:code="developer">Kevin Scannell</dc:contributor>
<dc:contributor xsi:type="olac:role" olac:code="researcher">Kevin Scannell</dc:contributor>
<dc:contributor xsi:type="olac:role" olac:code="data_inputter">Edward Jahn</dc:contributor>
<dc:contributor xsi:type="olac:role" olac:code="data_inputter">Dustin Joosten</dc:contributor>
<dc:contributor xsi:type="olac:role" olac:code="data_inputter">Nick Lewchenko</dc:contributor>
<dc:contributor xsi:type="olac:role" olac:code="sponsor">National Science Foundation</dc:contributor>
<dc:creator>Kevin Scannell</dc:creator>
<dc:date xsi:type="dcterms:W3CDTF">2018-03-28</dc:date>
<dc:description>A dataset containing word and character n-gram frequencies and lists of URLs for Karbi</dc:description>
<dc:format xsi:type="dcterms:IMT">application/zip</dc:format>
<dc:identifier xsi:type="dcterms:URI"></dc:identifier>
<dc:rights>Creative Commons Attribution 4.0 International License</dc:rights>
<dc:subject xsi:type="olac:language" olac:code="mjw"/>
<dc:subject xsi:type="olac:linguistic-field" olac:code="computational_linguistics"/>
<dc:subject xsi:type="olac:linguistic-field" olac:code="lexicography"/>
<dc:subject xsi:type="olac:linguistic-field" olac:code="text_and_corpus_linguistics"/>
<dc:subject xsi:type="olac:linguistic-field" olac:code="writing_systems"/>
<dc:type xsi:type="dcterms:DCMIType">Dataset</dc:type>
<dc:type xsi:type="olac:linguistic-type" olac:code="lexicon"/>
Hugh Paterson III
Hugh Paterson III
Collaborative Scholar

I specialize in bespoke research at the intersection of Linguistics, Law, Languages, and Technology; specifically utility and life-cycle management for information products in these spaces.