OLAC Term Consolidation

Today I was reminded of a situation in OLAC where DCMITypes terms, which ought to be members of a controlled vocabulary are showing up in OLAC with variations. This is due to data providers providing different variations or performing different text transformations on the metadata sent to OLAC. This variation presents challenges to users because the metadata does not pattern correctly in faceted searches. This problem can be solved with server side data consistency checks and normalization transformations. This is demonstrated by (Citation: et al., ) , & (). Analyzing and Normalizing Type Metadata for a Large Aggregated Digital Library. The Code4Lib Journal, 47. Retrieved from https://journal.code4lib.org/articles/14995 .

The following table taken from OLAC aggregated metadata shows the degree of variation in the following terms: Image, MovingImage, Sound, and Text.

DCMIType Count Problem
Collection ‎ (843)
Dataset ‎(5919)
Event (54)
Image (1605)
image ‎(63) Lacks upper-case initial letter
InteractiveResource ‎(13)
Moving Image (1) Should be a single word
MovingImage ‎(63055)
movingimage (4) Lacks upper-case initial letter in each lexical component
PhysicalObject (16)
Software (526)
Sound (134923)
sound (4) Lacks upper-case initial letter
StillImage (7692)
Text ‎ (198870)
text (65) Lacks upper-case initial letter
Screenshot of DCMIType metadata

Screenshot of DCMIType metadata Credit: OLAC metadata screenshot on 24 September 2022 by Hugh Paterson III

Bibliography

Lynch, Gibson & Han (2020)
, & (). Analyzing and Normalizing Type Metadata for a Large Aggregated Digital Library. The Code4Lib Journal, 47. Retrieved from https://journal.code4lib.org/articles/14995
Tags:
Categories:
Hugh Paterson III
Hugh Paterson III
Collaborative Scholar

I specialize in bespoke research at the intersection of Linguistics, Law, Languages, and Technology; specifically utility and life-cycle management for information products in these spaces.

Related