OLAC Term Consolidation
Today I was reminded of a situation in OLAC where DCMITypes terms, which ought to be members of a controlled vocabulary are showing up in OLAC with variations. This is due to data providers providing different variations or performing different text transformations on the metadata sent to OLAC. This variation presents challenges to users because the metadata does not pattern correctly in faceted searches. This problem can be solved with server side data consistency checks and normalization transformations. This is demonstrated by (Citation: Lynch et al., 2020) Lynch, J., Gibson, J. & Han, M. (2020). Analyzing and Normalizing Type Metadata for a Large Aggregated Digital Library. The Code4Lib Journal, 47. Retrieved from https://journal.code4lib.org/articles/14995 .
The following table taken from OLAC aggregated metadata shows the degree of variation in the following terms: Image, MovingImage, Sound, and Text.
DCMIType | Count | Problem |
---|---|---|
Collection | (843) | |
Dataset | (5919) | |
Event | (54) | |
Image | (1605) | |
image | (63) | Lacks upper-case initial letter |
InteractiveResource | (13) | |
Moving Image | (1) | Should be a single word |
MovingImage | (63055) | |
movingimage | (4) | Lacks upper-case initial letter in each lexical component |
PhysicalObject | (16) | |
Software | (526) | |
Sound | (134923) | |
sound | (4) | Lacks upper-case initial letter |
StillImage | (7692) | |
Text | (198870) | |
text | (65) | Lacks upper-case initial letter |
Bibliography
- Lynch, Gibson & Han (2020)
- Lynch, J., Gibson, J. & Han, M. (2020). Analyzing and Normalizing Type Metadata for a Large Aggregated Digital Library. The Code4Lib Journal, 47. Retrieved from https://journal.code4lib.org/articles/14995
Categories: