Where Have All the Collections Gone?: Analysis of OLAC Data Contributors' use of DCMIType 'Collection'


Language materials, as commonly conceptualized by academics, are resources which specifically exhibit or provide evidence of a naturally spoken language. The modern area of academic practice known as language documentation has its roots in anthropological linguistics but maintains a strong adherence to ideals which call for the archiving of source materials. The purpose for archiving is to benefit the many stakeholders involved in language development activities. Language archives, hosting language resources, have by and large adopted Dublin Core as a metadata standard along with the additional metadata terms of the Open Language Archive Community (OLAC) application profile as described in Bird and Simons (2001, 2003). This study is a first look at how the DCMIType ‘Collection’ is used across aggregated records from language archives. This study finds that current practices of arrangement and description at language resource preservation institutions participating in OLAC do not currently follow archival best practices in arrangement and description as described in frameworks like Describing Archives: A Content Standard including honoring principles like respect des fonds. This has multiple impacts including consequences in web-based navigation and discoverability.

Where Have All the Collections Gone?: Analysis of OLAC Data Contributors’ use of DCMIType ‘Collection’
In Proceedings of the 15th Annual Society of American Archivists Research Forum
May 2022

Citable as

Paterson III, Hugh J. (2022) “Analysis of OLAC Data Contributors’ use of DCMIType ‘Collection’” In the proeedings of the 15th Annual Society of American Archivists Research Forum. 21 July, 2021.

Paper Bibliography

Hale, Krauss, Watahomigie, Yamamoto, Craig, Jeanne & England (1992)
, , , , , & (). Endangered Languages. Language, 68(1). 1–42. https://doi.org/10.2307/416368
Bird & Simons (2001)
& (). The OLAC metadata set and controlled vocabularies. In DeClerck, T., Krauwer, S. & Rosner, M. (Eds.), Proceedings of ACL/EACL Workshop on Sharing Tools and Resources for Research and Education. (pp. 7–18). EACL-ACL; elsnet. Retrieved from https://www.aclweb.org/anthology/W01-1506
Nordmoe (2018)
(). SIL International Language and Culture Archives. Retrieved from https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/nordmoe.pdf
Woodbury (2014)
(). Archives and audiences: Toward making endangered language documentations people can read, use, understand, and admire. In Nathan, D. & Austin, P. (Eds.), Special Issue on Language Documentation and Archiving. (pp. 19–36). SOAS. Retrieved from http://www.elpublishing.org/PID/135
Holton (2012)
(). Language archives: They’re not just for linguists any more. In Seifart, F., Haig, G., Himmelmann, N., Jung, D., Margetts, A. & Trilsbeek, P. (Eds.), Potentials of Language Documentation: Methods, Analyses, and Utilization. (pp. 111–117). University of Hawai'i Press. Retrieved from http://scholarspace.manoa.hawaii.edu/handle/10125/4523
Žumer, Zeng & Salaba (2010)
, & (). FRBR: A Generalized Approach to Dublin Core Application Profiles. Dublin Core Metadata Initiative. Retrieved from https://dcpapers.dublincore.org/pubs/article/view/1024
Wijesundara & Sugimoto (2018)
& (). Metadata model for organizing digital archives of tangible and intangible cultural heritage, and linking cultural heritage information in digital space. Libres, 28(2). 58–80.
Tillett (2004)
(). What is FRBR? — A Conceptual Model for the Bibliographic Universe. Library of Congress Cataloging Distribution Service. Retrieved from https://www.loc.gov/cds/downloads/FRBR.PDF
Kurtz (2010)
(). Dublin Core, DSpace, and a Brief Analysis of Three University Repositories. Information Technology and Libraries, 29(1). 40–46. https://doi.org/10.6017/ital.v29i1.3157
Brown (1998)
(). Can Culture Be Copyrighted?. Current Anthropology, 39(2). 193–222. https://doi.org/10.1086/204721
Seyfeddinipur, Ameka, Bolton, Blumtritt, Carpenter, Cruz, Drude, Epps, Ferreira, Galucio, Hellwig, Hinte, Holton, Jung, Buddeberg, Krifka, Kung, Monroig, Neba, Nordhoff, Pakendorf, Prince, Rau, Rice, Riessler, Szoelloesi Brenig, Thieberger, Trilsbeek, Voort & Woodbury (2019)
, , , , , , , , , , , , , , , , , , , , , , , , , , , , & (). Public access to research data in language documentation: Challenges and possible strategies. Language Documentation & Conservation, 13. 545–563. Retrieved from http://scholarspace.manoa.hawaii.edu/handle/10125/24901
Patrick (2008)
(). The Speech Community. In Chambers, J., Trudgill, P. & Schilling-Estes, N. (Eds.), The Handbook of Language Variation and Change. (pp. 573–597). Blackwell Publishing Ltd.
(). Describing archives: a content standard (2nd). Society of American Archivists. Retrieved from http://files.archivists.org/pubs/DACS2E-2013_v0315.pdf
Palmer, Zavalina & Fenlon (2010)
, & (). Beyond size and search: Building contextual mass in digital aggregations for scholarly use. Proceedings of the American Society for Information Science and Technology, 47(1). 1–10. https://doi.org/10.1002/meet.14504701213
Wickett, Isaac, Doerr, Fenlon, Meghini & Palmer (2014)
, , , , & (). Representing Cultural Collections in Digital Aggregation and Exchange Environments. D-Lib Magazine, 20(5/6). https://doi.org/10.1045/may2014-wickett
Riva, Le Bœuf & Žumer (2017)
Riva, P., Le Bœuf, P. & Žumer, M. (). IFLA Library Reference Model A Conceptual Model for Bibliographic Information (December 2017). International Federation of Library Associations and Institutions (IFLA). Retrieved from https://www.ifla.org/publications/node/11412
Nathan (2013)
(). Access and Accessibility at ELAR, a Social Networking Archive for Endangered Languages Documentation. In Turin, M., Wheeler, C. & Wilkinson, E. (Eds.), Oral Literature in the Digital Age. (pp. 21–41). Open Book Publishers.
Munro & Nathan (2005)
& (). Introducing the ELAR information system architecture. Retrieved from http://www.robertmunro.com/research/munro05elar.pdf
Barwick (2003)
(). Planning for PARADISEC: The Pacific And Regional Archive for Digital Sources in Endangered Cultures. Retrieved from https://web.archive.org/web/20080815073730/http://www.acn.net.au/conference3/barwick/barwick.pdf
Thieberger & Jacobson (2010)
& (). Sharing data in small and endangered languages: Cataloging and metadata, formats, and encodings. In Grenoble, L. & Furbee, N. (Eds.), Language Documentation: Practice and Values. (pp. 147–158). John Benjamins Publishing Company. Retrieved from https://benjamins.com/catalog/z.158.15thi
Burke & Zavalina (2020)
& (). Descriptive richness of free‐text metadata: A comparative analysis of three language archives. Proceedings of the Association for Information Science and Technology, 57(1). https://doi.org/10.1002/pra2.429
Park (2009)
(). Metadata Quality in Digital Repositories: A Survey of the Current State of the Art. Cataloging & Classification Quarterly, 47(3-4). 213–228. https://doi.org/10.1080/01639370902737240
Park & Tosaka (2010)
& (). Metadata Creation Practices in Digital Repositories and Collections: Schemata, Selection Criteria, and Interoperability. Information Technology and Libraries, 29(3). 104–116. https://doi.org/10.6017/ital.v29i3.3136
Heery & Patel (2000)
& (). Application Profiles: Mixing and Matching Metadata Schemas. Ariadne, 25. Retrieved from http://www.ariadne.ac.uk/issue/25/app-profiles/
Sugimoto, Kiryakos, Wijesundara, Monika, Mihara & Nagamori (2018)
, , , , & (). Metadata Models for Organizing Digital Archives on the Web: Metadata-Centric Projects at Tsukuba and Lessons Learned. In DC-2018--The Porto, Portugal Proceedings. (pp. 95–105). Dublin Core Metadata Initiative. Retrieved from https://dcpapers.dublincore.org/pubs/article/view/3968/
Zavalina, Palmer, Jackson & Han (2009)
, , & (). Evaluating Descriptive Richness in Collection-Level Metadata. Journal of Library Metadata, 8(4). 263–292. https://doi.org/10.1080/19386380802627109
Tillett (2001)
(). Bibliographic Relationships. In Bean, C. & Green, R. (Eds.), Relationships in the Organization of Knowledge. (pp. 19–35). Springer Netherlands. Retrieved from http://link.springer.com/10.1007/978-94-015-9696-1_2
Christen, Merrill & Wynne (2017)
, & (). A Community of Relations: Mukurtu Hubs and Spokes. D-Lib Magazine, 23(5/6). https://doi.org/10.1045/may2017-christen
Burke & Zavalina (2020)
& (). Identifying Challenges for Information Organization in Language Archives: Preliminary Findings. In Sundqvist, A., Berget, G., Nolin, J. & Skjerdingstad, K. (Eds.), Sustainable digital communities: Proceedings of the 15th international conference, iConference 2020, Böras, Sweden, March 23–26, 2020. (pp. 622–629). Springer International Publishing. Retrieved from http://link.springer.com/10.1007/978-3-030-43687-2_52
Hughes (2004)
(). Metadata Quality Evaluation: Experience from the Open Language Archives Community. In Chen, Z., Chen, H., Miao, Q., Fu, Y., Fox, E. & Lim, E. (Eds.), Digital Libraries: International Collaboration and Cross-Fertilization. (pp. 320–329). Springer Berlin Heidelberg. Retrieved from http://link.springer.com/10.1007/978-3-540-30544-6_34
Sullivant (2020)
(). Archival description for language documentation collections. Language Documentation & Conservation, 14. 520–578. Retrieved from http://hdl.handle.net/10125/24949
Zavalina (2011)
(). Contextual Metadata in Digital Aggregations: Application of Collection-Level Subject Metadata and Its Role in User Interactions and Information Retrieval. Journal of Library Metadata, 11(3-4). 104–128. https://doi.org/10.1080/19386389.2011.629957
Bird & Simons (2003)
& (). Extending Dublin Core Metadata to Support the Description and Discovery of Language Resources. Computers and the Humanities, 37(4). 375–388. https://doi.org/10.1023/A:1025720518994
Hillmann & Phipps (2007)
& (). Application Profiles: Exposing and Enforcing Metadata Quality. Dublin Core Metadata Initiative & National Library Board Singapore. Retrieved from https://dcpapers.dublincore.org/pubs/article/view/866
Hirt, Simons & Spanne (2009)
, & (). Building a MARC-to-OLAC crosswalk: repurposing library catalog data for the language resources community. ACM Press. https://doi.org/10.1145/1555400.1555479
Park (2006)
(). Semantic Interoperability and Metadata Quality: An Analysis of Metadata Item Records of Digital Image Collections. Knowledge Organization, 31(1). 20–34.
Simons & Bird (2003)
& (). Building an Open Language Archives Community on the OAI foundation. Library Hi Tech, 21(2). 210–218. https://doi.org/10.1108/07378830310479848
(). Dublin Core Metadata Guide—Indiana Memory Project. Indiana University Perdue University Indianapolis. Retrieved from https://www.in.gov/library/files/IndianaMemoryMetadata2020.pdf
Simons (2016)
(). From Linguistic Data Type to Language Resource Type: Laying the groundwork for a metadata application profile. Retrieved from https://scholars.sil.org/sites/scholars/files/gary_f_simons/presentation/simons-language_resource_type_vocabulary.pdf
Wasson, Holton & Roth (2016)
, & (). Bringing User-Centered Design to the Field of Language Archives. Language Documentation & Conservation, 10. 641–681. Retrieved from http://hdl.handle.net/10125/24721
Park & Childress (2009)
& (). Dublin Core metadata semantics: an analysis of the perspectives of information professionals. Journal of Information Science, 35(6). 727–739. https://doi.org/10.1177/0165551509337871
Paterson III (2021)
(). OLAC Nightly Data Dump (XML) from 18 July 2021. Retrieved from https://zenodo.org/record/5112131
Paterson III (2021)
(). Where Have All the Collections Gone?. Retrieved from https://www2.archivists.org/am2021/research-forum-2021/agenda#posters
Paterson III (2021)
(). Language Archive Records: Interoperability of Referencing Practices and Metadata Models  (M.A. Thesis) University of North Dakota, Grand Forks, North Dakota. Retrieved from https://commons.und.edu/theses/3937/
Stvilia, Gasser, Twidale, Shreeves & Cole (2004)
, , , & (). Metadata Quality for Federated Collections. In Proceedings of the Ninth International Conference on Information Quality (ICIQ-04). (pp. 111–125). Retrieved from https://www.ideals.illinois.edu/handle/2142/721
Burke & Zavalina (2019)
& (). Exploration of information organization in language archives. Proceedings of the Association for Information Science and Technology, 56(1). 364–367. https://doi.org/10.1002/pra2.30
Bow, Christie & Devlin (2014)
, & (). Developing a Living Archive of Aboriginal Languages. Language Documentation and Conservation, 8. 345–360. Retrieved from https://espace.cdu.edu.au/view/cdu:42475
Bird & Simons (2021)
& (). Towards an Agenda for Open Language Archiving. University of North Texas. https://doi.org/10.12794/langarc1851171
Miller (2010)
(). The One-To-One Principle: Challenges in Current Practice. Dublin Core Metadata Initiative. Retrieved from https://dcpapers.dublincore.org/pubs/article/view/1043.html
Hutt & Riley (2005)
& (). Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Providers of Cultural Heritage Materials. ACM Press. https://doi.org/10.1145/1065385.1065447
Yi, Lake, Kim, Haakman, Jewell, Babinski & Bowern (2022)
, , , , , & (). Accessibility, Discoverability, and Functionality: An Audit of and Recommendations for Digital Language Archives. Journal of Open Humanities Data, 8(10). 1–19. https://doi.org/10.5334/johd.59
Thieberger & Harris (2022)
& (). When Your Data is My Grandparents Singing. Digitisation and Access for Cultural Records, the Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC). Data Science Journal, 21(9). 1–7. https://doi.org/10.5334/dsj-2022-009
Roy, Bhasin & Arriaga (2011)
Roy, L., Bhasin, A. & Arriaga, S. (). Tribal Libraries, Archives, and Museums: Preserving Our Language, Memory, and Lifeways. Scarecrow Press.
Elliott (2001)
(). The Manchu-Language Archives of the Qing Dynasty and the Origins of the Palace Memorial System. Late Imperial China, 22(1). 1–70. https://doi.org/10.1353/late.2001.0002
Bartlett (1992)
(). Respect des Fonds: The Origins of the Modern Archival Principle of Provenance. Primary Sources & Original Works, 1(1-2). 107–115. https://doi.org/10.1300/J269V01N01_07
Haworth (2001)
(). Archival Description: Content and Context in Search of Structure. Journal of Internet Cataloging, 4(3-4). 7–26. https://doi.org/10.1300/J141v04n03_02
Urban (2010)
(). Principle violations: Revisiting the Dublin Core 1:1 Principle. American Society for Information Science.
Ferreira, Lukschy, Watyam, Ungsitipoonpor & Seyfeddinipur (2021)
, , , & (). A Website Is a Website Is a Website: Why Trusted Repositories Are Needed More Than Ever. University of North Texas. https://doi.org/10.12794/langarc1851176
Clayphan, Charles & Wynne (2017)
, & (). Europeana Data Model – Mapping Guidelines (2.4). Europeana Foundation. Retrieved from https://pro.europeana.eu/files/Europeana_Professional/Share_your_data/Technical_requirements/EDM_Documentation/EDM_Mapping_Guidelines_v2.4_102017.pdf
Bardi, Kupietzky, Isaac, Matei, Weber, Arnold, Martinez Conde, Fingerhut, Clayphan, Bailly, Rühle, Charles & Agenjo (2014)
, , , , , , , , , , , & (). Recommendations for the representation of hierarchical objects in Europeana. Europeana Foundation. Retrieved from https://pro.europeana.eu/project/hierarchical-objects
(). Introduction to the DPLA Metadata Model (5). Digital Public Library of America. Retrieved from https://pro.dp.la/hubs/metadata-application-profile
(). Ohio Digital Network Metadata Application Profile (1.6). Ohio DPLA Project. Retrieved from http://ohiodigitalnetwork.org/wp-content/uploads/metadata-application-profile-v1-6.pdf
Wasson, Medina, Chong, LeMay, Nalin & Saintonge (2018)
, , , , & (). Designing for Diverse User Groups: Case Study of a Language Archive. Journal of Business Anthropology, 7(2). 235–267. https://doi.org/10.22439/jba.v7i2.5605
Albarillo & Thieberger (2009)
& (). Kaipuleohone, the University of Hawaiʻi’s Ethnographic Archive. Language Documentation & Conservation, 3(1). 154–181. Retrieved from http://hdl.handle.net/10125/4422
Wiberg (2014)
(). Mukurtu: Information Retrieval System Engineered for Indigenous Individuals and Communities. Retrieved from http://ifla-test.eprints-hosting.org/id/eprint/922
Burke, Zavalina, Chelliah & Phillips (2022)
, , & (). User needs in language archives: Findings from interviews with language archive managers, depositors, and end-users. Language Documentation & Conservation, 16. 1–24. Retrieved from http://hdl.handle.net/10125/74669
Kipp (2007)
(). Swimming in Words. Cultural Survival Quarterly, 31(2). 36–43. Retrieved from https://www.culturalsurvival.org/publications/cultural-survival- quarterly/swimming-words
Weber (2021)
(). The Curation of Language Data as a Distinct Academic Activity: A Call to Action for Researchers, Educators, Funders, and Policymakers. Journal of Open Humanities Data, 7(28). 1–10. https://doi.org/10.5334/johd.51
Hugh Paterson III
Hugh Paterson III
Collaborative Scholar

I specialize in bespoke research at the intersection of Linguistics, Law, Languages, and Technology; specifically utility and life-cycle management for information products in these spaces.