Bridging Corpora: Creating Learner Pathways Across Texts


We discuss the development and evolution of The Bridge ( a linked data application which supports language learning via texts. It supports both instructors and students as they navigate the learning process and acquisition of new lexical items.

Scholarship on language acquisition in the humanities and beyond confirms the importance of reading at the appropriate level for language learners.Yet teachers and readers of historical languages, such as Greek and Latin, have little more than anecdotes to help us understand the readability of texts and identify comprehensible assignments. Is one poem, story, author, or genre more accessible to particular students than another? What readings might best prepare students to approach a target text? In the absence of such data, editors, teachers, and professors can only depend on established practices when assaying textual readability. Without accessible tools to support such assessments, pedagogical effectiveness is hindered, innovation, curtailed, and proficiency diminished. It is within this context that Bridge supports a new approach to measuring readability (Mulligan & Gruber-Miller 2022). To attain full comprehension, readers must typically know 95 to 98% of the words in that text (Hu Hsueh-chao and Nation 2000). Yet many novice readers routinely know only 25% of the words in commonly-taught texts. More thoughtful and purposeful text selection can facilitate positive attitudes in learners. Choosing texts with a greater overlap of vocabulary serves to both reinforce previously learned lexical items and exposing readers to new terms.

Bridge is written in Python. It uses Python-based Natural Language Processing on texts to lemmatize them and then link lemmas across texts. The user interface allows users to query and receive reports regarding lexeme similarity across several selected texts. In this way instructors grounding their curriculum in texts, can map out the new vocabulary from text to text as they craft lesson plans. Likewise learners can look for new-to-them words, on the basis of the texts they have already been exposed to. In this way, learner pathways can be “charted” based on texts learners have already encountered. Our success in the classrooms teaching Latin and Greek lead us to believe that the application can be used in more languages than just English, Latin, and Greek.

Bridging Corpora: Creating Learner Pathways Across Texts
Proceedings of Linking Lexicographic and Language Learning Resources (4LR 2023) a workshop at LDK 2023
September 2023

Citable as

Paterson III, Hugh J., Bret Mulligan, Anna Lacy & Patricia Guardiola. 2023. Bridging Corpora: Creating Learner Pathways Across Texts. In Proceedings of Linking Lexicographic and Language Learning Resources (4LR 2023) a workshop at LDK 2023. pp. 598–603.

Bibliography to the abstract

Gruber-Miller & Mulligan (2022)
& (). Latin Vocabulary Knowledge and the Readability of Latin Texts: A Preliminary Study. New England Classical Journal, 49(1). 80–101.
Hu Hsueh-chao & Nation (2000)
& (). Unknown vocabulary density and reading comprehension. Reading in a Foreign Language, 13(1). 403–430. Retrieved from
Content Mediums:
Hugh Paterson III
Hugh Paterson III
Collaborative Scholar

I specialize in bespoke research at the intersection of Linguistics, Law, Languages, and Technology; specifically utility and life-cycle management for information products in these spaces.

Bret Mulligan
Bret Mulligan
Professor of Classics

Latin Professor.