Language, Script, Orthography, and Text-input: Rating Text-input Difficulty Across Languages


Language revitalization is a process of bringing a language into previously used domains of communication. However, for many languages typing them on a computer is a new activity. This puts text-input into a research area where evolving models for language revitalization and language documentation have not ventured. The status quo in ethnographic (and linguistic) scholarly activity is to assume that the text-input (typing) activity is normative across languages. I situate my research in a component analysis looking at Language, Script, Orthography, and Text-input technology. I present the results of a computational experiment which compares the relative effort required to type (text input) a communicative message by using parallel texts (in different languages). Theoretically, parallel texts convey the same amount of information; therefore, they present an idealized way of comparing communicative throughput across languages. Orthographic patterns are language dependent and produce hotspots of action on keyboards which may be considered by typists as “difficult” — “zq” is a rare English combination, but if typed would constitute a difficult fingering on popular English keyboard layouts. Several anecdotal remarks exist in the language documentation literature addressing socio-technical-language situations in the South Pacific, Asia, and Central America (Boerger 2007, [ntu]; Cooper 2005:160, [kls]; Guérin 2008, [mkv]; Jany 2010, [pxm]) These authors point out that “hard-to-type” motivates some ethnolinguistic-minority communities to seek orthography reform. While perceptions around orthography reform needs may be community-based, a critical question is overlooked when orthography reform is the presumed solution. That is, to what extent is the keyboard layout a problem in orthography usage and therefore responsible for impressions like “hard-to-use”? As far as is published, no method has yet been proposed in which the work of typing the same text across different languages could be compared. That is, would a certain text be easier to type in English [eng], French [fra], or Eastern Dan [dnj]? This is especially relevant for language development work in multilingual contexts where technology users have the choice between using two languages in a digital context. To address this question, I turn to parallel corpora and keyboard optimization algorithms. However, instead of optimizing keyboards, I use the algorithms to provide keyboard-corpus pairings a score and then compare the scores. I use a parallel corpus of English, French, and Eastern Dan (Ivory Coast). Two languages are non-tonal; the third is tonal. The parallel corpus includes an orthography with no diacritics (English), one with some diacritics (French), and one with many diacritics (Eastern Dan). My results suggest that current text-input methods which rely on deadkey combinations significantly increase the text input difficulty for languages which have usage-based needs outside of the ASCII range. This suggests reconsidering previously reported research (Feit et al. 2016) which reports no major differences were found in typing patterns employed by users of different languages (English and Finnish).

15 Nov, 2022 15:00
University of Oregon, Eugene, Oregon


Boerger (2007)
(). Natqgu Literacy: Capturing Three Domains for Written Language Use. Language Documentation & Conservation, 1(2). 126–153.
Cooper (2005)
(). Issues in the Development of a Writing System for the Kalasha Language  (Ph.D.) Macquarie University,
Feit, Weir & Oulasvirta (2016)
, & (). How We Type: Movement Strategies and Performance in Everyday Typing. ACM.
Guérin (2008)
(). Writing an endangered language. Language Documentation & Conservation, 2(1). 47–67.
Jany (2010)
(). Orthography Design for Chuxnabán Mixe. Language Documentation & Conservation, 4(1). 231–253.
Subject Languages:
Mentioned Languages:
Content Mediums:
Hugh Paterson III
Hugh Paterson III
Collaborative Scholar

I specialize in bespoke research at the intersection of Linguistics, Law, Languages, and Technology; specifically utility and life-cycle management for information products in these spaces.