Asian man speaking to a floating web camera which is automatically translating into English.

Categories:

“Cultural diversity is deeply intertwined with linguistic diversity. Every language carries within it a unique set of ideas, ways of thinking, and forms of expression. When a language fades, so does the cultural knowledge it embodies.”

David Gosset in China Daily

The Sapir-Whorf hypothesis states that Language is not just a tool for expressing thoughts, but shapes how we perceive aspects like time, space, color, and agency in ways that subtly shape attention and interpretation.1 Language encodes a particular way of seeing the world.  When a language dies, it “is not just the loss of words, but the sad erosion of one particular vision of the world,” as David Gosset writes.2 Yet today, we are witnessing the unprecedented acceleration of linguistic extinction. You could say that artificial intelligence is completing a project that colonialism began: the systematic erasure of languages that do not belong to the wealthy, English-speaking West.

This may sound melodramatic, but it is sadly true. It is already documented across linguistic research and the observable behavior of the AI systems.3 It is pretty evident that these systems have the power to reshape communication and knowledge production worldwide. So the danger of this is very clear.

My article explores how artificial intelligence accelerates linguistic homogenization, eroding cultural identity, and deepening the digital divide. It will discuss how this reflects and amplifies historical power imbalances. It examines why preserving linguistic diversity is not an academic concern but a cultural imperative.

The Scale of Erasure: What We’re Actually Losing

David Crystal, in his book Language Death in 2000, mentions that around 6,000 – 7000 languages exist worldwide. Of these languages that are spoken today, it is said that roughly half are projected to disappear by the end of this century.4 This is primarily due to factors such as globalization, political oppression, and cultural assimilation. 

This is not simply a linguistic problem. Each language that vanishes takes with it irreplaceable cultural knowledge and historical memory. What distinguishes linguistic loss from other forms of cultural decline is its permanence. When a language becomes extinct, the knowledge embedded within it, ecological wisdom, indigenous healing practices, philosophical concepts, and spiritual frameworks often cannot be fully translated or recovered.5

The extinction of each language results in the irrecoverable loss of unique cultural, historical, and ecological knowledge. Each language is a unique expression of the human experience of the world. Thus, the knowledge of any single language may be the key to answering fundamental questions of the future.

UNESCO, 2003

The Colonial Foundation

Colonialism was, fundamentally, a project of linguistic domination. European powers understood that to control a people, you had to control their language. In Kenya, as across much of the colonized world, children were punished for speaking their native tongues. In schools designed to produce obedient workers, success was measured by fluency in the colonizer’s language. Local languages that carried centuries of accumulated knowledge were frowned upon as the speech of the unsophisticated and stigmatized. To succeed, you were encouraged to abandon your language.6

Ngũgĩ wa Thiong’o and other writers dealing with decolonization have captured the violence of this process: “to starve or kill a language is to starve and kill a people’s memory bank.”6 This was not accidental. Colonial education systems were designed, explicitly, to sever colonized peoples from their own cultural roots while making them economically dependent on the colonial power’s language and knowledge systems.

What is striking is how little has changed. Elite schools across the formerly colonized world, celebrated for their “global standards” and “international excellence,” continue to teach their students that true knowledge exists in English, that African history is a footnote to European history, and that cultural sophistication is measured by proximity to Western European norms.6

As an Irish person, I am also acutely aware of the current state of the Irish language (Gaeilge). English colonialism in Ireland has made English the language of power. Until today, the paths to education, employment, and success run through English. The shift to English may feel like a choice, but economic pressures and the need for survival drive this process. This has created a cultural fracture. Despite government policies aimed at promoting Irish, there has been a documented decline in the number of Irish speakers.

The Digital Divide: How AI Reproduces English Hegemony

Today, the classroom punishment has been replaced by algorithmic exclusion.

The scale of English dominance in digital spaces is difficult to overstate. Currently, nearly half of all websites are in English, despite English being the native language of only a small percentage of the global population. Spanish, at 6%, the second most represented language online, accounts for only a fraction of that share.7  The gap reflects and reinforces the economic and cultural power embedded in the English-speaking internet. This is a symptom of the digital divide, as developing countries, which have the majority of the world’s population, have much less access to digital infrastructure. As in other systems of inequality, the world’s poor have a much smaller slice of the digital cake.

Large language models, the AI systems powering mainstream chatbots and educational tools, are primarily trained on English-saturated data. Although marketed as “multilingual,” these systems are, in the words of researchers at Johns Hopkins University, “faux polyglots” that create “information cocoons.”8

A Johns Hopkins study examined how such models respond to questions about geopolitical conflicts when prompts are given in different languages. When users asked about an Indian political figure in English, the model drew on English-language sources; when the query was in Hindi, it drew on Hindi sources. But when the same query was posed in a low-resource language with sparse training data, the model defaulted to English-language sources, often reflecting Western or American perspectives, regardless of the region in question.8

A hypothetical scenario is described in which three users ask about the India–China border dispute: a Hindi speaker receives an answer shaped primarily by Indian sources, a Chinese speaker receives a Chinese-leaning response, and an Arabic speaker, whose language lacks relevant training data, receives an answer grounded in American English sources. The Arabic speaker is influenced by the English speaker’s worldview, and may leave with a very different understanding of the conflict.8

This is not a technical glitch that better engineering will solve. It is a structural problem: English-dominant training data means English worldviews. As more humans come to rely on large language models for information and decision-making, they are progressively channeled toward English-language perspectives, regardless of their native language or geographic context.8

The Linguistic Imperialism of AI: When Technology Mirrors Power

Scholars of postcolonial linguistics use the term linguistic imperialism to describe the dominance of English at the expense of other languages.9 AI intensifies this linguistic imperialism through several mechanisms:

Data imbalance: AI developers train models on vastly more English content than any other single language. This is partly a reflection of the internet’s English bias, but it is also a choice: major AI companies prioritize English because that is where profits lie. Developing robust models for endangered languages has no immediate commercial return.

Structural invisibility: when a language lacks sufficient training data, it does not simply disappear from the model; it is silently folded into the dominant language. A Swahili query about local politics may return information filtered through English-language intermediaries. Users of low-resource languages do not see that translation process; they experience it as neutral knowledge.

Accelerating language shift: as AI tools become embedded in education and professional life, speakers of minority languages face mounting pressure to use English. Why write in an endangered language if the AI writing assistant works better in English? Why teach children a heritage language if educational tools operate almost exclusively in English?

Cultural erosion: loss of a language erodes entire knowledge systems. Indigenous ecological knowledge, spiritual practices, and social structures are often impossible to translate into English without distortion. When these systems can be expressed only through an English-language interface, they are fundamentally altered, stripped of nuance, philosophy, and of their intrinsic power.

The problem, in other words, is not merely technological. It is ideological. AI systems trained on English-dominant data are not neutral utilities; they are artifacts of power that reproduce and amplify the dominance of English-speaking, predominantly Western perspectives.

Why This Matters: The Stakes of Linguistic Diversity

The threat to linguistic diversity is a concern not only for linguists but also for the broader public. It has profound implications for human society:

Epistemological loss: different languages encode different ways of knowing. Many Indigenous languages embed ecological relationships and seasonal patterns directly into grammar and vocabulary. When these languages disappear, the knowledge they carry about sustainable land use, seasonal migration, and environmental stewardship often vanishes with them.

Cognitive diversity: the Sapir–Whorf hypothesis suggests that linguistic diversity is cognitive diversity. Languages that grammatically emphasize collective responsibility shape how speakers perceive community and obligation. Languages with complex systems for indicating evidentiality (how one knows something to be true) shape how speakers evaluate knowledge claims. When languages disappear, humanity loses these alternative cognitive frameworks.

Identity and belonging: for speakers of minority languages, language loss is experienced as cultural erasure. Language connects people to ancestors, community, and a particular way of being in the world. When children cannot communicate with grandparents in their heritage language, as increasingly happens when families shift toward economically dominant languages, entire cultural narratives, and knowledge systems are lost.

Justice: the acceleration of language loss through AI is, ultimately, a justice issue. The languages most threatened by English hegemony and AI bias are predominantly spoken by colonized, racialized, and economically marginalized communities. The same groups whose voices were silenced through colonial schooling are now being systematically excluded from the digital future.

Technology as a Tool to Support Diversity

While current AI systems reinforce English hegemony, we should not forget that technology can also be used to support linguistic diversity. The technology itself is not at fault; rather, it is the way it has been designed and implemented.

Projects such as Masakhane, an open-source machine translation initiative for African languages, demonstrate a radically different model. It is community-led, transparent, and explicitly oriented toward linguistic justice. Similarly, initiatives like Sunbird at Makerere University in Uganda collaborate with local communities to create datasets and tools for African languages that are often overlooked by commercial platforms.10

UNESCO’s Recommendation concerning the Promotion and Use of Multilingualism and Universal Access to Cyberspace urges member states to ensure that all cultures can express themselves and have access to cyberspace in all languages, including indigenous ones, and to support capacity-building for the production of local and indigenous content on the Internet. It also calls for the development and adaptation of operating systems, search engines, and web browsers with extensive multilingual capabilities, along with online dictionaries, terminologies, and automated translation services. These efforts project a vision for technology that is not inherently anti-diversity.12

Whether AI accelerates language death or supports language life depends on who controls it, whose languages are included in training data, and which values guide its design.

A Different Future: What It Would Require

Reversing the trajectory of linguistic homogenization through AI is possible but demands systemic change:

Mandated diversity in AI training data: regulators and funding bodies should require AI developers to train models on diverse linguistic and cultural data, including low-resource languages, and to document language coverage transparently.

Community-led technology development: minority-language communities must have agency and resources to build their own datasets and tools. Funding structures should prioritize projects that embed community governance into AI development.

Legal protection for linguistic rights: Governments should enact and enforce laws guaranteeing the right to use and transmit minority languages in education, public life, and digital spaces.

Educational transformation: schools need to move from assimilationist models to multilingual pedagogies that treat students’ languages as assets. Translanguaging, mother-tongue instruction, and bilingual programs should be standard practice where linguistic diversity exists.

Decolonizing knowledge systems: institutions, from universities to AI labs, must confront the assumption that “global standards” equal Western norms. True excellence in education and technology will mean centering Indigenous knowledge systems and non-Western epistemologies on their own terms.

Language as Freedom

Technology must be reimagined as an instrument of preservation rather than erasure. It must be designed and governed with linguistic justice at its core. The pace of development in recent years has been so fast that nobody has paid attention to this critical ethical issue.

Language is never merely a way to communicate. It is a way to belong, to think, to remember. The fight for linguistic diversity in the age of AI is, ultimately, a fight for human freedom, the freedom to transmit one’s heritage, to learn in one’s own language, and to shape the future in one’s own voice rather than the voice imposed by algorithms trained on the dominance of the English-speaking world.


References

  1. Whorf, Benjamin Lee. Language, Thought, and Reality: Selected Writings of Benjamin Lee Whorf. Edited by John B. Carroll. Cambridge, MA: MIT Press, 1956.
  2. China Daily. “How AI Threatens Linguistic Diversity.” China Daily, January 19, 2025. https://www.chinadaily.com.cn/a/202501/20/WS678d8c39a310a2ab06ea7e9c.html
  3. D’Angelo, Francesca. “The Hegemony of English Language in the Digital Era: Safeguarding Linguistic Diversity as Intangible Cultural Heritage.” inTRAlinea, 2025. https://www.intralinea.org/current/article/the_hegemony_of_english_language_in_the_digital_era
  4. Crystal, David. Language Death. Cambridge: Cambridge University Press, 2000.
  5. UNESCO Ad Hoc Expert Group on Endangered Languages. “Language Vitality and Endangerment.” Paris: UNESCO, 2003. https://ich.unesco.org/doc/src/00120-EN.pdf
  6. Kagwe, Al Kags. “AI: Are We Witnessing the Final Frontier of Cultural Erosion?” Alkags.me, November 25, 2024. https://alkags.me/ai-and-language/
  7. W3Techs. “Usage of Content Languages for Websites.” W3Techs Web Technology Surveys, accessed January 2026. https://w3techs.com/technologies/overview/content_language
  8. Johns Hopkins University, Hub. “Multilingual Artificial Intelligence Often Reinforces Bias.” JHU Hub, September 1, 2025. https://hub.jhu.edu/2025/09/02/multilingual-artificial-intelligence-often-reinforces-bias/
  9. Phillipson, Robert. Linguistic Imperialism. Oxford: Oxford University Press, 1992.
  10. Masakhane Research Foundation. “Masakhane: A Grassroots NLP Community for African Languages.” Project documentation, 2020–2025. https://www.masakhane.io/
  11. UNESCO. “Recommendation concerning the Promotion and Use of Multilingualism and Universal Access to Cyberspace.” Adopted October 15, 2003, 32nd session of the General Conference, Paris. https://www.unesco.org/en/legal-affairs/recommendation-concerning-promotion-and-use-multilingualism-and-universal-access-cyberspace.

Image: AI Generated

Comments are closed