AI-generated machine translations have flooded Wikipedia’s smaller language editions with error-riddled content, creating a dangerous feedback loop as AI models train on these flawed pages. The problem is particularly acute for vulnerable languages with few native speakers, where up to 60% of Wikipedia articles are now uncorrected machine translations that could accelerate language extinction rather than preserve these cultural treasures.
The scale of the problem: Machine-translated content has overwhelmed Wikipedia editions in hundreds of lesser-known languages, with devastating accuracy issues.
- Volunteers working on four African languages estimate that between 40% and 60% of articles in their Wikipedia editions are uncorrected machine translations.
- More than two-thirds of longer pages in the Inuktitut Wikipedia contain portions created through machine translation.
- The Greenlandic Wikipedia became so corrupted that its manager deleted almost everything and is now requesting the edition be shut down entirely.
Why AI struggles with vulnerable languages: Machine translation systems perform poorly on languages with limited online text and unique linguistic structures.
- Google’s research found that translation systems for lower-resourced languages were generally of lower quality, often mistranslating basic nouns including animal names and colors.
- Greenlandic and most Native American languages use agglutinative structures where single words can express entire sentences, making them poorly suited to most machine translation systems.
- AI translators produce absurd errors like claiming Canada has only 41 inhabitants or suggesting the Fulfulde word for “harvest” means “fever.”
In plain English: Agglutinative languages work like linguistic building blocks—speakers attach prefixes and suffixes to root words to create complex meanings. For example, a single Greenlandic word might express what English needs an entire sentence to convey, like “the one who is repeatedly going to hunt seals.” This structure confuses AI systems designed for languages like English that rely more on word order and separate words.
The feedback loop effect: Wikipedia serves as a primary training source for AI language models, meaning errors get amplified across the entire AI ecosystem.
- Wikipedia was estimated to make up more than half the training data for AI models translating some African languages including Malagasy, Yoruba, and Shona.
- For 27 under-resourced languages, Wikipedia was the sole easily accessible source of online linguistic data available for AI training.
- This creates a “garbage in, garbage out” cycle where poorly translated Wikipedia pages poison the data wells that future AI models draw from.
Real-world consequences: The proliferation of AI-generated errors is already harming language learning and preservation efforts.
- Error-strewn AI-generated books for learning languages like Inuktitut, Cree, and Manx are now appearing for sale on Amazon.
- Abdulkadir Abdulkadir, an agricultural planner, warns that machine-translated farming information in Fulfulde could “easily harm” farmers who rely on accurate seasonal guidance.
- Noah Ha’alilio Solomon, a Hawaiian language professor at the University of Hawai’i, reports that 35% of words on some Hawaiian Wikipedia pages are incomprehensible.
What language advocates are saying: Community leaders describe the situation as culturally devastating and potentially accelerating language extinction.
- “It is painful, because it reminds us of all the times that our culture and language has been appropriated,” says Solomon about poor Hawaiian content on Wikipedia.
- Abdulkadir predicts a bleak future for Fulfulde: “It is going to be terrible, honestly. Totally, completely no future.”
- Kenneth Wehr, who managed Greenlandic Wikipedia, concluded: “There is nobody in Greenland who is interested in this, or who wants to contribute. There is completely no point in it.”
The exception that proves the rule: Inari Saami demonstrates how careful community management can make Wikipedia work for endangered languages.
- This Finnish language went from four child speakers to several hundred speakers over four decades.
- The community created 6,400 Wikipedia articles, each copy-edited by fluent speakers, with quality prioritized over quantity.
- Wikipedia has been integrated into Inari Saami school curricula and helps introduce new vocabulary for modern concepts.
Platform responsibility questions: The Wikimedia Foundation, which operates Wikipedia, maintains that individual language communities bear responsibility for content quality.
- “Ultimately, the responsibility really lies with the community to see that there is no vandalism or unwanted activity, whether through machine translation or other means,” explains senior director Runa Bhattacharjee.
- But many vulnerable language editions lack active communities to monitor and correct problematic content.
- The foundation’s approach is to maintain platforms “in case someone comes along to revive” dormant editions.
The race against time: Linguists suggest that creating high-quality content quickly might be the only way to break the negative feedback loop.
- According to UNESCO, a language becomes extinct every two weeks.
- “ChatGPT only needs a lot of words,” notes Fabrizio Brecciaroli from the Inari Saami Language Association. “If we keep putting good material in, then sooner or later, we will get something out.”
- However, the damage may already be embedded in major AI systems—neither Google Translate nor ChatGPT can correctly count to 10 in Greenlandic.
Recent Stories
DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment
The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...
Oct 17, 2025Tying it all together: Credo’s purple cables power the $4B AI data center boom
Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...
Oct 17, 2025Vatican launches Latin American AI network for human development
The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...