African researchers have released what’s believed to be the largest known dataset of African languages for AI development, capturing 9,000 hours of speech across 18 languages from Kenya, Nigeria, and South Africa. This $2.2 million Gates Foundation-funded initiative addresses a critical gap in AI accessibility, as most current AI tools like ChatGPT are trained primarily on English and other European languages, leaving millions of Africans excluded from the AI revolution.
Why this matters: With Africa home to over a quarter of the world’s languages—more than 2,000 in total—the lack of African language representation in AI creates barriers to essential services and economic opportunities for hundreds of millions of people.
The challenge: Most African languages are primarily spoken rather than written, creating a data scarcity problem for AI training.
- AI systems require vast quantities of text data to function effectively, but African languages lack the extensive online written content available for English, Chinese, and European languages.
- “We think in our own languages, dream in them and interpret the world through them. If technology doesn’t reflect that, a whole group risks being left behind,” explains University of Pretoria’s Prof Vukosi Marivathe.
What the project accomplished: The Africa Next Voices initiative brought together linguists and computer scientists to create AI-ready datasets capturing everyday scenarios in farming, health, and education.
- Languages recorded include Kikuyu and Dholuo in Kenya, Hausa and Yoruba in Nigeria, and isiZulu and Tshivenda in South Africa—some spoken by millions of people.
- The team gathered voices from different regions, ages, and backgrounds to ensure inclusivity, according to computational linguist Lilian Wanzare.
- The data will be open access, allowing developers to build tools that translate, transcribe, and respond in African languages.
Real-world applications: Indigenous language AI tools are already solving practical challenges across the continent.
- Farmer Kelebogile Mosime uses the AI-Farmer app, which recognizes several South African languages including Sesotho, isiZulu, and Afrikaans, to troubleshoot farming problems on her 21-hectare vegetable operation.
- “Daily, I see the benefits of being able to use my home language Setswana on the app when I run into problems on the farm, I ask anything and get a useful answer,” Mosime explains.
The business case: South African company Lelapa AI is building AI tools in African languages for banks and telecoms, highlighting the economic barriers created by language exclusion.
- “English is the language of opportunity. For many South Africans who don’t speak it, it’s not just inconvenient—it can mean missing out on essential services like healthcare, banking or even government support,” says CEO Pelonomi Moiloa.
Cultural preservation concerns: Beyond practical applications, researchers warn that excluding indigenous languages from AI development risks losing cultural knowledge and worldviews.
- “Language is access to imagination,” Prof Marivathe notes. “It’s not just words—it’s history, culture, knowledge. If indigenous languages aren’t included, we lose more than data; we lose ways of seeing and understanding the world.”
Recent Stories
DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment
The Department of Energy has released a new roadmap targeting commercial-scale fusion power deployment by the mid-2030s, though the plan lacks specific funding commitments and relies on scientific breakthroughs that have eluded researchers for decades. The strategy emphasizes public-private partnerships and positions AI as both a research tool and motivation for developing fusion energy to meet data centers' growing electricity demands. The big picture: The DOE's roadmap aims to "deliver the public infrastructure that supports the fusion private sector scale up in the 2030s," but acknowledges it cannot commit to specific funding levels and remains subject to Congressional appropriations. Why...
Oct 17, 2025Tying it all together: Credo’s purple cables power the $4B AI data center boom
Credo, a Silicon Valley semiconductor company specializing in data center cables and chips, has seen its stock price more than double this year to $143.61, following a 245% surge in 2024. The company's signature purple cables, which cost between $300-$500 each, have become essential infrastructure for AI data centers, positioning Credo to capitalize on the trillion-dollar AI infrastructure expansion as hyperscalers like Amazon, Microsoft, and Elon Musk's xAI rapidly build out massive computing facilities. What you should know: Credo's active electrical cables (AECs) are becoming indispensable for connecting the massive GPU clusters required for AI training and inference. The company...
Oct 17, 2025Vatican launches Latin American AI network for human development
The Vatican hosted a two-day conference bringing together 50 global experts to explore how artificial intelligence can advance peace, social justice, and human development. The event launched the Latin American AI Network for Integral Human Development and established principles for ethical AI governance that prioritize human dignity over technological advancement. What you should know: The Pontifical Academy of Social Sciences, the Vatican's research body for social issues, organized the "Digital Rerum Novarum" conference on October 16-17, combining academic research with practical AI applications. Participants included leading experts from MIT, Microsoft, Columbia University, the UN, and major European institutions. The conference...