A Nigerian Startup’s Newest Model Is Crushing OpenAI And Google

Across much of Africa, voice assistants still stumble over everyday speech. A simple reassurance like “No worry, e go better” can morph into “No war eagle butter,” and names such as “Chukwuebuka” turn into tongue-twisting nonsense. The dream of hands-free tech that works as smoothly as it does elsewhere remains elusive for millions.

Nigerian AI startup Intron says it’s changing that. This week, the company unveiled Sahara v2, its latest automatic speech recognition (ASR) model built for how Africans actually talk. Rather than merely closing the gap with Silicon Valley, Intron argues it’s leaping ahead—at least on the languages, accents, and environments that dominate the continent.

Built in the wild, not the lab

The scale is notable: Sahara v2 was trained on more than 50,000 hours of audio from 40,000 speakers spanning 30 countries. It now supports 57 languages, adding 24 new ones that include Hausa, Swahili, Yoruba, and Zulu.

Crucially, the data isn’t studio-clean. Intron gathered speech in real-world settings—Nigerian clinics, Kenyan call centers, South African courtrooms—where overlapping voices, street noise, and imperfect mics are the norm. That messiness is the point: models that thrive there tend to survive anywhere.

On the company’s benchmarks, the gains are eye-catching. Intron claims Sahara v2 beats leading models like GPT-4 and Gemini by 68.6% when transcribing African names, organizations, and locations. In noisy environments, it reports a 36.5% improvement in “hallucination robustness,” meaning it’s less likely to invent words or phrases when audio quality drops.

Cracking code-switching

The standout feature may be the debut of what Intron calls the world’s first bilingual Swahili–English ASR model, built with Kenyan healthcare provider Penda Health. It’s tuned for code-switching—the fluid shifts between languages that define everyday conversation in many African cities. Global systems often choke on mid-sentence language pivots; Intron is betting that mastering them is a durable edge.

“We built for the hardest environment first,” said Tobi Olatunji, Intron’s CEO and a former physician, pointing to the company’s origins in overstretched Nigerian hospitals.

A rush of rivals—and the risk of commoditization

The timing is both opportune and precarious. Interest in African language AI has surged. Toronto-based Cohere recently introduced “Tiny Aya,” a set of multilingual models supporting over 70 languages, designed to run on-device where connectivity falters. Microsoft Research rolled out Paza, including a benchmark for low-resource African languages. Google released WAXAL, an open speech dataset covering 21 Sub-Saharan languages.

This wave validates Intron’s thesis—but also narrows its moat. Open datasets and benchmarks lower the barrier for new entrants and can squeeze pricing power for incumbents. If everyone can train similar models, the battleground shifts from raw accuracy to distribution, integration, and trust.

From demos to deployment

Intron’s answer is to go deep on infrastructure and use cases. Sahara v2 is being rolled out to speed transcription in Ogun State courts in Nigeria and to cut documentation errors at C-Care hospitals in Uganda. Financial firms like ARM Investments cite accuracy on complex jargon and Nigerian currency formats—areas where general-purpose models often stumble.

Data sovereignty is another wedge. Through a partnership with Nvidia, Sahara v2 can run fully offline, enabling governments and sensitive sectors to deploy within their own firewalls—a compelling proposition in markets wary of data leakage and latency.

“We’ve seen significant improvement in transcription and summaries,” said Ayo Oluleye, Head of Data at ARM Investments.

Audere’s CPO Sarah Morris added that the company’s APIs achieved “99%+ success rates” on Southern African accents during testing.

Why voice wins—and what it will take

Voice is widely expected to be the next major interface for the internet in regions where typing in local languages is slow, literacy varies, or keyboards are poorly localized. If AI can’t reliably understand users, those users remain shut out of the digital economy.

On its home turf, Intron appears to have an edge, at least by its own—and some customers’—measures. But the question is shifting from if Africa’s voice infrastructure will materialize to how it will be built and by whom. Can a sub-20-person startup outrun the data centers of Big Tech and the momentum of open-source academia? Or can it carve a sustainable niche by mastering the continent’s linguistic realities and embedding itself in the “plumbing” of hospitals, courts, and call centers?

For now, Sahara v2 is a statement: accuracy rooted in African soundscapes, code-switching as a first-class feature, and deployment models that respect sovereignty. Whether that’s enough to keep the giants at bay is the next test.

Nigerian Startup’s Sahara v2: Revolutionizing AI Speech Recognition in Africa

Alex Rivera

Unlock Your Escape: Mastering Asylum Life Codes for Roblox Adventures

Challenging AI Boundaries: Yann LeCun on Limitations and Potentials of Large Language Models

Charting New Terrain: Physical Reservoir Computing and the Future of AI

Decoding Everyday Decisions: The Power of Operational Research and Model Thinking

Navigating the Eel’s Journey: Insights from the 2024 Working Group on American Eel Meeting

Nigerian Startup’s Sahara v2: Revolutionizing AI Speech Recognition in Africa

Up next

Author

Alex Rivera

Tags

Share article