![]() |
| UCT's MzansiLM is the first AI language model trained on all 11 of South Africa's official written languages — but the researchers say the gap will take sustained collective effort to close. |
When a South African customer calls a business and hears a robotic American voice on the other end, the drop-off rate climbs significantly. Not because the person does not want help. Because they hang up. That data point came out of a Cape Town company called Untapped AI, which owns a call centre with over a decade of operational records. The pattern is consistent: foreign-sounding AI voices lose South African callers faster than human agents do.
That finding was published this week — and it is more important than most tech headlines you will see this month. Because it confirms something that anyone who speaks isiZulu, Sepedi, Tshivenda, or even just South African English already knows from experience. The AI tools being sold to businesses right now were not built for us. They were built in America, trained on American voices, and deployed here anyway.
I want to explain what that actually means — and why, if you are reading this with a South African language as your first language, this is one of the more interesting opportunities sitting in front of you right now.
The problem, explained without the tech jargon
South Africa has 12 official languages. Researchers at the University of Cape Town who built a model called MzansiLM — the first publicly available AI language model trained on all 11 of South Africa's official written languages — say nine of those 11 fall into what the AI research world calls "low-resource languages." That means there is not enough data in those languages for AI systems to learn from properly.
Think about how an AI learns to understand speech. It listens to thousands of hours of recorded conversations, transcripts, books, articles. Most of that data exists in English, Mandarin, Spanish, French. When someone speaks isiNdebele or Sepedi into an AI voice system, the system is working with a fraction of the training it had for English. The result is misheard words, mispronounced names, wrong responses — and customers who hang up.
The UCT team behind MzansiLM found that even within their own model, isiNdebele and Sepedi remain severely underrepresented. If the researchers building a model specifically for South Africa are still struggling with those languages, you can imagine what an American AI tool is doing with them in a call centre right now.
Untapped AI's founder Lloyd Matthew put it directly: when a caller switches to an African language, their current system asks the caller to continue in English or Afrikaans. That is the honest state of things in 2026. The gap is real, it is large, and it is not closing quickly.
🇿🇦 SA Spotlight
UCT's Department of Computer Science published research this month introducing MzansiLM — the first AI language model trained from scratch on all 11 of South Africa's official written languages. The team also released MzansiText, the dataset that makes the model possible. Both are freely available for researchers and developers to build on. UCT's Jan Buys said the goal is not just a product but proof that South Africa needs to build its own AI capacity rather than wait for American companies to get around to supporting local languages. The research will be presented at the Language Resources and Evaluation Conference in Mallorca, Spain.
Why this matters beyond the call centre
The businesses buying imported AI tools right now are making a mistake that is going to cost them customers. That is their problem. But the reason this matters to you — the person reading this on a phone in Limpopo or Soweto or KwaZulu-Natal — is different.
The reason AI systems cannot understand South African languages is not a mystery. It is a data problem. There is not enough recorded speech, not enough written text, not enough labelled examples in isiZulu, Tshivenda, Xitsonga, Sesotho, and the others. The companies building these systems know this. And they are paying people to fix it.
That is the part most articles about this story skip entirely.
The opportunity sitting inside this gap
Voice recording tasks
A platform called Luel AI is currently paying South Africans in US dollars to record natural conversations in isiZulu and isiXhosa. The task is simple — have a real conversation in your language using their platform, submit it, get paid. Both people in the conversation get paid. You need a phone, a quiet space, and someone to talk to in the language. That is it. Payment goes via Payoneer.
AI data annotation
Platforms like Toloka, Outlier, and Remotasks pay for small tasks that train AI systems — rating responses, labelling images, evaluating search results, transcribing audio. Toloka accepts South African users and pays via Payoneer. Outlier specifically accepts South African applicants and tends to pay higher rates for language-evaluation tasks. These are not full-time income sources but they are real supplemental earnings accessible on a smartphone.
Voice acting for AI training
Companies building African language AI models are actively looking for native speakers to record scripted and conversational content. A Cape Town company called Beatpulse recently listed a paid Xhosa voice actor role specifically for AI training data. The skill required is native fluency — no studio experience needed. If you speak isiXhosa, isiZulu, Tshivenda, or Sepedi natively, that fluency is an asset that global AI companies do not have easy access to and are willing to pay for.
Local AI content and consulting
As businesses realise imported AI tools are failing their South African customers, demand is growing for people who understand both AI tools and local context. This is an early-stage opportunity — but the person who builds knowledge now in how AI language models work, what they get wrong in a South African context, and how to advise businesses on this, is positioning ahead of where the market is going.
💬 Real Talk
I want to be honest about the income side of this. Voice recording tasks on platforms like Luel AI and Toloka are not going to replace a salary. The rates are low per task, availability fluctuates, and Payoneer setup adds friction for first-time users. Where this gets more interesting is if you treat it as entry into the AI data economy — not as the destination. The people earning meaningfully from this space in 2026 are combining multiple platforms, building a track record for quality, and moving toward higher-paying annotation and evaluation work on platforms like Outlier. That takes months, not days. Go in with honest expectations.
The internal link between this story and the digital income path is real. If you want to understand what AI data annotation actually involves before signing up anywhere, the article on the best remote AI data annotator jobs in South Africa covers the platforms and what they actually pay. And if you are thinking about how to build income on a smartphone without experience, the piece on trying to make a first R100 online in South Africa is an honest starting point.
Questions people are asking about this
Which South African languages are most underrepresented in AI systems right now?
According to UCT researchers, nine of South Africa's 11 official languages are classified as low-resource in AI terms. isiNdebele and Sepedi are the most severely underrepresented — even within MzansiLM, the model built specifically for SA languages. isiZulu has slightly more support from larger commercial models, but remains far behind English and Afrikaans.
Can I get paid to help train AI in my language from my phone?
Yes. Luel AI is currently paying for isiZulu and isiXhosa conversational recordings. Toloka and Outlier accept South African users for annotation and evaluation tasks. Beatpulse has listed voice acting roles for isiXhosa speakers specifically for AI training data. All of these are accessible on a smartphone. Payment typically goes via Payoneer — set that up first before you apply to any platform.
What is MzansiLM and does it mean AI will work better in SA languages soon?
MzansiLM is a language model built by UCT researchers trained on all 11 of South Africa's official written languages. It is not a consumer product — it is a research baseline that developers can build on. UCT's Jan Buys says it is evidence that South Africa needs its own AI capacity, not a guarantee that the problem is solved. Commercial improvement is still 12 to 18 months away at minimum, according to Untapped AI's estimate for when Microsoft and Amazon will localize properly for South Africa.
Is this relevant if I only speak South African English, not an African language?
Yes. Luel AI accepts South African English accent recordings specifically because global AI systems also struggle with SA English cadence and expression. Toloka and Outlier accept all English-speaking applicants. And the broader opportunity — understanding how AI tools fail in local contexts — is relevant regardless of which language you speak.
Tshivenda is one of the languages that falls into the low-resource category. That is the language of the region I grew up in. When I read that nine of South Africa's 11 official languages are essentially invisible to the AI systems being sold to businesses right now, it is not abstract to me. It means the person in Venda trying to use an AI customer service line, or trying to use a voice tool on their phone, is getting a worse experience than someone in London or New York — not because of anything they did wrong, but because nobody collected enough of their language to train the system properly. That gap will close. The question is whether the people who speak those languages are involved in closing it — or just waiting on the outside while someone else does.
