Do Hindi users pay more for AI? New data says Claude is expensive for non-english speakers


Are you talking to AI in Hindi or another non-English language? If yes, your AI chatbot could actually be costing you more. Companies like Anthropic, OpenAI and Google often present their latest AI models as tools that work equally well for everyone, regardless of where they live or what language they speak. But as it turns out, new data shared by researchers suggests that users who interact with AI in languages such as Hindi, Arabic and Chinese may effectively pay more than English speakers for conveying the same amount of information.

The reason? It lies behind how AI models process language. The same prompt in Hindi can generate significantly more tokens — the units AI systems use to read and understand text — than its English equivalent. Or, to put it simply, saying the same thing in Hindi will cost you more tokens than saying it in English. Hence making AI more expensive for non-English speakers.

Researchers, developers and AI users are increasingly referring to this phenomenon as a “language tax” or “linguistic tax”. Or a hidden cost created by the way AI models process different languages.

What exactly is going on?

A few weeks ago, OpenAI researcher Aran Komatsuzaki shared an experiment comparing how OpenAI and Anthropic’s tokenizers handle text in different languages. Using AI researcher Rich Sutton’s influential essay ‘The Bitter Lesson’ as a benchmark, Komatsuzaki translated the text into multiple languages and measured how many tokens were generated by different AI systems.

The results revealed a significant gap between English and several non-English languages. According to the analysis, Hindi text required 1.37 times more tokens than English on OpenAI’s tokenizer. On Anthropic’s Claude tokenizer, however, the figure rose to 3.24 times. Arabic required 2.86 times more tokens on Claude, while Chinese required 1.71 times more.

In simpler terms, if an English-speaking user spends one token budget to communicate an idea, a Hindi-speaking user may need more than three times that token budget on Claude to express the same information.

Komatsuzaki noted that these figures are based on a specific benchmark rather than every possible type of text. Still, the findings have sparked a wider conversation about how AI systems treat non-English languages.

But why do some languages cost more than others?

The answer lies in how AI models break down and process text. Before an AI model can understand a prompt, it converts the text into smaller units called tokens. This process is handled by a component known as a tokenizer.

Now according to researchers, this ‘language tax’ happens because AI models were mostly trained on English data. Because of this, the systems handle English much more efficiently. Other languages, like Hindi, Arabic, and Chinese, get broken down into many more pieces (tokens) due to their different scripts and structures, which makes them costlier to process.

And Komatsuzaki’s findings are not the only research pointing in this direction. Several studies have examined tokenizer efficiency across Indian and other non-English languages.

Now do note that higher token counts do not necessarily mean an AI model is worse at understanding Hindi or other languages. Token efficiency and language quality are two separate issues.

How to fix AI language tax?

There is no easy fix yet. Researchers say AI companies can improve tokenizers, train them on more multilingual data and design systems that handle non-English languages more efficiently. But until that happens, millions of users who interact with AI in their native languages may continue to face higher costs than English speakers for communicating the same amount of information.

– Ends

Published By:

Divya Bhati

Published On:

Jun 23, 2026 12:40 IST



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *