That ‘cheap’ open-source AI model actually burns through your calculation budget

That ‘cheap’ open-source AI model actually burns through your calculation budget

6 minutes, 16 seconds Read

Do you want smarter insights into your inbox? Register for our weekly newsletters to get only what is important for Enterprise AI, data and security leaders. Subscribe now


An extensive New study has shown that open-source artificial intelligence models use considerably more computer sources than their competitors with closed source in performing identical tasks, possibly undermine their cost benefits and reform how companies evaluate AI-implementation strategies.

The research conducted by AI Firm Nous researchdiscovered that open-weight models use between 1.5 and 4 times more tokens-the basic units of AI calculation-this closed models such as those of Openi And Anthropic. For simple knowledge questions, the gap was dramatically expanded, with a few open models with a maximum of 10 times more tokens.

“Open weight models use 1.5–4 × more tokens than closed (up to 10 × for simple knowledge questions), which means that they are sometimes more expensive per query despite lower costs per token,” the researchers wrote in their report that was published on Wednesday.

The findings challenge a prevailing assumption in the AI industry that open-source models offer clear economic benefits compared to their own alternatives. Although Open-Source models usually to pay less per token costs, the study suggests that this benefit can be “easily compensated if they need more tokens to reason about a certain problem.”


Ai -scale distribution touches its limits

Power caps, rising token costs and inference inference reform Enterprise AI. Become a member of our exclusive salon to discover how top teams are:

  • Change energy into a strategic advantage
  • Architecting efficient conclusion for real transit profits
  • Unlocking competitive ROI with sustainable AI systems

Secure your place to stay ahead: https://bit.ly/4mwgngo


The actual costs of AI: why ‘cheaper’ models can break your budget

The investigation investigated 19 different AI models About three categories of tasks: basic knowledge questions, mathematical problems and logical puzzles. The team has measured “token efficiency” – how many computational units models use with regard to the complexity of their solutions – a statistics that have received little systematic research, despite the considerable implications of the costs.

“Token -efficiency is a critical statistics for various practical reasons,” the researchers noted. “Although hosting open weight models can be cheaper, this cost benefit can easily be compensated if they need more tokens to reason about a certain problem.”

Open-source AI models use up to 12 times more computational sources than the most efficient closed models for basic knowledge questions. (Credit: Nous Research)

The inefficiency is particularly pronounced for major reasoning models (LRMs), which use extensively “Thought chains’To solve complex problems. These models, designed to think about step -by -step problems, can consume thousands of tokens that think about simple questions that require minimal calculation.

For basic knowledge questions such as “What is the capital of Australia?” The study showed that reasoning models “spend hundreds of tokens thinking about simple knowledge questions” that can be answered in one word.

Which AI models actually deliver for your money

The research showed that major differences between model providers. OpenAi’s models, especially being o4-mini and newly released open source GPT-Osss Variants, demonstrated exceptional token efficiency, especially for mathematical problems. The study showed that OpenAI models “stand out for extreme token efficiency in math problems”, using a maximum of three times fewer tokens than other commercial models.

Under open-source options, nvidia’s Llama-3.3-Memotron-Super-49B-V1 Returned as “the most token -efficient open weight model in all domains”, while newer models of companies such as Magistral showed “exceptionally high use of token” as outbijters.

The efficiency gorge varied considerably per task type. While open models used about twice as many tokens for mathematical and logical problems, the difference is that balloon for simple knowledge questions where efficient reasoning should be unnecessary.

The newest models from OpenAI reach the lowest costs for simple questions, while some open-source alternatives can cost considerably more despite lower prices per linked. (Credit: Nous Research)

What corporate leaders need to know about AI computer costs

The findings have immediate implications for AI acceptance of companies, whereby calculation costs can be scaled quickly with use. Companies that evaluate AI models often focus on accuracy penchmarks and prices per token, but can overlook the total calculation requirements for Real-World tasks.

“The better token efficiency of closed weight models often compensates for the higher API prices of those models,” the researchers thought when analyzing the total inference costs.

The study also showed that providers of closed-source model seem to be actively optimizing for efficiency. “Models with closed weight are iteratively optimized to use fewer tokens to reduce the inference costs”, while open-source models “have increased their token use for newer versions, which may reflect a priority for better reasoning performance.”

The computational overhead varies drastically between AI providers, with some models using more than 1,000 tokens for internal reasoning for simple tasks. (Credit: Nous Research)

How researchers have cracked the code on AI efficiency measurement

The research team stood for unique challenges in measuring efficiency in various model architectures. Many models with closed source do not reveal their rough reasoning processes, instead they offer compressed summaries of their internal calculations to prevent competitors from copying their techniques.

To tackle this, researchers used completion sticks – the total account units that were invoiced for each query – as a proxy for reasoning effort. They discovered that “the most recent closed source models will not share their unprocessed reasoning traces” and instead “use smaller language models to transcribe the idea into summaries or compressed representations.”

The methodology of the study included testing with modified versions of known problems to minimize the influence of remembering solutions, such as changing variables into mathematical competitive problems of the American Invitational Mathematics Examination (AIME).

Different AI models show different relationships between calculation and output, in which some providers compress reasoning traces, while others provide full details. (Credit: Nous Research)

The future of AI efficiency: what will come afterwards

The researchers suggest that token efficiency should become a primary optimization goal, in addition to accuracy for future model development. “A more compacted cot will also make a more efficient use of context and can prevent context degradation during challenging reasoning tasks,” They wrote.

The Release of OpenAI’s Open Source GPT-OSS modelswho demonstrate the advanced efficiency with ‘freely accessible bedspreading’ could serve as a reference point for optimizing other open-source models.

The full research data set and evaluation code are Available on Githubso that other researchers can validate and expand the findings. Since the AI industry drives to more powerful reasoning opportunities, this study suggests that the real competition may not be about who can build the smartest AI – but who can build the most efficient.

In a world where every token counts, the most wasteful models can finally be priced from the market, regardless of how well they can think.

#cheap #opensource #model #burns #calculation #budget

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *