z.ai’s open source GLM-5 achieves record low hallucination rate and uses new RL ‘slime’ technique

Chinese AI startup Zhupai, also known as z.ai, is back this week with a dazzling new cross-border large language model: GLM-5.

It is the latest in z.ai’s ongoing and continually impressive GLM series, has an open source MIT license – perfect for enterprise deployment – and, among its notable achievements, achieves a record-low hallucination rate on the independent market. Artificial Analytics Intelligence Index v4.0.

With a score of -1 on the AA-Omniscience Index – representing a massive 35-point improvement over its predecessor – GLM-5 now leads the entire AI industry, including US competitors such as Google, OpenAI and Anthropic, in knowledge reliability by knowing when to remember rather than make up information.

In addition to its reasoning abilities, GLM-5 is built for high-quality knowledge work. It features native “Agent Mode” capabilities that allow it to convert raw prompts or source material directly into professional office documents, including ready-made .docx, .pdf, and .xlsx files.

Whether generating detailed financial reports, high school sponsorship proposals or complex spreadsheets, GLM-5 delivers results in real-world formats that integrate directly into business workflows.

It is also disruptively priced at approximately $0.80 per million input tokens and $2.56 per million output tokens, approximately 6x cheaper than proprietary competitors such as Claude Opus 4.6, making state-of-the-art agentic engineering more cost-effective than ever before. Here’s what else enterprise decision makers need to know about the model and its training.

Technology: Scaling for Agentic Efficiency

At the heart of GLM-5 is a huge leap in raw parameters. The model scales from the 355B parameters of GLM-4.5 to a whopping 744B parameters, with 40B active per token in the Mixture-of-Experts (MoE) architecture. This growth is supported by an increase in pre-training data to 28.5T tokens.

To address training inefficiencies of this magnitude, Zai developed “slime,” a new infrastructure for asynchronous reinforcement learning (RL).

Traditional RL often suffers from long-tail bottlenecks; Slime breaks this lockstep by allowing trajectories to be generated independently, enabling the fine-grained iterations necessary for complex agent behavior.

By integrating system-level optimizations such as Active Partial Rollouts (APRIL), slime addresses the generation bottlenecks that typically consume more than 90% of RL training time, significantly accelerating the iteration cycle for complex agentic tasks.

The framework design is centered on a tripartite modular system: a high-performance training module powered by Megatron-LM, a deployment module that uses SGLang and custom routers for high-throughput data generation, and a centralized data buffer that manages fast initialization and deployment storage.

By enabling adaptive, verifiable environments and multi-turn compilation feedback loops, slime provides the robust, high-throughput foundation needed to transition AI from simple chat interactions to rigorous, long-horizon systems engineering.

To keep deployment manageable, GLM-5 integrates DeepSeek Sparse Attention (DSA), maintaining 200K context capacity and dramatically reducing costs.

Knowledge work from start to finish

Zai presents GLM-5 as an ‘office tool’ for the AGI era. While previous models focused on fragments, GLM-5 is built to deliver ready-made documents.

It can autonomously convert prompts into formatted .docx, .pdf, and .xlsx files, ranging from financial reports to sponsorship proposals.

In practice, this means the model can break down high-level goals into executable subtasks and perform “Agentic Engineering,” where humans define quality gates while the AI handles execution.

High performance

According to GLM-5 benchmarks, it is the new, most powerful open source model in the world Artificial analysiswhich surpasses Chinese rival Moonshot’s new Kimi K2.5, released just two weeks ago, showing that Chinese AI companies have almost been overtaken by much better equipped Western rivals.

According to z.ai’s own materials shared today, GLM-5 scores near state-of-the-art on several key benchmarks:

SWE bank Verified: GLM-5 achieved a score of 77.8, outperforming Gemini 3 Pro (76.2) and approaching Claude Opus 4.6 (80.9).

Sales bank 2: In a simulation of running a business, GLM-5 ranked #1 among open source models with a closing balance of $4,432.12.

In addition to performance, GLM-5 is aggressively undercutting the market. Live on OpenRouter from February 11, 2026, it will cost approximately $0.80 – $1.00 per million input tokens and $2.56 – $3.20 per million output tokens. It falls in the mid-range compared to other leading LLMs, but based on its top performance it is what you might call a ‘steal’.

Model	Input (per 1 million tokens)	Output (per 1 million tokens)	Total costs (1 million in + 1 million out)	Source
Qwen3 Turbo	$0.05	$0.20	$0.25	Alibaba cloud
Grok 4.1 Quick (reasoning)	$0.20	$0.50	$0.70	xAI
Grok 4.1 Fast (not reasoning)	$0.20	$0.50	$0.70	xAI
deepseek chat (V3.2-Exp)	$0.28	$0.42	$0.70	Deep Search
deepseek reasoner (V3.2-Exp)	$0.28	$0.42	$0.70	Deep Search
Gemini 3 Flash Preview	$0.50	$3.00	$3.50	Googling
Kimi-k2.5	$0.60	$3.00	$3.60	Moonshot
GLM-5	$1.00	$3.20	$4.20	Z.ai
ERNIE 5.0	$0.85	$3.40	$4.25	Qianfan
Claude Haiku 4.5	$1.00	$5.00	$6.00	Anthropic
Qwen3-Max (23-01-2026)	$1.20	$6.00	$7.20	Alibaba cloud
Gemini 3 Pro (≤200K)	$2.00	$12.00	$14.00	Googling
GPT-5.2	$1.75	$14.00	$15.75	OpenAI
Claude Sonnet 4.5	$3.00	$15.00	$18.00	Anthropic
Gemini 3 Pro (>200K)	$4.00	$18.00	$22.00	Googling
Close work 4.6	$5.00	$25.00	$30.00	Anthropic
GPT-5.2 Pro	$21.00	$168.00	$189.00	OpenAI

This is about 6x cheaper on input and almost 10x cheaper on output than Claude Opus 4.6 ($5/$25). This release confirms rumors that Zhipu AI was behind “Pony Alpha”, a stealth model that previously crushed coding benchmarks on OpenRouter.

However, despite the high benchmarks and low costs, not all early adopters are enthusiastic about the model, noting that the high performance doesn’t tell the whole story.

Lukas Petersson, co-founder of security-focused autonomous AI protocol startup Andon Labs, commented X: “After hours of reading GLM-5 tracks: an incredibly effective model, but much less situationally aware. Achieves goals through aggressive tactics, but doesn’t reason about the situation or use experience. This is scary. This is how you get a paperclip maximizer.”

The “paperclip maximizer” refers to a hypothetical situation described by Oxford philosopher Nick Bostrom in 2003in which an AI or other autonomous creation accidentally leads to an apocalyptic scenario or human extinction by following an apparently benign instruction – such as maximizing the number of paperclips produced – to an extreme degree, repurposing all the resources necessary for human (or other life) or otherwise making life impossible through its dedication to fulfilling the apparently benign goal.

Should your company adopt GLM-5?

Companies looking to escape supplier lock-in will find GLM-5’s MIT license and availability of open-weights a significant strategic advantage. Unlike closed-source competitors that keep information behind their own walls, GLM-5 allows organizations to host their own boundary-level intelligence.

Adoption is not without friction. The sheer scale of GLM-5 (744B parameters) requires a massive hardware floor that may be out of reach for smaller companies without significant cloud or on-premise GPU clusters.

Security leaders must weigh the geopolitical implications of a flagship model from a China-based laboratory, especially in regulated industries where the location and provenance of data are tightly controlled.

Furthermore, the shift to more autonomous AI agents introduces new governance risks. As models move from ‘chatting’ to ‘working’, they begin to work autonomously between apps and files. Without the robust agent-specific permissions and human-in-the-loop quality gates put in place by enterprise data leaders, the risk of autonomous errors increases exponentially.

Ultimately, GLM-5 is a ‘purchase’ for organizations that have outgrown simple copilots and are ready to build a truly autonomous office.

It’s for engineers who need to refactor an outdated backend or need a “self-healing” pipeline that doesn’t sleep.

While Western labs continue to optimize for “thinking” and depth of reasoning, Zai optimizes for execution and scale.

Companies adopting GLM-5 today are not just purchasing a cheaper model; they’re betting on a future where the most valuable AI is the one that can complete the project without being asked twice.

#z.ais #open #source #GLM5 #achieves #record #hallucination #rate #slime #technique

z.ai’s open source GLM-5 achieves record low hallucination rate and uses new RL ‘slime’ technique

Like this:

Related

Similar Posts

Air Canada cancels plans to resume flights on Sunday, while Union Back-to-Work Order

The stakes are sky-high: Private aviation reaches $26.6 billion as fractional ownership takes off

Leave a Reply Cancel reply

Share this:

Like this:

Related

Similar Posts

Leave a Reply Cancel reply