But 1,000 tokens per second is actually modest by Cerebras standards. The company has measured 2,100 tokens per second on Llama 3.1 70B and reported 3,000 tokens per second on OpenAI’s own gpt-oss-120B open-weight model, suggesting that Codex-Spark’s relatively lower speed reflects the overhead of a larger or more complex model.
AI coders have had a breakout year, with tools like OpenAI’s Codex and Anthropic’s Claude Code reaching new levels of usability for quickly building prototypes, interfaces, and boilerplate code. OpenAI, Google and Anthropic have all raced to deliver more capable coding agents, and latency has become what separates the winners; a model that codes faster allows a developer to iterate faster.
With stiff competition from Anthropic, OpenAI has rapidly iterated on its Codex line, releasing GPT-5.2 in December after CEO Sam Altman issued an internal “code red” memo about Google’s competitive pressure and then releasing GPT-5.3-Codex just days ago.
Diversification beyond Nvidia
Spark’s deeper hardware story could be more consequential than its benchmark scores. The model runs on Cerebras’ Wafer Scale Engine 3, a board-sized chip that Cerebras has built its business around since at least 2022. OpenAI and Cerebras announced their partnership in January, and Codex-Spark is the first product to emerge from it.
OpenAI has systematically reduced its dependence on Nvidia over the past year. The company signed a massive multi-year deal with AMD in October 2025, struck a $38 billion cloud computing deal with Amazon in November and has designed its own custom AI chip for eventual manufacturing by TSMC.
Meanwhile, a planned $100 billion infrastructure deal with Nvidia has so far fallen through, although Nvidia has since committed to a $20 billion investment. Reuters reported that OpenAI became dissatisfied with the speed of some Nvidia chips for inference tasks, which is exactly the kind of workload OpenAI designed Codex-Spark for.
Regardless of which chip is under the hood, speed matters, although it can come at the expense of accuracy. For developers who spend their days in a code editor waiting for AI suggestions, 1,000 tokens per second can feel less like carefully controlling a jigsaw and more like controlling a rip saw. Just pay attention to what you cut.
#OpenAI #bypasses #Nvidia #unusually #fast #encryption #model #slabsized #chips

