Since 2024, Anthropic’s performance optimization team has been giving applicants a test to ensure they know their stuff. But as AI coding tools have gotten better, the test has had to change a lot to stay ahead of AI-assisted cheating.
Team leader Tristan Hume described the history of the challenge in a blog post on Wednesday. “Each new Claude model has forced us to redesign the test,” Hume writes. “With the same time limit, Claude Opus 4 outperformed most human applicants. That still allowed us to distinguish the strongest candidates, but Claude Opus 4.5 matched even those.”
The result is a serious candidate assessment problem. Without personal proctoring, you cannot ensure that someone does not use AI to cheat during the test. If he does, he will quickly reach the top. “Under the limitations of the take-home test, we no longer had a way to distinguish between the output of our top candidates and our most capable model,” Hume writes.
The problem of AI fraud is already happening wreak havoc on schools and universities all over the world, so ironic that AI labs are also affected. But Anthropic is also uniquely well-equipped to tackle the problem.
Ultimately, Hume designed a new test that had less to do with optimizing hardware, making it sufficiently new to beat contemporary AI tools. But as part of the post, he shared the original test to see if anyone reading it could come up with a better solution.
“If you are the best at Opus 4.5,” the message reads, “we would love to hear from you.”
#Anthropic #revising #technical #interview #test #cheat #Claude #TechCrunch


