Where AI is failing design systems, and where we fail AI – Bencallahan.com

August 25, 2025

For a design system program where the goal is often stable, high -quality, sustainable output, how much instability, lack of quality or transience can we tolerate in our tools? That is the question of Nathan Curtis and I have asked the design system community for delivery 060 of the question.

On August 21, 2025, Nathan and I facilitated a deep diving discussion in the data we have collected from 95 design system practitioners. We have only asked two questions:

Is there an expectation in your organization with which the digital interface production will be done Ai In the near future?
Where, in your daily work, is Ai Too inaccurate for your needs – What don’t you fail?

As you would expect, the answers were complicated. In the discussion we have recognized that Ai Is really good in some parts of our process and pretty terrible with others. It turns out, the places where Ai Is failing design system possible the places where we fail Ai.

Organizational context

People do in many of the work Ai In design systems, Ai‘The inherent probabilistic output continues to come across the deterministic promises that our systems have to do. We think of our design systems as a contract – a promise that we make to our consumers. Due to their nature, the great language models that disrupt every other segment of our work are very quickly and really good guessers. But guessing is not what we promise.

Jesse has expressed it well, ‘Us [IT] Team wants the type of security insurance that you can get from one deterministic System, which makes stochastic reactions … difficult … from a security position. “

Although it is not exactly the same, the expression he used (‘security insurance’) is usually charted trustworthy patterns, tested code and shared language that product teams can do to trust. When you insert Ai In the current, the thing that generates an answer can produce another reaction every time you ask. In this case, that variety means eroded trust.

Deterministic contracts versus probabilistic helpers

Nathan spoke about holding both truths in one go – use AiBut keep the contract deterministic: ‘This balance between predictive stochastic behavior and deterministic specification is the work … I could use Ai To predict what the data can be, but the system still needs explicit specifications. “

He is right – the work shifts from smart instructions to complex context. Others call this Context technologyAnd there is a lot going on. If you are someone who laughed about the idea that writing prompts could be a full -time job (hey, I was sure), this is different.

Engineering context usually looks like entire systems that construct the necessary data, boundaries, instructions, output sizes and more, so that what you get from a system is much more controlled. You can imagine a scenario in which this context technical systems must be tested and version must be tested, especially because context changes over time. This adds a whole new series of rows to your already growing COOL. This is of course a lot of work. And that’s where the conversation started to shift to where Ai Is easy to work and where it is a challenge.

A spectrum of precision

As I read the unprocessed data, I started to see a pattern in the Ai Successes and failures that you have shared. Most of you are probably familiar with the Double diamond design process.

‘The double diamond is a visual representation of the design and innovation process. It is a simple way to describe the steps in every design and innovation project, regardless of methods and used tools. The two diamonds represent a process of exploring a problem broader or deep (diverse thinking) and subsequently taking targeted action (convergent thinking). “

The design process with double diamonds – it seems easier to get successful AI results on the different sides of the diamonds than on the convergent sides.

The pattern that I saw in your answers hinted that it is much easier to get the current generation of Ai tools to help with a variety of tasks, and much more challenging to become good Ai Results for Convergente Tasks.

On the left, inaccuracy is often fine – even useful. It is not on the right. From the unprocessed data, this is what I have heard:

Varied, where Ai works today: This includes tasks such as brainstorming, synthesizing notes, research aid, disposable prototypes and repetitive or low risk tasks. A respondent summarized it: ‘It offers a fast starting point and can automate some repetitive tasks. (Anonymized Survey -Reaction)

Convergent, where Ai Breaks trust: This included tasks such as making code of production quality, strict design system reliability, pixel-perfect visual decisions and reproducibility. As a person said: ‘Nobody trusts to write at the moment, certainly not without the correct assessment and testing. (Anonymized Survey -Reaction)

Other thoughts from the unprocessed data echoed the same pattern: ‘Figma Make does not adhere to our design system … Even when it is asked to do this. “And, ‘We have only reached about 80% accuracy … less than 100% is not logical for us. (Anonymized survey -answers)

On the generation of components, one respondent was bot: ‘It produces questionable results when generating new components … Dual APINon-ideal typoscript practices. (Anonymized survey -response)

But you have shared where it really helps: ‘Ai Is great as a search engine and emphasizing best practices, ”wrote one respondent, while the other shared that it is that ‘Help to automate easy work and to speed up our workflows. (Anonymized survey -answers)

Lean inside where it works, experiment where it is not

From the discussion, the survey and the working notes, here is a simple model that I will continue:

Lean inside where Ai works well

Discovery and exploration (Brainstorming, research, competition analysis)
Summaries and first concepts (Synthesis at a high level, summary meetings and comments)
Structured refactors (Bulk renovations, token diff reports, CSS Cleaning up with clear specifications)
Prototyping with a low risk (Fast flows to discuss options, not For usability of evidence)

Experiment where Ai does not work (yet) well)

Design-System Trouw (“Components/tokens must be precise”)
Production code (“Reproducibility> novelty”)
Pixel Perfect Visual Decisions (“Brand language and white space -judgment still fail”)
Consistency and repeatability (“You can give Ai the same promptly several times and a different result ”)

How you can help on both sides:

Some following steps

If you control or harden Ai in your Ds program, here are a few things that you can do immediately, drawn directly from the deep diving and unprocessed data:

Write down the contract. Name the non -negotiation that guarantees your system (APIs, accessibility rules, tokensemantics, etc.) and instrument tests for them.
Strict on the convergent. Requires rigorous checks when you cross the use Ai for convergent Tasks. If it affects published system assets, it must pass these tests.
Meet trust. Follow the quantity of Ai Export that is accepted without operations, average clean -up time and bugs or deviations that have been recorded after you have been rolled out. That can be a size for Ai Trust debtA sentence that Brandon uses in the deep dive.
More intentional context. For example, you can annotate ‘Style bible ”with 20 side by side good/wrong Examples removed from your product and use it as part of your context.
Ask why. When Ai Do not meet your expectations, be the part of your practice to ask for a statement – perhaps dig in which design system guidelines on which the model was based on the defective output. In the long term, this can help your human users as much as your Ai those. ((To see Xai.).

To summarize it all, until a model can warranty The promises that your system makes, Ai It is best treated if an employee who accelerates and proposes scaffolding, not a factory that sends.

The human part is still the difficult part

All this makes me think so the reasons Ai fails are the same reasons why people often fail. Without the right information to start with, success is really difficult.

I see this every day in my coaching practice. 99% of the problems we work through are problems with people. That does not only change because you insert a little brilliant Ai Model in your process. Murphy Trueman recently wrote about this:

‘We pour energy in the wrong places – we continue to solve the wrong problems. Perfect component libraries, beautifully manufactured tokens, extensive documentation sites that win prizes. In the meantime, teams are building around the system because it is easier than using. “

Just a reminder not to forget the people on the other side of your components.

Curious how others do this? Come and hang with us Redwoods. It is a space for people who want to support each other on the journey of building better design system programs.

Learning mode

I am constantly inspired by the people who participate week after week to dive into the answers we collect. Each of you appears In learning mode. That is why we all walk away with widened perspectives and an appreciation for the experiences that we all bring to these conversations.

To those of you who were present, thanks for participating in such a graceful attitude.

Sources

Thank you

Many thanks to everyone who participated.

If you missed this week, Register for the question And be ready to answer next time.

Writing Design Systems the question was

#failing #design #systems #fail #Bencallahan.com

Where AI is failing design systems, and where we fail AI – Bencallahan.com

Organizational context

Deterministic contracts versus probabilistic helpers

A spectrum of precision

Lean inside where it works, experiment where it is not

Some following steps

The human part is still the difficult part

Learning mode

Sources

Thank you

Like this:

Related

Similar Posts

Burger King Font Free Download

which actually matters when you go from prototype to real

Leave a Reply Cancel reply

Organizational context

Deterministic contracts versus probabilistic helpers

A spectrum of precision

Lean inside where it works, experiment where it is not

Some following steps

The human part is still the difficult part

Learning mode

Sources

Thank you

Share this:

Like this:

Related

Similar Posts

Leave a Reply Cancel reply