Six principles for production AI agents

Every now and then people ask me:

‘I am new in agent development, I am building something, but I feel that I miss some tribal knowledge. Help me catch up!“.

I am tempted to suggest some serious things, such as multi -week courses (eg. Hug or Berkeley), but not everyone is interested in that level of diving.

So I decided to collect six simple empirical lessons that helped me a lot during app.build development. This message is somewhat inspired by design decisions behind App.Build, but has been generalized and focused on a rapid guideline for newcomers in Agentic Engineering.

I have long been skeptical about prompt engineering, it looked more like shaman rituals than something that is close by in the area of Engineering. All those approaches “I’ll tip you $ 100“Or”My grandmother dies and needs this“Or”100% accurate or different“Can be useful as local fluctuation that uses inefficiency of the local model, but never worked on the longer run.

I have changed his mind with regard to prompt / context -engineering when I realized something simple: modern LLMS just needs a direct detailed context, no tricks, but clarity and lack of contradictions. That is it, no manipulation necessary. Models are good at following instructions, and the problem is often only the ambiguous nature of the instructions.

All LLM providers have educational sources about best practices to ask for their models (eg. An anthropic And One of Google). Simply follow them and make sure that your instructions are directly and detailed, not a smart-pants tricks required. Here is, for example, one System prompt We use to generate claude rules for Ast-Grep – Nothing difficult, only details about the use of the tool that the agent hardly knows.

A trick that we like is to start the first system prompt with the concept made by deep research -like variants of LLM. It usually needs human improvements, but is a solid basin.

Keeping a shared part of the context is beneficial for the Quick caching Mechanisms. From a technical point of view, user messages can also be found in the cache cache, but the context structure so that the system part is large and static and a user is small and dynamic.

Okay, a solid system prompt is here. But there is a reason why “context engineering” has been the newest trend about “prompt engineering”.

Context management is a subject for a consideration. Without the right context, models tend to hallucinate, go off the track or simply refuse to give an answer with too big a context. They are subject to view of attention (where models have difficulty concentrating on relevant parts of very long contexts, which leads to broken performance buried on important details in the middle), higher costs and latency.

A principle that we have found useful is to offer the absolute minimum of knowledge in the first place, and the option to collect more context if necessary via tools. In our case, for example, this may mean that all project files are stated in the prompt and a tool can be given to read the files that are relevant to the requested change; Although we are sure that some file content is crucial, we can include its content in the context in advance.

Logs and other artifacts of the feedback loop can quickly blown the context. Simple context compaction Automatically applied tools can help a lot. Encapsulation was a hype word for object-oriented programming, but for context management it is even more important: separate the worries and give each bit of your agent solution only the context it absolutely needs.

The core function of an AI agent is Calling toolThe combination of an LLM + exposed tools + basic operating current operators makes an agent.

Designing a tool set for the agent is somewhat similar to designing an API … but actually more complex. Human API users are better able to read between the lines, can navigate complex documents and find solutions. Tools created for agents are usually more limited (that too many of them is a way to pollute the context), should have direct simple interfaces and bring the overall order to the Stochastic LLM world. When building for a human user it can be great to design one main road and some tricks for corner boxes; LLMs are most likely to abuse your loops, and that is why you don’t want to be losers.

Good tools usually work at a similar level of granularity and have a limited number of strictly typed parameters. They are focused and well tested, such as an API that you are ready to offer to a smart but distractable junior developer. Idempotency is highly recommended to prevent problems with state management. Most software engineering agents have less than 10 multifunctional tools (such as read_file, Write_File, Edit_File, Execute …) with 1-3 parameters each (Appbuild example” Open code example), and attach extra tools based on context can also be suitable.

In some cases, designing an agent is to write a DSL code (domain-specific language) with actions instead of tools one by one a great idea. This approach was wide Popular by Smolagenten; However, it needs a correctly designed series of functions that must be exposed for the implementation of the agent. Despite the structure change at the highest level, the main idea remains valid: simple, sufficient but non-ambigue and non-redundant tools are crucial for the performance of the agent.

Good agent solutions combine the benefits of LLMS and traditional software. A crucial way of this combination is the design of a two -phases algorithm similar to the Actor-critical approach: where an actor decides on actions and a critic evaluates them.

We think it is useful to make LLM actors creative and critics can be strict. In our app generation world, this means that actors create or edit new files, and critics ensure that this code matches our expectations. The expectations are based on handmade criteria: we want the code to be compilable, running tests, typing checks, linters and other validators. The work of the critic is usually determined, but not 100% – for example, we can generate tests with an LLM independent and later perform the test suite.

When construction agents are for vertical, it is crucial to record domain -specific validation. This requires the definition and checking of domain variants that must retain, regardless of the specific approach of the agent – a concept that ML calls engineers mentions as a “inductive bias”.

Software Engineering is an industry that is most affected by AI agents for precise this reason. The feedback job is incredibly effective: it is easy to filter out poor results using very simple validators such as compilers, linters and tests. This influences the performance on two levels: fundamental models are trained on such verifiable rewards on a scale, and later product engineers can use these learned properties.

The same thinking applies to other domains. If an example, if a travel-oriented agent suggests a multi-leg flight, it is first to check whether those connections exist. Similarly, if the result of a bookhoudagent does not meet the principles of double input, this is a bad result and it should not be accepted.

Feedbacklussen are tightly linked to the concept of “guardrails” that are available in many frameworks. Agents are moderately good at recovering. Sometimes a bad result is worth trying to solve (sending the next message to the LLM with a reflective “hey, your previous solution is not acceptable because of X”), other times a chain of bad fixes can no longer be repaired – just throw it away and try again.

Agentic system must be ready for both hard and soft failures with different recovery strategies, and those recovery strategies together with the guardrails are the essence of a feedback job. You can think of it in a way that is similar to the Monte-Carlo Tree Search Concept: Some branches are promising and must be further developed, some are dead end and must be cut away.

Once you have confirmed a basic agent and a Feedbackklus, you can repeat and improve. Error analysis has always been a cornerstone in AI/ml -engineering and AI agents are no different.

An approach to error analysis is to assess common malfunctions, but agents are so productive! It is often easy to spawn dozens of agents, to keep them on different tasks, to generate tons of logs (hope that your feedback loop has built in a perceptibility function, right?). No matter how productive you are, it is very likely that the log current of agents is not readable.

That is why a simple meta-agent loop is very powerful:

Make a basin
Get some routes / logs
Analyze them with an LLM (a compliment for Gemini’s 1m context)
Improve the basic line based on the insights received.

Very often this blind spots will reveal in the context management or tools provided.

Nowadays, LLMs are powerful, and that is why people get frustrated fairly quickly when agents do really stupid things, or completely ignore the instructions. The reality is that instruction-matched models are also very sensitive to rewarding hacking, which means that they do something possible to satisfy the goal as it is interpreted. However, this is not necessarily the goal that the original system designer had in mind.

The insight is: an annoying problem cannot be caused by the LLM errors, but a system error such as the lack of the tool to resolve the problem or ambiguous section in the system prompt.

Recently I was loud curse: why on earth does the agent not use the integration provided to get the data and instead use the simulated random data, despite my explicit request not to do this? I read the logs and realized that I am the fools here – I did not give the agent the right API tests, so it tried to pick up the data, failed several times in the same way and instead went for a solution. That was not the only accident: for example, we also observed similar behavior with agents who tried to write a file while the access to the file system was missed.

Building effective AI agents is not about finding a silver bullet of a great prompt or an advanced framework -it is system design and correct software engineering. Focus on clear instructions, lean context management, robust tool interfaces and automated validatieluses. When your agent frustrates you, the system first starts: missing tools, unclear indications or insufficient context are usually the perpetrators, no model restrictions.

The most important thing is that you treat error analysis as a first -class citizen in your development process. Let LLMS help you understand where your agents fail and discuss this failure modi systematically. The goal is not perfect agents – they are reliable, repairable that can be gracefully failed and iteratively improved.

#principles #production #agents

Six principles for production AI agents

Like this:

Related

Similar Posts

Designer Spotlight: Julie Marting | Codrops

How user -oriented design improves the efficiency of hosted call centers – Designbeep

Leave a Reply Cancel reply

Share this:

Like this:

Related

Similar Posts

Leave a Reply Cancel reply