Visual imitation learning: Guidde trains AI agents on human ‘expert video’ instead of documentation

For years, the “last mile” of digital transformation has been littered with forgotten PDFs and ignored training manuals.

Organizations spend millions on advanced software such as SAP or Salesforce, where employees struggle with basic navigation. As the age of agentic AI dawns, companies face a double-edged sword: they must teach human employees to collaborate with AI, while simultaneously teaching AI agents to navigate the labyrinthine interfaces of the modern enterprise.

One idea that seems to be gaining momentum among AI-forward companies: using screen recordings and tutorials/walkthroughs of someone performing a business task (whether creating a new ticket or processing an invoice) and training AI to replicate the flow based on the screen recording. This week a startup called Standard Intelligence went viral on X with an early demo of an open version of this for the physical and digital world.

But the truth is that there are already players tackling this problem for the company itself: case-in-point, Guidedan Israeli startup born during the video-centric years of the COVID-19 pandemic, today announced a $50 million Series B oversubscription financing round led by PSG Equity to address precisely this knowledge infrastructure crisis.

Instead of giving an agent a static PDF manual, Guidde provides high-fidelity “Video Ground Truth”: a rich stream of data collected by real human experts as they navigate complex software.

The investment signals a shift in the way the technology industry views documentation – not as a static byproduct of work, but as the critical telemetry needed to train the next generation of autonomous digital agents.

Technology: from video recording to world models

At its core, Guidde is an AI Digital Adoption Platform (ADAP). However, the technological breakthrough lies in what happens behind the scenes during a recording.

Guidde doesn’t just record pixels; it records every click, scroll and latent interaction with the HTML page: the subtle pauses, the specific scroll depths and the corrections a human makes when a system is lagging behind. This telemetry transforms raw video into a Vision-Language-Action (VLA) training set.

Meanwhile, the platform’s Magic Redaction automatically hides sensitive data like passwords or credit card numbers during capture, keeping materials secure and HIPAA-aligned.

“Every time you click a button, drag and drop, scroll, type, we collect the interaction… everything, we sanitize it – there is no private information,” explains Yoav Einav, co-founder and CEO of Guidde, in an exclusive interview with VentureBeat.

Under the hood, the platform captures the underlying metadata and DOM (Document Object Model) changes in sync with the video frames. The differentiator is the telemetry hidden beneath the surface.

This rich metadata creates a ‘digital world model’ of enterprise software. And because every enterprise uses its own unique mix of apps and processes, Guidde is creating a data moat that allows enterprise agents to reason through legacy user interfaces with the same spatial insight as a human, so that automation actually works in a production environment instead of just in a lab demo.

To a human it is a tutorial. For an AI agent, it is a high-fidelity map of the interface. This allows agents to “see” and search complex user interfaces like humans do, solving the “last mile” of automation where agents previously failed due to a lack of specific business and in-situ usage context.

In a sense, Guidde is building a ‘self-driving car’ like a Waymo for computing.

Product: Three Pillars of Guidance

The platform has evolved into three different products designed to grow with the maturity of an organization:

Create Guide: The engine for subject matter experts to turn workflows into documentation in minutes.
Guidde Broadcast: a personalized recommendation engine (often compared to Netflix) that delivers answers within the tools people actually use. It knows who the user is and what department they are in, so relevant content becomes visible exactly when it is needed.
Guide Discover: The recently launched “agentic” pillar. Just as Waze maps roads by observing drivers, Discover software maps routes by tracking how employees work. It understands the workflow, creates the content and updates it automatically when the user interface changes.

Training people to use AI – and AI with the help of people

The most non-obvious aspect of Guidde’s growth is its dual mission. “We are the only platform that trains both people and agents,” says Einav.

As companies roll out AI tools like Microsoft 365 Copilot or ServiceNow agents, they encounter a skills gap. One of Guidde’s largest customers revealed that they were paying more than $1 million a year for an advanced AI tool, but “no one knows how to use it because they liked a 30-minute training session and that was it.” Guidde closes this gap by offering ‘bite-sized’ video tutorials in the flow of the work.

At the same time, these videos train the AI agents themselves. Foundation models like Gemini or GPT-4 often hallucinate when they need to execute specific enterprise workflows because they are not trained in the very specific, internal ‘vanilla workflows’ found in private enterprise systems. Guidde provides the ‘starting point’, the ‘metadata’ and the ‘x, y coordinates of the button’ that an agent needs to complete an action without getting stuck.

The multimodal advantage

To maintain this level of accuracy, Guidde uses a multimodal infrastructure. The system does not depend on a single model; instead, it uses a “fleet” of models that evaluate each other.

Google Gemini: Generally used for visual tasks such as analyzing PDFs or PowerPoints.
Anthropic Claude: Used for writing the storyline and narrative scripts.
Feedback loops: When a user edits a video, that data is fed back into the model to prevent the same errors from occurring in future recordings.

With this approach, Guidde can replace an old stack of six or seven disconnected tools – Loom for capturing, Adobe Premiere for editing, 11Labs for text-to-speech and Synthesia for avatars – with a single, AI-native platform. “We basically package everything for you,” says Einav, “and automate the entire process based on your brand guidelines.”

Video first origin story

Guidde’s origins lie in a frustration that every product leader knows. Before founding the company, Einav and co-founder Dan Sahar spent years managing video traffic at Qwilt, a company they started in 2010 to analyze how people watched Netflix and Disney+.

When COVID-19 hit, they saw a huge opportunity to apply that video expertise to the workplace. They noted that short video explainers could increase conversions from free to paid accounts by 30%, but the difficulties in creating them were unsustainable.

In an interview, Einav recalled the “boring work” of the old world: “My team in Israel would create the content, someone in the US with an American accent would do the narration, someone on the marketing team would write the script… and someone on the enablement team would do the editing.” This fragmented workflow meant that a single video took two to three weeks to produce. “And then two weeks later the product changes and you have to do it all over again,” Einav added.

Guidde is built to summarize this cycle in seconds. By automating the ‘Magic Capture’ of a workflow, the platform instantly generates a structured narrative script and a professional AI voice-over. This removes the editorial bottleneck, transforming subject matter experts into “training powerhouses.”

Licensing and market impact

Guidde’s pricing structure reflects the transition from a utility to a core part of the business infrastructure:

Free: $0 (up to 25 videos, web app support).
Pro: $18/creator/month (unlimited videos, branded kits).
Company: $39/creator/month (unlimited text-to-speech, analytics).
Enterprise: Custom pricing (multilingual translation, SSO, Magic Redaction).

The impact of the platform is already visible in the figures: 41% reduction in video creation time And 34% fewer incoming support tickets.

For customers like Emerson, this translates into 40-60% faster manual creation. Support teams in particular are finding that they can offload 80% of their ticket volume to agents, but only if those agents have the content that is useful.

“The agent without the content is useless,” Einav warns, noting that most company documentation is years out of date or not documented at all.

Early community and industry reception

Guidde already claims 4,500 business customers and wants to expand this number with a new financing round. Support and operations leaders have spoken out about the platform’s ease of use. Christopher Cummings, VP of Client Experience at DocNetwork, highlighted its ability to provide “fast, personalized video responses to customer queries.”

Meanwhile, Director of Customer Support Wren Cotrone noted, “Once you get the branding set up the way you want it, you can really zoom through this.”

Ronen Nir, Managing Director at PSG, summarized the investment thesis: “Guidde solves one of the biggest blockers to successful AI adoption: the knowledge infrastructure.”

Why this is important now

The paradigm shift from text-only LLMs to agentic video intelligence is the defining trend of 2026. Guidde’s Series B indicates that the “ground truth” for enterprise agents will come from raw video observation, not static documentation.

By capturing how work is done across tens of millions of workflows, Guidde is building a data set that few others possess.

As Einav put it: “It starts with people involved in the process, and over time it evolves to full autonomy.” For the modern enterprise, the map is no longer a static document; it’s a living, breathing video intelligence layer that guides both staff and the agents who support them.

#Visual #imitation #learning #Guidde #trains #agents #human #expert #video #documentation

Visual imitation learning: Guidde trains AI agents on human ‘expert video’ instead of documentation

Like this:

Related

Similar Posts

Alexander Isak to Liverpool strengthens the food chain of European football

No, you can’t get your AI to ‘admit’ it’s sexist, but it probably is | TechCrunch

Leave a Reply Cancel reply

Share this:

Like this:

Related

Similar Posts

Leave a Reply Cancel reply