We threw out our agent

We automate desktops and legacy portals using Claude. The sort of stuff that an enterprise hires 100+ person off-shore teams to do or pays UIPath and a team of consultants millions of dollars for. Our home-baked Golang agent runtime has served us well for last 18 months. It was pretty good (if I do say so myself) and we automated tens of thousands of hours of back-office work in high-risk industries. We threw most of this stack out and have rebuilt it on top of two companies (@AnthropicAI and @daytonaio ). It took 4 months and our browser agent platform is now 80% cheaper, up to 90% faster, and just as reliable. To explain our motivations and how we got here, I want to walk you through the major issues that lead up to this decision: a) our agent couldn't really write code it doesn't really matter what type of agent you're building anymore. Our take is that if you're building an agent that's not narrowly constrained, it needs to be able to write an execute code. Before, if a customer wanted us to simply take some input data and combine it with the agent output then put it into a form, we had to have the agent do it at inference time. Now we can run code to do it with 100% accuracy in a fraction of the time and tokens. b) we couldn't do desktop We're build to service the long tail of terrible tasks that drain the soul out of employees at healthcare companies. A lot of that is on Windows. We exist for the companies that need to automate a portal built in ASP.NET WebForms, running inside a frameset, on an internal network served over a custom protocol, etc. If you can't run desktop, that's a problem c) we couldn't remember things well The browser spits out vast amounts of context. Some executions are millions of tokens long. We did try to remember things using a variety of techniques, but the lack of a native file system was really an issue. Agents love files. They love cp'ing things around, grepping over files, finding things, awking, seding etc. They use these tools to write and patch their memories and it works really well. Without a real file system it was a tough problem. We gave the agent a file system and it's noticeably happier. d) bonus round: lab harnesses keep getting more capable ...and we kept having to maintain our own versions of features that we could get for free. compaction? we had our own version of that. Sub-agents? we've been maintaining our own version for a year. Context management? all home grown. So what did we do? 1: All-in on the frontier lab harnesses We got rid of most of our custom Golang harness. You can argue that this was a skill issue but I'd like to plead our case. Ultimately, we are not a frontier AI lab or a coding agent company like Cursor building a competitor to Anthropic's Claude Code or OpenAI's Codex. We're an automation company that makes terrible work less bad for the companies doing frontline work. We knew deep down that our effort should not be spent reinventing the agentic wheel* The battleground of the labs is now in harness territory, and we saw what happened when these companies fought head to head on frontier models. The same will happen for the harnesses. I think rolling your own harness from scratch might go the way of pretraining your own models in 2022. The models will be trained in the context of the harnesses too, meaning they'll operate better in Claude Code / Codex etc. than they will in our home-rolled harness. We chose Anthropic because we like Claude, and at the time of our decision it was SOTA on browser tasks. By trusting that the labs will do this work and do it well, we free ourselves up to focus on what really matters; automating processes. 2: Put our agents inside sandboxes By putting our agents onto virtual machines (we use @daytonaio), we give each execution an ephemeral environment that it can use to solve the task in the way the model sees fit. The environment no longer constrains the model. It has access to bash, to a file system, and do code writing and reading abilities. It can use its cloud browser programmatically instead of with tools. In a similar vein to the above, browser and virtual machine infrastructure is not our core value prop. This is the remit of Daytona who do an incredible job. The result is that our agents can frequently run almost entirely end to end, even on complex branching workflows, using learned code from previous executions. They cost a fraction of what they used to. The agents are not smarter, but their environment makes them more powerful in that they can express their abilities in a much less restricted manner. For example, we now frequently see agents inspecting network traffic and writing scripts to bypass the UI entirely. Browser employees Web automation has gone through many phases. I think we're in the phase of 'browser employees'. Give an agent a task, give it a virtual machine, and let is rip. The results have been pretty staggering. Agents on repeated tasks tend towards the baseline latency induced

Scraped Article