Building Spectre | Scrollback

Harvey’s internal collaborative cloud agent platform. By @ZongZiWang and Gabe Pereyra Spectre is Harvey's internal collaborative cloud agent platform. A request can start in Slack, the web app, or an automation; Spectre turns that request into a durable run, executes it inside an isolated sandbox, connects it to systems like GitHub, Datadog, and Linear through explicit boundaries, and returns reviewable artifacts such as summaries, diffs, branches, and pull requests. Spectre is not just a background agent runtime. It is a new collaborative surface for building software in the open inside Harvey. We built Spectre because local coding agents, while already indispensable, break down at the organizational boundary. They are tied to one laptop, one working directory, one set of credentials, and one engineer’s private context. That makes them hard to share, hard to secure, and hard to integrate deeply with the systems where engineering work actually happens. Existing tools were improving quickly, and many teams were converging on the same direction: take the increasingly capable local loop and move it into the cloud, then connect it to the systems where engineering work already happens. But they still did not give us the combination of isolation, durable history, resumability, explicit permissions, shared review surfaces, and company-context integration that we needed. That shift changes what an agent system is useful for. Spectre lets engineers investigate incidents in public threads, hand off work without re-explaining context, and wake up to reviewable pull requests and recurring maintenance runs. It also expands who can participate: engineers, product managers, and designers can all collaborate around the same runs and artifacts instead of routing everything through one person’s terminal. This post is about how we built that system, why the architecture looks the way it does, and what building it is teaching us about the infrastructure and organizational transformation that other knowledge industries, including legal, will need next. The First Version Take the local coding-agent loop that was already powerful, point it at a repository in a remote sandbox, and get the result back somewhere actionable. In practice, that meant: create a run, start a fresh sandbox, hydrate the repository, run the agent, stream the output, and persist the artifacts. Success meant a branch, a PR, a summary, and enough history to see what happened. Failure meant a human got the result back in a place where they could decide what to do next. That version was useful quickly because it changed the ergonomics of coding agents in a way that felt qualitatively different from IDE assistance. You could kick off real work and walk away, run several things at once, ask the agent to investigate instead of only implement, and hand the run to another engineer without turning your terminal into the source of truth. Just as importantly, the work stopped being trapped in one person’s local session. The run, its context, and its artifacts became shareable, which made it much easier for engineers and non-engineers alike to collaborate around the same piece of work. It also made the next layer of the problem obvious. To make Spectre effective for a team of engineers working together, we needed a remote coding agent that was not just a model in a container. Once it is operating on behalf of a team, you need a durable run rather than a fragile process, an execution boundary rather than an ambient desktop, a harness that can coordinate providers and tools, and shared surfaces where the work can be streamed, reviewed, resumed, and scheduled. The Core Technical Primitives Durable runs, ephemeral workers The most important design choice is that the durable object is the run record, not the worker process. Users collaborate around one stable record that carries ownership, sharing, attachments, execution history, artifacts, and provider session references. Spectre workers are short-lived. They exist to execute one slice of work and then terminate. This simplifies failure recovery because the system never has to preserve a live process as the primary state holder. It makes follow-ups tractable because a new worker can resume from an archived session state rather than trying to reanimate an old sandbox. It also keeps the product model stable across surfaces. Slack, web, CLI, and scheduled work all point at the same durable run instead of inventing their own session semantics. In practice, a follow-up does not wake up an old container. The control plane appends new interaction state to the durable run, starts a fresh worker, restores the relevant provider session context, and continues. Sandboxes as the execution boundary The sandbox defines the execution boundary. Spectre workers are isolated environments that are allowed to see a repository workspace, a configured set of tools, and a constrained set of credentials. They are not allowed to mutate the core sys

Scraped Article