2026-06-11

Apache Burr Bets the Agent-Framework Race on State Machines and Observability

Burr enters Apache incubation by wagering that the agent-framework battle is shifting from capability to reliability: visible state, replay, recovery.

agents frameworks devtools

Apache Burr Bets the Agent-Framework Race on State Machines and Observability — Photo / Unsplash

Summary

Apache Burr has entered the Apache Software Foundation incubator as a Python framework for “building reliable AI agents and applications.” Its pitch is not smarter prompting or fancier multi-agent choreography. It is a set of decidedly old-fashioned engineering ideas: model the application as a state machine, attach a built-in observability interface to every step, let state be persisted to disk or a database, and allow past runs to be replayed after the fact. For two years that bundle sat at the margins of the agent-framework boom. Burr drags it to the center.

The move matters more than the project. When an agent framework leads with “visible, debuggable, recoverable” instead of “more capable,” it signals that the center of gravity in this race is shifting, from who lets the model do the most to who lets what the model does survive production and stay auditable. That is the sound of a market moving from excitement to deployment.

What happened

Burr’s site describes itself as pure Python with “no magic.” You declare each step with an @action decorator that names which state fields it reads and writes, wire actions together with transitions to form a state machine, and build() a runnable application. No domain-specific language, no YAML, just Python functions and decorators. The advertised core capabilities are: a simple Python API; built-in observability (the Burr UI monitors, debugs, and traces every step in real time, showing state changes as they happen); persistence and state management (state is saved to disk, databases, or custom backends, and applications resume from where they stopped); human-in-the-loop (pause at any step to wait for input); branching and parallelism (run actions in parallel, fan out and fan in, compose sub-applications into complex directed acyclic graphs); and testing plus replay (replay past runs, unit-test individual actions, validate state transitions).

On integrations, the site stresses “no lock-in,” connecting directly to OpenAI, Anthropic, LangChain, Hamilton, Streamlit, FastAPI, Haystack, Instructor, Pydantic, and PostgreSQL, positioning Burr as an orchestration layer rather than an all-in-one suite. The name is worth a footnote: per the author on Hacker News, Burr is named after Aaron Burr, the U.S. vice president who killed Alexander Hamilton in a duel; Hamilton is the same team’s (DAGWorks) earlier open-source library. One does directed acyclic graphs, the other does state machines, and the rivalry is an inside joke the team planted on purpose. The name is a gag, but the technical claim underneath is serious: you need a state machine, not just a DAG. That is exactly what sets Burr apart from the pack.

Why it matters

For two years, agent frameworks sold on the capability axis: smoother prompt chains, smarter tool calls, more elaborate multi-agent collaboration. Burr inverts that and bets on the reliability-and-operability axis. The state machine makes control flow explicit and inspectable. The observability UI lets you see what happened at each step. Persistence lets an application resume after a crash. Replay lets you reproduce and test against historical data. None of that is a sexier feature set, but it is the piece that is actually missing when you turn a demo prototype into a production system.

It is worth logging because it lands squarely on a real pain point. In the Hacker News thread (167 points), one comment put it well: pitching a new framework with “this is what writing an agent looks like” is the wrong move; the better pitch is “look how easy observability, guardrails, monitoring, deployment, evals, versioning, and A/B testing are with our framework.” Another long-time agent builder added that the hard part was never the agent loop itself; it is the orchestration, context management, guardrails, and monitoring that surround it. Burr’s feature list reads almost like a transcription of that pain list, which suggests its positioning is well aimed.

A caveat, though: turning state machines and observability into a framework is not the same as getting them right. Visible state is fundamentally a discipline problem, not something a framework solves for you. A team unwilling to model state explicitly will stuff it into globals no matter what framework you hand them. Burr provides scaffolding, not discipline.

Builder impact

If you are choosing an agent framework, Burr’s arrival hands you a concrete evaluation checklist, and that checklist is worth more than the framework itself. First, is state visible: can you print, at any moment, where the application is and what it remembers, instead of having it buried in closures and implicit context? Second, can you replay: when a weird case shows up in production, can you rerun it with the exact state it had to diagnose it, rather than guessing from logs? Third, failure recovery: when the process crashes or the third step’s model call times out, can the application resume from the checkpoint instead of burning tokens from scratch? Visible state, replay, and recovery should be the hard criteria for any agent framework, ahead of how flashy its multi-agent demo looks.

A second impact is just as practical: entering the Apache incubator carries real weight for enterprise users. The Apache Foundation provides governance, community continuity, and the assurance that a project will not vanish because one startup folded, which is genuine reassurance for teams that must write a framework into production code and justify their dependency provenance to security and compliance reviewers. But incubation is a start, not a finish. Apache history has projects that graduate into household names and projects that end up in the “attic” (archived and abandoned). So the badge lowers the risk of the project disappearing; it does not lower your burden of deciding whether the framework fits you.

Third, be honest about the dissent. More than one HN commenter expressed broad skepticism of agent frameworks: one built a client MVP by deliberately using no framework, letting Claude/Codex write the agent loop, tools, and streaming directly, and it worked fine; another noted that opening a coding assistant and asking it to write an agent is not hard to begin with. That objection holds. For small projects and one-off zero-to-one validation, the abstraction cost of a framework may exceed its payoff. Burr’s value rises with the complexity of your application, how long it must be maintained, and how many people must collaborate on it. The more your thing resembles a system that needs to be operated, the more a framework like Burr pays for itself. Until you are sure you are building a “system,” hold off on the framework.

What to ignore

Ignore the romance of the name. The Burr-kills-Hamilton lore is fun, but it has nothing to do with whether the framework fits you. Do not let team in-jokes color a technical decision.

Ignore the three metric slots on the homepage that are stuck at “0” (GitHub Stars, PyPI Downloads, Discord Members). In the fetched page those counters all read zero, clearly placeholder text for a front-end count-up animation that did not render, not real adoption. To gauge traction, look at the actual stars and release cadence on the GitHub repo. There is a related fact worth noting: an HN commenter pointed out the project has stayed in the 0.4x version range for two years, and the author replied that pushing through the Apache process plus other commitments has kept the pace slow. So “now in Apache” does not mean “already mature.” Version number and release frequency are the more honest maturity signals.

Also ignore the “Reddit User” line in the testimonial wall. Named CTOs, founders, and architects carry some weight, but an anonymous Reddit user’s “take a look at Burr, thank me later” reads like filler. Do not treat it as social proof. What is actually informative are the named developers who compare Burr directly to LangChain, CrewAI, and AutoGen; their shared claim is that Burr is easier for modeling complex behavior and for debugging, which is consistent with the site’s state-machine-plus-observability story. Believable, but verify it yourself.

Technical takeaway

Burr’s core abstraction is a trio: actions (declared with a decorator that names which state fields they read and write), transitions (which define how actions jump to one another), and State (an explicit, immutably updated data carrier). Wired into a state machine, control flow no longer hides inside prompts or if-else branches; it becomes a graph you can print, visualize, and test. The real payoff is not speed. It is accountability. When something breaks, you can point at the graph and say which step ran, what state it read, and why it transitioned where it did. For an agent you have to operate over time, that accountability is often worth more than a model being a few percent smarter.

Sources

No official primary source available; this analysis is based on reliable secondary reporting (named outlets, cross-confirmed).