AgentSpan
Durable runtime and SDK for AI agents that runs workflows server-side so executions survive crashes, pauses, and long-running tasks.
AgentSpan is an open-source runtime and SDK for building production-grade AI agents as durable workflows. Instead of running agents inside ephemeral application processes, AgentSpan moves execution state to a dedicated server layer, allowing workflows to continue even if the original process crashes, restarts, or disconnects.
The system is designed around the concept of “durable execution for agents,” where every agent run is treated as a persisted workflow. Tool calls, intermediate states, and multi-step reasoning are stored server-side, enabling agents to resume exactly where they left off without losing progress or context.
AgentSpan compiles agent definitions into orchestrated workflows that can run across distributed environments. It supports retries, long-running tasks, human-in-the-loop pauses, and multi-agent coordination patterns such as sequential pipelines, parallel execution, and routing between specialized agents.
A key feature of AgentSpan is its execution model: agents are defined in code but executed as stateful workflows on a server (built on a Conductor-based orchestration layer). This allows it to separate compute (workers) from state (runtime), improving reliability for real-world deployments.
The platform also includes built-in observability tools, letting developers inspect every step of an agent run—tool inputs and outputs, LLM calls, timing, token usage, and failures. This makes debugging and replaying workflows significantly easier compared to traditional agent frameworks.
AgentSpan integrates with popular agent ecosystems such as LangGraph, OpenAI Agents SDK, and Google ADK, allowing existing agents to be “wrapped” without rewriting logic while gaining durability and orchestration features.
Key features include:
- Durable execution engine for AI agents
- Server-side workflow state persistence across crashes
- Multi-agent orchestration (sequential, parallel, router, handoff)
- Human-in-the-loop pauses and approvals
- Automatic retries and fault recovery
- Full observability of tool calls and LLM steps
- Integration with existing frameworks (LangGraph, OpenAI SDK, ADK)
- Streaming execution and runtime event tracking
- Self-hostable open-source architecture
- CLI + Python SDK for agent development
Common use cases include:
- Production AI agent workflows with high reliability requirements
- Long-running research, data processing, and automation tasks
- Multi-step pipelines with multiple specialized agents
- Human-in-the-loop approval systems
- Enterprise-grade AI orchestration and monitoring
- Resumable agent executions for unreliable environments
AgentSpan is positioned as an infrastructure layer for AI agents, focusing on durability, orchestration, and observability—solving the “agents work in demo but fail in production” problem by treating execution as a persistent distributed system.
Comments
0Markdown is supported.