Building a browser-native agent runtime

A solo project that started from a simple question: “Why can’t AI actually do things?” — and evolved into a full execution system with 50+ tools, multiple execution planes, and production-running pipelines.

50+
Built-in Tools
4
Execution Planes
3
Task Modes
900+
Lines of Protocol

How this project evolved

1

Problem Discovery

LLMs are great at generating text, but terrible at finishing real tasks. The gap between 'AI said it' and 'AI did it' became the core motivation.

2

Protocol Design

Developed the ΩHERE heredoc format, batch execution, and high-fidelity transport to solve escaping failures and multi-step reliability issues.

3

Runtime Evolution

Added async tasks, background execution, state management, and a browser execution bridge — turning a simple tool caller into a full task system.

4

Ecosystem Integration

Connected MCP tools, SSH remotes, cloud storage, media processing, and AI APIs into a unified dispatch layer with error handling and fallbacks.

5

Production Workflows

Built real pipelines: automated news video generation (script → TTS → images → FFmpeg → YouTube upload), tutorial recording, music processing, and more.

Key engineering decisions

Chrome Extension Architecture

Content scripts intercept SSE/WebSocket streams from AI chat pages, parse structured commands from model output, and bridge them to a local execution server.

Same-Origin Browser Execution

Instead of external automation tools, the system executes JavaScript directly in page context — accessing APIs, DOM, and authenticated sessions that external tools can't reach.

Robust Protocol Parsing

Custom heredoc-style format eliminates JSON escaping hell. Supports nested content, binary-safe transport, and conditional batch execution with dependency chains.

Multi-Modal Task Pipeline

Orchestrates text generation, TTS, image generation, video compositing, and upload across multiple services — with retry logic, fallbacks, and progress tracking.

What I learned

Escaping is the #1 killer of agent reliability — protocol design matters more than model capability

Browser same-origin access unlocks capabilities that no external API can replicate

Background execution transforms agents from 'assistants' into 'workers'

Real-world pipelines need error budgets, not just error handling

The best agent UX is invisible — it should feel like the AI just does things