Project Profile

Building a browser-native agent runtime

A solo project that started from a simple question: “Why can’t AI actually do things?” — and evolved into a full execution system with 50+ tools, multiple execution planes, and production-running pipelines.

50+

Built-in Tools

Execution Planes

Task Modes

900+

Lines of Protocol

Journey

How this project evolved

Problem Discovery

LLMs are great at generating text, but terrible at finishing real tasks. The gap between 'AI said it' and 'AI did it' became the core motivation.

Protocol Design

Developed the ΩHERE heredoc format, batch execution, and high-fidelity transport to solve escaping failures and multi-step reliability issues.

Runtime Evolution

Added async tasks, background execution, state management, and a browser execution bridge — turning a simple tool caller into a full task system.

Ecosystem Integration

Connected MCP tools, SSH remotes, cloud storage, media processing, and AI APIs into a unified dispatch layer with error handling and fallbacks.

Production Workflows

Built real pipelines: automated news video generation (script → TTS → images → FFmpeg → YouTube upload), tutorial recording, music processing, and more.

Technical Highlights

Key engineering decisions

Chrome Extension Architecture

Content scripts intercept SSE/WebSocket streams from AI chat pages, parse structured commands from model output, and bridge them to a local execution server.

Same-Origin Browser Execution

Instead of external automation tools, the system executes JavaScript directly in page context — accessing APIs, DOM, and authenticated sessions that external tools can't reach.

Robust Protocol Parsing

Custom heredoc-style format eliminates JSON escaping hell. Supports nested content, binary-safe transport, and conditional batch execution with dependency chains.

Multi-Modal Task Pipeline

Orchestrates text generation, TTS, image generation, video compositing, and upload across multiple services — with retry logic, fallbacks, and progress tracking.

Lessons

What I learned

Escaping is the #1 killer of agent reliability — protocol design matters more than model capability

Browser same-origin access unlocks capabilities that no external API can replicate

Background execution transforms agents from 'assistants' into 'workers'

Real-world pipelines need error budgets, not just error handling

The best agent UX is invisible — it should feel like the AI just does things