Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs-omnicoreagent.omnirexfloralabs.com/llms.txt

Use this file to discover all available pages before exploring further.

OmniServe — Production API Server

Turn any agent into a production-ready REST/SSE API with a single command. OmniServe is an optional production extra:
pip install "omnicoreagent[serve]"

Agent File Requirements

Your Python file must define one of the following:
# Option 1: Define an `agent` variable
from omnicoreagent import OmniCoreAgent

agent = OmniCoreAgent(
    name="MyAgent",
    system_instruction="You are a helpful assistant.",
    model_config={"provider": "openai", "model": "gpt-4o-mini"},
)
# Option 2: Define a `create_agent()` function
from omnicoreagent import OmniCoreAgent

def create_agent():
    """Factory function that returns an agent instance."""
    return OmniCoreAgent(
        name="MyAgent",
        system_instruction="You are a helpful assistant.",
        model_config={"provider": "openai", "model": "gpt-4o-mini"},
    )
OmniServe looks for agent variable first, then create_agent() function. Your file must export one of these.

Quick Start

Step 1: Create your agent file (my_agent.py)

from omnicoreagent import OmniCoreAgent, ToolRegistry

tools = ToolRegistry()

@tools.register_tool("greet")
def greet(name: str) -> str:
    """Greet someone by name."""
    return f"Hello, {name}!"

@tools.register_tool("calculate")
def calculate(expression: str) -> dict:
    """Evaluate a math expression."""
    import math
    result = eval(expression, {"__builtins__": {}}, {"sqrt": math.sqrt, "pi": math.pi})
    return {"expression": expression, "result": result}

agent = OmniCoreAgent(
    name="MyAgent",
    system_instruction="You are a helpful assistant with access to greeting and calculation tools.",
    model_config={"provider": "openai", "model": "gpt-4o-mini"},
    local_tools=tools,
)

Step 2: Set environment variables

export LLM_API_KEY=your_api_key_here

Step 3: Run the server

omniserve run --agent my_agent.py

Step 4: Test the API

# Health check
curl http://localhost:8000/health

# Run a query (sync)
curl -X POST http://localhost:8000/run/sync \
  -H "Content-Type: application/json" \
  -d '{"query": "Greet Alice and calculate 2+2"}'

# Run a query (streaming SSE)
curl -X POST http://localhost:8000/run \
  -H "Content-Type: application/json" \
  -d '{"query": "What is sqrt(144)?"}'

# Open interactive docs
open http://localhost:8000/docs

CLI Commands

CommandDescription
omniserve runRun your agent file as API server
omniserve quickstartZero-code server with defaults
omniserve configView or generate configuration
omniserve generate-dockerfileGenerate production Dockerfile

CLI Options: omniserve run

omniserve run \
  --agent my_agent.py \
  --host 0.0.0.0 \
  --port 8000 \
  --workers 1 \
  --auth-token YOUR_TOKEN \
  --rate-limit 100 \
  --cors-origins "*" \
  --no-docs
Examples:
# Basic run
omniserve run --agent my_agent.py

# With authentication
omniserve run --agent my_agent.py --auth-token secret123

# With rate limiting
omniserve run --agent my_agent.py --rate-limit 100

# Production settings
omniserve run --agent my_agent.py \
  --port 8000 \
  --auth-token $AUTH_TOKEN \
  --rate-limit 100 \
  --cors-origins "https://myapp.com,https://api.myapp.com"


CLI Options: omniserve quickstart

Start a server instantly without writing any code:
omniserve quickstart \
  --provider openai \
  --model gpt-4o \
  --name QuickAgent \
  --instruction "You are..." \
  --port 8000
Use the provider that matches your LLM_API_KEY. For an OpenAI key, run omniserve quickstart --provider openai --model gpt-4o-mini. Examples:
# OpenAI
omniserve quickstart --provider openai --model gpt-4o

# Google Gemini
omniserve quickstart --provider gemini --model gemini-2.0-flash

# Anthropic Claude
omniserve quickstart --provider anthropic --model claude-3-5-sonnet-20241022

API Endpoints

Core Endpoints

MethodEndpointAuthDescription
POST/runYes*SSE streaming response
POST/run/syncYes*JSON response (blocking)
GET/healthNoHealth check
GET/readyNoReadiness check
GET/prometheusNoPrometheus metrics
GET/toolsYes*List available tools
GET/metricsYes*Agent usage metrics
GET/events/{session_id}Yes*Replay stored telemetry events, then follow live telemetry events over SSE; add ?run_id=... to isolate one run
GET/events/{session_id}/listYes*Return stored telemetry events as JSON; add ?run_id=... to isolate one run
GET/events/{session_id}/traceYes*Compact telemetry trace summary for the latest session trace, or ?run_id=... for one run
GET/telemetry/eventsYes*Return stored telemetry events filtered by trace_id, run_id, session_id, task_id, or event_type
GET/telemetry/events/streamYes*Replay and follow telemetry events over SSE for session_id; add run_id to isolate one run
GET/telemetry/tracesYes*List traces filtered by trace, run, session, task, agent, workflow, model, or status
GET/telemetry/traces/{trace_id}Yes*Return one exact trace
GET/telemetry/runs/{run_id}/traceYes*Return the latest trace correlated to one run
GET/telemetry/sessions/{session_id}/traceYes*Return the latest trace for one session
GET/docsNoSwagger UI
GET/redocNoReDoc UI
/ready returns ready: true only after the FastAPI lifespan has completed startup, the served agent is initialized, and any configured MCP servers are connected. Agents without MCP servers do not need an MCP client to pass readiness.

Background Task Endpoints

These routes are mounted when background execution is enabled. It is enabled by default and can be turned off with OMNICOREAGENT_BACKGROUND_ENABLED=false or OmniServeConfig(background_enabled=False).
MethodEndpointAuthDescription
GET/background/statusYes*Inspect background manager counts and run status totals
POST/background/agentsYes*Register the served agent or an agent spec for background execution
GET/background/agentsYes*List registered background agents
GET/background/agents/{agent_id}Yes*Inspect a background agent spec
DELETE/background/agents/{agent_id}Yes*Delete a background agent spec
POST/background/tasksYes*Create a background task
GET/background/tasksYes*List background tasks
GET/background/tasks/{task_id}Yes*Inspect a background task
GET/background/tasks/{task_id}/statusYes*Inspect schedule state, run counts, and latest run
PATCH/background/tasks/{task_id}Yes*Patch a background task
POST/background/tasks/{task_id}/runYes*Queue a manual background run, optionally waiting for terminal state
POST/background/tasks/{task_id}/pauseYes*Pause scheduled dispatch
POST/background/tasks/{task_id}/resumeYes*Resume scheduled dispatch
DELETE/background/tasks/{task_id}Yes*Delete a background task
POST/background/runs/{run_id}/cancelYes*Cancel a queued or running run
GET/background/runsYes*List background runs
GET/background/runs/{run_id}Yes*Inspect run status
GET/background/runs/{run_id}/attemptsYes*List run attempts
GET/background/runs/{run_id}/eventsYes*Replay lifecycle events
GET/background/runs/{run_id}/workspaceYes*Inspect run workspace files
*Auth required only if --auth-token is set or OMNICOREAGENT_SERVE_AUTH_ENABLED=true with OMNICOREAGENT_SERVE_AUTH_TOKEN.

Request/Response Examples

# Sync request (with auth)
curl -X POST http://localhost:8000/run/sync \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{"query": "What is 2+2?", "session_id": "user123"}'

# Response:
# {"response": "2+2 equals 4", "session_id": "user123", ...}

# Streaming SSE request
curl -X POST http://localhost:8000/run \
  -H "Content-Type: application/json" \
  -d '{"query": "Explain quantum computing"}'

# Replay and follow one session's events
curl -N http://localhost:8000/events/user123

# Replay one run inside a shared session
curl -N "http://localhost:8000/events/user123?run_id=RUN_ID"
curl "http://localhost:8000/events/user123/list?run_id=RUN_ID"
curl "http://localhost:8000/events/user123/trace?run_id=RUN_ID"

# Production telemetry API
curl "http://localhost:8000/telemetry/events?session_id=user123&run_id=RUN_ID"
curl "http://localhost:8000/telemetry/events?run_id=RUN_ID&event_type=tool_result&limit=100"
curl "http://localhost:8000/telemetry/traces?session_id=user123&limit=20"
curl "http://localhost:8000/telemetry/traces/TRACE_ID"
curl "http://localhost:8000/telemetry/runs/RUN_ID/trace"
curl "http://localhost:8000/telemetry/sessions/user123/trace"
curl -N "http://localhost:8000/telemetry/events/stream?session_id=user123&run_id=RUN_ID"

# List tools
curl http://localhost:8000/tools \
  -H "Authorization: Bearer YOUR_TOKEN"
For telemetry traces, trace_id is the exact lookup handle returned by the agent runtime. run_id is the serving/runtime correlation handle returned from /run/sync and SSE completion payloads. Use trace_id when you need one exact trace. Use run_id when your UI or application needs all telemetry for one execution inside a shared session. /telemetry/events defaults to limit=200; /telemetry/traces defaults to limit=100. Both accept lower limits per request. Exact trace, run trace, and session trace detail endpoints return 404 when no matching trace exists or an accessor returns a trace that does not match the requested selector.

Background Task Example

OmniServe registers the served agent as a background-capable agent during server startup. The default background agent id is default; override it with OMNICOREAGENT_BACKGROUND_AGENT_ID or OmniServeConfig(background_agent_id="...").
curl -X POST http://localhost:8000/background/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "task_id": "daily_report",
    "query": "Write today'\''s operations report and save the output.",
    "schedule": {"type": "manual"},
    "timeout_seconds": 120,
    "retry_policy": {"max_retries": 1, "initial_delay_seconds": 30}
  }'

curl -X POST http://localhost:8000/background/tasks/daily_report/run \
  -H "Content-Type: application/json" \
  -d '{"wait": false}'
Set {"wait": true} when the response should wait for terminal run state. If the run does not finish before the background wait budget, OmniServe returns 504 with the run_id, latest status, wait_timeout_seconds, and request_timeout_seconds in detail; use the run_id with /background/runs/{run_id} to inspect the run later. If auth is enabled with --auth-token or OMNICOREAGENT_SERVE_AUTH_ENABLED=true, add -H "Authorization: Bearer YOUR_TOKEN" to protected requests. Each run stores operational state in the task store and writes lifecycle files into the configured workspace namespace. Use the default in-memory task store for local development, or choose sql, redis, or mongodb when runs must survive restarts. Durable stores preserve queued runs across server restarts: OmniServe can queue a run, stop, start again with the same task store, and the new manager can claim and complete that run. Choose one durable backend per deployment. Use SQL/SQLite for local durability or simple single-node services. Use Redis when your deployment already operates Redis with persistence and no eviction for task-store keys. Use MongoDB when MongoDB is your durable operational store. Common inspection endpoints:
curl http://localhost:8000/background/status
curl http://localhost:8000/background/tasks/daily_report/status
curl http://localhost:8000/background/runs/$RUN_ID
curl http://localhost:8000/background/runs/$RUN_ID/events
curl http://localhost:8000/background/runs/$RUN_ID/workspace

Environment Variables

Server settings use the OMNICOREAGENT_SERVE_* prefix. Background task settings use the OMNICOREAGENT_BACKGROUND_* prefix. Environment variables always override code values. You can run OmniServe without adding any OMNICOREAGENT_SERVE_* variables. The defaults enable the background API, start the worker, and use in-memory background task state. Add variables only when you want to change server behavior, authentication, rate limits, or background storage.
VariableDefaultDescription
OMNICOREAGENT_SERVE_HOST0.0.0.0Server host. Must not be empty
OMNICOREAGENT_SERVE_PORT8000Server port. Must be 1-65535
OMNICOREAGENT_SERVE_WORKERS1Direct OmniServe worker count. Must be 1; scale by running multiple processes
OMNICOREAGENT_SERVE_API_PREFIX""API path prefix (e.g., /api/v1). Normalized to a leading slash with no trailing slash; whitespace is invalid
OMNICOREAGENT_SERVE_ENABLE_DOCStrueSwagger UI at /docs
OMNICOREAGENT_SERVE_ENABLE_REDOCtrueReDoc at /redoc
OMNICOREAGENT_SERVE_CORS_ENABLEDtrueEnable CORS
OMNICOREAGENT_SERVE_CORS_ORIGINS*Allowed origins (comma-separated)
OMNICOREAGENT_SERVE_CORS_METHODS*Allowed methods (comma-separated)
OMNICOREAGENT_SERVE_CORS_HEADERS*Allowed headers (comma-separated)
OMNICOREAGENT_SERVE_CORS_CREDENTIALStrueAllow credentials
OMNICOREAGENT_SERVE_AUTH_ENABLEDfalseEnable Bearer token auth. Requires a non-empty auth token
OMNICOREAGENT_SERVE_AUTH_TOKENBearer token value used when auth is enabled
OMNICOREAGENT_SERVE_RATE_LIMIT_ENABLEDfalseEnable rate limiting
OMNICOREAGENT_SERVE_RATE_LIMIT_REQUESTS100Requests per window. Must be at least 1 when enabled
OMNICOREAGENT_SERVE_RATE_LIMIT_WINDOW60Window in seconds. Must be at least 1 when enabled
OMNICOREAGENT_SERVE_REQUEST_LOGGINGtrueLog requests
OMNICOREAGENT_SERVE_LOG_LEVELINFOLog level: CRITICAL, ERROR, WARNING, INFO, DEBUG, or TRACE
OMNICOREAGENT_SERVE_REQUEST_TIMEOUT300Request timeout in seconds
OMNICOREAGENT_BACKGROUND_ENABLEDtrueExpose background task endpoints
OMNICOREAGENT_BACKGROUND_AGENT_IDdefaultAgent id for the served agent in background tasks
OMNICOREAGENT_BACKGROUND_TASK_STOREin_memoryBackground control-plane store: in_memory, sql, redis, or mongodb
OMNICOREAGENT_BACKGROUND_TASK_STORE_URLSQL or Redis URL. Use OMNICOREAGENT_BACKGROUND_TASK_STORE=redis for Redis URLs
OMNICOREAGENT_BACKGROUND_TASK_STORE_URIMongoDB URI
OMNICOREAGENT_BACKGROUND_TASK_STORE_DATABASEomnicoreagentMongoDB database name
OMNICOREAGENT_BACKGROUND_TASK_STORE_PREFIXRedis key prefix
OMNICOREAGENT_BACKGROUND_TASK_STORE_COLLECTION_PREFIXMongoDB collection prefix
OMNICOREAGENT_BACKGROUND_TASK_STORE_CONNECT_TIMEOUTBackend connect timeout in seconds
OMNICOREAGENT_BACKGROUND_START_WORKERtrueStart scheduler and worker loop during server lifespan
The background task store is not conversation memory. Memory stays in MemoryRouter. The task store is the control plane for schedules, runs, attempts, leases, retries, and cancellation. Leave it at the in-memory default to start; choose sql, redis, or mongodb when that state must survive restarts. Redis durable deployments need persistence enabled and a no-eviction policy for task-store keys. MongoDB durable deployments use majority writes.
Example shell environment:
# Model credential
export LLM_API_KEY=your_api_key_here

# Optional OmniServe overrides
export OMNICOREAGENT_SERVE_PORT=8000
export OMNICOREAGENT_SERVE_AUTH_ENABLED=true
export OMNICOREAGENT_SERVE_AUTH_TOKEN=my-secret-token
export OMNICOREAGENT_SERVE_RATE_LIMIT_ENABLED=true
export OMNICOREAGENT_SERVE_RATE_LIMIT_REQUESTS=100
export OMNICOREAGENT_SERVE_CORS_ORIGINS=https://myapp.com,https://api.myapp.com
export OMNICOREAGENT_SERVE_CORS_METHODS=GET,POST,OPTIONS
export OMNICOREAGENT_SERVE_CORS_HEADERS=Authorization,Content-Type

# Optional durable background task store. Pick one backend.
export OMNICOREAGENT_BACKGROUND_TASK_STORE=sql
export OMNICOREAGENT_BACKGROUND_TASK_STORE_URL=sqlite:///.omnicoreagent/background.db
export OMNICOREAGENT_BACKGROUND_TASK_STORE=redis
export OMNICOREAGENT_BACKGROUND_TASK_STORE_URL=redis://localhost:6379/0
export OMNICOREAGENT_BACKGROUND_TASK_STORE=mongodb
export OMNICOREAGENT_BACKGROUND_TASK_STORE_URI=mongodb://localhost:27017
export OMNICOREAGENT_BACKGROUND_TASK_STORE_DATABASE=omnicoreagent

Docker Deployment

Generate a Dockerfile

omniserve generate-dockerfile --file my_agent.py

Build and run

docker build -t omnicoreagent-serve .
docker run -p 8000:8000 -e LLM_API_KEY=$LLM_API_KEY omnicoreagent-serve
The generator creates a Dockerfile for the current project directory and stores only non-sensitive defaults in the image:
SettingValue
AGENT_PATHIn-container path to the selected agent file
OMNICOREAGENT_WORKSPACE_BACKENDlocal
OMNICOREAGENT_WORKSPACE_DIR/tmp/workspace
The agent file must be inside the current Docker build context. The generator does not import or execute the agent file. S3/R2 workspace credentials are passed at runtime with -e, never baked into the image.

Cloud deployment examples

# Local workspace (ephemeral)
docker run -p 8000:8000 -e LLM_API_KEY=$LLM_API_KEY omnicoreagent-serve

# AWS S3 workspace (persistent)
docker run -p 8000:8000 \
  -e LLM_API_KEY=$LLM_API_KEY \
  -e OMNICOREAGENT_WORKSPACE_BACKEND=s3 \
  -e AWS_S3_BUCKET=my-bucket \
  -e AWS_ACCESS_KEY_ID=... \
  -e AWS_SECRET_ACCESS_KEY=... \
  -e AWS_REGION=us-east-1 \
  omnicoreagent-serve

# Cloudflare R2 workspace (persistent)
docker run -p 8000:8000 \
  -e LLM_API_KEY=$LLM_API_KEY \
  -e OMNICOREAGENT_WORKSPACE_BACKEND=r2 \
  -e R2_BUCKET_NAME=my-bucket \
  -e R2_ACCOUNT_ID=... \
  -e R2_ACCESS_KEY_ID=... \
  -e R2_SECRET_ACCESS_KEY=... \
  omnicoreagent-serve

Python API (Programmatic Control)

For full programmatic control, use OmniServe directly in your Python script:
from omnicoreagent import OmniCoreAgent, OmniServe, OmniServeConfig, ToolRegistry

tools = ToolRegistry()

@tools.register_tool("get_time")
def get_time() -> dict:
    from datetime import datetime
    return {"time": datetime.now().isoformat()}

agent = OmniCoreAgent(
    name="MyAgent",
    system_instruction="You are a helpful assistant.",
    model_config={"provider": "openai", "model": "gpt-4o-mini"},
    local_tools=tools,
)

config = OmniServeConfig(
    host="0.0.0.0",
    port=8000,
    auth_enabled=True,
    auth_token="my-secret-token",
    rate_limit_enabled=True,
    rate_limit_requests=100,
    rate_limit_window=60,
    cors_origins=["*"],
    enable_docs=True,
)

if __name__ == "__main__":
    server = OmniServe(agent, config=config)
    server.start()
Run with Python directly:
export LLM_API_KEY=your_api_key
python server.py
CLI vs Python API:
  • omniserve run --agent my_agent.py — CLI loads your agent file and applies CLI flags
  • python server.py — You control everything programmatically via OmniServeConfig
Environment Variable Precedence: environment variables always override values set in OmniServeConfig.

OmniServe is perfect for deploying agents as microservices, webhooks, chatbots, or any HTTP-accessible AI capability.

Learn More: See OmniServe Cookbook for more examples.