Skip to main content

OmniServe — Production API Server

Turn any agent into a production-ready REST/SSE API with a single command.

Agent File Requirements

Your Python file must define one of the following:
# Option 1: Define an `agent` variable
from omnicoreagent import OmniCoreAgent

agent = OmniCoreAgent(
    name="MyAgent",
    system_instruction="You are a helpful assistant.",
    model_config={"provider": "gemini", "model": "gemini-2.0-flash"},
)
# Option 2: Define a `create_agent()` function
from omnicoreagent import OmniCoreAgent

def create_agent():
    """Factory function that returns an agent instance."""
    return OmniCoreAgent(
        name="MyAgent",
        system_instruction="You are a helpful assistant.",
        model_config={"provider": "gemini", "model": "gemini-2.0-flash"},
    )
OmniServe looks for agent variable first, then create_agent() function. Your file must export one of these.

Quick Start

Step 1: Create your agent file (my_agent.py)

from omnicoreagent import OmniCoreAgent, ToolRegistry

tools = ToolRegistry()

@tools.register_tool("greet")
def greet(name: str) -> str:
    """Greet someone by name."""
    return f"Hello, {name}!"

@tools.register_tool("calculate")
def calculate(expression: str) -> dict:
    """Evaluate a math expression."""
    import math
    result = eval(expression, {"__builtins__": {}}, {"sqrt": math.sqrt, "pi": math.pi})
    return {"expression": expression, "result": result}

agent = OmniCoreAgent(
    name="MyAgent",
    system_instruction="You are a helpful assistant with access to greeting and calculation tools.",
    model_config={"provider": "gemini", "model": "gemini-2.0-flash"},
    local_tools=tools,
)

Step 2: Set environment variables

echo "LLM_API_KEY=your_api_key_here" > .env

Step 3: Run the server

omniserve run --agent my_agent.py

Step 4: Test the API

# Health check
curl http://localhost:8000/health

# Run a query (sync)
curl -X POST http://localhost:8000/run/sync \
  -H "Content-Type: application/json" \
  -d '{"query": "Greet Alice and calculate 2+2"}'

# Run a query (streaming SSE)
curl -X POST http://localhost:8000/run \
  -H "Content-Type: application/json" \
  -d '{"query": "What is sqrt(144)?"}'

# Open interactive docs
open http://localhost:8000/docs

CLI Commands

CommandDescription
omniserve runRun your agent file as API server
omniserve quickstartZero-code server with defaults
omniserve configView or generate configuration
omniserve generate-dockerfileGenerate production Dockerfile

CLI Options: omniserve run

omniserve run \
  --agent my_agent.py \        # Path to agent file (required)
  --host 0.0.0.0 \             # Host to bind (default: 0.0.0.0)
  --port 8000 \                # Port to bind (default: 8000)
  --workers 1 \                # Worker processes (default: 1)
  --auth-token YOUR_TOKEN \    # Enable Bearer token auth
  --rate-limit 100 \           # Rate limit (requests per minute)
  --cors-origins "*" \         # Comma-separated CORS origins
  --no-docs \                  # Disable Swagger UI
  --reload                     # Enable hot reload (development)
Examples:
# Basic run
omniserve run --agent my_agent.py

# With authentication
omniserve run --agent my_agent.py --auth-token secret123

# With rate limiting
omniserve run --agent my_agent.py --rate-limit 100

# Production settings
omniserve run --agent my_agent.py \
  --port 8000 \
  --auth-token $AUTH_TOKEN \
  --rate-limit 100 \
  --cors-origins "https://myapp.com,https://api.myapp.com"

# Development with hot reload
omniserve run --agent my_agent.py --reload

CLI Options: omniserve quickstart

Start a server instantly without writing any code:
omniserve quickstart \
  --provider openai \          # LLM provider (openai, gemini, anthropic)
  --model gpt-4o \             # Model name
  --name QuickAgent \          # Agent name (default: QuickAgent)
  --instruction "You are..." \ # System instruction
  --port 8000                  # Port (default: 8000)
Examples:
# OpenAI
omniserve quickstart --provider openai --model gpt-4o

# Google Gemini
omniserve quickstart --provider gemini --model gemini-2.0-flash

# Anthropic Claude
omniserve quickstart --provider anthropic --model claude-3-5-sonnet-20241022

API Endpoints

MethodEndpointAuthDescription
POST/runYes*SSE streaming response
POST/run/syncYes*JSON response (blocking)
GET/healthNoHealth check
GET/readyNoReadiness check
GET/prometheusNoPrometheus metrics
GET/toolsYes*List available tools
GET/metricsYes*Agent usage metrics
GET/docsNoSwagger UI
GET/redocNoReDoc UI
*Auth required only if --auth-token is set.

Request/Response Examples

# Sync request (with auth)
curl -X POST http://localhost:8000/run/sync \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{"query": "What is 2+2?", "session_id": "user123"}'

# Response:
# {"response": "2+2 equals 4", "session_id": "user123", ...}

# Streaming SSE request
curl -X POST http://localhost:8000/run \
  -H "Content-Type: application/json" \
  -d '{"query": "Explain quantum computing"}'

# List tools
curl http://localhost:8000/tools \
  -H "Authorization: Bearer YOUR_TOKEN"

Environment Variables

All settings via OMNISERVE_* prefix. Environment variables always override code values.
VariableDefaultDescription
OMNISERVE_HOST0.0.0.0Server host
OMNISERVE_PORT8000Server port
OMNISERVE_WORKERS1Worker processes
OMNISERVE_API_PREFIX""API path prefix (e.g., /api/v1)
OMNISERVE_ENABLE_DOCStrueSwagger UI at /docs
OMNISERVE_ENABLE_REDOCtrueReDoc at /redoc
OMNISERVE_CORS_ENABLEDtrueEnable CORS
OMNISERVE_CORS_ORIGINS*Allowed origins (comma-separated)
OMNISERVE_CORS_CREDENTIALStrueAllow credentials
OMNISERVE_AUTH_ENABLEDfalseEnable Bearer token auth
OMNISERVE_AUTH_TOKENBearer token value
OMNISERVE_RATE_LIMIT_ENABLEDfalseEnable rate limiting
OMNISERVE_RATE_LIMIT_REQUESTS100Requests per window
OMNISERVE_RATE_LIMIT_WINDOW60Window in seconds
OMNISERVE_REQUEST_LOGGINGtrueLog requests
OMNISERVE_LOG_LEVELINFOLog level (DEBUG/INFO/WARNING/ERROR)
OMNISERVE_REQUEST_TIMEOUT300Request timeout in seconds
Example .env file:
# Required
LLM_API_KEY=your_api_key_here

# OmniServe settings
OMNISERVE_PORT=8000
OMNISERVE_AUTH_ENABLED=true
OMNISERVE_AUTH_TOKEN=my-secret-token
OMNISERVE_RATE_LIMIT_ENABLED=true
OMNISERVE_RATE_LIMIT_REQUESTS=100
OMNISERVE_CORS_ORIGINS=https://myapp.com,https://api.myapp.com

Docker Deployment

Generate a Dockerfile

omniserve generate-dockerfile --file my_agent.py

Build and run

docker build -t omniserver .
docker run -p 8000:8000 -e LLM_API_KEY=$LLM_API_KEY omniserver
Smart Configuration — The generator inspects your agent and configures storage automatically:
Your Agent UsesDockerfile Sets
No memory toolsAGENT_PATH, OMNICOREAGENT_ARTIFACTS_DIR
Local memory+ OMNICOREAGENT_MEMORY_DIR=/tmp/memories
S3/R2 memoryPass credentials at runtime with -e

Cloud deployment examples

# Local memory (ephemeral)
docker run -p 8000:8000 -e LLM_API_KEY=$LLM_API_KEY omniserver

# AWS S3 memory (persistent)
docker run -p 8000:8000 \
  -e LLM_API_KEY=$LLM_API_KEY \
  -e AWS_S3_BUCKET=my-bucket \
  -e AWS_ACCESS_KEY_ID=... \
  -e AWS_SECRET_ACCESS_KEY=... \
  -e AWS_REGION=us-east-1 \
  omniserver

# Cloudflare R2 memory (persistent)
docker run -p 8000:8000 \
  -e LLM_API_KEY=$LLM_API_KEY \
  -e R2_BUCKET_NAME=my-bucket \
  -e R2_ACCOUNT_ID=... \
  -e R2_ACCESS_KEY_ID=... \
  -e R2_SECRET_ACCESS_KEY=... \
  omniserver

Python API (Programmatic Control)

For full programmatic control, use OmniServe directly in your Python script:
from omnicoreagent import OmniCoreAgent, OmniServe, OmniServeConfig, ToolRegistry

tools = ToolRegistry()

@tools.register_tool("get_time")
def get_time() -> dict:
    from datetime import datetime
    return {"time": datetime.now().isoformat()}

agent = OmniCoreAgent(
    name="MyAgent",
    system_instruction="You are a helpful assistant.",
    model_config={"provider": "gemini", "model": "gemini-2.0-flash"},
    local_tools=tools,
)

config = OmniServeConfig(
    host="0.0.0.0",
    port=8000,
    auth_enabled=True,
    auth_token="my-secret-token",
    rate_limit_enabled=True,
    rate_limit_requests=100,
    rate_limit_window=60,
    cors_origins=["*"],
    enable_docs=True,
)

if __name__ == "__main__":
    server = OmniServe(agent, config=config)
    server.start()
Run with Python directly:
echo "LLM_API_KEY=your_api_key" > .env
python server.py
CLI vs Python API:
  • omniserve run --agent my_agent.py — CLI loads your agent file and applies CLI flags
  • python server.py — You control everything programmatically via OmniServeConfig
Environment Variable Precedence: .env variables always override values set in OmniServeConfig.

Advanced: Resilience Patterns

Import retry and circuit breaker for custom use:
from omnicoreagent import RetryConfig, CircuitBreaker, with_retry

@with_retry(RetryConfig(max_retries=5, strategy="exponential"))
async def call_external_api():
    ...

breaker = CircuitBreaker("api", failure_threshold=3, timeout=60)
async with breaker:
    result = await risky_call()
OmniServe is perfect for deploying agents as microservices, webhooks, chatbots, or any HTTP-accessible AI capability.

Learn More: See OmniServe Cookbook for more examples.