Skip to main content

Context Engineering System

OmniCoreAgent implements state-of-the-art context engineering inspired by patterns from Anthropic and Cursor. This dual-layer approach ensures your agents never hit token limits — even during marathon coding sessions or multi-step research tasks.

How the Two Layers Work Together

LayerScopeWhat It ManagesWhen It Triggers
Context ManagementAgent loop messagesUser/Assistant conversation, tool call historyWhen context exceeds threshold_percent of limit
Tool OffloadingIndividual tool responsesLarge API responses, file contents, search resultsWhen response exceeds threshold_tokens

Layer 1: Agent Loop Context Management

Prevent token exhaustion during long-running tasks with automatic context management. When enabled, the agent monitors context size and applies truncation or summarization when thresholds are exceeded.
agent_config = {
    "context_management": {
        "enabled": True,
        "mode": "token_budget",  # or "sliding_window"
        "value": 100000,  # Max tokens (token_budget) or max messages (sliding_window)
        "threshold_percent": 75,  # Trigger at 75% of limit
        "strategy": "summarize_and_truncate",  # or "truncate"
        "preserve_recent": 4,  # Always keep last N messages
    }
}

Modes

ModeDescriptionBest For
token_budgetManage by total token countCost control, API limits
sliding_windowManage by message countPredictable context size

Strategies

StrategyBehaviorTrade-off
truncateDrop oldest messagesFast, no extra LLM calls
summarize_and_truncateSummarize then dropPreserves context, adds latency

Layer 2: Tool Response Offloading

Large tool responses are automatically saved to files, with only a preview in context. The agent can retrieve full content on demand using built-in tools.
agent_config = {
    "tool_offload": {
        "enabled": True,
        "threshold_tokens": 500,  # Offload responses > 500 tokens
        "max_preview_tokens": 150,  # Show first 150 tokens in context
        "storage_dir": "workspace/artifacts"
    }
}

Token Savings

Tool ResponseWithout OffloadingWith OffloadingSavings
Web search (50 results)~10,000 tokens~200 tokens98%
Large API response~5,000 tokens~150 tokens97%
File read (1000 lines)~8,000 tokens~200 tokens97%

Built-in Artifact Tools

Automatically available when offloading is enabled:
ToolPurpose
read_artifact(artifact_id)Read full content when needed
tail_artifact(artifact_id, lines)Read last N lines (great for logs)
search_artifact(artifact_id, query)Search within large responses
list_artifacts()See all offloaded data in current session

Combined Power

Enable both for maximum efficiency:
agent = OmniCoreAgent(
    name="research_agent",
    agent_config={
        "context_management": {"enabled": True, "strategy": "summarize_and_truncate"},
        "tool_offload": {"enabled": True, "threshold_tokens": 500}
    }
)
# Result: Agents that can run indefinitely without token exhaustion
Enable both layers for long-running tasks (research, multi-step workflows) where context or tool responses can grow unbounded.