Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs-omnicoreagent.omnirexfloralabs.com/llms.txt

Use this file to discover all available pages before exploring further.

Context Engineering

OmniCoreAgent has three context-control layers that work together:
LayerScopePurposeDefault
Session memoryAcross agent.run() callsControls how much conversation history is loaded for a session.On through the memory router
Agent loop context managementInside one agent.run() loopChecks active messages before every model call and reduces them before the configured budget is exceeded.Off until enabled
Tool output offloadingIndividual tool responsesMoves large tool outputs into workspace artifacts and keeps only a preview in the prompt.Off until enabled
When agent loop context management is enabled and configured with a budget below your model’s real context window, OmniCoreAgent acts before the provider context limit is hit. The runtime checks context before the LLM call, not after an error.
active messages
  -> context threshold check
  -> truncate or summarize+truncate when needed
  -> model call
This is why long tasks can keep moving without waiting for the provider to reject an oversized prompt.

Layer 1: Session Memory

Session memory decides what historical messages are loaded when a new agent.run() starts. This is the cross-request layer.
agent = OmniCoreAgent(
    name="assistant",
    system_instruction="You are helpful.",
    model_config={"provider": "openai", "model": "gpt-4o"},
    agent_config={
        "memory_config": {
            "mode": "sliding_window",
            "value": 10000,
            "summary": {"enabled": False},
        }
    },
)
Use persistent memory backends such as Redis, MongoDB, or SQL database storage when session history must survive process restarts.

Layer 2: Agent Loop Context Management

Agent loop context management runs inside the ReAct loop. Before each model call, OmniCoreAgent asks the context manager whether the current messages crossed the configured threshold. If yes, it reduces the message list before the LLM request is sent.
agent_config = {
    "context_management": {
        "enabled": True,
        "mode": "token_budget",          # or "sliding_window"
        "value": 100000,                 # token budget or message count
        "threshold_percent": 75,
        "strategy": "summarize_and_truncate",  # or "truncate"
        "preserve_recent": 6,
    }
}
With the config above, management triggers around 75,000 tokens. If the selected model has a larger context window, the harness reduces context before the model request reaches the provider limit.

What Is Preserved

PartBehavior
System promptAlways preserved.
Recent messagesPreserved according to preserve_recent.
Middle historyTruncated, or summarized then truncated, depending on strategy.
Summary metadataAdded when summarize_and_truncate creates a context summary.

Modes

ModeDescriptionBest For
token_budgetManage context by estimated token count.Provider context limits and cost control.
sliding_windowManage context by message count.Predictable, low-latency history windows.

Strategies

StrategyBehaviorTrade-Off
truncateDrop older middle messages while preserving system and recent messages.Fast and deterministic.
summarize_and_truncateSummarize older middle history, insert a summary message, then preserve recent messages.Keeps more intent, adds an LLM call and latency.

Layer 3: Tool Output Offloading

Tool offloading handles large individual tool responses. It keeps the agent from burning context on a full payload when a preview and a file reference are enough for the next reasoning step.
agent_config = {
    "tool_offload": {
        "enabled": True,
        "threshold_tokens": 500,
        "threshold_bytes": 2000,
        "max_preview_tokens": 150,
        "max_preview_lines": 10,
    }
}
When a tool result crosses the configured threshold, OmniCoreAgent writes the full payload to the active workspace artifacts/ area. The observation sent to the model contains the preview and the artifact reference.
large tool result
  -> workspace artifact
  -> preview + artifact reference in the observation
The artifact uses the same workspace backend as the rest of the agent: local, S3, or R2.

Built-In Artifact Tools

Artifact tools are available when offloading is enabled:
ToolPurpose
read_artifactRead the full offloaded payload.
tail_artifactRead the last lines of an artifact, useful for logs.
search_artifactSearch inside an offloaded payload.
list_artifactsList artifacts available in the current workspace/session scope.

Full Context Configuration

Use all layers together for long-running research, coding, data, and operational agents:
agent = OmniCoreAgent(
    name="research_agent",
    system_instruction=(
        "Use workspace files and artifact references for long tasks. "
        "Read artifacts only when the full payload is needed."
    ),
    model_config={"provider": "openai", "model": "gpt-4o"},
    agent_config={
        "memory_config": {
            "mode": "sliding_window",
            "value": 10000,
        },
        "context_management": {
            "enabled": True,
            "mode": "token_budget",
            "value": 100000,
            "threshold_percent": 75,
            "strategy": "summarize_and_truncate",
            "preserve_recent": 6,
        },
        "tool_offload": {
            "enabled": True,
            "threshold_tokens": 500,
            "threshold_bytes": 2000,
        },
    },
)
Set context_management.value to a budget below your model’s real context window. OmniCoreAgent checks the budget before each model call and reduces the prompt when the threshold is crossed.