Documentation Index
Fetch the complete documentation index at: https://docs-omnicoreagent.omnirexfloralabs.com/llms.txt
Use this file to discover all available pages before exploring further.
Context Engineering
OmniCoreAgent has three context-control layers that work together:
| Layer | Scope | Purpose | Default |
|---|
| Session memory | Across agent.run() calls | Controls how much conversation history is loaded for a session. | On through the memory router |
| Agent loop context management | Inside one agent.run() loop | Checks active messages before every model call and reduces them before the configured budget is exceeded. | Off until enabled |
| Tool output offloading | Individual tool responses | Moves large tool outputs into workspace artifacts and keeps only a preview in the prompt. | Off until enabled |
When agent loop context management is enabled and configured with a budget below
your model’s real context window, OmniCoreAgent acts before the provider context
limit is hit. The runtime checks context before the LLM call, not after an error.
active messages
-> context threshold check
-> truncate or summarize+truncate when needed
-> model call
This is why long tasks can keep moving without waiting for the provider to reject
an oversized prompt.
Layer 1: Session Memory
Session memory decides what historical messages are loaded when a new
agent.run() starts. This is the cross-request layer.
agent = OmniCoreAgent(
name="assistant",
system_instruction="You are helpful.",
model_config={"provider": "openai", "model": "gpt-4o"},
agent_config={
"memory_config": {
"mode": "sliding_window",
"value": 10000,
"summary": {"enabled": False},
}
},
)
Use persistent memory backends such as Redis, MongoDB, or SQL database storage
when session history must survive process restarts.
Layer 2: Agent Loop Context Management
Agent loop context management runs inside the ReAct loop. Before each model call,
OmniCoreAgent asks the context manager whether the current messages crossed the
configured threshold. If yes, it reduces the message list before the LLM request
is sent.
agent_config = {
"context_management": {
"enabled": True,
"mode": "token_budget", # or "sliding_window"
"value": 100000, # token budget or message count
"threshold_percent": 75,
"strategy": "summarize_and_truncate", # or "truncate"
"preserve_recent": 6,
}
}
With the config above, management triggers around 75,000 tokens. If the selected
model has a larger context window, the harness reduces context before the model
request reaches the provider limit.
What Is Preserved
| Part | Behavior |
|---|
| System prompt | Always preserved. |
| Recent messages | Preserved according to preserve_recent. |
| Middle history | Truncated, or summarized then truncated, depending on strategy. |
| Summary metadata | Added when summarize_and_truncate creates a context summary. |
Modes
| Mode | Description | Best For |
|---|
token_budget | Manage context by estimated token count. | Provider context limits and cost control. |
sliding_window | Manage context by message count. | Predictable, low-latency history windows. |
Strategies
| Strategy | Behavior | Trade-Off |
|---|
truncate | Drop older middle messages while preserving system and recent messages. | Fast and deterministic. |
summarize_and_truncate | Summarize older middle history, insert a summary message, then preserve recent messages. | Keeps more intent, adds an LLM call and latency. |
Tool offloading handles large individual tool responses. It keeps the agent from
burning context on a full payload when a preview and a file reference are enough
for the next reasoning step.
agent_config = {
"tool_offload": {
"enabled": True,
"threshold_tokens": 500,
"threshold_bytes": 2000,
"max_preview_tokens": 150,
"max_preview_lines": 10,
}
}
When a tool result crosses the configured threshold, OmniCoreAgent writes the
full payload to the active workspace artifacts/ area. The observation sent to
the model contains the preview and the artifact reference.
large tool result
-> workspace artifact
-> preview + artifact reference in the observation
The artifact uses the same workspace backend as the rest of the agent: local,
S3, or R2.
Artifact tools are available when offloading is enabled:
| Tool | Purpose |
|---|
read_artifact | Read the full offloaded payload. |
tail_artifact | Read the last lines of an artifact, useful for logs. |
search_artifact | Search inside an offloaded payload. |
list_artifacts | List artifacts available in the current workspace/session scope. |
Full Context Configuration
Use all layers together for long-running research, coding, data, and operational
agents:
agent = OmniCoreAgent(
name="research_agent",
system_instruction=(
"Use workspace files and artifact references for long tasks. "
"Read artifacts only when the full payload is needed."
),
model_config={"provider": "openai", "model": "gpt-4o"},
agent_config={
"memory_config": {
"mode": "sliding_window",
"value": 10000,
},
"context_management": {
"enabled": True,
"mode": "token_budget",
"value": 100000,
"threshold_percent": 75,
"strategy": "summarize_and_truncate",
"preserve_recent": 6,
},
"tool_offload": {
"enabled": True,
"threshold_tokens": 500,
"threshold_bytes": 2000,
},
},
)
Set context_management.value to a budget below your model’s real context
window. OmniCoreAgent checks the budget before each model call and reduces the
prompt when the threshold is crossed.