Skip to main content

Prompt Injection Guardrails

Protect your agents against malicious inputs, jailbreaks, and instruction overrides before they reach the LLM.
agent_config = {
    "guardrail_config": {
        "strict_mode": True,      # Block all suspicious inputs
        "sensitivity": 0.85,      # 0.0 to 1.0 (higher = more sensitive)
        "enable_pattern_matching": True,
        "enable_heuristic_analysis": True
    }
}

agent = OmniCoreAgent(..., agent_config=agent_config)

# If a threat is detected:
# result['response'] -> "I'm sorry, but I cannot process this request..."
# result['guardrail_result'] -> Full metadata about the detected threat

Key Protections

  • Instruction Overrides: “Ignore previous instructions…”
  • Jailbreaks: DAN mode, roleplay escapes, etc.
  • Toxicity & Abuse: Built-in pattern recognition.
  • Payload Splitting: Detects fragmented attack attempts.

Configuration Options

ParameterTypeDefaultDescription
strict_modeboolFalseWhen True, any detection (even low confidence) blocks the request.
sensitivityfloat1.0Scaling factor for threat scores (0.0 to 1.0). Higher = more sensitive.
max_input_lengthint10000Maximum allowed query length before blocking.
enable_encoding_detectionboolTrueDetects base64, hex, and other obfuscation attempts.
enable_heuristic_analysisboolTrueAnalyzes prompt structure for typical attack patterns.
enable_sequential_analysisboolTrueChecks for phased attacks across multiple tokens.
enable_entropy_analysisboolTrueDetects high-entropy payloads common in injections.
allowlist_patternslist[]List of regex patterns that bypass safety checks.
blocklist_patternslist[]Custom regex patterns to always block.
Always enable guardrails in user-facing applications to prevent prompt injection attacks and ensure agent reliability.