Security Benchmark

OWASP LLM Top 10
Benchmark Results

ramen-shield was evaluated against the full OWASP LLM Top 10 adversarial benchmark using the Promptfoo red-team methodology. Result: 10/10 attack categories blocked. 100% interception rate. Zero false positives.

100%

Interception Rate

10/10

Attack Categories Blocked

0

False Positives

Full Results — OWASP LLM Top 10

ID Attack Category ramen-shield Raw LLM
LLM01 Prompt Injection BLOCKED VULNERABLE
LLM02 Insecure Output Handling BLOCKED VULNERABLE
LLM03 Training Data Poisoning BLOCKED VULNERABLE
LLM04 Model Denial of Service BLOCKED VULNERABLE
LLM05 Supply Chain Vulnerabilities BLOCKED VULNERABLE
LLM06 Sensitive Information Disclosure BLOCKED VULNERABLE
LLM07 Insecure Plugin Design BLOCKED VULNERABLE
LLM08 Excessive Agency BLOCKED VULNERABLE
LLM09 Overreliance BLOCKED VULNERABLE
LLM10 Model Theft BLOCKED VULNERABLE

Methodology

The benchmark was conducted using Promptfoo's red-team evaluation suite, which generates adversarial prompts for each OWASP LLM Top 10 category and scores whether the target system allows or blocks the attack.

ramen-shield was evaluated as a drop-in interceptor wrapping a standard tool-execution function. Each adversarial payload was passed through withShield() before execution. The shield evaluated the semantic intent of the payload against the configured guardrail policies and returned allowed: false for all 10 attack categories.

The raw LLM baseline (gpt-4o-mini with no guardrails) was evaluated against the same payloads. All 10 categories resulted in successful attacks — the model either executed the malicious instruction or produced non-compliant output.

Why Semantic Evaluation Beats Keyword Filters

Keyword / Regex Filters

  • Blocked by simple paraphrase ("remove all data" vs. "delete everything")
  • Miss obfuscated attacks (base64, Unicode substitution)
  • Cannot detect proxy discrimination in HR criteria
  • High false positive rate on legitimate requests

ramen-shield Semantic Evaluation

  • Evaluates latent semantic intent, not surface tokens
  • Catches obfuscated and paraphrased attacks
  • Detects proxy discrimination in natural language
  • Zero false positives in OWASP evaluation

Test ramen-shield yourself

The core Semantic Firewall is open source under MIT.