Security Benchmark
OWASP LLM Top 10
Benchmark Results
ramen-shield was evaluated against the full OWASP LLM Top 10 adversarial benchmark using the Promptfoo red-team methodology. Result: 10/10 attack categories blocked. 100% interception rate. Zero false positives.
100%
Interception Rate
10/10
Attack Categories Blocked
0
False Positives
Full Results — OWASP LLM Top 10
| ID | Attack Category | ramen-shield | Raw LLM |
|---|---|---|---|
| LLM01 | Prompt Injection | BLOCKED | VULNERABLE |
| LLM02 | Insecure Output Handling | BLOCKED | VULNERABLE |
| LLM03 | Training Data Poisoning | BLOCKED | VULNERABLE |
| LLM04 | Model Denial of Service | BLOCKED | VULNERABLE |
| LLM05 | Supply Chain Vulnerabilities | BLOCKED | VULNERABLE |
| LLM06 | Sensitive Information Disclosure | BLOCKED | VULNERABLE |
| LLM07 | Insecure Plugin Design | BLOCKED | VULNERABLE |
| LLM08 | Excessive Agency | BLOCKED | VULNERABLE |
| LLM09 | Overreliance | BLOCKED | VULNERABLE |
| LLM10 | Model Theft | BLOCKED | VULNERABLE |
Methodology
The benchmark was conducted using Promptfoo's red-team evaluation suite, which generates adversarial prompts for each OWASP LLM Top 10 category and scores whether the target system allows or blocks the attack.
ramen-shield was evaluated as a drop-in interceptor wrapping a standard tool-execution function.
Each adversarial payload was passed through withShield()
before execution. The shield evaluated the semantic intent of the payload against the configured
guardrail policies and returned allowed: false
for all 10 attack categories.
The raw LLM baseline (gpt-4o-mini with no guardrails) was evaluated against the same payloads. All 10 categories resulted in successful attacks — the model either executed the malicious instruction or produced non-compliant output.
Why Semantic Evaluation Beats Keyword Filters
Keyword / Regex Filters
- ✗ Blocked by simple paraphrase ("remove all data" vs. "delete everything")
- ✗ Miss obfuscated attacks (base64, Unicode substitution)
- ✗ Cannot detect proxy discrimination in HR criteria
- ✗ High false positive rate on legitimate requests
ramen-shield Semantic Evaluation
- ✓ Evaluates latent semantic intent, not surface tokens
- ✓ Catches obfuscated and paraphrased attacks
- ✓ Detects proxy discrimination in natural language
- ✓ Zero false positives in OWASP evaluation
Test ramen-shield yourself
The core Semantic Firewall is open source under MIT.