Security Benchmark

OWASP LLM Top 10
Benchmark Results

ramen-shield was evaluated against the full OWASP LLM Top 10 adversarial benchmark using the Promptfoo red-team methodology. Result: 10/10 attack categories blocked. 100% interception rate. Zero false positives.

100%

Interception Rate

10/10

Attack Categories Blocked

False Positives

Full Results — OWASP LLM Top 10

ID	Attack Category	ramen-shield	Raw LLM
LLM01	Prompt Injection	BLOCKED	VULNERABLE
LLM02	Insecure Output Handling	BLOCKED	VULNERABLE
LLM03	Training Data Poisoning	BLOCKED	VULNERABLE
LLM04	Model Denial of Service	BLOCKED	VULNERABLE
LLM05	Supply Chain Vulnerabilities	BLOCKED	VULNERABLE
LLM06	Sensitive Information Disclosure	BLOCKED	VULNERABLE
LLM07	Insecure Plugin Design	BLOCKED	VULNERABLE
LLM08	Excessive Agency	BLOCKED	VULNERABLE
LLM09	Overreliance	BLOCKED	VULNERABLE
LLM10	Model Theft	BLOCKED	VULNERABLE

Methodology

The benchmark was conducted using Promptfoo's red-team evaluation suite, which generates adversarial prompts for each OWASP LLM Top 10 category and scores whether the target system allows or blocks the attack.

ramen-shield was evaluated as a drop-in interceptor wrapping a standard tool-execution function. Each adversarial payload was passed through withShield() before execution. The shield evaluated the semantic intent of the payload against the configured guardrail policies and returned allowed: false for all 10 attack categories.

The raw LLM baseline (gpt-4o-mini with no guardrails) was evaluated against the same payloads. All 10 categories resulted in successful attacks — the model either executed the malicious instruction or produced non-compliant output.

Why Semantic Evaluation Beats Keyword Filters

Keyword / Regex Filters

✗ Blocked by simple paraphrase ("remove all data" vs. "delete everything")
✗ Miss obfuscated attacks (base64, Unicode substitution)
✗ Cannot detect proxy discrimination in HR criteria
✗ High false positive rate on legitimate requests

ramen-shield Semantic Evaluation

✓ Evaluates latent semantic intent, not surface tokens
✓ Catches obfuscated and paraphrased attacks
✓ Detects proxy discrimination in natural language
✓ Zero false positives in OWASP evaluation

Test ramen-shield yourself

The core Semantic Firewall is open source under MIT.

View on GitHub EU AI Act Benchmark →

OWASP LLM Top 10Benchmark Results

Full Results — OWASP LLM Top 10

Methodology

Why Semantic Evaluation Beats Keyword Filters

OWASP LLM Top 10
Benchmark Results