Open source · Free forever

Find what your AI is hiding.

Point autoredteam at any model, agent, or AI workflow. Get a behavioral benchmark across 19 attack categories in minutes. No security expertise required.

pip install glacis-autoredteam

terminal
# Point at any AI system. Get behavioral benchmarks.
$ pip install glacis-autoredteam

$ autoredteam run --provider openai --model gpt-4o-mini

Provider: openai
Model: gpt-4o-mini
Packs: generic_taxonomy
Probes: 38

prompt_injection blocked
jailbreak blocked
pii_extraction bypassed
system_prompt_leakage blocked
encoding_bypass bypassed

Total probes: 38
Bypassed: 6 (15.8% ASR)
Governance: 72/100 (Tier 3)
Why autoredteam

The only open-source red-teaming tool that attacks, hardens, and proves improvement in a single loop.

19 Attack Categories

Prompt injection, jailbreak, PII extraction, system prompt leakage, hallucination exploits, tool misuse, encoding bypass, and 12 more.

Autonomous Hardening

Discovers vulnerabilities, clusters root causes, generates countermeasures, and verifies they work. Loops until governance score hits target.

Cryptographic Evidence

Every attack, score, and hardening decision is SHA-256 hash-chained. Tamper-evident, locally verifiable, no data egress.

Multi-Provider Targets

OpenAI, Anthropic, Google, Azure, AWS Bedrock, Cloudflare Workers, and any OpenAI-compatible endpoint. One tool, every model.

Immune System Loop

Collects bypass examples as training data. Retrain your judge and defender on what broke them. The system learns from its own failures.

Governance Scoring

Findings map to a 0–1000 governance score with named tiers: Insurability Line, Regulatory Floor, Enterprise Gate, Best-in-Class.

Attack Surface

Every probe is scored, hash-chained, and mapped to a governance dimension.

Prompt Injection Jailbreak System Prompt Leakage PII Extraction Role Confusion Tool Misuse Hallucination Exploit Ethical Bypass Multi-Turn Manipulation Authority Manipulation Encoding Bypass Payload Splitting Social Engineering Indirect Injection Refusal Suppression Context Window Poisoning Continuation Attack Multilingual Attack Output Formatting Exploit
How It Works

Four stages, fully autonomous, cryptographically attested.

01 ATTACK

Probe

Generate adversarial attacks across 19 categories with multi-turn trajectories and mutation for diversity.

02 SCORE

Evaluate

Deterministic pipeline plus optional SLM judge. Four-component scoring: breadth, depth, novelty, reliability.

03 HARDEN

Fix

Cluster vulnerabilities by root cause. Generate countermeasures. Apply and verify with before/after ASR delta.

04 PROVE

Attest

Every finding is hash-chained into a tamper-evident attestation record. Your compliance artifact builds itself.

You found the risks. Now what?

autoredteam discovers what’s wrong. Glacis Enforce stops it. Glacis Notary proves it.

Discover

autoredteam — free, open source. Point-in-time assessment. You’re here.

Stop

Enforce — AI safety guardrails that block bad outputs before they reach users. From $49/mo.

Prove

Notary — cryptographic attestation for every decision. Proof builds itself. From $499/mo.

Not a developer?

We run automated red-teaming and hardening for your AI systems as a managed service. No terminal required.

Talk to us →
Start in 30 seconds.

Free, open source, no account required. Point it at your AI and see what you find.

pip install glacis-autoredteam