Open source · Free forever

Find what your AI is hiding.

Name: autoredteam
Author: Glacis

Point autoredteam at any model, agent, or AI workflow. Get a behavioral benchmark across 19 attack categories in minutes. No security expertise required.

View on GitHub How it works

pip install glacis-autoredteam

terminal

# Point at any AI system. Get behavioral benchmarks.

$ pip install glacis-autoredteam

$ autoredteam run --provider openai --model gpt-4o-mini

Provider: openai

Model: gpt-4o-mini

Packs: generic_taxonomy

Probes: 38

✓ prompt_injection — blocked

✓ jailbreak — blocked

✗ pii_extraction — bypassed

✓ system_prompt_leakage — blocked

▵ encoding_bypass — bypassed

Total probes: 38

Bypassed: 6 (15.8% ASR)

Governance: 72/100 (Tier 3)

Capabilities

Why autoredteam

The only open-source red-teaming tool that attacks, hardens, and proves improvement in a single loop.

19 Attack Categories

Prompt injection, jailbreak, PII extraction, system prompt leakage, hallucination exploits, tool misuse, encoding bypass, and 12 more.

Autonomous Hardening

Discovers vulnerabilities, clusters root causes, generates countermeasures, and verifies they work. Loops until governance score hits target.

Cryptographic Evidence

Every attack, score, and hardening decision is SHA-256 hash-chained. Tamper-evident, locally verifiable, no data egress.

Multi-Provider Targets

OpenAI, Anthropic, Google, Azure, AWS Bedrock, Cloudflare Workers, and any OpenAI-compatible endpoint. One tool, every model.

Immune System Loop

Collects bypass examples as training data. Retrain your judge and defender on what broke them. The system learns from its own failures.

Governance Scoring

Findings map to a 0–1000 governance score with named tiers: Insurability Line, Regulatory Floor, Enterprise Gate, Best-in-Class.

Coverage

Attack Surface

Every probe is scored, hash-chained, and mapped to a governance dimension.

Prompt Injection Jailbreak System Prompt Leakage PII Extraction Role Confusion Tool Misuse Hallucination Exploit Ethical Bypass Multi-Turn Manipulation Authority Manipulation Encoding Bypass Payload Splitting Social Engineering Indirect Injection Refusal Suppression Context Window Poisoning Continuation Attack Multilingual Attack Output Formatting Exploit

Process

How It Works

Four stages, fully autonomous, cryptographically attested.

01 ATTACK

Probe

Generate adversarial attacks across 19 categories with multi-turn trajectories and mutation for diversity.

02 SCORE

Evaluate

Deterministic pipeline plus optional SLM judge. Four-component scoring: breadth, depth, novelty, reliability.

03 HARDEN

Fix

Cluster vulnerabilities by root cause. Generate countermeasures. Apply and verify with before/after ASR delta.

04 PROVE

Attest

Every finding is hash-chained into a tamper-evident attestation record. Your compliance artifact builds itself.

What Happens Next

You found the risks. Now what?

autoredteam discovers what’s wrong. Glacis Enforce stops it. Glacis Notary proves it.

Discover

autoredteam — free, open source. Point-in-time assessment. You’re here.

Stop

Enforce — AI safety guardrails that block bad outputs before they reach users. From $49/mo.

Prove

Notary — cryptographic attestation for every decision. Proof builds itself. From $499/mo.