The LLM Judge cross-references every AI claim against actual tool outputs. When the AI says "the test passes" but the exit code was 1 — that's a critical contradiction, flagged with full evidence.
Severity-classified findings: CRITICAL for test failures and failed commands. HIGH for file mismatches and data misreads. Every finding links to the exact message.
BYOK — your key, never stored. Works with any OpenAI-compatible endpoint: OpenRouter, LiteLLM, vLLM, Ollama.
Read the docsCRITICAL for test failures and failed commands. HIGH for file mismatches. LOW for ambiguous claims. Auto-assigned from category — no LLM judgment on severity, fully deterministic.
Trigger audits automatically: on every sync push, when a PR/MR is opened, or manual only. Configure the trigger mode in dashboard Settings.
Every audit stored in the database. Compare runs over time, track trust score trends, and see which models produce better results.
BYOK means your API key is never sent to SessionFS servers. Connect directly to any OpenAI-compatible endpoint — your company gateway, a local Ollama instance, or an external provider.