Features / LLM Judge

Your AI's claims.
Verified.

The LLM Judge cross-references every AI claim against actual tool outputs. When the AI says "the test passes" but the exit code was 1 — that's a critical contradiction, flagged with full evidence.

Severity-classified findings: CRITICAL for test failures and failed commands. HIGH for file mismatches and data misreads. Every finding links to the exact message.

BYOK — your key, never stored. Works with any OpenAI-compatible endpoint: OpenRouter, LiteLLM, vLLM, Ollama.

Read the docs
audit report
Trust Score
74%
3 contradictions · 9 unverified · 42 verified
CRITICAL test_result
AI: "The test passes successfully"
FAILED tests/test_auth.py
assert status_code == 200, got 401
exit code: 1
HIGH file_existence
AI: "I've created validator.py"
Reality: No Write call to that path

How the Judge works

Severity classification

CRITICAL for test failures and failed commands. HIGH for file mismatches. LOW for ambiguous claims. Auto-assigned from category — no LLM judgment on severity, fully deterministic.

Auto-audit

Trigger audits automatically: on every sync push, when a PR/MR is opened, or manual only. Configure the trigger mode in dashboard Settings.

Audit history

Every audit stored in the database. Compare runs over time, track trust score trends, and see which models produce better results.

Your key. Your endpoint.

BYOK means your API key is never sent to SessionFS servers. Connect directly to any OpenAI-compatible endpoint — your company gateway, a local Ollama instance, or an external provider.

OpenRouter (multi-model consensus)
LiteLLM / vLLM (internal gateway)
Ollama (local, fully air-gapped)
Azure OpenAI, Anthropic, Google
judge config
$ sfs judge run ses_abc \
--model gpt-4o \
--base-url https://litellm.company.com/v1 \
--key $COMPANY_GATEWAY_KEY
Running audit... 54 claims extracted.
Trust score: 74% · 3 contradictions found.
Report saved. View: sfs judge report ses_abc