Features / LLM Judge

Your AI's claims.
Verified.

The LLM Judge cross-references every AI claim against actual tool outputs. When the AI says "the test passes" but the exit code was 1 — that's a critical contradiction, flagged with full evidence.

Severity-classified findings: CRITICAL for test failures and failed commands. HIGH for file mismatches and data misreads. Every finding links to the exact message.

BYOK — your key, never stored. Works with any OpenAI-compatible endpoint: OpenRouter, LiteLLM, vLLM, Ollama.

Read the docs

audit report

Trust Score

74%

3 contradictions · 9 unverified · 42 verified

CRITICAL test_result

AI: "The test passes successfully"

FAILED tests/test_auth.py
assert status_code == 200, got 401
exit code: 1

HIGH file_existence

AI: "I've created validator.py"

Reality: No Write call to that path

How the Judge works

Severity classification

CRITICAL for test failures and failed commands. HIGH for file mismatches. LOW for ambiguous claims. Auto-assigned from category — no LLM judgment on severity, fully deterministic.

Auto-audit

Trigger audits automatically: on every sync push, when a PR/MR is opened, or manual only. Configure the trigger mode in dashboard Settings.

Audit history

Every audit stored in the database. Compare runs over time, track trust score trends, and see which models produce better results.

Your key. Your endpoint.

BYOK means your API key is never sent to SessionFS servers. Connect directly to any OpenAI-compatible endpoint — your company gateway, a local Ollama instance, or an external provider.

✓ OpenRouter (multi-model consensus)

✓ LiteLLM / vLLM (internal gateway)

✓ Ollama (local, fully air-gapped)

✓ Azure OpenAI, Anthropic, Google

judge config

$ sfs judge run ses_abc \

--model gpt-4o \

--base-url https://litellm.company.com/v1 \

--key $COMPANY_GATEWAY_KEY

Running audit... 54 claims extracted.

Trust score: 74% · 3 contradictions found.