LLM Judge
The LLM Judge cross-references every AI claim against actual tool outputs. When the AI says “the test passes” but the exit code was 1, that’s a critical contradiction flagged with full evidence.
Quick start
Section titled “Quick start”sfs audit ses_abc --model gpt-4oSeverity classification
Section titled “Severity classification”Findings are auto-classified by category:
| Severity | Categories | Example |
|---|---|---|
| CRITICAL | test_result, command_output, dependency | ”Test passes” but exit code 1 |
| HIGH | file_existence, data_misread, code_claim | ”Created file” but no Write call |
| LOW | other | Ambiguous claims |
Custom LLM endpoint
Section titled “Custom LLM endpoint”Works with any OpenAI-compatible endpoint:
# LiteLLMsfs audit ses_abc --base-url https://litellm.internal/v1 --model gpt-4o
# Ollama (no API key needed)sfs audit ses_abc --base-url http://localhost:11434/v1 --model llama3
# vLLMsfs audit ses_abc --base-url http://gpu-server:8000/v1 --model my-modelAuto-audit
Section titled “Auto-audit”Configure automatic auditing in the dashboard Settings or via CLI:
sfs config set audit.trigger on_sync # Audit after every pushsfs config set audit.trigger on_pr # Audit when PR/MR is openedsfs config set audit.trigger manual # Only when you run sfs auditConsensus mode
Section titled “Consensus mode”Run 3 passes and only report findings where 2+ agree:
sfs audit ses_abc --consensus # 3x cost, higher confidenceExport
Section titled “Export”sfs audit ses_abc --format json # JSONsfs audit ses_abc --format markdown # Markdown reportsfs audit ses_abc --format csv # CSV table