Live eval · last run 30d ago
The agent grades itself.
The same RAG agent that powers the homepage chat runs against 89 hand-designed recruiter cases on every build. Pass / fail per case is visible below — failures aren't hidden, they're cited.
All questions
88/89
99% pass
Mandatory
38/38
100% pass
Nice-to-have
37/37
100% pass
Out-of-scope
13/14
93% pass
Per-question results
Identity & background
Current role / Enidus
AI Copilot deep-dive
Custom Reports & Dashboards deep-dive
Carrier API Gateway / BFF
ClaudeJob deep-dive
Denari RAG capstone
Weather pipeline / distributed systems coursework
Earlier roles
Education
Engineering judgment
AI engineering specifics
Behavioral / soft
Career goals
Hardball / probing
Out-of-scope sanity checks