sahil_mehta.

Live eval · last run 30d ago

The agent grades itself.

The same RAG agent that powers the homepage chat runs against 89 hand-designed recruiter cases on every build. Pass / fail per case is visible below — failures aren't hidden, they're cited.

All questions

88/89

99% pass

Mandatory

38/38

100% pass

Nice-to-have

37/37

100% pass

Out-of-scope

13/14

93% pass

Per-question results

Identity & background

Current role / Enidus

AI Copilot deep-dive

Custom Reports & Dashboards deep-dive

Carrier API Gateway / BFF

ClaudeJob deep-dive

Denari RAG capstone

Weather pipeline / distributed systems coursework

Earlier roles

Education

Engineering judgment

AI engineering specifics

Behavioral / soft

Career goals

Hardball / probing

Out-of-scope sanity checks