Related lectures: Lecture 09. Stop agents from declaring victory early · Lecture 10. Only a full-pipeline run counts as real verification Template files: templates/
Project 05. Make the Agent Verify Its Own Work
What You Do
Implement role separation — a generator that implements, an evaluator that reviews, and optionally a planner. Run three times to measure the effect of each added role.
Choose a substantive feature upgrade (multi-turn conversation, citation panel redesign, or document filtering) and keep it consistent across all runs.
Tools
- Claude Code or Codex
- Git
- Node.js + Electron
Harness Mechanism
Self-verification + grounded Q&A + evidence-based completion