Hackathon 2 — Path 1 of 3

Critique

Examine AI claims and outputs with an evaluator's eye

🕐 10–20 minutes · 3 options to choose from · Free AI tool of your choice

What this path involves

The Critique path asks you to bring your evaluation expertise to bear on AI outputs, claims, or competency frameworks. You'll use an AI tool as both subject and collaborator — examining what it gets right, what it misses, and what it might actively distort.

This isn't about finding flaws for sport. It's about developing and sharing the analytical habits evaluators need when working alongside AI. Your submission becomes a reusable resource for the community.

Choose your option

🕐 10–15 min

Red-Team a Claim

Take a claim about AI in evaluation — a statement about how AI should or shouldn't be used in evaluation practice — and systematically interrogate its assumptions, evidence base, and potential harms. The goal is to stress-test assertions about AI's role, value, or risks in the evaluation field.

Steps:

Identify a claim about AI in evaluation — something asserting how AI should or shouldn't be used in evaluation practice. Use one you've encountered in the field, generate one with an AI tool, or start from a provided example.
Ask the AI: "What assumptions underlie this claim about AI in evaluation?" — then push back on the ones it misses or glosses over.
Ask: "Who might this recommendation harm, and how?" Probe for blind spots specific to evaluation contexts (e.g. marginalized communities, low-resource settings, data privacy).
Draft a short critique: one paragraph summarizing what the claim gets right and where it falls short as guidance for evaluation practitioners.

Include in your submission:

The claim about AI in evaluation (copied or paraphrased)
Your one-line framing of the core issue with the claim
Evidence of your critique (pasted exchange, notes, or written summary)
A 140-character reflection on what surprised you

Can't think of a claim? Check out our 100% not fabricated claim library →

🕐 15–20 min

AEA Competencies Remix

The AEA Evaluator Competencies were written before AI became a practical tool in evaluation work. Pick one competency and make the case for how it should evolve — then use AI to help draft the updated version.

Steps:

Browse the AEA Evaluator Competencies Framework and select one competency that feels most in tension with — or most affected by — AI tools in practice.
Write 2–3 sentences on how you think the competency should be updated for the AI era: what should be added, reframed, or retired?
Paste your notes into an AI tool and ask it to draft a formally worded updated competency statement based on your recommendations.
Review the AI's draft. Revise it until it reflects your intent — or note where the AI pushed back in useful or unhelpful ways.

Include in your submission:

The original competency you selected
Your 2–3 sentence update rationale
The AI-assisted draft of the updated competency
A 140-character reflection on what this process revealed about AI as a co-author of professional standards

🕐 15–20 min

Case Snapshot: Failure or Success

Find a real instance where AI use in an evaluation context produced a notable outcome — a failure, a success, or something in between. Your case must be drawn from first-hand experience or a documented source (published report, peer account, incident you witnessed). Do not construct a hypothetical or ask an AI tool to fabricate a scenario. Document it as a brief structured case.

Steps:

Identify a case from your own practice (first-hand) or from a published report, peer account, or documented incident (second-hand). Do not construct a hypothetical or ask an AI tool to fabricate a scenario.
Note the source type so readers can calibrate how to use the case: first-hand experience, published report, peer account, etc.
Document the case using the "Case Snapshot" format: Context → AI tool used → What happened → Why it matters.
Add your evaluator's verdict: what should practitioners learn from this case?

Include in your submission:

The Case Snapshot (4-part format above)
Source type (first-hand, published report, peer account, etc.)
Your one-line verdict
Evidence of work (written case or link to source)
A 140-character reflection

Before you submit

Every Critique submission needs these four things:

One-line need or claim — a single sentence stating what you're examining or what you found (max 140 characters)
Evidence of work — your pasted exchange, critique notes, updated competency draft, or case snap (no minimum length, but be substantive)
Ethics check — confirm your submission contains no PII or real client data; any data referenced is synthetic or anonymized
140-character reflection — one tweet-length thought on what you learned or what surprised you

Submit your Critique entry →