← Research
Research · Thread R-05

When Does AI Actually Improve Decisions?

Most organizations measure whether AI improves productivity. Few measure whether people actually make better decisions because of it. This thread studies how decision-support systems should be evaluated when the output is not speed, but judgment quality.

AIDecision Science
Contents
  1. 01Question
  2. 02Why It Matters
  3. 03Current Direction
  4. 04Research Notes
  5. 05Related Systems
  6. 06Materials
01 · Question

Research Question

What would it take to evaluate a decision-support system on the quality of the decisions it enables, rather than on its own predictive accuracy?

02 · Why It Matters

Why This Matters

Accuracy is a property of the model. Judgment quality is a property of the human-and-model system. Most current evaluation frameworks measure only the first.

03 · Current Direction

Current Direction

Sketching an evaluation harness in which the unit of analysis is a decision, not a prediction, with counterfactual baselines drawn from unaided operators.

04 · Research Notes

Early Notes

Open question: how to handle cases where the model is right and the operator is wrong but the override turns out to be correct for reasons outside the model's training distribution.

06 · Materials

Materials

Paper · Coming SoonNotes · Coming SoonDataset · Coming SoonCode · Coming Soon