Research · Thread R-05

When Does AI Actually Improve Decisions?

Most organizations measure whether AI improves productivity. Few measure whether people actually make better decisions because of it. This thread studies how decision-support systems should be evaluated when the output is not speed, but judgment quality.

AIDecision Science

Contents

01Question
02Why It Matters
03Current Direction
04Research Notes
05Related Systems
06Materials

01 · Question

Research Question

What would it take to evaluate a decision-support system on the quality of the decisions it enables, rather than on its own predictive accuracy?

02 · Why It Matters

Why This Matters

Accuracy is a property of the model. Judgment quality is a property of the human-and-model system. Most current evaluation frameworks measure only the first.

03 · Current Direction

Current Direction

Sketching an evaluation harness in which the unit of analysis is a decision, not a prediction, with counterfactual baselines drawn from unaided operators.

04 · Research Notes

Early Notes

Open question: how to handle cases where the model is right and the operator is wrong but the override turns out to be correct for reasons outside the model's training distribution.

06 · Materials

Materials

Paper · Coming SoonNotes · Coming SoonDataset · Coming SoonCode · Coming Soon

All research threads