When Does AI Actually Improve Decisions?
Most organizations measure whether AI improves productivity. Few measure whether people actually make better decisions because of it. This thread studies how decision-support systems should be evaluated when the output is not speed, but judgment quality.
Research Question
What would it take to evaluate a decision-support system on the quality of the decisions it enables, rather than on its own predictive accuracy?
Why This Matters
Accuracy is a property of the model. Judgment quality is a property of the human-and-model system. Most current evaluation frameworks measure only the first.
Current Direction
Sketching an evaluation harness in which the unit of analysis is a decision, not a prediction, with counterfactual baselines drawn from unaided operators.
Early Notes
Open question: how to handle cases where the model is right and the operator is wrong but the override turns out to be correct for reasons outside the model's training distribution.