The Heisenberg Principle in Evaluations
When great emphasis is placed on one element of an evaluation, people will focus on that measure, sometimes with good results that are in line with a desired objective, and sometimes to game the system and cheat.
New York Times Columnist Eduardo Porter says that phenomenon is best known as Goodhart’s Law, after British economist Charles Goodhart. Porter adds that Luis Garicano, of the London School of Economics, calls it the “Heisenberg Principle of incentive design,” likening it to the Heisenberg uncertainty principle in quantum mechanics. As Porter summarizes, “a performance metric is only useful as a performance metric as long as it isn’t being used a performance metric.”
Porter’s recent column cites several examples of Goodhart’s Law. In 2004 the Chinese government decreed that there should be far fewer accidental deaths, and provincial authorities began a “no safety, no promotion” campaign, which tied bureaucrats’ fates to accidental death rates. In seven years the death rate dropped in half. But scholars who studied the figures found that local officials had gamed the system. People severely injured in traffic accidents were counted as accidental deaths if they died within seven days. Officials who arranged to have the victims kept alive for eight days improved their statistics. Porter writes that U.S. hospitals have been known to do “whatever it takes” to keep fragile patients alive at least 31 days after an operation, to beat the Medicare 30-day survival yardstick. Further, Porter writes, Chicago magazine found that the city was able to report a reduced crime rate because some incidents were reclassified as non-criminal.
American education has begun an experiment in incentive design, Porter says, in which most states have established teacher evaluation systems based on gains their student make on standardized tests, along with some more conventional criteria such as evaluations by principals. The relevance of testing is based on sophisticated research. A study by Columbia Professor Jonah Rokoff and two Harvard colleagues, Raj Chetty and John Freidman, found teachers who improved student test scores—called high value added teachers—raised chances for student success in higher education and careers. But heavy reliance on testing has been extremely controversial and generated heated debate about the impact on children, and about whether education becomes less meaningful if there is relentless focus on testing success.
Porter doesn’t take sides on specific matters of research and testing. But he says it’s a good idea to keep Goodbhart’s Law in mind because when the fate of individual teachers and schools depends on high stakes testing, the temptation for bad behavior is high. Several districts across the country have been accused of blatant cheating on tests, and others have used more subtle manipulations to create illusions of improvement. A recent New York Post story and Diane Ravitch’s blog showed state test scores artificially elevated by deletion of four questions large numbers of students got wrong or left blank.
So will schools face massive unintended consequences as states institute more fully developed teacher evaluation systems? Porter quotes educators who say while evaluations are necessary, any system needs to be examined for unintended consequences. He quotes Professor Rockoff, who has defended the results of his own study, on preconditions for successful evaluations: “The obvious answer is do not put too much weight on any single measure.” Read Porter’s column here.