False Positive Rate Reflects Condition Rate
Posted by jeremy on August 25th, 2009I just watched a good TED speech by Peter Donnelly, a statistician who explains that most people have just enough knowledge of statistics to be dangerous. Among other examples, he walks through claims of test accuracy and how deceiving they can be. He leads us through the following hypothetical example with HIV:
In a population of 1,000,000, 100 are HIV-positive; 999,900 are HIV-negative.
A test for HIV is 99% accurate.
What is the chance that a person who tests positive is actually negative?
About 99%.
See, out of the 100 people who are actually positive, 99 will test positive and 1 will test negative – that’s 99% accuracy. Of the 999,900 people who are negative, 989,901 will test negative while 9,999 (1 out of every 100) will test positive even though they are actually negative.
So then we have 10,098 people who tested positive, but 9,999 of them are actually negative. 9,999 / 10,098 = .9902. About 99% of the positive results are false even though the test has 99% accuracy.
Weird, eh? The BBC draws on this principle in an editorial about cameras that supposedly identify terrorists. And I heard that a running magazine completed a similar analysis of tests for performance enhancing drugs: If there are 100 dopers out of 10,000 runners, and the test is 99% accurate, half of those caught will be innocent.
But don’t make the mistake of assuming that accuracy statistics are meaningless; on the contrary, they can be very useful if we know the degree to which the target condition exists in the tested sample. Donnelly’s hypothetical HIV test produced so many false positives because HIV was actually very rare in his hypothetical sample.
Suppose we develop a test for a condition that is more common than HIV – say, a DNA test for left-handedness – and this test is 99% accurate. About 10% of the population is left-handed, but 8.3% (1 out of 12) of those our test identifies as left-handed are actually right-handed. This still may not be what you expect from a 99% accurate test, but it’s a whole lot better than our HIV test that also claimed 99% accuracy. In other words, even if two tests have equal accuracy, the one that tests for the more common condition will have a lower ratio of false positives to true positives.
With a 99% accurate DNA test for an even-more-common condition – femaleness – we will get results very close to the claimed accuracy because being female is more common than having HIV or being left-handed. In fact, if we assume (wrongly, I know) that each sex occurs equally in the world, 1% of those identified as males will actually be females, and 1% of those identified as females will be males. That’s pretty good.
So, we just need to make sure to only test for common conditions and we won’t have this problem, right? Well, no. See, the tradeoff is that, as the condition becomes more common, your rate of false negatives increases. This figure shows how 99% accurate tests produce different percentages of false positives and false negatives depending on how common or rare the condition is in the tested sample.


