Thursday, May 17, 2007

crimes and misdestimation

Story on an NYT blog today about a woman in the Netherlands who may be serving a life sentence because of an error in statistical reasoning and an example of sampling on the dependent variable.

Although the physicist explaining the error makes an error of his own:
[S]uppose that police pick up a suspect and match his or her DNA to evidence collected at a crime scene. Suppose that the likelihood of a match, purely by chance, is only 1 in 10,000. Is this also the chance that they are innocent? It’s easy to make this leap, but you shouldn’t.

Here’s why. Suppose the city in which the person lives has 500,000 adult inhabitants. Given the 1 in 10,000 likelihood of a random DNA match, you’d expect that about 50 people in the city would have DNA that also matches the sample. So the suspect is only 1 of 50 people who could have been at the crime scene. Based on the DNA evidence only, the person is almost certainly innocent, not certainly guilty.
The error is that if the police had picked the person up as a suspect completely at random and found that their DNA had a 1 in 10,000 match to that found at the scene of the crime, then, yes, the person is most likely innocent. But, police tend to pick up suspects for nonrandom reasons, and the more the nonrandom reason is related to the actual probability that the person is the culprit, the less relevant the 1 in 50 calculation is and the more relevant the 1 in 10,000 probability is. Because there isn't a neat way of synthesizing this into a new probability estimate, people jump from one bad way of reasoning about the problem to another bad way of reasoning about the problem.


Gabriel said...

as 1 in 50 is less than .05 even in the relaxed interpretation the suspect's guilt would pass peer review.

Anonymous said...

Doesn't the physicist cover the possibility of non-random elements by mentioning 'based on DNA only'. Usually when a scenario like this is described in probability books the author invents several pieces of evidence, assigns probabilities to each, and then considers the joint probability. Would you still consider that an error in reasoning? (I suppose there's always the argument that there are unmeasured "intuitions" that police have when making an arrest)

jeremy said...

Yeah, I guess what seems dodgy to me is that he talks about police picking up a suspect and then inserts that "only" later.

OmahaJoe said...

A better example of this problem is the challenge in trolling bank records for potential terrorist activity. Here, there is no "non-random" leg-work, so even if the hueristic is 99% accurate, that still leaves a large probability that a suspected terrorist is actually not.

Anonymous said...

Part of the problem, I think, is worse than what Jeremy pointed out--that is, the power of proof for biological evidence is often overstated. Whenever it comes to DNA, people forget their undergraduate multivariate statistics entirely. The trial would be more fair if other pieces of evidence are also taken into account. The suspect deserves a more thorough scrutiny than just a lab test.