1

Simple math? Not so simple

Published on Sunday, May 10, 2015 in , ,

mattbuck's Bayes' Theorem neon sign photoJust over a month ago, TheWeek.com posted an article titled The simple math problem that blows apart the NSA's surveillance justifications. It concerned the probability of detecting terrorists, when you have a near-perfect terrorist-detecting machine.

It turns out that the simple math isn't so simple.

Let's start with the question itself:

Suppose one out of every million people is a terrorist (if anything, an overestimate), and you've got a machine that can determine whether someone is a terrorist with 99.9 percent accuracy. You've used the machine on your buddy Jeff Smith, and it gives a positive result. What are the odds Jeff is a terrorist?
A better way to state the question is, “Given that the machine has identified Jeff as a terrorist, what is the probability Jeff is actually a terrorist?” Questions like this are known as conditional probabilities, and it turns out that Bayes' Theorem helps answer questions like this very effectively. If you're not already familiar with Bayes' theorem, read that post and watch the videos to better understand it before proceeding.

Unfortunately, the linked article above doesn't employ such computations, so we have to go about it ourselves. Let's assume the 99.9% (0.999) accuracy of the machine applies to detecting not only terrorists, but to identifying innocent people, as well. In turn, that means that the machine has a 0.1% (0.001) chance of identifying an innocent person as a terrorist, or identifying a terrorist as an innocent person. So, we have four different probabilities:

Chance that an actual innocent is identified as a terrorist: 0.001 (False +)
Chance that an actual innocent is NOT identified as a terrorist: 0.999 (True -)
Chance that an actual terrorist is identified as a terrorist: 0.999 (True +)
Chance that an actual terrorist is NOT identified as a terrorist: 0.001 (False -)

Let's put these numbers in the following table:

  Is a terrorist Is innocent
Identified as terrorist 0.999 (True +) 0.001 (False +)
Identified as innocent 0.001 (False -) 0.999 (True -)

Now that we've got the probabilities in order, let's see what happens when 1 terrorist and 999,999 innocent people are thrown into the mix. We'll multiply both entries in the “Is a terrorist” column by 1, to represent the 1 terrorist, and both entries in the “Is innocent” column by 999,999, to represent the 999,999 innocent people:

  Is a terrorist Is innocent
Identified as terrorist 0.999 (1 × 0.999) 999.999 (999,999 × 0.001)
Identified as innocent 0.001 (1 × 0.001) 998,999.001 (999,999 × 0.999)

We can double-check that the table has been correctly constructed, because all the numbers add up to 1 million. This covers all the data, so now we're ready to tackle the original question.

Remember that the question itself is “Given that the machine has identified Jeff as a terrorist, what is the probability Jeff is actually a terrorist?” In other words, we aren't concerned with the possibility of being identified as an innocent, as identification as a terrorist is already a given. All we have to do here is trim the “Identified as innocent” row out of the table completely:

  Is a terrorist Is innocent
Identified as terrorist 0.999 999.999

At this point, don't forget the basic probability formula: Probability = (targeted outcome) ÷ (total possibilities). What are the total possibilities here? 0.999 + 999.999 = 1000.998. What is the targeted outcome? It's that Jeff is a terrorist, which is 0.999. So, the probability is 0.999 ÷ 1000.998 ≈ 0.000998, or about a 0.0998% chance.

In more practical terms, once the 99.9% accurate machine has identified Jeff has a terrorist, there's still only a 1 in 1,002 chance that he's actually a terrorist! Granted, this isn't radically different from the 1 in 1,000 chance posted in the original article. However, in math, the path you take is just as important as the results.