Blame Marilyn vos Savant. Back in 1990, Craig F. Whitaker of Columbia, Maryland wrote to her with a probability puzzle, and found he'd kicked up a hornet's nest! He asked, Suppose you’re on a game show, and you’re given the choice of three doors. Behind one door is a car, behind the others, goats. You pick a door, say #1, and the host, who knows what’s behind the doors, opens another door, say #3, which has a goat. He says to you, 'Do you want to pick door #2?' Is it to your advantage to switch your choice of doors?
Marilyn replied that, if you switch, your odds of winning the car are ⅔, and if you don't switch, your odds are only ⅓. It was difficult for many to believe. Even subsequent discussions about these probabilities, such as the Scam School episode on the Monty Hall Dilemma, find that the belief in a 50-50 chance prevails.
Despite all the numerous ways there are to explain it, practical demonstration if often the most effective way to see that the ⅔ odds of winning is correct.
Oxford Mathematics Professor Marcus Du Sautoy shows the effectiveness of practical demonstration to English comedian Alan Davies when it comes to the Monty Hall Dilemma:
While the practical demonstration in this video is effective, it's a little surprising that the switch approach won 4 times as often as it lost. This is one of the classic problems with using a small sample size (such as playing this game only 20 times). Over at Epanechnikov's blog entry on the Monty Hall Dilemma, he features a graph of repeated simulations that shows the problem with just 20 runs:
The probabilities don't even really start settling down to the calculations until about 300 trials have been run! To help see the true odds, why not use a computer to run thousands of simulations very quickly? Inspired by the spreadsheet approach used by Presh Talwalkar to simulate trials for a different probability puzzle, I decided to do the same for the Monty Hall Dilemma.
How do you set it up? The first column states the door which holds the car, and this is generated as a random integer ranging from 1 to 3. The second column states the door chosen by the player, and this is also generated as a random integer ranging from 1 to 3.
The next column is a little trickier. It's going to hold the door which is shown by the host, but there are restrictions on which door can be shown by the host, so we can't just randomly generate a number. The host will only show a door that was NOT chosen by the player and that the host knows will contain a goat. How do we communicate these restrictions to a spreadsheet? There are 2 cases to consider here. First, what happens when the door which contains the car DOESN'T match the door chosen by the player, such as when the car is behind door #1 and the player initially chooses door #2? In this case, the host can only show door #3. In fact, since 1 + 2 + 3 = 6, we can simply subtract the number of the door with the car and the number of the door chosen by the player from 6 to get the number of the door shown by the host.
That only works when the door chosen by the player and the door holding the car are different. What do we do when those two doors are the same? If the player chooses door #1 and the car is behind door #1, we can have the computer choose randomly between door #2 and door #3. A similar approach is used for the other 2 doors, of course. The final spreadsheet entry reads this way:
=IF(A6=B6,IF(RANDBETWEEN(1,2)=1,CHOOSE(A6,3,1,2),CHOOSE(A6,2,3,1)),6-A6-B6)Translated into English, that says, If the first two columns (the door hiding the car and the door chosen by the player) are the same, then choose a number, either 1 or 2, at random. If 1 is chosen, look at the number of the door hiding the car, and choose that item from the following list of numbers: 3, 1, 2 (So, if door #1 is hiding the car, choose the 1st number, 3, and so on). If 2 is chosen, look at the number of the door hiding the car, and choose that item from the following list of numbers: 2, 1, 3 (So, if door #1 is hiding the car, choose the 1st number, 2, and so on). Finally, if the first two columns don't match, just take the number 6, subtract the number of the door hiding the car, then subtract the number of the door chosen by the player, and use that as number of the door shown by the host.
Fortunately, the final 2 columns are easier. The 4th column shows either a 1 if the players wins without switching, and a 0 if the player loses by not switching. Since the player only wins without switching when they chose the door containing the car initially, this column is only a 1 if the first 2 columns have the same number. For the opposite case, the 5th column shows either a 1 if the players wins by switching, and a 0 if the player loses by switching. In this case, a 1 is displayed only if the first 2 columns have different numbers.
Once we've got those columns set up as described above, we can copy them for as many trials as desired! I've set this up on Google Sheets to run 10,000 trials. The results are reported at the top, and take a few seconds to run (keep an eye on the progress bar in the upper right which will disappear when all the calculations are finished). For the "Player stays & wins" percentage calculation, the spreadsheet totals up all the 1s and 0s in the 4th column and divides by 10,000. For the "Player switches & wins" percentage calculation, the spreadsheet does the same thing for the 1s and 0s in the 5th column.
What do you think? Are 10,000 trials enough to convince you of the proper odds of the Monty Hall dilemma?