probability


3
Dec 11

The first thing you learned about probability is wrong*


*or dangerously incomplete.

I’ve just started reading Against the Gods: The remarkable Story of Risk, a book by Peter Bernstein that’s been high on my “To Read” list for a while. I suspect it will be quite interesting, though it’s clearly targeted at a general audience with no technical background. In Chapter 1 Bernstein makes the distinction between games which require some skill, and games of pure chance. Of the latter, Bernstein notes:

“The last sequence of throws of the dice conveys absolutely no information about what the next throw will bring. Cards, coins, dice, and roulette wheels have no memory.”

This is, often, the very first lesson that gets presented in a book or a lecture on probability theory. And, so far as theory goes it’s correct. For that celestially perfect fair coin, the odds of getting heads remain forever fixed at 1 to 1, toss after platonic toss. The coin has no memory of its past history. As a general rule, however, to say that the last sequence tells you nothing about what the next throw will bring is dangerously inaccurate.

In the real world, there’s no such thing as a perfectly fair coin, die, or computer-generated random number. Ok, I see you growling at your computer screen. Yes, that’s a very obvious point to make. Yes, yes, we all know that our models aren’t perfect, but they are very close approximations and that’s good enough, right? Perhaps, but good enough is still wrong, and assuming that your theory will always match up with reality in a “good enough” way puts you on the express train to ruin, despair and sleepless nights.

Let’s make this a little more concrete. Suppose you have just tossed a coin 10 times, and 6 out of the ten times it came up heads. What is the probability you will get heads on the very next toss? If you had to guess, using just this information, you might guess 1/2, despite the empirical evidence that heads is more likely to come up.

Now suppose you flipped that same coin 10,000 times and it came up heads exactly 6,000 times. All of a sudden you have a lot more information, and that information tells you a much different story than the one about the coin being perfectly fair. Unless you are completely certain of your prior belief that the coin is perfectly fair, this new evidence should be strong enough to convince you that the coin is biased towards heads.

Of course, that doesn’t mean that the coin itself has memory! It’s simply that the more often you flip it, the more information you get. Let me rephrase that, every coin toss or dice roll tells you more about what’s likely to come up on the next toss. Even if the tosses converge to one-half heads and one-half tails, you now know with a high degree of certainty what before you had only assumed: the coin is fair.

The more you flip, the more you know! Go back up and reread Bernstein’s quote. If that’s the first thing you learned about probability theory, then instead of knowledge you we’re given a very nasty set of blinders. Astronomers spent century after long century trying to figure out how to fit their data with the incontrovertible fact that the earth was the center of the universe and all orbits were perfectly circular. If you have a prior belief that’s one-hundred-percent certain, be it about fair coins or the orbits of the planets, then no new data will change your opinion. Theory has blinded you to information. You’ve left the edifice of science and are now floating in the either of faith.


23
Nov 11

Monty Hall revisited

Chances are you’ve already heard about the Monty Hall problem. I wouldn’t be mentioning it at all, except that I keep reading descriptions of the problem that miss the absolutely critical point. For those who are new to the problem, here’s a summary:

Suppose you’re a contestant on a game show. The host, Monty Hall, shows you three numbered doors. Two of these doors hide goats, which you don’t want, and one of them hides a shiny new convertible, which you do. Pick the right door and you go home with the convertible, pick the wrong door and you get the goat (which I suspect they don’t even really give you). You make your best guess and choose a door. But before showing what’s behind it, Mr. Hall opens one of the other two doors to reveal a goat. “Now”, he asks, “do you want to stick with your original choice, or do you want to switch doors to the other one that hasn’t been opened yet?”

While you try desperately to remember the rules for conditional probability, the studio audience yells out suggestions and an attractive model smiles at you, making you wonder if you should ask if she comes with the car, but then you realize she probably gets that question all the time. Time is running out! Should you switch doors?

The correct decision, at least in terms of maximizing your chances of winning the car (but, alas, not the model), is to switch. IQ Test Grand Champion and writer Marilyn Vos Savant famously answered the question in one of her columns. Her answer, that you should switch, was widely controversial. The math behind the solution is surprisingly simple, though it rarely seems be presented in a simple way. Your first guess has a one-in-three chance of being right. That means your first guess has a two-in-three chance of being wrong. If your first guess was wrong, that means the car must be behind one of the other two doors. Since Monty just showed you the goat, the car must be behind the other door. Switch and you will get the car for sure. If you don’t switch, your chance of winning remains one-in-three. If you do switch, it jumps up to two-in-three. So ignore the studio audience and don’t get distracted by the model. Just call out the number of that other door!

But wait! Did you catch the missing assumptions needed to make this solution work? The big one, for me, is that Monty Hall will always follow the same procedure of opening up a door with a goat, regardless of what’s behind the door you picked. If you distrust Monty, you might suspect that he will only show you a goat when you’ve picked the car, in order to entice you to switch and loose the car. In that case you should stick with the door you have. Or perhaps Monty shows the goat more frequently when the car is picked first (but not all the time), in which case switching may or may not be the best strategy.

The part where I yell
The problem here is that the Monty Hall problem MAKES NO SENSE WITHOUT AN EXPLICIT PRIOR on Monty Hall’s behavior. Sorry for the yelling, but the point is too important to miss. In this case, the prior is your belief about the procedure Monty is using, and how strongly you hold that belief to be true. The notion of a “prior” might be difficult to explain to a general audience, but assuming a particular one without stating it directly is poisonous. The Monty Hall problem, like many others, can’t be turned into math without first assuming some kind of probability distribution for the inputs.

Usually, when one the distributions of an input isn’t specified, we tend to assume that every possible option has an equal chance of occurring; in other words that we have a uniform probability distribution. This makes sense for another hidden assumption in the problem — that either the game show contestant has made his first pick randomly, or that the prizes were placed behind doors randomly. Though even here I tend to agree with mathematical historian Byron Wall, who argues that our default assumption of a set of equally likely events is problematic. But in the case of the Monty Hall problem, there’s no uniform to even assume. The set of possible ways that Mr. Hall could decide to act is infinite and unknowable.

How does Hall pick between the goats?
Another hidden assumption is that Monty randomizes which door to reveal if the unpicked doors are both hiding goats. If he didn’t, and you knew for sure that Monty would always pick the door with the lower number if when he had a choice, then the math works out differently. Now, if you pick door number 1 and Monty shows you a goat behind door number 3, you know for sure car must be behind door number 2. Switching guarantees you a win! If you pick door number 1 and Monty opens door number 2, that could mean either a car or a goat is behind door number 3. To calculate your odds of winning by switching, you can use Bayes’ theorem to find the probability that a car is behind door number 3, given that Monty reveled a goat behind door 2.

Work out the math, and you should get one-half. In other words, if Monty shows you door number 2, and if he’s using the rule stated above, then switching doors gives you a one-half probability of winning, as does staying with the door you have. It doesn’t matter. No matter which door Monty reveals, switching your pick is never worse than not switching, and sometimes it’s better to switch. That means it’s what game theorists call a dominant strategy, one you would always want to employ. Even so, since Monty’s door revealing rules can change your odds of winning, this is another hidden assumption that should have been made explicit.

Back when goats were golden
When the Monty Hall problem was originally described to me, I assumed that Monty had chosen a door to reveal at random, and that this door just happened to contain a goat. Perhaps not the most reasonable assumption to make, but at the time I was still young enough to think that winning a goat might be cooler than winning some K-car convertible (hum… maybe I still believe that). At any rate, I didn’t have the skills to work out a solution under my assumptions back then, but doing it now takes just a little bit of work.

The probability that you will win after switching, given that Monty “accidentally” reveals a goat, is actually the sum of two other probabilities. The first probability, that you will win by switching if both of the other doors contain goats, is zero. The second is the probability that only one of the two others doors was hiding a goat, in which case you will win for sure, since we already assumed that Monty revealed a goat. Because we know that Monty picked a goat by accident, we gain no additional information about the door we picked or the alternative we might switch to. Each one is equally likely to have the car, so switch or not, our probability of winning is one-half.

If you find this explanation confusing, you might want to try Jeffrey Rosenthal’s explanation, which shows how to re-normalize probabilities of events within your target condition.

The Man Who Loved Only Bayes
After publishing her solution, Vos Savant was flooded with letters telling her she got it wrong. I suspect that many of those readers were ignorant of her assumptions, though Vos Savant says that most people fully understood the problem, and simply didn’t accept her solution. One of the few accounts to mention the importance of Monty Hall’s procedural rules, even though that part only comes after 8 pages of discussion, is in Paul Hoffman’s “The Man Who Loved Only Numbers”. To explain why so many people, many of them with advanced degrees, got it wrong, Hoffman quotes mathematician Andrew Vázsonyi:

“Physical scientists tend to believe in the idea that probability is attached to things. Take a coin. You know the probability of a head is one-half. Physical scientists seem to have the idea that the probability of one-half is fused with the coin. It’s a property. It’s a physical thing. But say I take that coin and toss it a hundred times and each time it comes up tails. You will say something is wrong. The coin is false. But the coin hasn’t changed. It’s the same coin that it was when I started to toss it. So why did I change my mind? Because my mind has been upgraded with information. This is the Bayesian view of probability. It took me much effort to understand that probability is a state of mind.”

I might view probability more in terms of degrees of (rational) belief, but the Vázsonyi quote highlights a key component missing in much of science: the direct recognition that you have a prior, and that this prior is a form of bias, very often baked right into the model you have chosen. There is no escape from this bias! The frequentest approach to probability is really just a special case within the world of Bayesian inference, where you have picked an uninformative (or minimally informative?) prior. But even here you have to model the prior. You have to know: how are we assuming that Monty Hall makes his decision about showing the contestant a goat? Is it based on some fixed probability regardless of which door the contestant picks? Does Monty consult the entrails of a chicken? As mentioned before, the world of possibilities is infinite, and no progress can be made in terms of our understanding until we delineate a space in which our prior beliefs will live. Only once we’ve done that, implicitly or (preferably!) explicitly, can we test out our beliefs, and update them based Monty Hall’s actions.


21
May 11

Problematic quote of the day

“Ellsberg offered several groups of people a chance to bet on drawing either a red ball or a black ball from two different urns, each holding 100 balls. Urn 1 held 50 balls of each color; the breakdown in Urn 2 was unknown. Probability theory would suggest that Urn 2 was split 50-50, for there was no basis for any other distribution.”

From page 280 of Against the Gods: The Remarkable Story of Risk.


5
Nov 10

Livin’ la Vida Poisson

Yes, I did just mix English, Spanish and French. And no, I living the “fishy” life, popular opinion to the contrary. Here’s the story. As someone who spends the majority of his time working online, with no oversight, I notice that I tend to drift a lot. I don’t play solitaire, or farm for virtual carrots, but I do wander over to Reddit more than I should, or poke around in this or that market in virtual assets to see if anything interesting has shown up. To some extent this can be justified. Many, perhaps all, of my profitable ventures have come from keeping my eyes open, poking around, doing my best to understand the digital world. On the other hand, at times I feel like I’ve been drifting aimlessly, that I’m all drift and no focus. My existing projects are gathering dust while I chase after shiny new things.

That’s the feeling, anyway. What does the evidence say? To keep track of what I was really doing, and perhaps nudge me towards more focus, I set a stopwatch to go off every 15 minutes. When it did, I would stop, write down what I was doing at that moment, and continue on. Perhaps you can see how these set intervals might provide an incentive to, shall we say, cheat? Especially right after the stopwatch chimed, I knew that whatever I did for the next few minutes was “free”, untracked. So I decided that I would have to write down everything I did during those 15 minute intervals, which worked sometimes, othertimes not so well.

My current solution? Setup a bell which chimes at random intervals, with an average time between chimes of 15 minutes. To hear what the bell sounds like, Go ahead and try it out, I think you’ll find it makes a nice sound. Go ahead and leave that page open while you read the rest of this post, see how many times it rings.

At any rate, in order to randomize how long the wait was between chimes, I used a little something called a Poisson process. Actually, what I used was the Binomial approximation to the Poisson built from multiple Bernoulli trials, which results in wait times that are Exponential. Wait! Did you get all that? If so, then skip ahead until things look interesting. Otherwise, here’s more detail about how this works:

In order to determine the length of time between chimes, my computer generates a random number number between 0 and 1. If this random number is less than 1/15, then the next chime is in just one minute. Otherwise, the computer generates another random number and adds one minute to the time between chimes. On average, it will take 15 tries to get a number below 1/15, so the average time between chimes will be 15 minutes. However, to call 15 minutes the average is somewhat misleading. Here are the frequencies of different wait times (source code in R at the end):


As you can see, the most common time between chimes is just one minute. Strange, no? What’s going on here is that each test to see if the random number is below 1/15 is a Bernoulli trial, which is basically Italian for “maybe it succeeds, maybe it fails”. In this case “success” has probability of 1/15, failure happens the other 14 out of 15 times. In cases where probability is small, and you end of doing a lot of trials, the total number of successes over a given time period will have the Poisson distribution. The “Poisson” here is a Frenchman, who may or may not have smelled like his surname, but who certainly understood The Calculus as well as anyone in the early 1800′s. To get an even better approximation of the Poisson, I could have used trails with probability of success of 1/900, then treated each failure as another second of waiting time. That would have made the graph above smoother.

But wait! I didn’t show you a graph of the Poisson. I showed you a graph of something that approximates the exponential distribution. The number of chimes per hour is (roughly) Poisson distributed, but the waiting time between each chime is exponential, which means shorter wait times are more frequent, but no length of time, no matter how long, can be ruled out. In fact, the exponential distribution is the only (continuous) distribution which is “memoryless”. If you have waited 15 minutes for a chime, your expected wait time is still…. 15 minutes. In fact, your expected wait is independent of how long you have waited so far. The exponential distribution is a “maximal entropy” distribution, entropy in this case is related to how much you know. With the exponential, no matter how long you’ve waited, you still don’t know anything more than when you started waiting.

If you’ve been tuning out and scanning this post, now would be a good time to tune back in. I promise new and interesting things ahead!

It’s one things to understand the memoryless property of the exponential, even down to the mathematical nitty-gritty. It’s quite another to actually live with the exponential. No matter how well I know the formulas, I can’t shake the felling that the longer I have waited in between bell rings, the sooner the next chime must be coming. Certainly, it should be due any time now! While I “know” that any given minute has exactly the same probably as the next to bring with it the bell, the longer I wait, the nearer I feel the the next chime must be. After all, the back of my mind insists, once the page loads the wait time has been set into stone. However it was distributed before, it’s now a constant. Every minute you wait you are getting closer to the next bell, whenever it might have been set to come. I keep wanting to know more than I did a minute ago about about when the next bell will arrive.

This isn’t the only way in which I find my psyche battling with my intellect. I would also swear that over time the distribution of short waits and long waits evens out. Now, by the law of large numbers, it’s true that the more chimes I sit through, the closer the mean wait time will approach 15 minutes. However, even if you’ve just heard three quick bells in a row, that has absolutely no bearing on how long the wait will be between the next three chimes. The expected wait times going forward are completely independent of the wait times in the past. The mean remains 15 minutes, the median remains 10.4 minutes. Yet that’s not what I feel is happening, and over the past two weeks of experimenting with this I would swear that on days when there are a number of unusually quick intervals, these have been followed, later that very the same day, with unusually long intervals. And vice versa. It feels like things are evening out.

It’s possible that when my computer wakes up from a sleep mode, my web browser doesn’t remember where it was in a countdown to refreshing the chime page. So I reload it. Now, in theory, if you “reload” an exponential wait time while in process, this has absolutely no effect on your eventual wait time until the next chime. Yet anytime I reload the page, I have a moment of doubt as to whether I’m “cheating” in some way, to make what would have been a long wait shorter. In this case, the back of my mind says the exact opposite of its previous bias: because I am reloading a page that has been waiting a long time, this means that the wait time would have been really long. By starting the process anew, I’m increasing the chances of a short chime time.

Before you call me a nut, try living for a while with the timer running the background. Keep track of what you are doing if you want (and BTW I’ve found this to be every enlightening and more than a little sad), but mostly keep track of how you feel about the timing. Try reloading the page if you don’t hear a chime for a while. How does that feel? I suspect that in some ways humans were very well hard wired to understand probabilities. Yet I also suspect our wiring hinders how we understand probability, a suspicion backed up by all those gamblers out there waiting for the lucky break that’s well overdue.

CODE:

iters = 1000
results = rep(0,iters)
for (i in 1:iters) {
	minutes = 1
	while(runif(1)>(1/15)){
		minutes = minutes + 1
	}
 
	results[i] = minutes
}
 
hist(results, breaks=40, col="blue", xlab="Minutes")

30
Aug 10

The Chosen One

Toss one hundred different balls into your basket. Shuffle them up and select one with equal probability amongst the balls. That ball you just selected, it’s special. Before you put it back, increase its weight by 1/100th. Then put it back, mix up the balls and pick again. If you do this enough, at some point there will be a consistent winner which begins to stand out.

The graph above shows the results of 1000 iterations with 20 balls (each victory increases the weight of the winner by 5%). The more balls you have, the longer it takes before a clear winner appears. Here’s the graph for 200 balls (0.5% weight boost for each victory).

As you can see, in this simulation it took about 85,000 iterations before a clear winner appeared.

I contend that as the number of iterations grows, the probability of seeing a Chosen One approaches unity, no matter how many balls you use. In other words, for any number of balls, a single one of them will eventually see its relative weight, compared to the others, diverge. Can you prove this is true?

BTW this is a good Monte Carlo simulation of the Matthew Effect (no relation).

Here is the code in R to replicate:

numbItems = 200
items = 1:numbItems
itemWeights = rep(1/numbItems,numbItems) # Start out uniform
iterations = 100000
itemHistory = rep(0,iterations)
 
for(i in 1:iterations) {
	chosen = sample(items, 1, prob=itemWeights)
	itemWeights[chosen] = itemWeights[chosen] + (itemWeights[chosen] * (1/numbItems))
	itemWeights = itemWeights / sum(itemWeights) # re-Normalze
	itemHistory[i] = chosen
}
 
plot(itemHistory, 1:iterations, pch=".", col="blue")

After many trials using a fixed large number of balls and iterations, I found that the moment of divergence was amazingly consistent. Do you get the same results?