probability


30
Jan 14

Probability Podcast

UPDATE:

RSS feed is here, let me know if you have any problems with it: http://feeds.soundcloud.com/users/soundcloud:users:18793848/sounds.rss

I’ve produced a pilot episode of a “Probability Podcast”. Please have a listen and let me know if you’d be interested in hearing more episodes. Thanks!

The different approaches of Fermat and Pascal
Pascal’s solution, which may have come first (we don’t have all of the letters between Pascal and Fermat, and the order of the letters we do have is the matter of some debate), is to start at a point where the score is even and the next point wins, then work backwards solving a series of recursive equations. To find the split at any score, you would first note that if, at a score of (x,x), the next point for either player results in a win, then the pot at (x,x) would be split evenly. The pot split for player A at (x-1,x) would be the chance of his winning the next game, times the pot amount due him at (x,x). Once you know the split in the case where player A (or B) lacks a point, you can then solve for the case where a player is down by two and so on.

Fermat took a combinatorial approach. Suppose that the winner is the first person to score N points, and that Player A has a points and Player B has b points when the game is stopped. Fermat first noted that the maximum number of games left to be played was 2N-a-b-1 (supposing both players brought their score up to N-1, and then a final game was played to determine the winner). Then Fermat calculated the number of distinct ways these 2N-a-b-1 might play out, and which ones resulted in a victory for player A or player B. Each of these combinations being equally likely, the pot should be split in proportion to the number of combinations favoring a player, divided by the total number of combinations.

To understand the two approaches to solving the problem of points I have created the diagram shown at right.

Suppose each number in parenthesis represents the score of players A and B, respectively. The current score, 3 to 2, is circled. The first person to score 4 points wins. All of the paths that could have led to the current score are shown above the point (3,2). If player A wins the next point then the game is over. If player B wins, either player can win the game by winning the next point. Squares represent games won by player A, the star means that player B would win. The dashed lines are paths that make up combinations in Fermat’s solution, even though these points would not be played out.

Pascal’s solution for the pot distribution at (3,2) would be to note that if the score were tied (3,3), then we would split the pot evenly. However, since we are at point (3,2), there is only a one-in-two chance that we will reach point (3,3), at which point there is a one-in-two chance that player A will win the game. Therefore the proportion of the pot that goes to player A is 1/2+1/2 (1/2)=3/4 whereas player B is due 1/2 (1/2)=1/4.

Fermat’s approach would be to note that there are a total of 4 paths that lead from point (3,2) to the level where a total of 7 points have been played:

(3,2)→(4,2)→(5,2)
(3,2)→(4,2)→(4,3)
(3,2)→(3,3)→(4,3)
(3,2)→(3,3)→(3,4)

Of these, 3 represent victories for player A and 1 is a victory for player B. Therefore player A should get 3/4 of the pot and player B gets 1/4 of the pot.

As you can see, both Pascal and Fermat’s solutions yield the same split. This is true for any starting point. Fermat’s approach is generally agreed to be superior, as the recursive equations of Pascal can become very complicated. By contrast, Fermat’s combinatorial method can be solved quickly using what we now call Pascal’s Triangle or its related equations. However, both approaches are important for the development of probability theory.


31
Jul 13

A probability cookbook

Randomness – Probability = Chance

Chance – Randomness = Fate

Fate + God = Predestination

Probability + Epistemology = Types of Randomness

Subjective Probability = Betting + Coherence

Propensity theory = Probability + Animism

Kolmogorov Axioms = Probability – randomness – chance

Probability + Complexity = Cryptography

Chaos + Ignorance = Randomness

Regression: Data = Signal + Noise

Bayesian:
Posterior = Prior  \times Likelihood
Prior + Data  \rightarrow Probability

Probabilitst:
Probability  \rightarrow Frequency

Statistical:
Frequency  \rightarrow Probability

Big Data:
Predictive value  \gg Model simplicity
High dimensions + Fast computers = De chao ordo


1
Jul 13

Morality needs probability, manifesto addendum

Just added to my Big Bright Green Manifesto Machine. You might need to read this through a couple times; it’s a difficult concept since it lives in a collective blind spot for us:

Doing ethics without probability is like performing surgery with a wooden spoon — it’s a blunt instrument capable of only the most basic operations, and more likely to kill the patient than heal them. Implicitly, we understand this need for probability in making ethical judgements, yet most people recoil when the calculus of probabilities is made explicit, because it seems cold, because the math frightens and confuses them, or because letting odds remain unestimated and unacknowledged allows people to confuse positive outcomes with moral behavior, sweeping hidden risks under the rug when things go well, or claiming ignorance when they don’t. It’s time to acknowledge — directly, explicitly, mathematically — that morality needs probability. For ethics to move forward it must be integrated with our knowledge of randomness and partial entailment.

Here’s an example of how we already take probability into account implicitly. If we retrieve our lost ball from someone’s yard without asking first, we justify this based on our belief that the owner is more likely to be bothered by us interrupting their dinner, than by our temporary trespass on their lawn. The greater the probability of great harm, the higher the level of certainty we demand. Our most heated debates involve situations where the probability of harm from both action and inaction is high. If someone’s dog is stuck in a hot car on a sunny day, should you break in and try to save it? Does the chance of a dog dying of heatstroke justify a forced entry that will probably result in expensive damage and an irate owner (though it’s possible they would be grateful instead). If you decide to break in, how long should you wait first? What prior distribution should you put on the owner’s return time, and how do you update your prior as time goes by? If the waiting time is chi-square on low degrees of freedom, your concern for the dog might be unjustified. If it follows the unreliable friend distribution, you may be that dog’s only hope.

As I hope is becoming clear, questions of morality cannot be resolved without asking questions about probability. If the example above seems trivial (perhaps the owner’s property rights trump your concern for a dog), then substitute the animal for a toddler who looks uncomfortably warm. Now how long do you wait, and how do you deal with the risk that smashing a window might harm the child?


21
May 13

What are the chances this headline will still be true in 10 years?

In this post I’ll be discussing the ideas presented in The Half-Life of Facts, by Samuel Arbesman. The book argues that facts, which we often take to be iron-clad, unchanging laws of the universe, are regularly discovered to be false or replaced by updated versions. He argues that while it’s impossible to predict in advance how long a particular fact will endure, in aggregate truth values decay at stable rates. In effect, Arbesman is proposing a kind of Law of Large Numbers for belief.

Arebesman’s thesis, I should say right up front, is highly appealing to me. It fits my belief that all facts are, to some extent, fuzzy, uncertain, contingent, and most importantly prone to revision over time as new information comes in. Of course, some facts, or categories of facts, are more likely to be revised than others. What I hoped to get from Arbesman’s book was a deep analysis of why some facts (or fictions) last longer than others, and how you might quantify different categories of facts from the viewpoint of survival analysis.

What are facts?
Arbesman defines facts as “individual states of knowledge awareness.” His main way of subdividing facts is on the basis of how quickly they change, from those constantly in flux (the current weather) to the very stable (the number of continents). In between are what Arbesman calls “mesofacts,” those which change at an intermediate timescale. Most of our scientific knowledge fits in this category.

When I mentioned the continents, you may have wondered whether I was referring to the number of huge landmasses on earth (a slow-changing fact, by any measure), or what we consider to be a continent. For example, if scientists decide that Madagascar or Baffin Island should be called a continent, the quantity of large land-masses on earth hasn’t changed.

Rauncho, the thirst mutilator!
This may seem like an obvious distinction, but it’s one that Arbesman fails to make. He conflates facts about the earth with nomenclature, confusing words with objects. The worst example of this confusion occurs in the chapter on how facts spread. Arbesman explains how we came to use the word “brontosaurous” for what, by scientific convention, should be call the apatosaurus, as this name came first. Here the “fact” that changed doesn’t really have anything to do with the nature of dinosaurs, it has to do with the name we’ve decided to give it (which is, of course, a matter of convention, and arbitrary). To Arbesman, though, this issue of nomenclature becomes an “erroneous” fact which has “sadly” persisted for way to long.

The conflation of semantics and understanding allows Arbesman to hide a normative decree in a linguistic assessment. If my explanation of the confusion between the descriptive and prescriptive is, itself, confusing, consider Mike Judge’s wonderful illustration from the film Idiocracy. The main character tries to explain to the people of the future that their plants are dying because they are being irrigated with Rauncho, a sport drink. Here’s how their conversation goes:

Arbesman’s failure to draw a line around what are facts, and what aren’t, leads to even deeper confusions. Making this distinction clear would be, no doubt, a very difficult task. But instead of attempting it, and risking falling into “an epistemological rabbit hole,” Arbesman’s shrugs and paraphrases the supremely weasely Supreme Court Justice Potter, who said that no precise, legal definition of pornography was needed, because “I know it when I see it.”

Without a line (however fuzzy) drawn around his subject, Arbesman quickly wanders off from an insightful discussion of the decay rate of information in physics, medicine and scientific models in general, to a broad discussion of the things in our world that change. This transition is completed in the chapter titled “Moore’s law of everything,” in which Arbesman compares exponential growth in computing power to other technologies with accelerating levels of change, like transportation. At this point it’s no longer clear which are the facts under consideration. Is it the maximum number of transistors per chip? Is it our model of how technology changes? Or is it the rate of change of change itself?

Is change a constant?
This last question might be the most interesting one of all. More clearly stated, what is the derivative of the half-life of facts, for a given category? And even one more step beyond, are these derivatives themselves stable? I want to know what the evidence says. Are medical facts becoming obsolete faster than ever? Has our knowledge about basic physical concepts like inertia begun to solidify? Arbesman hints at these questions, but just barely. I was very disappointed by his lack of rigor and quantification. Perhaps this field of study still needs it’s Darwin or John Graunt, someone willing to spend years or decades compiling and analyzing the minutia how facts change, before coming up with a well-informed model of truth decay.

My own suspicion? The stability of a fact is proportional to how well the related field of study is established, and to how long that particular fact has been considered valid. Thus the lifespan of facts would be Weibull distributed, or have some variant of the Unreliable Friend distribution (more about that in a future post). Arbesman hints at this possibility when discussing the history of mathematical proof. He notes that the waiting time for a conjecture to be settled follows a heavy-tailed distribution, which makes it difficult to predict how much longer it will take for mathematicians to come to a conclusion about long-standing problems.

But even this attempt at a more nuanced view of half-lives hints at another problem with Arbesman’s incomplete taxonomy of facts, and his unwillingness to specify which facts we are discussing. In this case of mathematics, it seems at first that he might be referring to the underlying proposition itself. This leads me to wonder if Arbesman is positing (at least implicitly) a Schrödinger’s cat view of the mathematics, where Fermat’s Last Theorem (FLT) exists in a state of superposition, both true and false and indeterminate all at once, waiting for Andrew Weil to come along to open the lid, peer into the box, and declare it “true.” Another interpretation is that the fact being discussed is the social phenomenon; mathematicians went from believing that FLT was probably true but definitely unproven, to believing that FLT was indisputably true. Based on his initial definition in terms of awareness, I assume it’s the later. Unfortunately, no clarification is forthcoming, and Arbesman misses out on an opportunity to comment on the two most interesting twists in the FLT saga, especially from the point of view of evaluating “facts”. For one, Weil made a crucial mistake in his first official version of the proof, and for the other, Weil’s proof depends on a newer, and somewhat controversial, mathematical assumption (the Axiom of Choice).

Chart from The Half-Life of Facts showing the increase in transportation speeds over time.

The depths of shallowness
I suppose there’s a limit to how much depth we can expect from a general interest book. Still, I’m disappointed that the author seems to explicitly avoids discussing the basic, hard puzzles of knowledge: How close to the (real?) truth are the “facts” we are learning today? What is the probability that these will be later found out to be untrue? Does that probability go to one on a long enough timeline, and to what extent can we quantify that timeline.

Instead of rigorous analysis, Arbesman fills out his short book by rehashing famous stories from well-known research papers (if I have to read about the gorilla on the basketball court one more time, I just might go apeshit). We do get occasional bits of insight, usually in the form of quotes, like Lord Kelvin’s insistence that anything that can be measured, can be measured incorrectly, or John M. Smith’s quip that “Statistics is the science that lets you do twenty experiments a year and publish one false result in Nature.”

This last quote refers to the p-value, which Arbesman does a decent job of explaining, though I’m+ not sure he fully understands it. He quotes John Ioannidis saying that, “If a study is small, it can yield a positive result more easily due to random chance.” However, the wse a fixed p-vale cutoff generally ensures that the exact oppose is true (see this delightfully humorous video about “The power of the test”). The structure of hypothesis testing can be tricky, but since Arbesman is described on the book jacket as an applied mathematician, I’m not willing to grade him on a curve.

There’s one other confusion in Arbesman’s book that I feel compelled to point out, since it may just be the most insidious (and common) epistemological mistake of all: the conflation of facts, predictions, and models. Arbesman mixes them all together in a short passage. In describing computer simulation of a social network, he says:

“When [the researchers] ran this experiment, they discovered that weak ties aren’t that important to spreading knowledge. While weak ties do in fact hold the network together, much as Granovetter suspected, they aren’t integral for spreading facts.”

Did you catch that? Arbesman went from describing a model (in this case a computer simulation) that generated a prediction (about the spread of information), to asserting a fact about our world (weak ties “aren’t integral for spreading facts”).

Am I just being annoying, noxious, always lingering?
Am I’m being overly fussy (to use the nicer word)? Am I too focused on precise definitions and picky distinctions, at the cost of missing the bigger picture? I don’t think so. The history of scientific progress, and in particular statistics, shows a strong correlation between linguistic and taxonomic advances. We can look back and see how progress is stifled by a lack of common, well-defined terms. For example, some of the early attempts to understand probability disintegrated into confused debates that could have been avoided with a clear stating of terms. More recently, E.T. Jaynes resolved Bertrand Russell’s paradox of the random chord by explicitly defining the characteristics a “random” chord would need to have.

If Arbesman is sloppy with the details, can he at least get credit for presenting the broader story in context? To some extent, I think so. As a general tour of how facts change, there’s no mistaking the basic message: facts do change, and we can be particularly blind (or caught off guard) when it comes to changes which happen at a medium pace. I wish, though, that Arbesman had explicitly connected this broader story with what is, to me, the central lesson: all of our beliefs should come with a measure of doubt!

To understand this doubt mathematically, we use probability theory. To understand it in practice, we use a framework for statistical inference. There are a number of these frameworks available, each with it’s own strengths and weaknesses. Hume said we could never infer anything from anything, giving us a kind of historical “null hyothesis” of inference, one that’s been soundly rejected by the evidence of scientific and technological progress. Fisher and von Mises maintained that probability should be restricted to long term frequencies. Keynes and Jefferies spoke of subjective probabilities and degrees of rational belief. Jaynes viewed probability theory as an extension of logical deduction.

All modern approaches to inference share the assumption that knowledge is not static, and that empirical evidence provides partial information. Full certainty, to the extent that it exists at all, is to be found only in the very long run (mathematically speaking, at the infinite limit). As such, we need to recognize the provisional nature of all facts.


5
Mar 13

My favorite randomization device

My recent look at JavaScript as a contender for statistical modeling got me thinking about the different methods used to create random variates. All computers algorithms create Type 1 randomness, which is to say, completely deterministic once you either figure out the underlying algorithm or once you see every number in the algorithm’s period. Jumping outside of software to the hard world around us, it seems possible to create Type 2 or even Type 3 randomness, at least from perspective of an observer who can’t base their predictions on real-time analysis of the generating mechanism (ie, they can’t watch it tick).

My favorite example of a real-world solution to randomizing is shown in the video at top. More details about the construction of the device are here.

What’s your favorite (hardware or virtual) randomization device?