Mar 16

Probability Podcast Ep2: Imprecise probabilities with Gert de Cooman


I happened to be travelling through Brussels, so I stopped by Ghent, the world hotspot for research into imprecise probabilities, and setup an interview with Gert de Cooman. Gert has been working in imprecise probabilities for more than twenty years, is a founding member and former President of SIPTA, the Society for Imprecise Probability: Theories and Applications, and has helped organize many of the ISIPTA conferences and SIPTA Schools.

Topics include fair betting rates, Dutch books, Monte Carlo methods, Markov chains, utility, and the foundations of probability theory. We had a rich, wide-ranging discussion. You may need to listen two (or more!) times to process everything.

Episode on SoundCloud

Jan 14

Probability Podcast

I’ve produced a pilot episode of a “Probability Podcast”. Please have a listen and let me know if you’d be interested in hearing more episodes. Thanks!

The different approaches of Fermat and Pascal
Pascal’s solution, which may have come first (we don’t have all of the letters between Pascal and Fermat, and the order of the letters we do have is the matter of some debate), is to start at a point where the score is even and the next point wins, then work backwards solving a series of recursive equations. To find the split at any score, you would first note that if, at a score of (x,x), the next point for either player results in a win, then the pot at (x,x) would be split evenly. The pot split for player A at (x-1,x) would be the chance of his winning the next game, times the pot amount due him at (x,x). Once you know the split in the case where player A (or B) lacks a point, you can then solve for the case where a player is down by two and so on.

Fermat took a combinatorial approach. Suppose that the winner is the first person to score N points, and that Player A has a points and Player B has b points when the game is stopped. Fermat first noted that the maximum number of games left to be played was 2N-a-b-1 (supposing both players brought their score up to N-1, and then a final game was played to determine the winner). Then Fermat calculated the number of distinct ways these 2N-a-b-1 might play out, and which ones resulted in a victory for player A or player B. Each of these combinations being equally likely, the pot should be split in proportion to the number of combinations favoring a player, divided by the total number of combinations.

To understand the two approaches to solving the problem of points I have created the diagram shown at right.

Suppose each number in parenthesis represents the score of players A and B, respectively. The current score, 3 to 2, is circled. The first person to score 4 points wins. All of the paths that could have led to the current score are shown above the point (3,2). If player A wins the next point then the game is over. If player B wins, either player can win the game by winning the next point. Squares represent games won by player A, the star means that player B would win. The dashed lines are paths that make up combinations in Fermat’s solution, even though these points would not be played out.

Pascal’s solution for the pot distribution at (3,2) would be to note that if the score were tied (3,3), then we would split the pot evenly. However, since we are at point (3,2), there is only a one-in-two chance that we will reach point (3,3), at which point there is a one-in-two chance that player A will win the game. Therefore the proportion of the pot that goes to player A is 1/2+1/2 (1/2)=3/4 whereas player B is due 1/2 (1/2)=1/4.

Fermat’s approach would be to note that there are a total of 4 paths that lead from point (3,2) to the level where a total of 7 points have been played:


Of these, 3 represent victories for player A and 1 is a victory for player B. Therefore player A should get 3/4 of the pot and player B gets 1/4 of the pot.

As you can see, both Pascal and Fermat’s solutions yield the same split. This is true for any starting point. Fermat’s approach is generally agreed to be superior, as the recursive equations of Pascal can become very complicated. By contrast, Fermat’s combinatorial method can be solved quickly using what we now call Pascal’s Triangle or its related equations. However, both approaches are important for the development of probability theory.

Dec 13

Prize for statistics students?

In order to promote work on statistical simulations, as well as thinking about deeper issues in data analysis, I’m considering starting a prize for students.

Here are my ideas:

* One prize would be for the most innovative use of Monte Carlo methods to model a problem in pure or applied statistics. This prize would be offered in two divisions: undergraduate and graduate.

* One prize would be for an essay that explores the foundations of probability theory or statistics with an emphasis on epistemological issues. This would be open to all students.

* Prizes would be in the $3,000 – $6,000 range.

* The judging committee would be drawn from professors, students and industry.

What are your thoughts? Specifically:

* If you’re a student, is this something you’d apply for?

* If you’re a professor or instructor, do you think your students would be interested in this? Would you pass along the information to them?

* If you represent a company, could you see advantages to sponsoring one of the prizes?

* What changes or suggestions do you have?

Oct 13

The disgrace of the mandatory census

In 2011, Audrey Tobias refused to provide Statistics Canada with a filled out copy of her census form, as mandated by law. Her decision, and her decision to stand by that decision, led to a trial in which the 89-year-old faced jail time. Although Tobias stated that her act was protest against the use of US military contractor Lockheed Martin to process the forms, and not against the mandatory nature of the census itself, this was really a trial of the government’s power to compel citizens to provide it with private information. As Tobias’ lawyer, Peter Rosenthal, argued, compelling Tobias to fill out the form on threat of jail was a violation of the Canadian Charter of Rights, and its provisions for freedom of conscience and expression.

The judge in the case, Ramez Khawly, rejected Rosenthal’s argument, but found a way to find Tobias not guilty anyway on the basis of his doubt about her intent in not filling out the form. Perhaps sensing the outrage that might ensue over punishing an octogenarian for a non-violent act of civil disobedience, Khawly was nevertheless too fearful, or obtuse, to uphold an argument that would set a highly inconvenient precedent from the standpoint of the state. The judge both justified and exposed his particular mix of cowardice and compassion by asking, “Could they [the Crown] not have found a more palatable profile to prosecute as a test case?”

I suppose I shouldn’t be surprised by the judge’s politically expedient decision. What shocks me is the reaction of many regular citizens, and in particular of some fellow statisticians. Let me be as clear as possible about this: support for the mandatory census is a moral abomination and a professional disgrace. It should go without saying that informed consent is a baseline, a bare minimum for morality when conducting experiments with human subjects. Forcing citizens to divulge information they would otherwise wish to keep private, on pain of throwing them in a locked cage, does not qualify as informed consent!

There is no point here in arguing that what’s being requested is a minor inconvenience, or an inconsequential imposition. Informed consent doesn’t mean “what we think you should consent to.” More than anything else, statistics is about understanding the inherent uncertainties in measurement, prediction, and extrapolation. Just because you might not object to answering certain questions, gives no reason to assume the universality of your preferences. Finally, note that to at least a small group of revolutionaries, the right not to divulge certain information to authorities was so important that it was written right into the Bill of Rights.

Besides the argument that the census in minimally invasive, I’ve also heard it argued that the value of obtaining complete data outweighs concerns of privacy and choice. To this I say that our desire, as statisticians, for complete and reliable data, isn’t some ethical trump card, nor is it the scientific version of a religious indulgence that purifies our transgressions.

Dealing with incomplete and imprecise data isn’t some unique problem that can be overcome at the point of a gun, it’s the very heart and soul of statistics! In the real world, there is no such thing as indisputably complete or infinity precise data. That’s why we have confidence intervals, likelihood estimates, rules for data cleaning, and a wide variety of sampling procedures. In fact, these sampling procedures, if properly chosen and well executed, can be more accurate than a census.

I call on all those who work for StatsCan or other organizations to refuse to participate in any non-consensual surveys, to stand up for their own good name and the good name of the profession, and to focus their energies on finding creative, scientifically sound, non-coercive ways to obtain high quality data.

Jul 13

A probability cookbook

Randomness – Probability = Chance

Chance – Randomness = Fate

Fate + God = Predestination

Probability + Epistemology = Types of Randomness

Subjective Probability = Betting + Coherence

Propensity theory = Probability + Animism

Kolmogorov Axioms = Probability – randomness – chance

Probability + Complexity = Cryptography

Chaos + Ignorance = Randomness

Regression: Data = Signal + Noise

Posterior = Prior  \times Likelihood
Prior + Data  \rightarrow Probability

Probability  \rightarrow Frequency

Frequency  \rightarrow Probability

Big Data:
Predictive value  \gg Model simplicity
High dimensions + Fast computers = De chao ordo