- Larry Wasserman, Professor at Carnegie Mellon University, is a graduate of University of Toronto, a COPSS Award winner, and a leading statistician in Bayesian analysis and inference. In this post, he discusses his views on the question
*Is Bayesian Inference a Religion?* - Two people will each spend 15 consecutive minutes in a bar between 12:00pm – 1:00pm. Assuming uniform and independent arrival times, what is the probability that they will have a chance to clink glasses?
- Have you ever wondered which statistical package gives the fastest computational speeds? This quick comparison of Julia, Python, R and pqR provides some guidence.
- An interesting analysis of the most popular porn searches in the US.
- A quiz for everyone in the data visualization industry: Identify at least three problems with this chart and explain what you can do to make it better.
- R user groups continue to thrive worldwide. Joseph Rickert from Revolution Analytics prepares the following compilation of the locations of 127 R user groups around the world.

## r

9

Sep 13

## The week in stats (Sept. 9th edition)

3

Jul 13

## The hat trick

In his book Quantum Computing Since Democritus, Scott Aaronson poses the following question:

Suppose that you’re at a party where every guest is given a hat as they walk in. Each hat has either a pineapple or a watermelon on top, picked at random with equal probability. The guests don’t get to see the fruit on their own hats, but they can see all of the other hats. At no point in the evening can they communicate about what’s on their heads. At midnight, each person predicts the fruit on their own hat, simultaneously. If more than 50% of the guests get the correct answer, they’re given new Tesla cars. If less than 50% of the guests get it right, they’re given anxious goats to take care of. What strategy (if any) can they use to maximize their chances of winning the cars?

Answer: there is no strategy that works.

Kidding! Of course there’s a strategy, as you can tell by the length of this post. Did you come up with any ideas? At first glance, it seems like the problem has no solution. If you can’t communicate with the other party goers, how can you find out any information about the fruit on your own head? Since each person was independently given a pineapple or a watermelon with equal probability, what they have on *their* heads tells you nothing about what you have on *your* head, right?

My own initial strategy, after considerable (but not enough!) thought, was to bet on regression to the mean. Suppose you see 7 pineapples and 2 watermelons. The process of handing out hats is more likely to generate a pineapple/watermelon ratio of 7 to 3 than 8 to 2 (it’s *most* likely to generate an equal number of each type, with every step away from a 5/5 ratio less and less likely). Thus, I figured it would be best to vote that my own hat moved the group closer towards the mean. Following my strategy, we all ended up with goats. What did I do wrong?

The key to solving this problem is to realize that the initial process for handing out hats is irrelevant. All that matters is that, from the perspective of a given person, they are a random sampling of 1 from a distribution that is *known* to have either 7 pineapples and 3 watermelons, or 8 pineapples and 2 watermelons. Thus, each person knows that the probability a randomly sampled guest will have a pineapple on their head is somewhere between 70% and 80%. More precisely, it’s *either* 70% or 80%. In any case, so long as every person votes for themselves being in the majority, then the majority of guests will be voting that they are in the majority.

I simulated this strategy using parties of different sizes, all of them odd (to avoid the issue of having and equal number of each hat type). Here’s the plot, with each point representing the mean winning percentage with 500 trials for each group size. As always, you can find my code at the end of the post.

As you can tell from the chart, once we have 11 or more guests, it’s highly likely that we all win Teslas.

One way to look at this problem is through the lens of the anthropic principle. That is, we need to take into account how what we observe gives us information about ourselves, irrespective of the original process that made each of our hats what they are. What matters is that from the perspective of each party goer, their view comprises a random sampling from the particular, finite distribution of pineapples and watermelons that was set in stone once everyone had entered the room. In other words, even if the original probably of getting a pineapple was 99%, if you see more watermelons than pineapples, that’s what you should vote for.

This problem, by the way, is related to Condorcet’s Jury Theory (featured on the most recent episode of Erik Seligman’s Math Mutation podcast). Condorcet showed, using the properties of the binomial distribution, that if each juror has a better than 50% chance of voting in accordance with the true nature of the defendant, then the more jurors you add, the more likely the majority vote will be correct. And vice versa. Condorcet assumed independence, which we don’t have because our strategy ensures that every person will vote the same way, so long as the difference between types of hats is more than 2.

# Code by Matt Asher for StatisticsBlog.com # Feel free to modify and redistribute, but please keep this header set.seed(101) iters = 500 numbPeople = seq(1, 41, 2) wins = rep(0, length(numbPeople)) cntr = 1 for(n in numbPeople) { for(i in 1:iters) { goodGuesses = 0 hats = sample(c(-1,1), n, replace = T) disc = sum(hats) for(h in 1:n) { personHas = hats[h] # Cast a vote based on what this person sees personSees = disc - personHas # In case of a tie, the person chooses randomly. if(personSees == 0) { personSees = sample(c(-1,1),1) } personBelievesHeHas = sign(personSees) if(personBelievesHeHas == personHas) { goodGuesses = goodGuesses + 1 break } } if(goodGuesses > .5) { # We win the cars, wooo-hooo! wins[cntr] = wins[cntr] + 1 } } cntr = cntr + 1 } winningPercents = wins/iters plot(numbPeople, winningPercents, col="blue", pch=20, xlab="Number of people", ylab="Probability that the majority votes correctly")

30

May 13

## Uncovering the Unreliable Friend Distribution

Head down to your local hardware store and pick up a smoke detector. Pop off the cover and look inside. You’ll see a label that mentions Americium 241, a radioactive isotope. Put on your HEV suit, grab a pair of tweezers and a fine-tipped pen, and remove the 0.3 millionths of a gram of Americium. If you need reading glasses, now might be a good time to put them on. Pick out one of atoms and label it with an X. Now watch closely. Sooner or later, it will spit out an Alpha particle.

Just how long will you have to wait? Decay rates are measured in half-lives, which is the amount of time needed for half of the particles to decay (any particular atom has a 1/2 chance of decaying in this time as well). The stated half-life for this isotope is 432 years, and your waiting time will follow an exponential distribution. The strange, oddly beguiling quality about this distribution is that the conditional probabilities remains constant. In other words, no matter how long you’ve waited, there’s still a 1 in 2 chance that your Americium isotope will decay in the next 432 years. Waiting for an exponentially distributed event to happen leads to an odd feeling, at least for me. The longer you wait, the more you “expect” the event to happen soon, even knowing that your expected wait time never changes. I wrote about that feeling previously, and created an exponential timer you can try out for yourself. I would suggest setting it to less than 432 years.

**Cranking uncertainty up to 11**

Recently, as I waited patiently for my own particle of Americium to give up its Alpha, I got to thinking about conditional uncertainty. No matter how long we wait for our event, we never get any smarter about when it will happen. But we don’t get any dumber, either. Would it be possible, I wondered, to build a kind of “super-exponential” distribution, where the longer we wait, the less we know. In other words, can we take our level of uncertainty up to 11?

Imagine the following scenario: first we sample from a standard uniform distribution, which gives us a number somewhere between 0 and 1. Call this number . Then we take (without looking at it!), and plug it into the exponential distribution as the parameter . This gives us a random variable with a mean waiting time of for the first occurrence. (Note that this mean *isn’t* the same as the half-life, which is actually the median. To convert from mean to half-life, multiply by the natural log of 2).

My prediction was that this method would increase the *overall* level of uncertainty about our waiting time, and, even worse, make our uncertainty grow over time. Why? The longer we’ve waited, I figured, the more likely our (presumed) will be small, which in turn means the expectation and variance of our exponential waiting time grow, widening our confidence intervals.

At this point, I had the vague feeling that this probability distribution should already exist as a known thing, that it may even be a version of something I’ve encountered before. Another way to look at the exponential is in terms of the failure rate, or, conversely, the survival rate. When Ed Norton, the un-named narrator of Fight Club (I know, I know), says that “on a long enough timeline, the survival rate for everyone drops to zero,” this is what he means. Only Norton is referring to the cumulative survival rate, whereas it’s usually most interesting to look at the instantaneous (or marginal, for my economist friends) rate. For the exponential this rate is constant, ie flat. There is a distribution specifically crafted to let you simulate failure rates when the rate itself is variable, it’s called the Weibull. It can be used to model products whose expected durability increases with time (note that we are not saying the product becomes more durable over time, but that the fact that it *has* survived tells us that it is highly durable). Did I just rediscover the Weibull, or one it’s friends in the same family of extreme distributions?

Before breaking out my great big Compendium of Probability Distributions, I dove right in with a quick Monte Carlo simulation. As with all my posts using R, you’ll find the code at the end of this post.

**A wave of plots**

Here’s the histogram for our sample, with the rightmost tail chopped off (because your screen, unlike mine, is just too damn small):

So it looks like a variant of the exponential, but this plot doesn’t tell us much. To really understand the distribution we have to see it as if we were inside the distribution, waiting for the event to happen. All we know is the process, and we have to come up with a guess about our distribution curve conditional on how long we’ve waited so far. In order to understand this curve, we first need to make a guess about , which is to say . Can we put a probability distribution on given how long we’ve been waiting so far? Yes, we most certainly can! And, because our prior distribution on is uniform (of course), our posterior is our likelihood. Here’s what our (posterior) curves look like:

Each curve is a probability distribution on our belief about . In other words, the peaks represent what we believe to be the *most likely* value for , given how long we’ve waited so far. The biggest curve is our distribution for after waiting for one unit of time (let’s just call them “minutes”). As you can tell, if we continue to wait, our maximum likelihood estimate (MLE) for shifts left, and it looks like our curve flattens out. But wait! Each of these curves has a different area. To treat them like a true probability distribution, we should normalize each of the areas to one. Here’s what those same curves look like after normalization:

From this handsome chart (the same one from the beginning of the post), we can tell that expected range of values for is narrowing, not broadening. So could our uncertainty be decreasing along with our wait, as we hone in on the true value of ? Let’s take a look at what happens to our additional wait time as time passes.

You can think of these curves as the chance that your friend will show up in the coming minutes, given how long you’ve already been waiting. At the very beginning of your wait, modeled by the orange curve at the far left, you can be almost certain that your friend will show up in the next 10 minutes. But by the time you’ve been waiting for 500 minutes, as seen in the blue curve at the far right, you are only 50% sure that she will show up in the next 500 minutes. Are those probabilities exact? It seems like it, but let’s zoom in on the first 25 minutes:

The X’s represent the median time for your friend’s arrival. If this was always equal to your wait time so far, all of the X’s would be in a straight line at 0.5. From this plot, it’s clear that this is not the case from the beginning, but only becomes so as you wait longer. So what have we got here? At this point I’m at the limit of what I can get out of Monte Carlo. It’s time to do math! (or not, feel free to skip this next section).

**The formula**

To get the pdf for this distribtion, I start by noting that if we had two possible choices for with a one-half chance each of being picked, then the probability our waiting time would be less than would be:

where is the cumulative distribution function (CDF) of the exponential distribution with parameter . If you really know your exponential, you may have noticed some similarities with the hyperexponential distribution, but we’re gonna take it to the limit, and create a kind of hyper-hyperexponential. More generally, for a sample of :

Since the are uniformly distributed, the more of them we sample, the more our order statistics are going to look like where our sample size is n (proof is left as an exercise for you, my dear reader).

Ready to take it to the limit?

Solving this integral, we get:

**Did we get it right?**

Maybe you trust my math, maybe you don’t and skimmed over the last section. Either way, let’s see how well the math matches the data. Here I’ve plotted the log of the observed (Monte Carlo) density versus what the math says it should be:

Looks like we nailed it, no? But wait, why are the blue points at the beginning of the curve in between the red points? That’s because we took the differences between points on the empirical CDF, so each density reading is really in-between the true pdf values. So far as Monte Carlo confirmation goes, it doesn’t get much better than this.

**Introducing, the Unreliable Friend Distribution!**

So far as I can tell, other than the hyperexponential, which is merely similar and more limited, this is a brand new distribution. Have you ever been waiting for someone, and the more they make you wait, the more you suspect they’ve forgotten about you completely? In that person’s honor, I’m calling this the Unreliable Friend Distribution (UFD).

As seems appropriate for such a distribution, the expected wait time for the UFD is infinite. Which means that no matter how late your unreliable friend shows up, you should be grateful that he came early.

The code:

# Code by Matt Asher for StatisticsBlog.com # Feel free to modify and redistribute, but please keep this header set.seed(943) #I remembered this time! # Initial MC sampling trials = 10^7 results = rexp(trials, runif(trials)) # Plot of liklihood curves for U based on waiting time # colr = sample(colours(), 1000, replace=T) lik = function(p, t){ return((1 - p)^(t-1)*p) } # x-values to plot p = seq(0,1,0.0001) # Waiting times t = 1:20 dataMatrix = matrix(nrow=length(t), ncol=length(p)) for(i in t) { dataMatrix[i,]=lik(p,rep(i+1,length(p))) } plot(p, dataMatrix[1,], col=colr[1], pch=".", cex=3, bty="n" ) for(i in 2:max(t)) { points(p, dataMatrix[i,], col=colr[i], pch=".", cex=3) } # Let's standardize the area of each curve standardMatrix = dataMatrix/rowSums(dataMatrix) plot(p, standardMatrix[1,], col=colr[1], pch=".", cex=3, bty="n", ylim=c(0,max(standardMatrix))) for(i in 2:max(t)) { points(p, standardMatrix[i,], col=colr[i], pch=".", cex=3) } # Find wait time curves conditional on having waited t minutes # We need tail probabilities, let's find them! t = 0:1000 tailP = rep(0,max(t)) for(i in t) { tailP[(1+i)] = length(results[results>i])/trials } show = seq(1,25,1) # Blank Plot plot(0,0,col="white", xlim = c(0,2*max(show)), ylim = c(0, 1), ylab="Probability that your friend will have shown up", xlab="Time") for(i in show) { # Normalizing the probabilies so that tailP[i] = 1 tmp = tailP[(i+1):(max(t)+1)] tmp = tmp * 1/tmp[1] tmp = 1-tmp print(length(tmp[tmp<.5])) # par(new = TRUE) lines(i:(max(t)), tmp, col=sample(colours(), 1), lwd=3) # Find the index of the closest tmp to tmp[i] xloc = which.min(abs(tmp[i] - tmp)) # Put a point where we cross time 2t on the curve points(i+xloc-1, tmp[i], pch=4, col="black", cex=2, lwd=3) } plot(0,0,col="white", xlim = c(0,100), ylim = c(0, 0.25)) t = 1:20 tmp = results[results<quantile(results, .99)] for (i in t){ par(new = TRUE) plot(density(tmp[tmp>i]), xlim = c(0,100), ylim = c(0, 0.25), col=colr[i], cex=3) } tpdf = function(x) { toReturn = (-x*exp(-x)+1-exp(-x))/x^2 return(toReturn) } tF = function(x) { toReturn = (exp(-x) + x - 1)/x return(toReturn) } lengths = rep(0,1000) for(i in 0:1000) { lengths[(i+1)] = length(results[results>i]) } empericalF = 1 - (lengths/trials) empericalf = diff(empericalF) # Because this the the perfect size for the dots! plot(log(tpdf(1:1000)), col=rgb(0,0,1,.2), pch=20, cex=1.3728, xlab="Wait time", ylab="Log of density") points(log(empericalf), col=rgb(1,0,0,.2), pch=20, cex=1.3728)

11

Apr 13

## High Obesity levels found among fat-tailed distributions

In my never ending quest to find the perfect measure of tail fatness, I ran across this recent paper by Cooke, Nieboer, and Misiewicz. They created a measure called the “Obesity index.” Here’s how it works:

- Step 1: Sample four times from a distribution. The sample points should be independent and identically distributed (did your mind just say “IID”?)
- Step 2: Sort the points from lowest to highest (that’s right, order statistics)
- Step 3: Test whether the sum of the smallest and greatest number is larger than the sum of the two middle.

The Obesity index is the probability that the sum of these end points is larger than the sum of the middle numbers. In mathy symbols:

The graph at the top of this post shows how the Obesity index converges for different distributions. As always, I’ve included my R code at the end of this article, so you can run this simulation for yourself (though, as usual, I forgot to set a random seed so that you can run it *exactly* like I did).

The dots in the graph represent the mean results from 8, 16, 32, and so on, up to 4096 trials from each of the distributions I tested. Note that each trial involves taking 4 sample points. Confused? Think of it this way: each sample of 4 points gives us one Bernoulli trial from a single distribution, which returns a 0 or 1. Find the average result after doing 4096 of these trials, and you get one of the colored dots at the far right of the graph. For example, the red dots are averages from a Uniform distribution. The more trials you do, the closer results from the Uniform will cluster around 0.5, which is the “true” Obesity value for this distribution. The Uniform distribution is, not coincidentally, symmetric. For symmetric distributions like the Normal, we only consider positive values.

The graph gives a feel for how many trials would be needed to distinguish between different distributions based on their Obesity index. I’ve done it this way as part of my Grand Master Plan to map every possible distribution based on how it performs in a variety of tail indices. Apparently the Obesity index can be used to estimate quantiles; I haven’t done this yet.

My initial impressions of this measure (and these are very initial!) are mixed. With a large enough number of trials, it does a good job of ordering distributions in a way that seems intuitively correct. On the other hand, I’d like to see a greater distance between the Uniform and Beta(0.01, 0.01) distribution, as the latter is an extreme case of small tails.

Note that Obesity is invariant to scaling:

but not to translations:

This could be a bug or a feature, depending on what you want to use the index for.

**Extra special karma points** to the first person who comes up with a distribution whose Obesity index is between the Uniform and Normal, and that isn’t a variant of one I already tested.

Here’s the code:

# Code by Matt Asher for StatisticsBlog.com # Feel free to redistribute, but please keep this notice # Create random varaibles from the function named in the string generateFromList = function(n, dist, ...) { match.fun(paste('r', dist, sep=''))(n, ...) } # Powers of 2 for testAt testAt = 3:12 testAtSeq = 2^testAt testsPerLevel = 30 distros = c() distros[1] = 'generateFromList(4,"norm")' distros[2] = 'generateFromList(4,"unif")' distros[3] = 'generateFromList(4,"cauchy")' distros[4] = 'generateFromList(4,"exp")' distros[5] = 'generateFromList(4,"chisq",1)' distros[6] = 'generateFromList(4,"beta",.01,.01)' distros[7] = 'generateFromList(4,"lnorm")' distros[8] = 'generateFromList(4,"weibull",1,1)' # Gotta be a better way to do this. dWords = c("Normal", "Uniform", "Cauchy", "Exponential", "Chisquare", "Beta", "Lognormal", "Weibull") par(mar=c(4,5,1.5,.5)) plot(0,0,col="white",xlim=c(min(testAt),max(testAt)), ylim=c(-.5,1), xlab="Sample size, expressed in powers of 2", ylab="Obesity index measure", main="Test of tail fatness using Obesity index") abline(h=0) colorList = list() colorList[[1]]=rgb(0,0,1,.2) colorList[[2]]=rgb(1,0,0,.2) colorList[[3]]=rgb(0,1,0,.2) colorList[[4]]=rgb(1,1,0,.2) colorList[[5]]=rgb(1,0,1,.2) colorList[[6]]=rgb(0,1,1,.2) colorList[[7]]=rgb(0,0,0,.2) colorList[[8]]=rgb(.5,.5,0,.2) # Create the legend for(d in 1:length(distros)) { x = abs(rnorm(20,min(testAt),.1)) y = rep(-d/16,20) points(x, y, col=colorList[[d]], pch=20) text(min(testAt)+.25, y[1], dWords[d], cex=.7, pos=4) } dCounter = 1 for(d in 1:length(distros)) { for(l in testAtSeq) { for(i in 1:testsPerLevel) { count = 0 for(m in 1:l) { # Get the estimate at that level, plot it testsPerLevel times x = sort(abs(eval(parse( text=distros[dCounter] )))) if ( (x[4]+x[1])>(x[2]+x[3]) ) { count = count + 1 } } # Tiny bit of scatter added ratio = count/l points(log(l, base=2), ( ratio+rnorm(1,0,ratio/100)), col=colorList[[dCounter]], pch=20) } } dCounter = dCounter + 1 }

18

Mar 13

## Review of Mathematica 9 and R-link

VIDEO TRANSCRIPT: Hello, this is Matt Asher from StatisticsBlog.com. I’m going to be reviewing Mathematica 9, from Wolfram Research. In particular, I’ll be focusing on using it with R and to do Monte Carlo simulations and other statistical work. You can find a full transcript of this video at my blog, including the source code and links to all of the webpages I mention.

Before I begin I’d like to thank Jeff Hara and Andy Ross from Wolfram for their time. Also thanks to the folks at the Mathematica Stack Exchange, who helped with a couple of my questions.

I’m going to get started with a blank notebook. I’m going to clear out all of the variables that exist. I’ve found sometimes that if you have existing variables that can cause problems.

ClearAll["Global`*"]

After each line I’m hitting Shift+Enter to run the command, if you just hit enter Mathematica won’t run things yet.

So I’ve cleared my variables and I’m going to run

Needs["RLink`"]

which will bring in the link between Mathematica and R.

InstallR[]

I’m going to make sure it’s installed.

REvaluate["R.Version()"]

And then I’m going to run a test command here to make sure everything is up and running. As you can see this is the version of R I’m running and the connection succeeded.

Note that the free version of Mathematica, the evaluation version, doesn’t come with support for R, so if you want to test out Mathematica’s and its interactions with R you either have to have to buy the full version or maybe if you call up or contact Wolfram they’d be willing to let you have a free evaluation version that is full and allows you to test out R.

So how does the interface between R and Mathematica work?

Basically, you can run operations in R, then store the results to variables in R. You can also pass data types back and forth between R and Mathematica.

Here I’m setting a variable and this variable is set in R, not in Mathematica

RSet["hal", 9000]

So if I were to type just

hal

There is no response back. This is Mathematica’s way of saying that the variable is undefined or that it doesn’t know what to do with your input. So to get back information from R we have to use:

REvaluate["hal"]

We are putting “hal” in quotes so we are parsing this in R and not in Mathematica.

For example we can do things like grab a dataset from R

iris = REvaluate["iris"]

I’m grabbing the famous “iris” dataset in R and I am pulling it into Mathematica.

or I could do things like evaluate a command in R:

REvaluate["det(matrix(sample(-50:49),nrow=10))"]

and bring back the results. This grabs the determinant of a random matrix.

We can even do things like create our own functions in R, and this gets put into a variable in Mathematica.

perfectSample = RFunction["function(n, dist, ...) match.fun(paste('q', dist, sep=''))((1:n) / (n+1), ...)"]

This function creates a perfect sample of the length that you specify of a particular distribution. Then we can call that function directly in Mathematica.

perfectSample[100, "pois", 10]

and the results are returned.

Of course, if we just wanted to do things in R, we would be continuing to just use R, instead of having to learn this new interface between R and Mathematica. So then what can we do in Mathematica that goes beyond what we can easily do in R?

One of the biggest advantages to using Mathematica is that you get access to tools for creating interactive plots and simulations that can be changed on the fly.

I’m going to do an example using the Benini Distribution, which, according to Wolfram’s web page, can be used to model the weight of cats.

So to do that, what I’m going to do is use the Mathematica command “Manipulate”

Manipulate[Block[{data, dist, kmd}, data = RandomVariate[dist = BeniniDistribution[\[Alpha],\[Beta], \[Sigma]], n]; kmd = KernelMixtureDistribution[data, h, MaxMixtureKernels -> All]; Plot[{PDF[dist, x], PDF[kmd, x]}, {x, 0, xRng}, PlotRange -> Automatic]], {{\[Alpha], 1}, .01, 5}, {{\[Beta], 1}, 0.01, 5}, {{\[Sigma], 1}, .01, 2}, {{n, 100}, {10, 100, 1000, 10000}}, {{xRng, 5}, 1, 10}, {{h, .5}, 0.01, 1}]

And then I get back the results and what I’ve got here is a live Monte Carlo simulation where I am specifying the different parameters of the distribution and I’m also specifying how many variates that I’m creating. This is the smoothing, the kernel bandwidth that I am adjusting.

And I can adjust the size of it here. Make it bigger. And do all of these adjustments on the fly.

As you can see, you’ve got some good power here for generating interactive plots and simulations. You can do these natively in Mathematica, or you do live manipulation of results coming from R. This example comes from the Mathematica guys:

mathematicaRPlotWrapper = RFunction["function(filename, plotfun){ pdf(filename) plotfun() dev.off() }"]; Clear[getRPlot]; getRPlot[plotFun_RFunction] := With[{tempfile = FileNameJoin[{$TemporaryDirectory, "temp.pdf"}]}, If[FileExistsQ[tempfile], DeleteFile[tempfile]]; mathematicaRPlotWrapper[tempfile, plotFun]; If[! FileExistsQ[tempfile], Return[$Failed]]; Import[tempfile]]; myFun[t_] := Show[#, ImageSize -> Medium, PlotRange -> All] &@ getRPlot[RFunction["function(){ x<- seq(1," <> ToString@t <> ",0.1) y<- sin(x) plot(y~x) }"]]

What’s going to happen here is I am calling an R function, doing all of my calculations, bringing them back into Mathematica.

I forgot the “Manipulate” part:

Manipulate[myFun[t], {t, 2, 10}]

So here we go. And what’s happening is everything is being sent to R for processing then coming all the way back to Mathematica. As you can see even though we are making that round trip the results are coming back at a good pace, it’s almost instantaneous this particular example.

What about speed though, more generally?

I tried creating some random variates like I did in my examination of JavaScript versus R. So I’m going to create 10 million random variates from a Normal distribution

Timing[data = RandomVariate[NormalDistribution[], 10^7];]

and that takes about a quarter of a second, a little bit more, which is about twice as fast as R for generating those.

But then let’s look at more of a worst-case scenario, a bunch of nested loops.

Timing[ l = 0; For [i = 0, i < 10^2, i = i + 1, For[j = 0, j < 10^2, j = j + 1, For[k = 0, k < 10^2, k = k + 1, l = l + 1 ] ] ] ]

Three of them, a total of a million loops it’s going through, and this takes about 1.2 seconds, and it will increase by an order of magnitude if I add an order of magnitude in here. That is slow, it’s about twice as slow as the same code would take to run in R, a language not known for it’s speed with loops. Of course, this is generally speaking not what you want to do if you want to run fast code. Mathematica itself on their website advises against using these kinds of procedural codes.

But at the same time I’ve found that there are times when there really is no other way to do things, especially if you are doing simulations with a bunch of objects that take place over time, that are iterative, in which case you do need a programming structure that is like this, so you’ll have to keep this in mind in terms of the speed.

Beside the live graphical manipulation of the results, another key benefit to using Mathematica is that you can do direct probability calculations, and sometimes even find closed form solutions for combining random variables.

I’ve got some code here that determines the probability that one standard normal variable will be greater than another.

Probability[ Subscript[x, 1] > Subscript[x, 2], {Subscript[x, 1] \[Distributed] NormalDistribution[0, 1], Subscript[x, 2] \[Distributed] NormalDistribution[0, 1]}]

And it is, of course, one-half. That’s a simple example. We can do things that are more complicated. I’m going to look here for a more general solution when you have two Normal variables with means μ1 and μ2 and we’re going to try and find the probability that one of them is greater than the other.

Probability[ Subscript[x, 1] > Subscript[x, 2], {Subscript[x, 1] \[Distributed] NormalDistribution[Subscript[\[Mu], 1], 1], Subscript[x, 2] \[Distributed] NormalDistribution[Subscript[\[Mu], 2], 1]}]

As you can see it’s taking more time to run this calculation. And eventually we get back the response. And we do have a closed form solution.

“Erfc” stands for the complementary error function.

Unfortunately, not all of the problems that I tried to do work and sometimes they just freeze up completely.

Probability[ Subscript[x, 3] > Subscript[x, 4], {Subscript[x, 3] \[Distributed] PoissonDistribution[5], Subscript[x, 4] \[Distributed] PoissonDistribution[10]}]

Here I’m running a query to Mathematica to try and find the probability that a Poisson(5) will be greater than a Poission(10) random variable. I found that this just freezes up and will not return a response. Though I only waited a certain number of minutes. Actually, one time it locked my compter up entirely, the other time I gave up after a few minutes. I’m going to hit Alt-comma to abort the command and back out of that.

So, by comparison to R, you can do the same calculation of two Poissions. I’m going to make sure that’s run in R:

x = rpois(10^6,5); y=rpois(10^6,10); z = x<y; length(z[z==TRUE])/length(x)

(NOTE: This code actually finds the probability that a Poission(5) is LESS than a Poission(10))

As you can see I’ve run that a couple times already, and it takes about .9 seconds to run this. Of course, this is not trying to find a closed form solution, but for me anyway, these are perfectly good solutions, numerical solutions, to the problems.

So, going back in to Mathematica. Besides getting closed form solutions for probabilities, you can combine distributions, do convolutions and see what kind of formula you get back out.

I found this worked fairly well for simple, well-know distributions:

dist = TransformedDistribution[u + v, {u \[Distributed] NormalDistribution[μ1, σ1], v \[Distributed] NormalDistribution[μ2, σ2]}]; PDF[dist, x]

I do it here for the Normal. I’m adding two Normally distributed variables together and we get back out very quickly the formula for that.

But what happens if we try to work with a less well known distribution. In fact, lets go ahead and see what happens if we want to add together cats and find out the final weight distribution.

dist = TransformedDistribution[u + v, {u \[Distributed] BeniniDistribution[\[Alpha]1,\[Beta]1,\[Sigma]1], v \[Distributed] BeniniDistribution[\[Alpha]2, \[Beta]2,\[Sigma]2]}]; PDF[dist, x]

And, apparently, cats don’t add so well. I tried this one as well a couple times and wasn’t able to get results returned from Mathematica unfortunately.

So, besides these bugs and issues, there are a couple other significant downsides to Mathematica. Sharing results over the Internet can be done: you can export notebooks to HTML, but people if they want to use the interactive graphs they’ll need to have a special browser plugin installed. I asked the guys from Wolfram if they know what percent of web users already have it installed. They didn’t know, I suspect the number is very low, much lower than, say, the number who have Flash installed. Of course R by itself doesn’t provide much support for sharing results over the web, though Rstudio makes something called Shiny that can do some exporting over the web. I haven’t checked that out very much yet. I plan to do that soon.

So, beyond sharing, the biggest impediment to using Mathematica on a daily basis is the interface. The GUI. I’ll bring in some of the buttons here, the palettes of buttons. Overall the look is extremely crude. Things are disordered. The floating palettes have buttons of various sizes different places, and it looks essentially like something you might have hacked together in Visual Basic 15 years ago and then hadn’t been touched since. Clearly, the modern era of interface design has passed Mathematica by. At this point even open source programs that began with horrific interfaces, like the Gimp, have now begun to focus on making their interfaces look good. And looks may seem superficial, but if you are going to be working with an interface like this, if you are going to be paying $1000 for a license of Mathematica, I don’t think it’s too much to expect that the design be easy to use, that it be easy to find things, and that there are cues as to where to find things. Color would certainly go a long way to help with that, as would other cues to help you.

I’m going to bring back in Mathematica here.

Based on the GUI, I wonder… One more thing about the GUI, if you’re moving the palettes along it doesn’t do ghosting, so it pops back in.

So, despite these issues, I can see using Mathematica as an occasional tool to find exact results for probabilities and distributions, or quickly testing out how changing parameters affects a distribution’s shape. For now though, I’m going to continue experimenting with JavaScript as an alternative to R that runs code quickly and also I’ll be looking some more into Shiny.

Make sure to check out the links related to this video at StatisticsBlog.com, and if you like these videos click on the subscribe button.