The week in stats (Dec. 9th edition)

  • The problems with using a p-value as a fixed cutoff for hypothesis testing are well known. Probabilities and P-Values is another article that discusses the weakness of the p-value. However, like every author who claims the p-value is horrible, no one is able to produce a satisfactory substitute.
  • PirateGrunt is currently producing a series of 24 articles called 24 Days of R. In every post, he shares a few neat R tricks and explains how you can use them. You may find his first post here and the subsequent ones in his blog.
  • Coursera – an online education startup – has rapidly expanded its curriculum of statistics and data analysis courses. There are now 33 modules directly linked to the field, excluding the courses where statistics and data science are used as a supportive tool (e.g. finance). These courses make use of multiple statistical software packages like Python, MATLAB and of course R.  Here’s the complete list of Coursera courses using R, ranked by “popularity”.
  • For those interested in machine learning, a preview of Data Mining Applications with R by Yanchang Zhao and Yonghua Cen is available here.
  • A tutorial on the R package Plotly, and how to make beautiful visuals and graphs with it.
  • A recent article by Matt Asay claims that “Python is displacing R as the language for data science.” David Smith of Revolution Analytics discusses his thoughts on the competition of R and Python.
  • Consider n points uniformly distributed on a sphere. What is the probability that all points lie on a same hemisphere (not necessarily the north or south hemisphere)? Arthur Charpentier of Freakonometrics presents a simulation-based solution, along with some very nice visuals.

2 comments

  1. Hey Matt,

    This year I have become enamored of equivalence tests (tests in which the null hypothesis are framed in terms of difference at least as large as some selected tolerance, if you reject the null, you conclude equivalence within the tolerance. See Schuirmann 1987, or Wellek’s, 2010 textbook). The cool thing about the framework is that all the familiar workhorse tests (null of sameness) have equivalence formulations. When the results are combined with the more familiar tests of difference, four interesting possibilities result:

    1. not reject null of difference, reject null of sameness: conclude relevant difference
    2. reject null of difference, reject null of sameness: conclude trivial difference (i.e. too small to matter)
    3. not reject null of difference, not reject null of sameness: conclude underpowered tests
    4. reject null of difference, not reject null of sameness: conclude equivalence

    This is a bit obliquely related to the p-value in that the p-val approach is still used, only power and relevant effect size are explicitly part of the the framework.