Recommendation of the week

“[I]f you have performed any statistical analysis that is more complex than calculating the mean and the standard deviation, you should perform the same analysis on noise to make sure that whatever effect you observe is indeed a unique feature of your data and not an artefact of the analysis.”

Found this one over at Stefan’s sieste blog. I couldn’t agree more, especially now that computers and big data sets entice us to make ever more complex models. Oh, and that’s not a bad thing! As I’ve argued, we’ll need to give up on simple, easy to interpret models in order to get more predictive power.

I’d go even more meta than Stefan and argue that you should re-test your entire model-creating process on noise (perhaps he meant this with his quote). If you started with a data set, then ran a stepwise variable selection algorithm, then added in a new non-linear term to get a better fit, do the same on noise, trying to get the best fit. Are you able to get a statistically significant result? Better still, run the same procedure on different types of noise, not just Gaussian White (I know, sounds like something you’d load into a syringe. Normality, the gateway drug?).

Tags: , ,

2 comments

Leave a comment