Nice! Reminds me of trading cards to McDonald’s Monopoly game as a kid, trying to complete a particular set. We always suspected that they made all but the last card in a set very rare, to prevent people from winning many of the big prizes. Given that our we were just a tiny fraction of the people who collected the cards, we were effectively sampling with replacement from the overall collection.

Makes me wonder about trading, though, and how much it would increase your chances of finding your “unicorn”. Could make the simulation more interesting.

]]>The relation with your problem: instead of saying that the probability of finding a species is 1/20 at each trip, it is roughly equivalent to say that at each trip you find one (random) species, and then counting 50 trips as one (this approximation is ok, because 50 not much bigger than the square root of 1000). The expected number of trips necessary to find the 1000 species should therefore be roughly 1000 * log(1000) / 50, which is approximately 140.

Just checking at the graph: Seems to fit

You can also do the calculations directly for your model. This is actually very easy, since the random variables N_1,…,N_k,…,N_1000, where N_k is the number of trips you have to do to find species k, are independent. Each one is geometric with success probability 1/20, therefore their maximum is roughly 20*log(1000) (standard result from extreme value theory).

]]>Very interesting problem! Kind of like the German Tank Problem (http://www.statisticsblog.com/2010/05/how-many-tanks-gtp-gets-put-to-the-test/) for species. Would be interesting to test the theorem you linked with an MC simulation.

]]>References for students of the matter, as I am:

http://oregonstate.edu/instruct/st571/urquhart/var_prob/sld011.htm

Overton, Stehman, “The Horvitz-Thompson Theorem as a unifying perspective for probability sampling: With examples from natural resources sampling,” THE AMERICAN STATISTICIAN, 49(3), August 1995, pp 261ff.

A. R. Solow, W. K. Smith, “Estimating species number under an inconvenient abundance model,” JOURNAL OF AGRICULTURAL, BIOLOGICAL, AND ENVIRONMENTAL STATISTICS, 14: 242-252, 2009.

]]>