The guessing game in R (with a twist, of course)

Maybe you remember playing this one as a kid. If you are about my age, you may have even created a version of this game as one of your first computer programs. You guess a number, the computer tells you if you if you are too low or high. I’ve limited the number of maximum guesses, and randomized the computer’s choice based on the Poisson distribution (more on that later).

Here’s the code. This was part of my attempt to understand how R reads input from the command line. One of the things I learned: you may need to save this to a file and run it with “source()”, instead of running it directly from the console, line by line.

# Classic guessing game with twist
x = 0
gotRight = 0
failed = 0

# Initial lambda for our random var
correct = 2000
initial = correct

# How many guesses should we allow per number
maxGuesses = 7
	
while(x != Inf) {
	# The +1 part makes sure we never get zero, which would trigger 0's forever
	correct = rpois(1,correct) + 1
	
	# The advantage of using "cat" instead of "print" is that you remove those pesky quotation marks
	cat("I am thinking of a number between 1 and infinity. What is it? (Type Inf to quit)\n")
	
	# Solicit input from the user
	x = scan(n=1) # Just one item in this vector
	
	# Be nice and let the user quit. 
	if(x == Inf) {
		cat("The correct answer was", correct, "\n")
		cat("You got", gotRight, "right and failed", failed, "times. Maximum allowed guesses was", maxGuesses, "and initial lambda was", initial, ". Goodbye.\n")
		cat("Post your score to http://www.statisticsblog.com/2010/05/the-guessing-game-in-r-with-a-twist-of-course/#comments \n")
		break
	}
	
	for(i in 1:maxGuesses) {
		if(x == correct) {
			print("You rock!")
			gotRight = gotRight + 1
			break
		} else {		
			if(i == maxGuesses) {
				cat("You ran out of guesses. I will pick a new random number based on the last one.\n")
				failed = failed + 1
			} else {
				if(x < correct) {
					cat("You are too low. Guess again.\n")
				} else {
					cat("You are too high. Guess again.\n")
				}
				
				x = scan(n=1)
			}			
		}
	}
}

Note 1: My code makes a couple uses of the aparently controversial "break" function. I can still recall a heated debate I had with a CS professor who believed that calling "break" (in Python) was as bad as crossing the streams of your Proton Pack. That said, I have sucessfully used it on several occasions now without any appearance by Stay Puft Marshmallow Man or changing the natural order between dogs and cats. In R, the biggest problem with using constructs like "break" and "while" is that, for reasons clear only to readers of this blog but not myself, if you ask R for help about either of these tokens using

?term

you get an sent an error or to purgatory, respectively.

Hint: Because the random guesses are Poisson based, using a "half the distance" strategy for guessing may not be the best way to go. The hardcore amongst yourselves might want to calculate the median of the expected value conditional on having guessed too low or high.

Note 2: The Poisson isn't a very good distribution for for this. Maybe you can find a better one, or at least jack up the dispersion like an overzealous offroader tweaking the suspension of his 4Runner.

Tags: , , , ,

6 comments

  1. I’d probably use a zeta distribution… with parameter between 1 and 2 . No mean!

    And then scale it up to get a nice big median.

    Or, since I am lazy, you could get close enough to that by generating from a Pareto and truncating.

  2. Just use the Cauchy distribution. It is the prototype of a distribution with moments that do not exist or are infinite.

  3. btw, you may want to have a look at this little problem:
    pick the largest of two numbers
    http://www.stanford.edu/~cover/papers/paper73.pdf

  4. @efrique:

    Interesting suggestions. I like the idea of using a discrete power-law type distribution, where the higher the number is, the higher it is likely to be.

    @Napo:

    1. I considered the Cauchy and a couple other continuous distors, but working with any of these would make the code more complicated. For starters, I would have to round the answer off to an integer (A game of “guess the real number” might not be much fun, eh?). Next, since allowing negative numbers seems to violate the spirit of game and make it less fun, I would have to take the absolute value of the number. Unless the distro was centered at zero (and the center is supposed to move over time), this makes finding conditional probabilities harder since part of the tail gets added back in to the positive. Or I could truncate at zero, which might cause other issues like attraction to 0 over time, which is the problem with using the exponential, another possibility I considered. So far as the Cauchy itself goes…. that just seemed, well, kind of mean, don’t you think?

    2. Thanks for the link. Interesting game, hadn’t heard of it. Seems like the strategy would only work if the person writing down numbers on the slips of paper did all their writing in advance, then shuffled the papers. Otherwise they could tweak the strategy of what they write down with each round.

  5. About picking two numbers from Napo, there was a solution proposed at http://sas-and-r.blogspot.com/2010/03/example-727-probability-question.html

  6. 1. You are right, in practical terms the distribution should be dominated by the counting measure in order to produce useful results.

    2. This Strategy does always work, because it exploits that you not only know (ex ante) there are two different numbers, but also that these two numbers are distinct. You just draw a RV Z~N(0,1). If Z is between the two numbers, you always decide right. Since on IR you have a positive probability to be in that interval, you decide with p>0.5 right.
    You can use any measure which is absolutely continuous w.r.t lebesgue measure, because the 1-dim lebesgue measure give mass >0 to any non-empty interval on IR.
    If the two numbers are very big (or small) and very near each other the probability to hit this interval will be small, but it stays always strictly positive.

    If you use only positive integers, you may have to require that the distance of the two numbers is at least 2. That way you can construct an interval about the integer in the middle and round every random variable Z in this interval to this integer.

    You may want to increase the probability by using a pdf other than the gaussian, because its tails decay so rapidly. The cauchy distribution might be better.