After completing an essay for an elective in University, I checked my word count to find it was an exact multiple of 100. For a moment I thought to myself, what a remarkable coincidence, what are the odds of that! hmm... what are the odds of that. I've written many papers, is it still unlikely that at least one of them would have a word count that is a multiple of 100? Or is it in fact likely?
You are a magician.
A bad one. In an attempt to impress people at parties you attempt to guess cards they pick at random from a deck. Each time you try it's unlikely you will guess correctly: 1 in 52.
So most people will identify you are a bad magician and probably avoid you. Your strategy is that eventually after enough attempts you will get it right and impress someone, they will become your friend, maybe you can quit magic etc.
If you try it 52 times a night every night, on average you will get one correct guess a night. But you do not have a 100% chance of getting one correct guess every night, there is always a chance you will go home a failure, and there is always a chance you will guess correctly every time. This difference is important!
Things are amazing or horrible. Impressing no one and going home alone - horrible. Impressing any number of people - amazing. So you might want to know how many times you need to try the trick to get some percentage chance of going home happy - ie. getting one or more guesses right. You know that you will never get to 100% confidence of going home happy -
due to this being a random event, with a chance of total failure every night.
Maybe you would like to have a 50% of going home happy every night. The plot below shows experimental data about the odds of getting more than 0 guesses right after a given number of attempts.
CRITICAL: all we care about is that we impressed at least one person. Impressing 1,2,3 or 50 people is all treated the same: not crying yourself to sleep.
After 36 attempts, the probability is 0.5 you will go home happy. I used the average of 3000 trials to generate each point, but there is still reasonable noise from the random data.
We have shown that once you have written 69 or more essays, (papers etc) the odds are over 50% that one of them will have an even multiple of 100 words. Adding to this that the number of words in a document are likely not random if there is a minimum word count, it is probably clustered around multiples of 100 (depending on how lazy you are). So the paper I wrote was likely not special or magical, and if I had spent less time determining this I would likely have proof read it more thoroughly.
We have seen that if you have written 69 or more essays, papers, etc. there is a greater than 50% chance that you have written one or more with an even multiple of 100 words. Likely, the distribution of essay lengths is not random, but rather clusters around multiples of 100 (assuming your are busy or... lazy). So this is to say, that my essay was not special, or magical - and had I decided this after writing it, I would have saved much math analysis, and likely had time to proofread the paper before submitting it for a mediocre grade.
This turns out to be a scaled geometric distribution. If we said that we cared about the odds of x failures followed by one success it would be exactly geometric - and would simply be multiplied by 1/x. - For our purposes this is mostly irrelevant. Do not remember this.
With this groundwork laid, lets take a look again at the essay example with a 1 in 100 chance of "success" because the numbers are nicer to work with. How many essays would I have to write before there is a 50% chance one of them has a multiple of 100 words. Our intuition from earlier - knowing that 100 attempts does not give 100% chance of success - suggests that we will have to try more than 50 times.
I approached this by checking what the odds were that that I did not succeed. This is easier as there is only one route to failure to check. The odds of failure are 0.99 each attempt. The probability of an event occurring n times consecutively can be given by P^n, in this example 0.99^n.
This leaves us with the solution to this case 68.968 attempts. Interesting. Can we generalize this for other cases? It would be nice to know some factor I could multiply by the odds as an offhand calculation to find out when something would be likely to occur. For this example, if the odds are 1 in M (here M=100), we must attempt the task 0.6897*M times.
What if the odds were different, for the card example we saw before it was 35.7 attempts out of 52, giving 0.686*M. What if the odds were say 1 in 1000, would we need to attempt 0.686*1000 times? For this we will need to generalize the formula as the number of attempts goes to infinity.
The proof for this I will try to present legibly in another post. While not onerous, it is slightly tedious in keeping track of notation, and deserves it's own article.