A randomized definition of the natural logarithm constant

The number e is the base of the natural logarithm. It is an important constant in mathematics, which is approximately 2.718281828. This post discusses a charming little problem involving the the number e. This number can be represented in many ways. In a calculus course, the number e may be defined as the upper limit of the following integral:

    \displaystyle \int_1^e \frac{1}{t} \ dt=1

Another representation is that it is the sum \displaystyle e=\sum_{n=1}^\infty \frac{1}{n!}. Still another is that it is the limit \displaystyle e=\lim_{n \rightarrow \infty} \biggl( 1+\frac{1}{n} \biggr)^n. According to the authors of a brief article from the Mathematics Magazine [1], these textbook definitions do not give immediate insight about the number e (article can be found here). As a result, students come away from the course without a solid understanding of the number e and may have to resort to rote memorization on what the number e is. Instead, the article gives six probability oriented ways in which the number e can occur. These occurrences of e are more interesting and, according to the authors of [1], can potentially increase students’ appreciation of the number e. In two of these six examples (Example 2 and Example 5), the number e is defined by drawing random numbers from the interval (0,1). This post discusses Example 2.


Random Experiment

Here’s the description of the random experiment that generates the number e. Randomly and successively select numbers from the interval (0, 1). The experiment terminates when the sum of the random numbers exceeds 1. What is the average number of selections in this experiment? In other words, we are interested in the average length of the experiment. According to the article [1], the average length is the number e. The goal here is to give a proof of this result.

As illustration, the experiment is carried out 1 million times using random numbers that are generated in Excel using the Rand() function. The following table summarizes the results.

    \left[\begin{array}{rrrrrr}      \text{Length of} & \text{ } & \text{ } & \text{ } & \text{Relative} \\      \text{Experiment} & \text{ } & \text{Frequency} & \text{ } & \text{Frequency} \\      \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\      2 & \text{ } & 500777 & \text{ } & 0.500777 \\      3 & \text{ } & 332736 & \text{ } & 0.332736 \\      4 & \text{ } & 124875 & \text{ } & 0.124875 \\      5 & \text{ } & 33465 & \text{ } & 0.033465 \\      6 & \text{ } & 6827 & \text{ } & 0.006827 \\      7 & \text{ } & 1130 & \text{ } & 0.001130 \\      8 & \text{ } & 172 & \text{ } & 0.000172 \\      9 & \text{ } & 14 & \text{ } & 0.000014 \\      10 & \text{ } & 4 & \text{ } & 0.000004 \\      \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\       \text{Total} & \text{ } & 1000000 & \text{ } & \text{ }      \end{array}\right]

The average length of experiment in these 1 million experiments is 2.717001. Even though the rate of convergence to the number e is fairly slow, the simulated data demonstrates that on average it takes approximately e number of simulations of numbers in the unit interval to get a sum that exceeds 1.


A proof

Let U_1,U_2,U_3,\cdots be a sequence of independent and identically distributed random variables such that the common distribution is a uniform distribution on the unit interval (0,1). Let N be defined as follows:

    \displaystyle N=\text{min}\left\{n: U_1+U_2+\cdots+U_n>1 \right\} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

The objective is to determine E(N). On the surface, it seems that we need to describe the distribution of the independent sum X_n=U_1+U_2+\cdots+U_n for all possible n. Doing this may be possible but the result would be messy. It turns out that we do not need to do so. We need to evaluate the probability P(X_n \le 1) for all n. We show that

    \displaystyle F_n(x)=P(X_n \le x)=\frac{x^n}{n!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

for 0 \le x \le 1 and for n=1,2,3,\cdots. This is accomplished by an induction proof. This is true for n=1 since X_1=U_1 is a uniform distribution. Suppose (2) holds for the integer n-1. This means that \displaystyle F_{n-1}(x)=P(X_{n-1} \le x)=\frac{x^{n-1}}{(n-1)!} for 0 \le x \le 1. Note that X_n=X_{n-1}+U_n, which is an independent sum. Let’s write out the convolution formula for this independent sum:

    \displaystyle \begin{aligned} F_n(x)&=\int_{0}^x F_{n-1}(x-y) \cdot f_{U_n}(y) \ dy \\&=\int_{0}^x \frac{(x-y)^{n-1}}{(n-1)!} \cdot 1 \ dy \\&=\frac{1}{(n-1)!} \int_0^x (x-y)^{n-1} \ dy \\&=\frac{x^n}{n!}  \end{aligned}

The above derivation completes the proof of the claim. We now come back to the problem of evaluating the mean of the random variable N defined in (1). First, note that N>n if and only if X_n=U_1+U_2+\cdots+U_n \le 1. So we have the probability statement \displaystyle P(N>n)=F_n(1)=\frac{1}{n!}.

As a result, the following is the probability function of the random variable N.

    \displaystyle \begin{aligned} P(N=n)&=P(N>n-1)-P(N>n) \\&=F_{n-1}(1)-F_n(1) \\&=\frac{1}{(n-1)!}-\frac{1}{n!} \\&=\frac{n-1}{n!}  \end{aligned}

Now evaluate the mean.

    \displaystyle \begin{aligned} E(N)&=\sum \limits_{n=2}^\infty \frac{n(n-1)}{n!} \\&=\sum \limits_{n=2}^\infty \frac{1}{(n-2)!} \\&=\sum \limits_{m=0}^\infty \frac{1}{m!} \\&=e \end{aligned}

With the above derivation, the proof that e= 2.718281828… is the average number of random numbers to select in order to obtain a sum that exceeds 1 is completed.



  1. Shultz H. S., Leonard B., Unexpected Occurrences of the Number e,Mathematics Magazine, October 1989, Volume 62, Number 4, pp. 269–271.

\copyright \ \text{2015 by Dan Ma}

Picking Two Types of Binomial Trials

We motivate the discussion with the following example. The notation W \sim \text{binom}(n,p) denotes the statement that W has a binomial distribution with parameters n and p. In other words, W is the number of successes in a sequence of n independent Bernoulli trials where p is the probability of success in each trial.

Example 1
Suppose that a student took two multiple choice quizzes in a course for probability and statistics. Each quiz has 5 questions. Each question has 4 choices and only one of the choices is correct. Suppose that the student answered all the questions by pure guessing. Furthermore, the two quizzes are independent (i.e. results of one quiz will not affect the results of the other quiz). Let X be the number of correct answers in the first quiz and Y be the number of correct answers in the second quiz. Suppose the student was told by the instructor that she had a total of 4 correct answers in these two quizzes. What is the probability that she had 3 correct answers in the first quiz?

On the face of it, the example is all about binomial distribution. Both X and Y are binomial distributions (both \sim \text{binom}(5,\frac{1}{4})). The sum X+Y is also a binomial distribution (\sim \text{binom}(10,\frac{1}{4})). The question that is being asked is a conditional probability, i.e., P(X=3 \lvert X+Y=4). Surprisingly, this conditional probability can be computed using the hypergeometric distribution. One can always work this problem from first principle using binomial distributions. As discussed below, for a problem such as Example 1, it is always possible to replace the binomial distributions using a thought process involving the hypergeometric distribution.

Here’s how to think about the problem. This student took the two quizzes and was given the news by the instructor that she had 4 correct answers in total. She now wonders what the probability of having 3 correct answers in the first quiz is. The thought process is this. She is to pick 4 questions from 10 questions (5 of them are from Quiz 1 and 5 of them are from Quiz 2). So she is picking 4 objects from a group of two distinct types of objects. This is akin to reaching into a jar that has 5 red balls and 5 blue balls and pick 4 balls without replacement. What is the probability of picking 3 red balls and 1 blue ball? The probability just described is from a hypergeometric distribution. The following shows the calculation.

\displaystyle (1) \ \ \ \ P(X=3 \lvert X+Y=4)=\frac{\displaystyle \binom{5}{3} \ \binom{5}{1}}{\displaystyle \binom{10}{4}}=\frac{50}{210}

We will show below why this works. Before we do that, let’s describe the above thought process. Whenever you have two independent binomial distributions X and Y with the same probability of success p (the number of trials does not have to be the same), the conditional distribution X \lvert X+Y=a is a hypergeometric distribution. Interestingly, the probability of success p has no bearing on this observation. For Example 1, we have the following calculation.

\displaystyle (2a) \ \ \ \ P(X=0 \lvert X+Y=4)=\frac{\displaystyle \binom{5}{0} \ \binom{5}{4}}{\displaystyle \binom{10}{4}}=\frac{5}{210}

\displaystyle (2b) \ \ \ \ P(X=1 \lvert X+Y=4)=\frac{\displaystyle \binom{5}{1} \ \binom{5}{3}}{\displaystyle \binom{10}{4}}=\frac{50}{210}

\displaystyle (2c) \ \ \ \ P(X=2 \lvert X+Y=4)=\frac{\displaystyle \binom{5}{2} \ \binom{5}{2}}{\displaystyle \binom{10}{4}}=\frac{100}{210}

\displaystyle (2d) \ \ \ \ P(X=3 \lvert X+Y=4)=\frac{\displaystyle \binom{5}{3} \ \binom{5}{1}}{\displaystyle \binom{10}{4}}=\frac{50}{210}

\displaystyle (2e) \ \ \ \ P(X=4 \lvert X+Y=4)=\frac{\displaystyle \binom{5}{4} \ \binom{5}{0}}{\displaystyle \binom{10}{4}}=\frac{5}{210}

Interestingly, the conditional mean E(X \lvert X+Y=4)=2, while the unconditional mean E(X)=5 \times \frac{1}{4}=1.25. The fact that the conditional mean is higher is not surprising. The student was lucky enough to have obtained 4 correct answers by guessing. Given this, she had a greater chance of doing better on the first quiz.

Why This Works

Suppose X \sim \text{binom}(5,p) and Y \sim \text{binom}(5,p) and they are independent. The joint distribution of X and Y has 36 points in the sample space. See the following diagram.

Figure 1

The probability attached to each point is

\displaystyle \begin{aligned}(3) \ \ \ \  P(X=x,Y=y)&=P(X=x) \times P(Y=y) \\&=\binom{5}{x} p^x (1-p)^{5-x} \times \binom{5}{y} p^y (1-p)^{5-y}  \end{aligned}

where x=0,1,2,3,4,5 and y=0,1,2,3,4,5.

The conditional probability P(X=k \lvert X+Y=4) involves 5 points as indicated in the following diagram.

Figure 2

The conditional probability P(X=k \lvert X+Y=4) is simply the probability of one of the 5 sample points as a fraction of the sum total of the 5 sample points encircled in the above diagram. The following is the sum total of the probabilities of the 5 points indicated in Figure 2.

\displaystyle \begin{aligned}(4) \ \ \ \  P(X+Y=4)&=P(X=0) \times P(Y=4)+P(X=1) \times P(Y=3)\\&\ \ \ \ +P(X=2) \times P(Y=3)+P(X=3) \times P(Y=2)\\&\ \ \ \ +P(X=4) \times P(Y=0)  \end{aligned}

We can plug (3) into (4) and work out the calculation. But (4) is actually equivalent to the following because X+Y \sim \text{binom}(10,p).

\displaystyle (5) \ \ \ \ P(X+Y=4)=\ \binom{10}{4} p^4 \ (1-p)^{6}

As stated earlier, the conditional probability P(X=k \lvert X+Y=4) is simply the probability of one of the 5 sample points as a fraction of the sum total of the 5 sample points encircled in Figure 2. Thus we have:

\displaystyle \begin{aligned}(6) \ \ \ \  P(X=k \lvert X+Y=4)&=\frac{P(X=k) \times P(Y=4-k)}{P(X+Y=4)} \\&=\frac{\displaystyle \binom{5}{k} p^k (1-p)^{5-k} \times \binom{5}{4-k} p^{4-k} (1-p)^{5-(4-k)}}{\displaystyle  \binom{10}{4} p^4 \ (1-p)^{6}}   \end{aligned}

With the terms involving p and 1-p cancel out, we have:

\displaystyle (7) \ \ \ \  P(X=k \lvert X+Y=4)=\frac{\displaystyle \binom{5}{k} \times \binom{5}{4-k}}{\displaystyle  \binom{10}{4}}


Suppose X \sim \text{binom}(N,p) and Y \sim \text{binom}(M,p) and they are independent. Then X+Y is also a binomial distribution, i.e., \sim \text{binom}(N+M,p). Suppose that both binomial experiments \text{binom}(N,p) and \text{binom}(M,p) have been performed and it is known that there are a successes in total. Then X \lvert X+Y=a has a hypergeometric distribution.

\displaystyle (8) \ \ \ \  P(X=k \lvert X+Y=a)=\frac{\displaystyle \binom{N}{k} \times \binom{M}{a-k}}{\displaystyle  \binom{N+M}{a}}

where k=0,1,2,3,\cdots,\text{min}(N,a).

As discussed earlier, think of the N trials in \text{binom}(N,p) as red balls and think of the M trials in \text{binom}(M,p) as blue balls in a jar. Think of the a successes as the number of balls you are about to draw from the jar. So you reach into the jar and select a balls without replacement. The calculation in (8) gives the probability that you select k red balls and a-k blue balls.

The probability of success p in the two binomial distributions have no bearing on the result since it gets canceled out in the derivation. One can always work a problem like Example 1 using first principle. Once the thought process using hypergeometric distribution is understood, it is a great way to solve this problem, that is, you can by pass the binomial distributions and go straight to the hypergeometric distribution.

Additional Practice
Practice problems are found in the following blog post.

How to pick binomial trials