A randomized definition of the natural logarithm constant

The number e is the base of the natural logarithm. It is an important constant in mathematics, which is approximately 2.718281828. This post discusses a charming little problem involving the the number e. This number can be represented in many ways. In a calculus course, the number e may be defined as the upper limit of the following integral:

    \displaystyle \int_1^e \frac{1}{t} \ dt=1

Another representation is that it is the sum \displaystyle e=\sum_{n=1}^\infty \frac{1}{n!}. Still another is that it is the limit \displaystyle e=\lim_{n \rightarrow \infty} \biggl( 1+\frac{1}{n} \biggr)^n. According to the authors of a brief article from the Mathematics Magazine [1], these textbook definitions do not give immediate insight about the number e (article can be found here). As a result, students come away from the course without a solid understanding of the number e and may have to resort to rote memorization on what the number e is. Instead, the article gives six probability oriented ways in which the number e can occur. These occurrences of e are more interesting and, according to the authors of [1], can potentially increase students’ appreciation of the number e. In two of these six examples (Example 2 and Example 5), the number e is defined by drawing random numbers from the interval (0,1). This post discusses Example 2.

________________________________________________________________________

Random Experiment

Here’s the description of the random experiment that generates the number e. Randomly and successively select numbers from the interval (0, 1). The experiment terminates when the sum of the random numbers exceeds 1. What is the average number of selections in this experiment? In other words, we are interested in the average length of the experiment. According to the article [1], the average length is the number e. The goal here is to give a proof of this result.

As illustration, the experiment is carried out 1 million times using random numbers that are generated in Excel using the Rand() function. The following table summarizes the results.

    \left[\begin{array}{rrrrrr}      \text{Length of} & \text{ } & \text{ } & \text{ } & \text{Relative} \\      \text{Experiment} & \text{ } & \text{Frequency} & \text{ } & \text{Frequency} \\      \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\      2 & \text{ } & 500777 & \text{ } & 0.500777 \\      3 & \text{ } & 332736 & \text{ } & 0.332736 \\      4 & \text{ } & 124875 & \text{ } & 0.124875 \\      5 & \text{ } & 33465 & \text{ } & 0.033465 \\      6 & \text{ } & 6827 & \text{ } & 0.006827 \\      7 & \text{ } & 1130 & \text{ } & 0.001130 \\      8 & \text{ } & 172 & \text{ } & 0.000172 \\      9 & \text{ } & 14 & \text{ } & 0.000014 \\      10 & \text{ } & 4 & \text{ } & 0.000004 \\      \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\       \text{Total} & \text{ } & 1000000 & \text{ } & \text{ }      \end{array}\right]

The average length of experiment in these 1 million experiments is 2.717001. Even though the rate of convergence to the number e is fairly slow, the simulated data demonstrates that on average it takes approximately e number of simulations of numbers in the unit interval to get a sum that exceeds 1.

________________________________________________________________________

A proof

Let U_1,U_2,U_3,\cdots be a sequence of independent and identically distributed random variables such that the common distribution is a uniform distribution on the unit interval (0,1). Let N be defined as follows:

    \displaystyle N=\text{min}\left\{n: U_1+U_2+\cdots+U_n>1 \right\} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

The objective is to determine E(N). On the surface, it seems that we need to describe the distribution of the independent sum X_n=U_1+U_2+\cdots+U_n for all possible n. Doing this may be possible but the result would be messy. It turns out that we do not need to do so. We need to evaluate the probability P(X_n \le 1) for all n. We show that

    \displaystyle F_n(x)=P(X_n \le x)=\frac{x^n}{n!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

for 0 \le x \le 1 and for n=1,2,3,\cdots. This is accomplished by an induction proof. This is true for n=1 since X_1=U_1 is a uniform distribution. Suppose (2) holds for the integer n-1. This means that \displaystyle F_{n-1}(x)=P(X_{n-1} \le x)=\frac{x^{n-1}}{(n-1)!} for 0 \le x \le 1. Note that X_n=X_{n-1}+U_n, which is an independent sum. Let’s write out the convolution formula for this independent sum:

    \displaystyle \begin{aligned} F_n(x)&=\int_{0}^x F_{n-1}(x-y) \cdot f_{U_n}(y) \ dy \\&=\int_{0}^x \frac{(x-y)^{n-1}}{(n-1)!} \cdot 1 \ dy \\&=\frac{1}{(n-1)!} \int_0^x (x-y)^{n-1} \ dy \\&=\frac{x^n}{n!}  \end{aligned}

The above derivation completes the proof of the claim. We now come back to the problem of evaluating the mean of the random variable N defined in (1). First, note that N>n if and only if X_n=U_1+U_2+\cdots+U_n \le 1. So we have the probability statement \displaystyle P(N>n)=F_n(1)=\frac{1}{n!}.

As a result, the following is the probability function of the random variable N.

    \displaystyle \begin{aligned} P(N=n)&=P(N>n-1)-P(N>n) \\&=F_{n-1}(1)-F_n(1) \\&=\frac{1}{(n-1)!}-\frac{1}{n!} \\&=\frac{n-1}{n!}  \end{aligned}

Now evaluate the mean.

    \displaystyle \begin{aligned} E(N)&=\sum \limits_{n=2}^\infty \frac{n(n-1)}{n!} \\&=\sum \limits_{n=2}^\infty \frac{1}{(n-2)!} \\&=\sum \limits_{m=0}^\infty \frac{1}{m!} \\&=e \end{aligned}

With the above derivation, the proof that e= 2.718281828… is the average number of random numbers to select in order to obtain a sum that exceeds 1 is completed.

________________________________________________________________________

Reference

  1. Shultz H. S., Leonard B., Unexpected Occurrences of the Number e,Mathematics Magazine, October 1989, Volume 62, Number 4, pp. 269–271.

________________________________________________________________________
\copyright \ \text{2015 by Dan Ma}