Are digits of pi random?

What better way to celebrate Pi Day than to have a blog post on the digits of \pi!

Giving pi as tips

The number \pi is the ratio of the circumference of a circle to its diameter. It is an irrational number. This means that the decimal expansion of \pi never ends and never repeats. Any decimal representation of \pi with just a finite number of decimal places written out is an approximation.

In any calculation involving \pi, the more decimal places, the more precise the computation will be. Approximations of \pi used in computations can range from 3.14 to 3.14159 (for hand calculation), or from 3.141592654 to 3.14159265358979 (using calculator or software). In fact, the calculations involving \pi performed in NASA use 15 or 16 significant digits (the last approximation cited has 15 significant digits).

As of the writing of this post, \pi has been successfully calculated to over 22 trillions decimal places, or 22,459,157,718,361 decimal places to be exact (see here). If 15 or 16 digits are good enough for space travel, why the fascination with \pi with billions and trillions or more digits? Why strive for more digits of \pi than we will ever need for practical applications on Earth or in space?

Jumping on the pi band wagon could be all about pi mania. People had been fascinated with \pi since antiquity. In our contemporary society, a day is set aside to celebrate the number pi. Actually two days are set aside – March 14 (3.14) and July 22 (based on 22/7 as an approximation), even though the first one is more well known than the second one. There are plenty of other well known special numbers. But I have never heard of a Natural Log Constant Day or Square Root of 2 Day.

There are indeed some special needs for \pi that require billions or more digits. One is that \pi is used in testing computer precision. The other is that digits of \pi are sometimes used as random digits generator. Even such uses speak more of pi mania than the natural and intrinsic qualities of the pi since other special numbers can instead be used for these purposes.

Of course, \pi as a number is not random. The digits are fixed and determined ahead of time. The first decimal place is always 1, the second decimal place is always 4 and so on. The third decimal place in the number \pi could not be any digit other than 1. Instead, we should ask a different question.

Are the decimal digits of \pi uniformly distributed? In other words, does every digit (from 0 to 9) in the decimal expansion of \pi appear one-tenth of the time? Does every pair of digits appear one-hundredth of the time? Does every triple of digits in the decimal expansion of \pi appear one-thousandth of the time and so on? If that is the case, we would say \pi is a normal number in base 10.

The concept of normal number applies to other bases too. So if each digit in the binary expansion of a number appears half the time, and if each pair of binary digits (00, 01, 10 and 11) appear one-quarter of the time and so on, the number in question is called a normal number in base 2. In general, for a number to be a normal number in a given base, every sequence of possible digits in that base is equally likely to appear in the expansion of that number. A number that is a normal number in every base is called absolutely normal.

Is \pi normal in the base 10? In Base 2? In base 16 (hexadecimal)? There are a great deal of empirical evidences that the digits of \pi behave like a normal number in base 10. The following table is the tabulation of the first 10 millions digits of \pi (source).

    Table 1 – First 10 millions digits of Pi
    \begin{array}{crrrrr}    \text{Digit} &  \text{ } & \text{Count}  & \text{ } & \text{ } & \\  \text{ } &   \text{ } & \text{ } & \text{ } & \text{ } &  \\   \text{0} &   \text{ } & \text{999,440} &  & \text{ } &  \\   \text{1} &   \text{ } & \text{999,333} &  & \text{ } &  \\   \text{2} &   \text{ } & \text{1,000,306} &  & \text{ } &  \\     \text{3} &   \text{ } & \text{999,965} &  & \text{ } &  \\   \text{4} &   \text{ } & \text{1,001,093} &  & \text{ } &   \\   \text{5} &   \text{ } & \text{1,000,466} &  & \text{ } &  \\     \text{6} &   \text{ } & \text{999,337} &  & \text{ } &  \\   \text{7} &   \text{ } & \text{1,000,206} &  & \text{ } &  \\   \text{8} &   \text{ } & \text{999,814} &  & \text{ } &  \\  \text{9} &   \text{ } & \text{1,000,040} &  & \text{ } &  \\             \end{array}

Note that the frequencies are basically 1,000,000 plus or minus a few hundreds. So each digit appears 10% of the time among the first 10 million digits of \pi. This is also confirmed by a chi-squared test (see below). Here is another statistical analysis of the first 10 million of digits of \pi.

The following table is the tabulation of the first 1 trillion digits of \pi (source).

    Table 2 – First one trillion digits of Pi
    \begin{array}{crrrrr}    \text{Digit} &  \text{ } & \text{Count}  & \text{ } & \text{ } & \\    \text{ } &   \text{ } & \text{ } & \text{ } & \text{ } &  \\   \text{0} &   \text{ } & \text{99,999,485,134} &  & \text{ } &  \\   \text{1} &   \text{ } & \text{99,999,945,664} &  & \text{ } &  \\   \text{2} &   \text{ } & \text{100,000,480,057} &  & \text{ } &  \\     \text{3} &   \text{ } & \text{99,999,787,805} &  & \text{ } &  \\   \text{4} &   \text{ } & \text{100,000,357,857} &  & \text{ } &   \\   \text{5} &   \text{ } & \text{99,999,671,008} &  & \text{ } &  \\     \text{6} &   \text{ } & \text{99,999,807,503} &  & \text{ } &  \\   \text{7} &   \text{ } & \text{99,999,818,723} &  & \text{ } &  \\   \text{8} &   \text{ } & \text{100,000,791,469} &  & \text{ } &  \\  \text{9} &   \text{ } & \text{99,999,854,780} &  & \text{ } &  \\             \end{array}

The first trillion digits of \pi appear uniform too (also confirmed by a chi-squared test below). The frequency for each digit is around 100 billion (100,000,000,000) plus or minus of an amount that is less than 1 million.

The calculation of 22.4 trillion digits of \pi were completed in November 2016. Subsequently, statistical analysis had been performed on these 22.4 trillion decimal digits of \pi (abstract and paper). The analysis is another empirical check for normality of \pi. In this analysis, the frequencies of the sequences with length one, two and three in the base 10 and base 16 representations are examined. The conclusion is that the evaluated frequencies are consistent with the hypothesis of \pi being a normal number in base 10 and 16.

Is \pi a normal number? The empirical evidences, though promising, are not enough to prove that \pi is a normal number in base 10 or in any other base. Though many mathematicians believe that \pi is a normal number in base 10 and possibly other bases, they had not been able to find mathematical proof. It is also not known whether any one of the other special numbers such as natural log constant e or other irrational numbers such as \sqrt{2} are normal numbers.

Thus determining whether \pi is a normal number is a profound and unsolved classic problem in probability. If \pi is a normal number, even just in base 10, the implication would be quite interesting. If \pi is a normal number in base 10, the sequence of decimal digits of \pi, when appropriately converted into letters, would contain the entire work of Shakespeare or the entire text of War and Peace or any other classic work of literature that you might be interested in. Finding the work of Shakespeare will probably require computing more digits of \pi than the current world record of 22.4 trillion digits.

One practical consideration of using digits of \pi as random numbers is the issue of computational speed. For large scale simulation needs, computing fresh digits of \pi will take considerable amount of time. It took 105 days to compute the current world record of 22.4 trillion digits of \pi. The verification took 28 hours (see here). It is clear that the digits of \pi are harder and harder to come by as more and more digits had been obtained.

The number pi is a fascinating mathematical object. It has something for everyone, from the practical to the fanciful and to the mysterious. On the practical side, it has a precise mathematical meaning as it describes the relationship between circumference of a circle and its diameter. It helps solved practical problems here on Earth and in space. It also exhibits random behavior that is hinted at in this post. It is a mysterious thing that captures the imagination from young students to professional mathematicians. Proving that pi is a normal number may not even easy. Everyone wants to uncover more secrets about the number pi.

Happy Pi Day!


Chi-Squared Test

The two tables of pi digit frequencies above provide good exercises for using chi-squared test. The question is: do the frequencies of digits in Table 1 and in Table 2 follow a uniform distribution? More specifically, do the digits in the first 10 million (and in the first trillion) digits of pi follow a uniform distribution? The chi-squared goodness-of-fit test is an excellent way to determine whether the observed frequencies fit the hypothesized uniform distribution.

The null hypothesis is that the observed frequencies follow the uniform distribution. Assuming the null hypothesis, the expected frequency for each digit would be one million for Table 1 and 100 billion for Table 2. Then compute the chi-squared statistic for each Table. The chi-squared statistic is computed by squaring the difference of the observed and expected frequencies (and then divided by the expected frequency in an effort to normalized the squared difference) and then taking all the sum of the normalized squared differences.

The chi-squared statistic is a measure of how much the observed frequencies deviate from the expected frequencies. When the observed frequencies and the expected frequenciess are very different, the value of the chi-squared statistic will be large. Thus large values of the chi-squared statistic provide evidence against the null hypothesis. If the null hypothesis is true, the chi-squared statistic will have an approximate chi-squared distribution with 9 degrees of freedom (for both Table 1 and Table 2). See here for a more detailed look at the chi-squared goodness of fit test.

For Table 1, the chi-squared statistic is 2.783356 with 9 degrees of freedom (df = 9). The p-value is 0.9723. With the p-value so large, we do not reject the null hypothesis. Thus the frequencies of the digits in the first 10 million digits of \pi are consistent with a uniform distribution.

For Table 2, the chi-squared statistic is 14.97246681 with 9 degrees of freedom (df = 9). The p-value is 0.091695048. The p-value is still large, though not as large as the one for Table 1. At level of significance 0.01, we do not reject the null hypothesis. There is still reason to believe that the frequencies of the digits in the first trillion digits of \pi are consistent with a uniform distribution. The larger chi-squared value of 14.97 is partly contributed to the larger frequency of the digit 8. It appears that the digit 8 appears slightly more often, but the slightly larger frequency of digit 8 is not significant.

\copyright 2017 – Dan Ma