An Introduction to the Bayes’ Formula

We open up a discussion of the Bayes’ formula by going through a basic example. The Bayes’ formula or theorem is a method that can be used to compute “backward” conditional probabilities such as the examples described here. The formula will be stated after we examine the calculation from Example 1. The following diagram describes Example 1. Example 2 is presented at the end of the post and is left as exercise. For a basic discussion of the Bayes’ formula, see [1] and chapter 4 of [2].

Example 1

As indicated in the diagram, Box 1 has 1 red ball and three white balls and Box 2 has 2 red balls and 2 white balls. The example involves a sequence of two steps. In the first step (the green arrow in the above diagram), a box is randomly chosen from two boxes. In the second step (the blue arrow), a ball is randomly selected from the chosen box. We assume that the identity of the chosen box is unknown to the participants of this random experiment (e.g. suppose the two boxes are identical in appearance and a box is chosen by your friend and its identity is kept from you). Since a box is chosen at random, it is easy to see that P(B_1)=P(B_2)=0.5.

The example involves conditional probabilities. Some of the conditional probabilities are natural and are easy to see. For example, if the chosen box is Box 1, it is clear that the probability of selecting a red ball is \displaystyle \frac{1}{4}, i.e. \displaystyle P(R \lvert B_1)=\frac{1}{4}. Likewise, the conditional probability P(R \lvert B_2) is \displaystyle \frac{2}{4}. These two conditional probabilities are “forward” conditional probabilities since the events R \lvert B_1 and R \lvert B_2 occur in a natural chronological order.

What about the reversed conditional probabilities P(B_1 \lvert R) and P(B_2 \lvert R)? In other words, if the selected ball from the unknown box (unknown to you) is red, what is the probability that the ball is from Box 1?

The above question seems a little backward. After the box is randomly chosen, it is fixed (though the identity is unknown to you). Since it is fixed, shouldn’t the probability that the box being Box 1 is \displaystyle \frac{1}{2}? Since the box is already chosen, how can the identity of the box be influenced by the color of the ball selected from it? The answer is of course no.

We should not look at the chronological sequence of events. Instead, the key to understanding the example is through performing the random experiment repeatedly. Think of the experiment of choosing one box and then selecting one ball from the chosen box. Focus only on the trials that result in a red ball. For the result to be a red ball, we need to get either Box 1/ Red or Box 2/Red. Compute the probabilities of these two cases. Then add these two probabilities, we will obtain the probability that the selected ball is red. The following diagram illustrates this calculation.

Example 1 – Tree Diagram

The outcomes with red border in the above diagram are the outcomes that result in a red ball. The diagram shows that if we perform this experiment many times, about 37.5% of the trials will result in a red ball (on average 3 out of 8 trials will result in a red ball). In how many of these trials, is Box 1 the source of the red ball? In the diagram, we see that the case Box 2/Red is twice as likely as the case Box 1/Red. We conclude that the case Box 1/Red accounts for about one third of the cases when the selected ball is red. In other words, one third of the red balls come from Box 1 and two third of the red balls come from Box 2. We have:

\displaystyle (1) \ \ \ \ \ P(B_1 \lvert R)=\frac{1}{3}

\displaystyle (2) \ \ \ \ \ P(B_2 \lvert R)=\frac{2}{3}

Instead of using the tree diagram or the reasoning indicated in the paragraph after the tree diagram, we could just as easily apply the Bayes’ formula:

\displaystyle \begin{aligned}(3) \ \ \ \ \ P(B_1 \lvert R)&=\frac{P(R \lvert B_1) \times P(B_1)}{P(R)} \\&=\frac{\frac{1}{2} \times \frac{1}{4}}{\frac{3}{8}} \\&=\frac{1}{3}  \end{aligned}

In the calculation in (3) (as in the tree diagram), we use the law of total probability:

\displaystyle \begin{aligned}(4) \ \ \ \ \ P(R)&=P(R \lvert B_1) \times P(B_1)+P(R \lvert B_2) \times P(B_2) \\&=\frac{1}{4} \times \frac{1}{2}+\frac{2}{4} \times \frac{1}{2} \\&=\frac{3}{8}  \end{aligned}


We are not saying that an earlier event (the choosing of the box) is altered in some way by a subsequent event (the observing of a red ball). The above probabilities are subjective. How strongly do you believe that the “unknown” box is Box 1? If you use probabilities to quantify your belief, without knowing any additional information, you would say the probability that the “unknown” box being Box 1 is \frac{1}{2}.

Suppose you reach into the “unknown” box and get a red ball. This additional information alters your belief about the chosen box. Since Box 2 has more red balls, the fact that you observe a red ball will tell you that it is more likely that the “unknown” chosen box is Box 2. According to the above calculation, you update the probability of the chosen box being Box 1 to \frac{1}{3} and the probability of it being Box 2 as \frac{2}{3}.

In the language of Bayesian probability theory, the initial belief of P(B_1)=0.5 and P(B_2)=0.5 is called the prior probability distribution. After a red ball is observed, the updated belief as in the probabilities \displaystyle P(B_1 \lvert R)=\frac{1}{3} and \displaystyle P(B_2 \lvert R)=\frac{2}{3} is called the posterior probability distribution.

As demonstrated by this example, the Bayes’ formula is for updating probabilities in light of new information. Though the updated probabilities are subjective, they are not arbitrary. We can make sense of these probabilities by assessing the long run results of the experiment objectively.

An Insurance Perspective

The example discussed here has an insurance interpretation. Suppose an insurer has two groups of policyholders, both equal in size. One group consists of low risk insureds where the probability of experiencing a claim in a year is \frac{1}{4} (i.e. the proportion of red balls in Box 1). The insureds in other group, a high risk group, have a higher probability of experiencing a claim in a year, which is \frac{2}{4} (i.e. the proportion of red balls in Box 2).

Suppose someone just purchase a policy. Initially, the risk profile of this newly insured is uncertain. So the initial belief is that it is equally likely for him to be in the low risk group as in the high risk group.

Suppose that during the first policy year, the insured has incurred one claim. The observation alters our belief about this insured. With the additional information of having one claim, the probability that the insured belong to the high risk group is increased to \frac{2}{3}. The risk profile of this insured is altered based on new information. The insurance point of view described here has the exact same calculation as in the box-ball example and is that of using past claims experience to update future claims experience.

Bayes’ Formula

Suppose we have a collection of mutually exclusive events B_1, B_2, \cdots, B_n. That is, the probabilities P(B_i) sum to 1.0. Suppose R is an event. Think of the events B_i as “causes” that can explain the event R, an observed result. Given R is observed, what is the probability that the cause of R is B_k? In other words, we are interested in finding the conditional probability P(B_k \lvert R).

Before we have the observed result R, the probabilities P(B_i) are the prior probabilities of the causes. We also know the probability of observing R given a particular cause (i.e. we know P(R \lvert B_i). The probabilities P(R \lvert B_i) are “forward” conditional probabilities.

Given that we observe R, we are interested in knowing the “backward” probabilities P(B_i \lvert R). These probabilities are called the posterior probabilities of the causes. Mathematically, the Bayes’ formula is simply an alternative way of writing the following conditional probability.

\displaystyle (5) \ \ \ \ \ P(B_k \lvert R)=\frac{P(B_k \cap R)}{P(R)}

In (5), as in the discussion of the random experiment of choosing box and selecting ball, we are restricting ourselves to only the cases where the event R is observed. Then we ask, out of all the cases where R is observed, how many of these cases are caused by the event B_k?

The numerator of (5) can be written as

\displaystyle (6) \ \ \ \ \ P(B_k \cap R)=P(R \lvert B_k) \times P(B_k)

The denominator of (5) is obtained from applying the total law of probability.

\displaystyle (7) \ \ \ \ \ P(R)=P(R \lvert B_1) P(B_1) + P(R \lvert B_2) P(B_2)+ \cdots + P(R \lvert B_n) P(B_n)

Plugging (6) and (7) into (5), we obtain a statement of the Bayes’ formula.

\displaystyle (8) \ \ \ \ \ P(B_k \lvert R)=\frac{P(P(R \lvert B_k) \times P(B_k)}{\sum \limits_{j=1}^n P(R \lvert B_j) \times P(B_j)} \ \ \ \ \ \ \ \text{(Bayes' Formula)}

Of course, for any computation problem involving the Bayes’ formula, it is best not to memorize the formula in (8). Instead, simply apply the thought process that gives rise to the formula (e.g. the tree diagram shown above).

The Bayes’ formula has some profound philosophical implications, evidenced by the fact that it spawned a separate school of thought called Bayesian statistics. However, our discussion here is solely on its original role in finding certain backward conditional probabilities.

Example 2

Example 2 is left as exercise. The event that both selected balls are red would give even more weight to Box 2. In other words, in the event that a red ball is selected twice in a row, we would believe that it is even more likely that the unknown box is Box 2.

  1. Feller, W., An Introduction to Probability Theory and Its Applications, third edition, John Wiley & Sons, New York, 1968.
  2. Grinstead, C. M., Snell, J. L. Introduction to Probability, Online Book in PDF format.