We illustrate the thought process of conditional distributions with a series of examples. These examples are presented in a series of blog posts. In this post, we look at some conditional distributions derived from discrete probability distributions.

Practice problems are found in the companion blog.

_____________________________________________________________________________________________________________________________

**The Setting**

Suppose we have a discrete random variable with as the probability mass function. Suppose some random experiment can be modeled by the discrete random variable . The sample space for this probability experiment is the set of sample points with positive probability masses, i.e. is the set of all for which . In the examples below, is either a subset of the real line or a subset of the plane . Conceivably the sample space could be subset of any Euclidean space in higher dimension.

Suppose that we are informed that some event in the random experiment has occurred ( is a subset of the sample space ). Given this new information, all the sample points outside of the event are irrelevant. Or perhaps, in this random experiment, we are only interested in those outcomes that are elements of some subset of the sample space . In either of these scenarios, we wish to make the event as a new sample space.

The probability of the event , denoted by , is derived by summing the probabilities over all the sample points . We have:

The probability may not be 1.0. So the probability masses for the sample points , if they are unadjusted, may not form a probability distribution. However, if we consider each such probability mass as a proportion of the probability , then the probability masses of the event will form a probability distribution. For example, say the event consists of two probability masses 0.2 and 0.3, which sum to 0.5. Then in the new sample space, the first probability mass is 0.4 (0.2 multiplied by or divided by 0.5) and the second probability mass is 0.6.

We now summarize the above paragraph. Using the event as a new sample space, the probability mass function is:

The above probability distribution is called the conditional distribution of given the event , denoted by . This new probability distribution incorporates new information about the results of a random experiment.

Once this new probability distribution is established, we can compute various distributional quantities (e.g. cumulative distribution function, mean, variance and other higher moments).

_____________________________________________________________________________________________________________________________

**Examples**

Suppose that two students take a multiple choice test that has 5 questions. Let be the number of correct answers of one student and be the number of correct answers of the other student (these can be considered as test scores for the purpose of the examples here). Assume that and are independent. The following shows the probability functions.

Note that and . Without knowing any additional information, we can expect on average one student gets 1.6 correct answers and one student gets 2.9 correct answers. If having 3 or more correct answers is considered passing, then the student represented by has a 30% chance of passing while the student represented by has a 60% chance of passing. The following examples show how the expectation can change as soon as new information is known.

The following examples are based on these two test scores and .

**Example 1**

In this example, we only consider the student whose correct answers are modeled by the random variable . In addition to knowing the probability function , we also know that this student has at least one correct answer (i.e. the new information is ).

In light of the new information, the new sample space is . Note that . In this new sample space, each probability mass is the original one divided by 0.6. For example, for the sample point , we have . The following is the conditional probability distribution of given .

The conditional mean is the mean of the conditional distribution. We have . Given that this student is knowledgeable enough to answer some question correctly, the expectation is higher than before knowing the additional information. Also, given the new information, the student in question has a 50% chance of passing (vs. 30% before the new information is known).

**Example 2**

We now look at a joint distribution that has a 2-dimensional sample space. Consider the joint distribution of test scores and . If the new information is that the total number of correct answers among them is 4, how would this change our expectation of their performance?

Since and are independent, the sample space is a square as indicated the figure below.

**Figure 1 – Sample Space of Test Scores**

Because the two scores are independent, the joint probability at each of these 36 sample points is the product of the individual probabilities. We have . The following figure shows one such joint probability.

**Figure 2 – Joint Probability Function**

After taking the test, suppose that we have the additional information that the two students have a total of 4 correct answers. With this new information, we can focus our attention on the new sample space that is indicated in the following figure.

**Figure 3 – New Sample Space**

Now we wish to discuss the conditional probability distribution of and the conditional probability distribution of . In particular, given that there are 4 correct answers between the two students, what would be their expected numbers of correct answers and what would be their chances of passing?

There are 5 sample points in the new sample space (the 5 points circled above). The conditional probability distribution is obtained by making each probability mass as a fraction of the sum of the 5 probability masses. First we calculate the 5 joint probabilities.

The sum of these 5 joint probabilities is . Making each of these joint probabilities as a fraction of 0.16, we have the following two conditional probability distributions.

Now the conditional means given that , comparing against the unconditional means.

Now compare the chances of passing.

Based on the new information of , we have a lower expectation for the student represented by and a higher expectation for the student represented by . Observe that the conditional probability at increases to 0.5 from 0.4, while the conditional probability at increases to 0.5 from 0.2.

**Example 3**

Now suppose the new information is that the two students do well on the test. Particularly, their combined number of correct answers is greater than or equal to 5, i.e., . How would this impact the conditional distributions?

First we discuss the conditional distributions for and . By considering the new information, the following is the new sample space.

**Figure 4 – New Sample Space**

To derive the conditional distribution of , sum the joint probabilities within the new sample space for each . The calculation is shown below.

The sum of these probabilities is 0.49, which is . The conditional distribution of is obtained by taking each of the above probabilities as a fraction of 0.49. We have:

We have the conditional mean (vs. ). The conditional probability of passing is (vs. ).

Note that the above conditional distribution for is not as skewed as the original one for . With the information that both test takers do well, the expected score for the student represented by is much higher.

With similar calculation we have the following results for the conditional distribution of .

We have the conditional mean (vs. ). The conditional probability of passing is (vs. ). Indeed, with the information that both test takers do well, we can expect much higher results from each individual test taker.

**Example 4**

In Examples 2 and 3, the new information involve both test takers (both random variables). If the new information involves just one test taker, it may be immaterial on the exam score of the other student. For example, suppose that . Then what is the conditional distribution for ? Since and are independent, the high score has no impact on the score . However, the high joint score does have an impact on each of the individual scores (Example 3).

_____________________________________________________________________________________________________________________________

**Summary**

We conclude with a summary of the thought process of conditional distributions.

Suppose is a discrete random variable and is its probability function. Further suppose that is the probability model of some random experiment. The sample space of this random experiment is .

Suppose we have some new information that in this random experiment, some event has occurred. The event is a subset of the sample space .

To incorporate this new information, the event is the new sample space. The random variable incorporated with the new information, denoted by , has a conditional probability distribution. The following is the probability function of the conditional distribution.

where = .

The thought process is that in the conditional distribution is derived from taking each original probability mass as a fraction of the total probability . The probability function derived in this manner reflects the new information that the event has occurred.

Once the conditional probability function is derived, it can be used just like any other probability function, e.g. computationally for finding various distributional quantities.

**Practice Problems**

Practice problems are found in the companion blog.