Bayes’ formula gives better perspective on medical testing

Suppose that a patient is to be screened for a certain disease or medical condition. There are two important questions at the outset. How accurate is the screen or test? For example, at the outset, what is the probability of the test giving the correct result? The second question: once the patient obtains the test result (positive or negative), how reliable is the result? These two questions seem one and the same. Confusing these two questions as the same is a common misconception. Sometimes even medical doctors can get it wrong. This post demonstrates how to sort out these questions using Bayes’ formula or Bayes’ theorem.

Example

Before a patient is screened, a relevant question is on the accuracy of the test. Once the test result comes back, an important question is on whether positive result means having the disease and negative result means healthy. Here’s the two questions that are of interest:

  • What is the probability of the test giving a correct result, positive for someone with the disease and negative for someone who is healthy?
  • Once the test result is back, what is the probability that the test result is correct? More specifically, if a patient is tested positive, what is the probability that the patient has the disease? If a patient is tested negative, what is the probability that the patient is healthy?

Both questions involve conditional probabilities. In fact, the conditional probabilities in the second question are the reverse of the ones in the first question. To illustrate, we use the following example.

Example. Suppose that the prevalence of a disease is 1%. This disease has no particular symptoms but can be screened by a medical test that is 90% accurate. This means that the test result is positive about 90% of the times when it is applied on patients who have the disease and that the test result is negative about 90% of the time when it is applied on patients who do not have the disease. Suppose that you take the test and the test shows a positive result. Then the burning question is: how likely is it that you have the disease? Similarly, how likely is it that the patient is healthy if the test result is negative?

The accuracy of the test is 90% (0.90 as a probability). Since there is a 90% chance the test works correctly, if the patient has the disease, there is a 90% chance that the test will come back positive and if the patient is healthy, there is a 90% chance the test will come back negative. If a patient tested positive, wouldn’t it mean that there is a 90% chance that the patient has the disease?

Note that the given number of 90% is for the conditional events: “if disease, then positive” and “if healthy, then negative.” The example asks for the probabilities for the reversed events – “if positive, then disease”and “if negative, then healthy.” It is a common misconception that the two probabilities are the same.

Tree Diagrams

Let’s use a tree diagrams to look at this problem in a systematic way. First let H be the event that the patient being tested is healthy (does not have the disease in question) and let S be the event that the patient being tested is sick (has the disease in question). Let + \lvert S denote the event that the test result is positive if the patient has the disease. Let - \lvert H denote the event that the test result is negative if the patient is healthy.

Then P[+ \lvert S]=0.90 and P[- \lvert H]=0.90. These two conditional probabilities are based on the accuracy of the test. These probabilities are in a sense chronological – the patient either is healthy or sick and then is being tested. The example asks for the conditional probabilities P[S \lvert +] and P[H \lvert -], which are backward from the given conditional probabilities, and are also backward in a chronological sense. We call P[+ \lvert S] and P[- \lvert H] forward conditional probabilities. We call P[S \lvert +] and P[H \lvert -] backward conditional probabilities. Bayes’ formula is a good way to compute the backward conditional probabilities. The following diagram shows the structure of the tree diagram.

Figure 1 – Structure of Tree Diagram

At the root of the tree diagram is a randomly chosen patient being tested. The first level of the tree shows the disease status (H or S). The events at the first level are unconditional events. The next level of the tree shows the test status (+ or -). Note that the test status is a conditional event. For example, the + that follows H is the event + \lvert H and the – that follows H is the event - \lvert H. The next diagram shows the probabilities that are in the tree diagram.

Figure 2 – Tree Diagram with Probabilities

The probabilities at the first level of the tree are the unconditional probabilities P[H] and P[S], where P[S] is the prevalence of the disease. The probabilities at the second level of the tree are conditional probabilities (the probabilities of the test status conditional on the disease status). A path probability is the product of the probabilities in a given path. For example, the path probability of the first path is P[H] \times P[+ \lvert H], which equals P[H \text{ and } +]. Thus a path probability is the probability of the event “disease status and test status.” The next diagram displays the numerical probabilities.

Figure 3 – Tree Diagram with Numerical Probabilities

Figure 3 shows four paths – “H and +”, “H and -“, “S and +” and “S and -“. With 0.099 + 0.891 + 0.009 + 0.001 = 1.0, the sum of the four path probabilities is 1.0. These probabilities show the long run proportions of the patients that fall into these 4 categories. The path that is most likely is the path “H and -“, which happens 89.1% of the time. This makes sense since the disease in question is one that has low prevalence (only 1%). The two paths marked in red are the paths for positive test status. Thus P[+]=0.099+0.009=0.108. Thus about 10.8% of the patients being tested will show a positive result. Of these, how many of them actually have the disease?

    \displaystyle P[S \lvert +]=\frac{0.009}{0.009+0.099}=\frac{0.009}{0.108}=0.0833=8.33 \%

In this example, the forward conditional probability is P[+ \lvert S]=0.9. As the tree diagrams have shown, the backward conditional probability is P[S \lvert +]=0.0833. Of all the positive cases, only 8.33% of them are actually sick. The other 91.67% are false positives. Confusing the forward conditional probability as the backward conditional probability is a common mistake. In fact, sometimes medical doctors got it wrong according to a 1978 article in New England Journal of Medicine. Though we are using tree diagrams to present the solution, the answer of 8.33% is obtained by the Bayes’ formula. We will discuss this in more details below.

According to Figure 3, P[-]=0.891+0.001=0.892. Of these patients, how many of them are actually healthy?

    \displaystyle P[H \lvert -]=\frac{0.891}{0.891+0.001}=\frac{0.891}{0.892}=0.9989=99.89 \%

Most of the negative results are actual negatives. So there are very few false negatives. Once gain, the forward conditional probability P[- \lvert H] is not to be confused with the backward conditional probability P[H \lvert -].

Bayes’ Formula

The result P[S \lvert +]=0.0833 seems startling. Thus if a patient tested positive, there is only a slightly more than 8% chance that the patient actually has the disease! It seems that the test is not very accurate and seems to be not reliable. Before commenting on this result, let’s summarize the calculation implicit in the tree diagrams.

Though not mentioned by name, the above tree diagrams use the idea of Bayes’ formula or Bayes’ rule to reverse the forward conditional probabilities to obtain the backward conditional probabilities. This process has been discribed in this previous post.

The above tree diagrams describe a two-stage experiment. Pick a patient at random and the patient is either healthy or sick (the first stage in the experiment). Then the patient is tested and the result is either positive or negative (the second stage). A forward conditional probability is a probability of the status in the second stage given the status in the first stage of the experiment. The backward conditional probability is the probability of the status in the first stage given the status in the second stage. A backward conditional probability is also called a Bayes probability.

Let’s examine the backward conditional probability P[S \lvert +]. The following is the definition of the conditional probability P[S \lvert +].

    \displaystyle P[S \lvert +]=\frac{P[S \text{ and } +]}{P[+]}

Note that two of the paths in Figure 3 have positive test results (marked with red). Thus P[+] is the sum of two quantities with P[+]=P[H] \times P(+ \lvert H)+P[S] \times P(+ \lvert S). One of the quantities is for the case of the patient being healthy and the other is for the case of the patient being sick. With P[S \text{ and } +]=P[S] \times P[+ \lvert S], and plugging in P[+],

    \displaystyle P[S \lvert +]=\frac{P[S] \times P(+ \lvert S)}{P[H] \times P(+ \lvert H)+P[S] \times P(+ \lvert S)}

The above is the Bayes’ formula in the specific context of a medical diagnostic test. Though a famous formula, there is no need to memorize it. If using the tree diagram approach, look for the two paths for the positive test results. The ratio of the path for “sick” patients to the sum of the two paths would be the backward conditional probability P[S \lvert +].

Regardless of using tree diagrams, the Bayesian idea is that a positive test result is explained by two causes. One is that the patient is healthy. Then the contribution to a positive result is P[H \text{ and } +]=P[H] \times P[+ \lvert H]. The other cause of a positive result is that the patient is sick. Then the contribution to a positive result is P[S \text{ and } +]=P[S] \times P[+ \lvert S]. The ratio of the “sick” cause to the sum total of the two causes is the backward conditional probability P[S \lvert +]. However, a tree diagram is clearly a very handy device to clarify the Bayesian calculation.

Further Discussion of the Example

The calculation in Figure 3 is based on the prevalence of the disease of 1%, i.e. P[S]=0.01. The hypothetical disease in the example affects one person in 100. With P[S \lvert +] being relatively small (just 8.33%), we cannot place much confidence on a positive result. One important point to understand is that the confidence on a positive result is determined by the prevalence of the disease in addition to the accuracy of the test. Thus the less common the disease, the less confidence we can place on a positive result. On the other hand, the more common the disease, the more confidence we can place on a positive result.

Let’s try some extreme examples. Suppose that we are to test for a disease that nobody has (think testing for ovarian cancer among men or prostate cancer among women). Then we would have no confidence on a positive test result. In such a scenario, all positives would be healthy people. Any healthy patient that receives a positive result would be called a false positive. Thus in the extreme scenario of a disease with 0% prevalence among the patients being tested, we do not have any confidence on a positive result being correct.

On the other hand, suppose we are to test for a disease that everybody has. Then it would then be clear that a positive result would always be a correct result. In such a scenario, all positives would be sick patients. Any sick patient that receives a positive test result is called a true positive. Thus in the extreme scenario of a disease with 100% prevalence, we would have great confidence on a positive result being correct.

Thus prevalence of a disease has to be taken into account in the calculation for the backward conditional probability P[S \lvert +]. For the hypothetical disease discussed here, let’s look at the long run results of applying the test to 10,000 patients. The next tree diagram shows the results.

Figure 4 – Tree Diagram with 10,000 Patients

Out of 10,000 patients being tested, 100 of them are expected to have the disease in question and 9,900 of them are healthy. With the test being 90% accurate, about 90 of the 100 sick patients would show positive results (these are the true positives). On the other hand, there would be about 990 false positives (10% of the 9,900 healthy patients). There are 990 + 90 = 1,080 positives in total and only 90 of them are true positives. Thus P[S \lvert +] is 90/1080 = 8.33%.

What if the disease in question has a prevalence of 8.33%? What would be the backward conditional probability P[S \lvert +] assuming that the test is still 90% accurate?

    \displaystyle \begin{aligned} P[S \lvert +]&=\frac{P[S] \times P(+ \lvert S)}{P[H] \times P(+ \lvert H)+P[S] \times P(+ \lvert S)} \\&=\frac{0.0833 \times 0.9}{0.9167 \times 0.1+0.0833 \times 0.9} =0.44989 \approx 45 \%  \end{aligned}

With P[S \lvert +]=0.45, there is a great deal more confidence on a positive result. With the test accuracy being the same (90%), the greater confidence is due to the greater prevalence of the disease. With P[S]=0.0833 being greater than 0.01, a greater portion of the positives would be true positives. The higher the prevalence of the disease, the greater the probability P[S \lvert +]. Just to further illustrate this point, suppose the test for a disease has a 90% accuracy rate and the prevalence for the disease is 45%. The following calculation gives P[S \lvert +].

    \displaystyle \begin{aligned} P[S \lvert +]&=\frac{P[S] \times P(+ \lvert S)}{P[H] \times P(+ \lvert H)+P[S] \times P(+ \lvert S)} \\&=\frac{0.45 \times 0.9}{0.55 \times 0.1+0.45 \times 0.9} =0.88043 \approx 88 \%  \end{aligned}

With the prevalence being 45%, the probability of a positive being a true positive is 88%. The calculation shows that when the disease or condition is widespread, a positive result should be taken seriously.

One thing is clear. The backward conditional probability P[S \lvert +] is not to be confused with the forward conditional probability P[+ \lvert S]. Furthermore, it will not be easy to invert the forward conditional probability without using Bayes’ formula (either using the formula explicitly or using a tree diagram).

Bayesian Updating Based on New Information

The calculation shown above using the Bayes’ formula can be interpreted as updating probabilities in light of new information, in this case, updating risk of having a disease based on test results. With the hypothetical disease having a prevalence of 1% being discussed above, the initial risk is 1%. With one round of testing using a test with 90% accuracy, the risk is updated to 8.33%. For the patients who test positive in the first round of testing, the risk is raised to 8.33%. They can then go through a second round of testing using another test (but also with 90% accuracy). For the patients who test positive in the second round, the risk is updated to 45%. For the positives in the third round of testing, the risk is updated to 88%. The successive Bayesian calculation can be regarded as sequential updating of probabilities. Such updating would not be easy without the idea of the Bayes’ rule or formula.

Sensitivity and Specificity

The sensitivity of a medical diagnostic test is the ability to give correct results for the people who have the disease. Putting it in another way, the sensitivity is the true positive rate, which would be the percentage of sick people who are correctly identified as having the disease. In other words, the sensitivity of a test is the probability of a correct test result for the people with the disease. In our discussion, the sensitivity is the conditional forward probability P[+ \lvert S].

The specificity of a medical diagnostic test is the ability to give correct results for the people who do not have the disease. The specificity is then the true negative rate, which would be the percentage of healthy people who are correctly identified as not having the disease. In other words, the specificity of a test is the probability of a correct test result for healthy people. In our discussion, the specificity is the conditional forward probability P[- \lvert H].

With the sensitivity being the conditional forward probability P[+ \lvert S], the discussion in this post shows that the sensitivity of a test is not the same as backward conditional probability P[S \lvert +]. The sensitivity may be 90% but the probability P[S \lvert +] can be much lower depending on the prevalence of the disease. The sensitivity only tells us that 90% of the people who have the disease will have a positive result. It does not take into account of the prevalence of the disease (called the base rate). The above calculation shows that the rarer the disease (the lower the base rate), the lower the likelihood that a positive test result is a true positive. Likewise, the more common the disease, the higher the likelihood that a positive test result is a true positive.

In the example discussed here, both the sensitivity and specificity are 90%. This scenario is certainly ideal. In medical testing, the accuracy of a test for a disease may not be the same for the sick people and for the healthy people. For a simple example, let’s say we use chest pain as a criterion to diagnose a heart attack. This would be a very sensitive test since almost all people experiencing heart attack will have chest pain. However, it would be a test with low specificity since there would be plenty of other reasons for the symptom of chest pain.

Thus it is possible that a test may be very accurate for the people who have the disease but nonetheless identify many healthy people as positive. In other words, some tests have high sensitivity but have much lower specificity.

In medical testing, the overriding concern is to use a test with high sensitivity. The reason is that a high true positive rate leads to a low false positive rate. So the goal is to have as few false negative cases as possible in order to correctly diagnose as many sick people as possible. The trade off is that there may be a higher number of false positives, which is considered to be less alarming than missing people who have the disease. The usual practice is that a first test for a disease has high sensitivity but lower specificity. To weed out the false positives, the positives in the first round of testing will use another test that has a higher specificity.

\text{ }

\text{ }

\text{ }

\copyright 2017 – Dan Ma

Advertisements

An Introduction to the Bayes’ Formula

We open up a discussion of the Bayes’ formula by going through a basic example. The Bayes’ formula or theorem is a method that can be used to compute “backward” conditional probabilities such as the examples described here. The formula will be stated after we examine the calculation from Example 1. The following diagram describes Example 1. Example 2 is presented at the end of the post and is left as exercise. For a basic discussion of the Bayes’ formula, see [1] and chapter 4 of [2].

Example 1

As indicated in the diagram, Box 1 has 1 red ball and three white balls and Box 2 has 2 red balls and 2 white balls. The example involves a sequence of two steps. In the first step (the green arrow in the above diagram), a box is randomly chosen from two boxes. In the second step (the blue arrow), a ball is randomly selected from the chosen box. We assume that the identity of the chosen box is unknown to the participants of this random experiment (e.g. suppose the two boxes are identical in appearance and a box is chosen by your friend and its identity is kept from you). Since a box is chosen at random, it is easy to see that P(B_1)=P(B_2)=0.5.

The example involves conditional probabilities. Some of the conditional probabilities are natural and are easy to see. For example, if the chosen box is Box 1, it is clear that the probability of selecting a red ball is \displaystyle \frac{1}{4}, i.e. \displaystyle P(R \lvert B_1)=\frac{1}{4}. Likewise, the conditional probability P(R \lvert B_2) is \displaystyle \frac{2}{4}. These two conditional probabilities are “forward” conditional probabilities since the events R \lvert B_1 and R \lvert B_2 occur in a natural chronological order.

What about the reversed conditional probabilities P(B_1 \lvert R) and P(B_2 \lvert R)? In other words, if the selected ball from the unknown box (unknown to you) is red, what is the probability that the ball is from Box 1?

The above question seems a little backward. After the box is randomly chosen, it is fixed (though the identity is unknown to you). Since it is fixed, shouldn’t the probability that the box being Box 1 is \displaystyle \frac{1}{2}? Since the box is already chosen, how can the identity of the box be influenced by the color of the ball selected from it? The answer is of course no.

We should not look at the chronological sequence of events. Instead, the key to understanding the example is through performing the random experiment repeatedly. Think of the experiment of choosing one box and then selecting one ball from the chosen box. Focus only on the trials that result in a red ball. For the result to be a red ball, we need to get either Box 1/ Red or Box 2/Red. Compute the probabilities of these two cases. Then add these two probabilities, we will obtain the probability that the selected ball is red. The following diagram illustrates this calculation.

Example 1 – Tree Diagram

The outcomes with red border in the above diagram are the outcomes that result in a red ball. The diagram shows that if we perform this experiment many times, about 37.5% of the trials will result in a red ball (on average 3 out of 8 trials will result in a red ball). In how many of these trials, is Box 1 the source of the red ball? In the diagram, we see that the case Box 2/Red is twice as likely as the case Box 1/Red. We conclude that the case Box 1/Red accounts for about one third of the cases when the selected ball is red. In other words, one third of the red balls come from Box 1 and two third of the red balls come from Box 2. We have:

\displaystyle (1) \ \ \ \ \ P(B_1 \lvert R)=\frac{1}{3}

\displaystyle (2) \ \ \ \ \ P(B_2 \lvert R)=\frac{2}{3}

Instead of using the tree diagram or the reasoning indicated in the paragraph after the tree diagram, we could just as easily apply the Bayes’ formula:

\displaystyle \begin{aligned}(3) \ \ \ \ \ P(B_1 \lvert R)&=\frac{P(R \lvert B_1) \times P(B_1)}{P(R)} \\&=\frac{\frac{1}{2} \times \frac{1}{4}}{\frac{3}{8}} \\&=\frac{1}{3}  \end{aligned}

In the calculation in (3) (as in the tree diagram), we use the law of total probability:

\displaystyle \begin{aligned}(4) \ \ \ \ \ P(R)&=P(R \lvert B_1) \times P(B_1)+P(R \lvert B_2) \times P(B_2) \\&=\frac{1}{4} \times \frac{1}{2}+\frac{2}{4} \times \frac{1}{2} \\&=\frac{3}{8}  \end{aligned}

______________________________________________________________
Remark

We are not saying that an earlier event (the choosing of the box) is altered in some way by a subsequent event (the observing of a red ball). The above probabilities are subjective. How strongly do you believe that the “unknown” box is Box 1? If you use probabilities to quantify your belief, without knowing any additional information, you would say the probability that the “unknown” box being Box 1 is \frac{1}{2}.

Suppose you reach into the “unknown” box and get a red ball. This additional information alters your belief about the chosen box. Since Box 2 has more red balls, the fact that you observe a red ball will tell you that it is more likely that the “unknown” chosen box is Box 2. According to the above calculation, you update the probability of the chosen box being Box 1 to \frac{1}{3} and the probability of it being Box 2 as \frac{2}{3}.

In the language of Bayesian probability theory, the initial belief of P(B_1)=0.5 and P(B_2)=0.5 is called the prior probability distribution. After a red ball is observed, the updated belief as in the probabilities \displaystyle P(B_1 \lvert R)=\frac{1}{3} and \displaystyle P(B_2 \lvert R)=\frac{2}{3} is called the posterior probability distribution.

As demonstrated by this example, the Bayes’ formula is for updating probabilities in light of new information. Though the updated probabilities are subjective, they are not arbitrary. We can make sense of these probabilities by assessing the long run results of the experiment objectively.

______________________________________________________________
An Insurance Perspective

The example discussed here has an insurance interpretation. Suppose an insurer has two groups of policyholders, both equal in size. One group consists of low risk insureds where the probability of experiencing a claim in a year is \frac{1}{4} (i.e. the proportion of red balls in Box 1). The insureds in other group, a high risk group, have a higher probability of experiencing a claim in a year, which is \frac{2}{4} (i.e. the proportion of red balls in Box 2).

Suppose someone just purchase a policy. Initially, the risk profile of this newly insured is uncertain. So the initial belief is that it is equally likely for him to be in the low risk group as in the high risk group.

Suppose that during the first policy year, the insured has incurred one claim. The observation alters our belief about this insured. With the additional information of having one claim, the probability that the insured belong to the high risk group is increased to \frac{2}{3}. The risk profile of this insured is altered based on new information. The insurance point of view described here has the exact same calculation as in the box-ball example and is that of using past claims experience to update future claims experience.

______________________________________________________________
Bayes’ Formula

Suppose we have a collection of mutually exclusive events B_1, B_2, \cdots, B_n. That is, the probabilities P(B_i) sum to 1.0. Suppose R is an event. Think of the events B_i as “causes” that can explain the event R, an observed result. Given R is observed, what is the probability that the cause of R is B_k? In other words, we are interested in finding the conditional probability P(B_k \lvert R).

Before we have the observed result R, the probabilities P(B_i) are the prior probabilities of the causes. We also know the probability of observing R given a particular cause (i.e. we know P(R \lvert B_i). The probabilities P(R \lvert B_i) are “forward” conditional probabilities.

Given that we observe R, we are interested in knowing the “backward” probabilities P(B_i \lvert R). These probabilities are called the posterior probabilities of the causes. Mathematically, the Bayes’ formula is simply an alternative way of writing the following conditional probability.

\displaystyle (5) \ \ \ \ \ P(B_k \lvert R)=\frac{P(B_k \cap R)}{P(R)}

In (5), as in the discussion of the random experiment of choosing box and selecting ball, we are restricting ourselves to only the cases where the event R is observed. Then we ask, out of all the cases where R is observed, how many of these cases are caused by the event B_k?

The numerator of (5) can be written as

\displaystyle (6) \ \ \ \ \ P(B_k \cap R)=P(R \lvert B_k) \times P(B_k)

The denominator of (5) is obtained from applying the total law of probability.

\displaystyle (7) \ \ \ \ \ P(R)=P(R \lvert B_1) P(B_1) + P(R \lvert B_2) P(B_2)+ \cdots + P(R \lvert B_n) P(B_n)

Plugging (6) and (7) into (5), we obtain a statement of the Bayes’ formula.

\displaystyle (8) \ \ \ \ \ P(B_k \lvert R)=\frac{P(P(R \lvert B_k) \times P(B_k)}{\sum \limits_{j=1}^n P(R \lvert B_j) \times P(B_j)} \ \ \ \ \ \ \ \text{(Bayes' Formula)}

Of course, for any computation problem involving the Bayes’ formula, it is best not to memorize the formula in (8). Instead, simply apply the thought process that gives rise to the formula (e.g. the tree diagram shown above).

The Bayes’ formula has some profound philosophical implications, evidenced by the fact that it spawned a separate school of thought called Bayesian statistics. However, our discussion here is solely on its original role in finding certain backward conditional probabilities.

______________________________________________________________
Example 2

Example 2 is left as exercise. The event that both selected balls are red would give even more weight to Box 2. In other words, in the event that a red ball is selected twice in a row, we would believe that it is even more likely that the unknown box is Box 2.
______________________________________________________________
Reference

  1. Feller, W., An Introduction to Probability Theory and Its Applications, third edition, John Wiley & Sons, New York, 1968.
  2. Grinstead, C. M., Snell, J. L. Introduction to Probability, Online Book in PDF format.