Appendix
A
Probability Theory
STEVEN N. TANI
A reasonable probability is the only certainty.
—E.W. Howe
A.1
A.2
A.3
A.4
A.5
A.6
A.7
Introduction
Distinctions and the Clarity Test
Possibility Tree Representation of a Distinction
Probability as an Expression of Degree of Belief
Inferential Notation
Multiple Distinctions
Joint, Conditional, and Marginal Probabilities
A.7.1 Joint Probability
A.7.2 Marginal Probability
A.7.3 Conditional Probability
A.8
Calculating Joint Probabilities
A.9
Dependent and Independent Probabilities
A.10 Reversing Conditional Probabilities: Bayes’ Rule
A.11 Probability Distributions
A.11.1 Summary Statistics for a Probability
Distribution
A.12 Combining Uncertain Quantities
References
363
363
364
364
365
365
365
365
366
366
367
368
368
369
371
371
373
Handbook of Decision Analysis, First Edition. Gregory S. Parnell, Terry A. Bresnick, Steven N. Tani,
and Eric R. Johnson.
© 2013 John Wiley & Sons, Inc. Published 2013 by John Wiley & Sons, Inc.
362
A.2 Distinctions and the Clarity Test
363
A.1 Introduction
This appendix is not intended to be a comprehensive discourse on probability
theory. For that, please refer to any good textbook on probability, such as those
by William Feller (1968) and by K.L. Chung and Farid AitSahlia (2003). Rather,
we here present a number of key ideas from probability theory that every decision practitioner should know.
A.2 Distinctions and the Clarity Test
A fundamental requirement for addressing the important uncertainties in a
decision is to create distinctions that are both clear and useful. A distinction is a
separation of the universe of possibilities (outcomes) into two or more subsets.
By deinition, these subsets will be mutually exclusive (i.e., a possibility cannot
be in more than one subset) and collectively exhaustive (i.e., every possibility
must be in a subset). An example of a distinction is whether or not a new
product achieves high sales volume. This distinction deines two events that are
mutually exclusive (the product cannot both have high sales and not have high
sales) and collectively exhaustive (either the product will have high sales or it
will not, which includes the possibility that it is not launched and therefore has
zero sales.)
A distinction is clear when there is no ambiguity in its deinition. To determine if a distinction is clear, we apply the clarity test (Howard, 2007), which
employs the concept of a clairvoyant—someone who perfectly sees and truthfully
reports any observable event or quantity, either in the future or in the past.1 A
distinction is deemed to be clear if the clairvoyant, without exercising any judgment, can say whether or not an event that is deined by the distinction will
occur. The distinction in the example above would fail the clarity test. The
clairvoyant would not be able to say whether the product will achieve high sales
volume without judging what level of sales volume qualiies as “high.” A much
clearer distinction in this example would be whether or not the new product
achieves worldwide sales volume of at least 1 million units during the next calendar year. The clairvoyant might still wonder whether or not free samples given
to prospective customers should be included in “sales volume,” so the distinction
might need to be reined to make that clear. Creating distinctions that are clear
is important in decision making because it avoids a situation in which subject
1
This deinition is similar to the deinition of an ideal observer used in Chapter 10, insofar as it
highlights the application of a criterion to events without use of judgment. This deinition differs
slightly from the deinition of clairvoyant used in Chapter 11. The deinition used in Chapter 11
also requires that the event in question be unaffected by the asker’s actions. This additional requirement is necessary for value of information to be formulated properly.
364
APPENDIX A Probability Theory
matter experts give different probabilities for the same event because they have
different interpretations of its deinition.
A distinction is useful when it contributes to a full understanding of why
one decision alternative is preferred to others. The distinction can be a direct
measure of value or it can be an important parameter in the calculation of value.
For example, the level of sales volume is a useful distinction in the calculation
of net present value. Creating distinctions that are both useful and clear is an
important skill of a decision practitioner.
A.3 Possibility Tree Representation of
a Distinction
A highly useful diagrammatic representation of a distinction is a possibility tree.
The distinction is shown as a tree node from which branches emerge representing
the possibilities. The number of branches at a node is called the degree of the
distinction. Figure A.1 shows the possibility tree for the example distinction of
high sales or not high sales (two degrees), suitably deined to pass the clarity test.
It is easy to imagine a distinction with three or more degrees. For example, a
distinction deined by a student’s grade in a course could have six degrees: A, B
C, D, E, or F.
A.4 Probability as an Expression of Degree
of Belief
A probability is the quantitative expression of someone’s uncertainty about a
distinction based on his or her state of information. More precisely, it expresses
the person’s degree of belief that an event will occur, ranging from 0 if the person
is certain that the event will not occur to 1 if the person is certain that it will
occur. A person’s belief about the occurrence of any event, such as high sales
volume, depends, of course, on the information that the person possesses. If the
person’s information changes (by observing the results of a test market, for
example), then the probability may change.
FIGURE A.1 Possibility tree.
A.7 Joint, Conditional, and Marginal Probabilities
365
A.5 Inferential Notation
Using inferential notation (Howard, 1966) emphasizes that every probability is
based on a particular state of information. The probability of occurrence of an
event A is notated as follows:
Pr( A|&),
where the event is represented to the left of the vertical bar and the information
on which the probability is based is represented to the right of the bar. The
ampersand character “&” stands for all of the background information possessed
by the person making the probability statement.
A.6 Multiple Distinctions
In most decision-making situations, we must deal with multiple distinctions.
Suppose, for example, that in addition to the previous distinction of whether or
not the product achieves high sales volume (as deined to meet the Clarity Test),
we have a second distinction of whether or not our major competitor launches
a similar product (again, suitably deined to pass the Clarity Test). In the discussion that follows, we use the following notation:
SY = High sales (≥1 M)
SN = Not high sales (<1 M)
CY = Competitive product
CN = No competitive product.
As shown in the possibility tree in Figure A.2, these two distinctions, each with
two degrees, together deine four elemental possibilities—(1) high sales, competitive product (SY,CY), (2) high sales, no competitive product (SY, CN), (3) not
high sales, competitive product (SN, CY), and (4) not high sales, no competitive
product (SN, CN).
A.7 Joint, Conditional, and Marginal Probabilities
There are three types of probabilities that we use to express uncertainty about
multiple distinctions—joint, marginal, and conditional.
A.7.1
JOINT PROBABILITY
The probability of an elemental possibility with multiple events is called a joint
probability. In the example, the probability of high sales combined with the
competitive product is a joint probability, notated as follows:
Pr(SY , CY|&).
366
APPENDIX A Probability Theory
FIGURE A.2 Possibility tree with two distinctions.
A.7.2
MARGINAL PROBABILITY
The probability of an event deined by just one of the multiple distinctions is
called a marginal probability. In the example, the probability of high sales is a
marginal probability, notated as follows:
Pr(SY|&).
A.7.3
CONDITIONAL PROBABILITY
The probability of an event deined by one distinction given that we know of
the occurrence of an event deined by a second distinction is called a conditional
probability. In the example, the probability of competitive product given high
sales is a conditional probability, notated as follows:
Pr(CY|SY , &).
We can illustrate these types of probabilities in a tree diagram as shown in
Figure A.3.
Since the probabilities of all events given the state of information must sum
to 100%, we know that
Pr(CY|SY , &) + Pr(CN|SY , &) = 20% + 80% = 100%.
It is clear from the tree diagram that a marginal probability is the sum of the
appropriate joint probabilities.
Pr(SY|&) = Pr(SY , CY|&) + Pr(SY , CN|&) = 12% + 48% = 60%.
A.8 Calculating Joint Probabilities
367
FIGURE A.3 Probability tree with two distinctions.
A.8 Calculating Joint Probabilities
One of the fundamental results from probability theory is that a joint probability
can be calculated as the product of a conditional probability and a marginal
probability:
Pr(SY , CN|&) = Pr(CN|SY , &) × Pr(SY|&).
When applied to the tree diagram in Figure A.3, this result means that each joint
probability at the right-hand end of a path through the tree is equal to the
product of the marginal and conditional probabilities of the branches comprising
that path. For example,
Pr(SY , CN|&) = 60% × 80% = 48%.
Note that since
Pr(SY , CN|&) = Pr(CN, SY|&).
we can reverse the conditionality and get the same result:
Pr(SY , CN|&) = Pr(SY|CN, &) × Pr(CN|&).
Also note in the diagram that if we know the four joint probabilities at the end
of the tree, we can calculate all of the other probabilities on the tree. For example,
the marginal probability of high sales is the sum of two joint probabilities
(12% + 48% = 60%). The conditional probability of a competitive product
given high sales is a joint probability divided by a marginal probability (12% /
60% = 20%).
368
APPENDIX A Probability Theory
A.9 Dependent and Independent Probabilities
The probabilities of two distinctions are either dependent or independent. The
test is whether knowing the outcome of one distinction affects the probabilities
of the other. If the probabilities for one distinction are different depending on
the outcome of the other distinction, the probabilities are dependent. If the
probabilities for one distinction are unaffected by the outcome of the other, the
probabilities are independent. Stated mathematically, events A and B are probabilistically independent if
Pr( A|&) = Pr( A|B , &),
which implies that
Pr( A, B|&) = Pr( A|&) × Pr( B|&).
It is readily apparent that the probabilities of the two distinctions shown in
Figure A.3 are dependent because the probability of a competitive product given
high sales (20%) differs from the probability of a competitive product given not
high sales (75%).
Probabilistic dependence is a mutual property. If the probabilities of one
distinction are dependent on a second distinction, then the reverse must also be
true.
A.10 Reversing Conditional Probabilities:
Bayes’ Rule
As indicated in Section A.8, when dealing with two distinctions with mutually
dependent probabilities, it is often useful to be able to reverse the order of conditioning. That is, if we know the conditional probability of event A given that
event B has occurred, we would like to calculate the conditional probability of
B given A.
This operation, which is known as Bayes’ rule, is done quite easily using
tree diagrams, as illustrated in Figure A.4. The original order of the distinctions
is shown in the tree on the left.
The following steps are taken to create the tree on the right with the order
reversed.
Step 1. Draw the tree structure with the order reversed
Step 2. Copy the four joint probabilities from the original tree, taking care
to put them in the correct position in the reversed tree.
Step 3. Calculate the marginal probabilities in the reversed tree as the sum
of the appropriate joint probabilities. For example, Pr(CY|&) = 12% +
30% = 42%.
A.11 Probability Distributions
369
FIGURE A.4 Reversing the order of a tree.
Step 4. Calculate the conditional probabilities in the reversed tree as
the ratio of a joint probability to a marginal probability. For example,
Pr(SY|CY,&) = 12%/42% = 29%.
We can express Bayes’ rule mathematically for events A and B as
Pr( A|B , &) = Pr( B|A, &) × Pr( A|&)/ Pr( B|&).
Bayes’ rule plays a central role in decision analysis since it provides a sound
mathematical way to update our probabilities based on new information (See
Chapter 11).
A.11 Probability Distributions
Often, the distinctions that we create in a decision situation are described by
quantitative measures. For example, for a new product introduction decision,
the distinction of the average unit cost of manufacturing the product might be
important. In such cases, we characterize our uncertainty about the quantity as
a probability distribution, which speciies the probability that the quantity is in
any given interval. One way to visualize a probability distribution is as a histogram, as shown in Figure A.5. We divide the possible range for the quantity into
intervals and draw a bar for each interval whose height is proportional to the
probability that the quantity will be in that interval. If the intervals are suitably
small enough, the histogram in some cases is the familiar “bell-shaped” curve.
An alternate way to portray a probability distribution is in its cumulative
form, which is sometimes called an “S-shaped” curve. The cumulative form
makes it especially easy to gauge the probability that the quantity will be in a
given range. For example, the cumulative curve in Figure A.6 shows that the
370
APPENDIX A Probability Theory
17%
16%
15%
12%
12%
8%
6%
5%
4%
2%
2%
1%
120−140 140−160 160−180 180−200 200−220 220−240 240−260 260−280 280−300 300−320 320−340 340−360
Unit Cost ($)
FIGURE A.5 Probability distribution as a histogram.
Cumulative Probability (%)
100
82% probability that unit
cost is less than $250
90
80
70
60
50
40 10% probability
30 that unit cost is
less than $150
20
10
0
$100
$150
$200
$250
$300
$350
$400
Unit Cost
FIGURE A.6 Probability distribution in cumulative form.
probability that cost will be less than $250 is 82% while the probability that it
will be less than $150 is 10%. We can therefore deduce that the probability that
cost will be between $150 and $250 is 82% − 10% = 72%.
The quantitative measure on which a probability distribution is deined can
be either continuous or discrete. A continuous measure is one which can take any
371
A.12 Combining Uncertain Quantities
Cumulative Probability (%)
100
90
80
70
60
50
40
30
20
10
0
0
1
2
3
Outages
4
5
6
FIGURE A.7 Cumulative probability distribution of a discrete measure.
value in a speciied range while a discrete measure is restricted to a countable
number of possible values. The unit cost example above exempliies a continuous
measure. An example of a discrete measure is the number of unplanned outages
experienced by a power plant in a year. The cumulative probability distribution
of a discrete measure has a stairstep shape (see Fig. A.7).
A.11.1 SUMMARY STATISTICS FOR A PROBABILITY
DISTRIBUTION
A number of key statistics are often used to summarize a probability distribution.
The most widely used is the probability-weighted average, commonly called the
mean or the expected value. Two other summary statistics of interest are the variance, which measures the dispersion about the mean and the skewness, which
measures the degree of asymmetry of the distribution. Percentiles are another set
of useful summary statistics for a distribution. The P-th percentile is the value
of the quantity such that there is a probability of P that the quantity will not
exceed that value. The cumulative curve is a display of the percentiles of the
distribution (read in reverse—the x-axis value of a point on the curve is the P-th
percentile, where P is the y-axis value of the point). The frequently used 50th
percentile is given the special name median.
A.12 Combining Uncertain Quantities
When uncertain quantities are combined via mathematical operations, the combined quantity is also uncertain and therefore has a probability distribution.
Indeed, the primary goal of analysis in a decision situation is to determine the
372
APPENDIX A Probability Theory
TABLE A.1 Equalities When Combining Uncertain Quantities
Mean of sum = ?
Sum of means
Variance of sum = ?
Sum of variances
Skewness of sum = ?
Sum of skewnesses
Percentile of sum = ?
Sum of percentiles
Mean of product = ?
Product of means
Variance of product = ?
Product of variances
Percentile of product = ?
Product of percentiles
Quantities Are Mutually
Independent
Quantities Are Not Mutually
Independent
True
True
True
False
True
False
False
False
True
False
False
False
False
False
probability distribution of the value measure of interest, which is a (usually
complicated) combination of the decision choices taken and external uncertain
factors.
Probability theory tells us some very useful facts about simple combinations of uncertain quantities—sums and products. The most important of
these facts is that the mean of the sum of uncertain quantities is always equal
to the sum of the means of the individual quantities. For example, if a corporate portfolio comprises a number of business units, each of which has
uncertain proits, then the mean of corporate proit is always equal to the
sum of the means of the business unit proits. The word “always” means that
the equality holds even if the business unit proits are probabilistically dependent on each other.
Table A.1 lists the results of probability theory regarding the sums and
products of uncertain quantities. It is important to note that the percentiles of
sums and products are, in general, not equal to the sums and products of corresponding percentiles. That is, one should not attempt to calculate the median,
for example, of the sum of uncertain quantities by summing the individual
medians.
Another property of note in the table is that the mean, variance, and skewness of the sum of independent quantities can be found by summing the corresponding measures of the individual quantities. So, a very quick way to get a
good approximation of the probability distribution for the value of a portfolio
of independent assets is to calculate the mean, variance, and skewness of the
portfolio by summing across the individual assets and itting a probability distribution to those three statistics.
References
373
REFERENCES
Chung, K.L. & AitSahlia, F. (2003). Elementary Probability Theory, 4th ed. Springer.
Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd
ed. John Wiley & Sons.
Howard, R. (1966). Information value theory. IEEE Transactions on Systems Science and
Cybernetics, 2(1), pp. 22–26, reprinted in R. Howard, & J. Matheson, The Principles
and Applications of Decision Analysis (pp. 779–783). Menlo Park, CA: Strategic Decisions Group.
Howard, R.A. (2007). The foundations of decision analysis revisited. In W. Edwards,
R.F. Miles, & D. von Winterfeldt (eds.), Advances in Decision Analysis: From Foundations to Applications, pp. 32–56. Cambridge University Press.