Academia.eduAcademia.edu

Appendix A: Probability Theory

2013, Parnell/Handbook of Decision Analysis

Appendix A Probability Theory STEVEN N. TANI A reasonable probability is the only certainty. —E.W. Howe A.1 A.2 A.3 A.4 A.5 A.6 A.7 Introduction Distinctions and the Clarity Test Possibility Tree Representation of a Distinction Probability as an Expression of Degree of Belief Inferential Notation Multiple Distinctions Joint, Conditional, and Marginal Probabilities A.7.1 Joint Probability A.7.2 Marginal Probability A.7.3 Conditional Probability A.8 Calculating Joint Probabilities A.9 Dependent and Independent Probabilities A.10 Reversing Conditional Probabilities: Bayes’ Rule A.11 Probability Distributions A.11.1 Summary Statistics for a Probability Distribution A.12 Combining Uncertain Quantities References 363 363 364 364 365 365 365 365 366 366 367 368 368 369 371 371 373 Handbook of Decision Analysis, First Edition. Gregory S. Parnell, Terry A. Bresnick, Steven N. Tani, and Eric R. Johnson. © 2013 John Wiley & Sons, Inc. Published 2013 by John Wiley & Sons, Inc. 362 A.2 Distinctions and the Clarity Test 363 A.1 Introduction This appendix is not intended to be a comprehensive discourse on probability theory. For that, please refer to any good textbook on probability, such as those by William Feller (1968) and by K.L. Chung and Farid AitSahlia (2003). Rather, we here present a number of key ideas from probability theory that every decision practitioner should know. A.2 Distinctions and the Clarity Test A fundamental requirement for addressing the important uncertainties in a decision is to create distinctions that are both clear and useful. A distinction is a separation of the universe of possibilities (outcomes) into two or more subsets. By deinition, these subsets will be mutually exclusive (i.e., a possibility cannot be in more than one subset) and collectively exhaustive (i.e., every possibility must be in a subset). An example of a distinction is whether or not a new product achieves high sales volume. This distinction deines two events that are mutually exclusive (the product cannot both have high sales and not have high sales) and collectively exhaustive (either the product will have high sales or it will not, which includes the possibility that it is not launched and therefore has zero sales.) A distinction is clear when there is no ambiguity in its deinition. To determine if a distinction is clear, we apply the clarity test (Howard, 2007), which employs the concept of a clairvoyant—someone who perfectly sees and truthfully reports any observable event or quantity, either in the future or in the past.1 A distinction is deemed to be clear if the clairvoyant, without exercising any judgment, can say whether or not an event that is deined by the distinction will occur. The distinction in the example above would fail the clarity test. The clairvoyant would not be able to say whether the product will achieve high sales volume without judging what level of sales volume qualiies as “high.” A much clearer distinction in this example would be whether or not the new product achieves worldwide sales volume of at least 1 million units during the next calendar year. The clairvoyant might still wonder whether or not free samples given to prospective customers should be included in “sales volume,” so the distinction might need to be reined to make that clear. Creating distinctions that are clear is important in decision making because it avoids a situation in which subject 1 This deinition is similar to the deinition of an ideal observer used in Chapter 10, insofar as it highlights the application of a criterion to events without use of judgment. This deinition differs slightly from the deinition of clairvoyant used in Chapter 11. The deinition used in Chapter 11 also requires that the event in question be unaffected by the asker’s actions. This additional requirement is necessary for value of information to be formulated properly. 364 APPENDIX A Probability Theory matter experts give different probabilities for the same event because they have different interpretations of its deinition. A distinction is useful when it contributes to a full understanding of why one decision alternative is preferred to others. The distinction can be a direct measure of value or it can be an important parameter in the calculation of value. For example, the level of sales volume is a useful distinction in the calculation of net present value. Creating distinctions that are both useful and clear is an important skill of a decision practitioner. A.3 Possibility Tree Representation of a Distinction A highly useful diagrammatic representation of a distinction is a possibility tree. The distinction is shown as a tree node from which branches emerge representing the possibilities. The number of branches at a node is called the degree of the distinction. Figure A.1 shows the possibility tree for the example distinction of high sales or not high sales (two degrees), suitably deined to pass the clarity test. It is easy to imagine a distinction with three or more degrees. For example, a distinction deined by a student’s grade in a course could have six degrees: A, B C, D, E, or F. A.4 Probability as an Expression of Degree of Belief A probability is the quantitative expression of someone’s uncertainty about a distinction based on his or her state of information. More precisely, it expresses the person’s degree of belief that an event will occur, ranging from 0 if the person is certain that the event will not occur to 1 if the person is certain that it will occur. A person’s belief about the occurrence of any event, such as high sales volume, depends, of course, on the information that the person possesses. If the person’s information changes (by observing the results of a test market, for example), then the probability may change. FIGURE A.1 Possibility tree. A.7 Joint, Conditional, and Marginal Probabilities 365 A.5 Inferential Notation Using inferential notation (Howard, 1966) emphasizes that every probability is based on a particular state of information. The probability of occurrence of an event A is notated as follows: Pr( A|&), where the event is represented to the left of the vertical bar and the information on which the probability is based is represented to the right of the bar. The ampersand character “&” stands for all of the background information possessed by the person making the probability statement. A.6 Multiple Distinctions In most decision-making situations, we must deal with multiple distinctions. Suppose, for example, that in addition to the previous distinction of whether or not the product achieves high sales volume (as deined to meet the Clarity Test), we have a second distinction of whether or not our major competitor launches a similar product (again, suitably deined to pass the Clarity Test). In the discussion that follows, we use the following notation: SY = High sales (≥1 M) SN = Not high sales (<1 M) CY = Competitive product CN = No competitive product. As shown in the possibility tree in Figure A.2, these two distinctions, each with two degrees, together deine four elemental possibilities—(1) high sales, competitive product (SY,CY), (2) high sales, no competitive product (SY, CN), (3) not high sales, competitive product (SN, CY), and (4) not high sales, no competitive product (SN, CN). A.7 Joint, Conditional, and Marginal Probabilities There are three types of probabilities that we use to express uncertainty about multiple distinctions—joint, marginal, and conditional. A.7.1 JOINT PROBABILITY The probability of an elemental possibility with multiple events is called a joint probability. In the example, the probability of high sales combined with the competitive product is a joint probability, notated as follows: Pr(SY , CY|&). 366 APPENDIX A Probability Theory FIGURE A.2 Possibility tree with two distinctions. A.7.2 MARGINAL PROBABILITY The probability of an event deined by just one of the multiple distinctions is called a marginal probability. In the example, the probability of high sales is a marginal probability, notated as follows: Pr(SY|&). A.7.3 CONDITIONAL PROBABILITY The probability of an event deined by one distinction given that we know of the occurrence of an event deined by a second distinction is called a conditional probability. In the example, the probability of competitive product given high sales is a conditional probability, notated as follows: Pr(CY|SY , &). We can illustrate these types of probabilities in a tree diagram as shown in Figure A.3. Since the probabilities of all events given the state of information must sum to 100%, we know that Pr(CY|SY , &) + Pr(CN|SY , &) = 20% + 80% = 100%. It is clear from the tree diagram that a marginal probability is the sum of the appropriate joint probabilities. Pr(SY|&) = Pr(SY , CY|&) + Pr(SY , CN|&) = 12% + 48% = 60%. A.8 Calculating Joint Probabilities 367 FIGURE A.3 Probability tree with two distinctions. A.8 Calculating Joint Probabilities One of the fundamental results from probability theory is that a joint probability can be calculated as the product of a conditional probability and a marginal probability: Pr(SY , CN|&) = Pr(CN|SY , &) × Pr(SY|&). When applied to the tree diagram in Figure A.3, this result means that each joint probability at the right-hand end of a path through the tree is equal to the product of the marginal and conditional probabilities of the branches comprising that path. For example, Pr(SY , CN|&) = 60% × 80% = 48%. Note that since Pr(SY , CN|&) = Pr(CN, SY|&). we can reverse the conditionality and get the same result: Pr(SY , CN|&) = Pr(SY|CN, &) × Pr(CN|&). Also note in the diagram that if we know the four joint probabilities at the end of the tree, we can calculate all of the other probabilities on the tree. For example, the marginal probability of high sales is the sum of two joint probabilities (12% + 48% = 60%). The conditional probability of a competitive product given high sales is a joint probability divided by a marginal probability (12% / 60% = 20%). 368 APPENDIX A Probability Theory A.9 Dependent and Independent Probabilities The probabilities of two distinctions are either dependent or independent. The test is whether knowing the outcome of one distinction affects the probabilities of the other. If the probabilities for one distinction are different depending on the outcome of the other distinction, the probabilities are dependent. If the probabilities for one distinction are unaffected by the outcome of the other, the probabilities are independent. Stated mathematically, events A and B are probabilistically independent if Pr( A|&) = Pr( A|B , &), which implies that Pr( A, B|&) = Pr( A|&) × Pr( B|&). It is readily apparent that the probabilities of the two distinctions shown in Figure A.3 are dependent because the probability of a competitive product given high sales (20%) differs from the probability of a competitive product given not high sales (75%). Probabilistic dependence is a mutual property. If the probabilities of one distinction are dependent on a second distinction, then the reverse must also be true. A.10 Reversing Conditional Probabilities: Bayes’ Rule As indicated in Section A.8, when dealing with two distinctions with mutually dependent probabilities, it is often useful to be able to reverse the order of conditioning. That is, if we know the conditional probability of event A given that event B has occurred, we would like to calculate the conditional probability of B given A. This operation, which is known as Bayes’ rule, is done quite easily using tree diagrams, as illustrated in Figure A.4. The original order of the distinctions is shown in the tree on the left. The following steps are taken to create the tree on the right with the order reversed. Step 1. Draw the tree structure with the order reversed Step 2. Copy the four joint probabilities from the original tree, taking care to put them in the correct position in the reversed tree. Step 3. Calculate the marginal probabilities in the reversed tree as the sum of the appropriate joint probabilities. For example, Pr(CY|&) = 12% + 30% = 42%. A.11 Probability Distributions 369 FIGURE A.4 Reversing the order of a tree. Step 4. Calculate the conditional probabilities in the reversed tree as the ratio of a joint probability to a marginal probability. For example, Pr(SY|CY,&) = 12%/42% = 29%. We can express Bayes’ rule mathematically for events A and B as Pr( A|B , &) = Pr( B|A, &) × Pr( A|&)/ Pr( B|&). Bayes’ rule plays a central role in decision analysis since it provides a sound mathematical way to update our probabilities based on new information (See Chapter 11). A.11 Probability Distributions Often, the distinctions that we create in a decision situation are described by quantitative measures. For example, for a new product introduction decision, the distinction of the average unit cost of manufacturing the product might be important. In such cases, we characterize our uncertainty about the quantity as a probability distribution, which speciies the probability that the quantity is in any given interval. One way to visualize a probability distribution is as a histogram, as shown in Figure A.5. We divide the possible range for the quantity into intervals and draw a bar for each interval whose height is proportional to the probability that the quantity will be in that interval. If the intervals are suitably small enough, the histogram in some cases is the familiar “bell-shaped” curve. An alternate way to portray a probability distribution is in its cumulative form, which is sometimes called an “S-shaped” curve. The cumulative form makes it especially easy to gauge the probability that the quantity will be in a given range. For example, the cumulative curve in Figure A.6 shows that the 370 APPENDIX A Probability Theory 17% 16% 15% 12% 12% 8% 6% 5% 4% 2% 2% 1% 120−140 140−160 160−180 180−200 200−220 220−240 240−260 260−280 280−300 300−320 320−340 340−360 Unit Cost ($) FIGURE A.5 Probability distribution as a histogram. Cumulative Probability (%) 100 82% probability that unit cost is less than $250 90 80 70 60 50 40 10% probability 30 that unit cost is less than $150 20 10 0 $100 $150 $200 $250 $300 $350 $400 Unit Cost FIGURE A.6 Probability distribution in cumulative form. probability that cost will be less than $250 is 82% while the probability that it will be less than $150 is 10%. We can therefore deduce that the probability that cost will be between $150 and $250 is 82% − 10% = 72%. The quantitative measure on which a probability distribution is deined can be either continuous or discrete. A continuous measure is one which can take any 371 A.12 Combining Uncertain Quantities Cumulative Probability (%) 100 90 80 70 60 50 40 30 20 10 0 0 1 2 3 Outages 4 5 6 FIGURE A.7 Cumulative probability distribution of a discrete measure. value in a speciied range while a discrete measure is restricted to a countable number of possible values. The unit cost example above exempliies a continuous measure. An example of a discrete measure is the number of unplanned outages experienced by a power plant in a year. The cumulative probability distribution of a discrete measure has a stairstep shape (see Fig. A.7). A.11.1 SUMMARY STATISTICS FOR A PROBABILITY DISTRIBUTION A number of key statistics are often used to summarize a probability distribution. The most widely used is the probability-weighted average, commonly called the mean or the expected value. Two other summary statistics of interest are the variance, which measures the dispersion about the mean and the skewness, which measures the degree of asymmetry of the distribution. Percentiles are another set of useful summary statistics for a distribution. The P-th percentile is the value of the quantity such that there is a probability of P that the quantity will not exceed that value. The cumulative curve is a display of the percentiles of the distribution (read in reverse—the x-axis value of a point on the curve is the P-th percentile, where P is the y-axis value of the point). The frequently used 50th percentile is given the special name median. A.12 Combining Uncertain Quantities When uncertain quantities are combined via mathematical operations, the combined quantity is also uncertain and therefore has a probability distribution. Indeed, the primary goal of analysis in a decision situation is to determine the 372 APPENDIX A Probability Theory TABLE A.1 Equalities When Combining Uncertain Quantities Mean of sum = ? Sum of means Variance of sum = ? Sum of variances Skewness of sum = ? Sum of skewnesses Percentile of sum = ? Sum of percentiles Mean of product = ? Product of means Variance of product = ? Product of variances Percentile of product = ? Product of percentiles Quantities Are Mutually Independent Quantities Are Not Mutually Independent True True True False True False False False True False False False False False probability distribution of the value measure of interest, which is a (usually complicated) combination of the decision choices taken and external uncertain factors. Probability theory tells us some very useful facts about simple combinations of uncertain quantities—sums and products. The most important of these facts is that the mean of the sum of uncertain quantities is always equal to the sum of the means of the individual quantities. For example, if a corporate portfolio comprises a number of business units, each of which has uncertain proits, then the mean of corporate proit is always equal to the sum of the means of the business unit proits. The word “always” means that the equality holds even if the business unit proits are probabilistically dependent on each other. Table A.1 lists the results of probability theory regarding the sums and products of uncertain quantities. It is important to note that the percentiles of sums and products are, in general, not equal to the sums and products of corresponding percentiles. That is, one should not attempt to calculate the median, for example, of the sum of uncertain quantities by summing the individual medians. Another property of note in the table is that the mean, variance, and skewness of the sum of independent quantities can be found by summing the corresponding measures of the individual quantities. So, a very quick way to get a good approximation of the probability distribution for the value of a portfolio of independent assets is to calculate the mean, variance, and skewness of the portfolio by summing across the individual assets and itting a probability distribution to those three statistics. References 373 REFERENCES Chung, K.L. & AitSahlia, F. (2003). Elementary Probability Theory, 4th ed. Springer. Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd ed. John Wiley & Sons. Howard, R. (1966). Information value theory. IEEE Transactions on Systems Science and Cybernetics, 2(1), pp. 22–26, reprinted in R. Howard, & J. Matheson, The Principles and Applications of Decision Analysis (pp. 779–783). Menlo Park, CA: Strategic Decisions Group. Howard, R.A. (2007). The foundations of decision analysis revisited. In W. Edwards, R.F. Miles, & D. von Winterfeldt (eds.), Advances in Decision Analysis: From Foundations to Applications, pp. 32–56. Cambridge University Press.