UNCERTAINTY AND
INFORMATION
Foundations of Generalized
Information Theory
George J. Klir
Binghamton University—SUNY
A JOHN WILEY & SONS, INC., PUBLICATION
Copyright © 2006 by John Wiley & Sons, Inc. All rights reserved
Published by John Wiley & Sons, Inc., Hoboken, New Jersey
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act,
without either the prior written permission of the Publisher, or authorization through payment
of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive,
Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com.
Requests to the Publisher for permission should be addressed to the Permissions Department,
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201)
748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best
efforts in preparing this book, they make no representations or warranties with respect to the
accuracy or completeness of the contents of this book and specifically disclaim any implied
warranties of merchantability or fitness for a particular purpose. No warranty may be created
or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional
where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any
other commercial damages, including but not limited to special, incidental, consequential, or
other damages.
For general information on our other products and services or for technical support, please
contact our Customer Care Department within the United States at (800) 762-2974, outside the
United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in
print may not be available in electronic formats. For more information about Wiley products,
visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Klir, George J., 1932–
Uncertainty and information : foundations of generalized information theory / George J. Klir.
p. cm.
Includes bibliographical references and indexes.
ISBN-13: 978-0-471-74867-0
ISBN-10: 0-471-74867-6
1. Uncertainty (Information theory) 2. Fuzzy systems. I. Title.
Q375.K55 2005
033¢.54—dc22
2005047792
Printed in the United States of America
10 9 8 7 6 5 4 3 2 1
A book is never finished.
It is only abandoned.
—Honoré De Balzac
CONTENTS
Preface
xiii
Acknowledgments
xvii
1 Introduction
1
1.1. Uncertainty and Its Significance / 1
1.2. Uncertainty-Based Information / 6
1.3. Generalized Information Theory / 7
1.4. Relevant Terminology and Notation / 10
1.5. An Outline of the Book / 20
Notes / 22
Exercises / 23
2 Classical Possibility-Based Uncertainty Theory
26
2.1.
2.2.
Possibility and Necessity Functions / 26
Hartley Measure of Uncertainty for Finite Sets / 27
2.2.1. Simple Derivation of the Hartley Measure / 28
2.2.2. Uniqueness of the Hartley Measure / 29
2.2.3. Basic Properties of the Hartley Measure / 31
2.2.4. Examples / 35
2.3. Hartley-Like Measure of Uncertainty for Infinite Sets / 45
2.3.1. Definition / 45
2.3.2. Required Properties / 46
2.3.3. Examples / 52
Notes / 56
Exercises / 57
3 Classical Probability-Based Uncertainty Theory
3.1.
61
Probability Functions / 61
3.1.1. Functions on Finite Sets / 62
vii
viii
CONTENTS
3.1.2. Functions on Infinite Sets / 64
3.1.3. Bayes’ Theorem / 66
3.2. Shannon Measure of Uncertainty for Finite Sets / 67
3.2.1. Simple Derivation of the Shannon Entropy / 69
3.2.2. Uniqueness of the Shannon Entropy / 71
3.2.3. Basic Properties of the Shannon Entropy / 77
3.2.4. Examples / 83
3.3. Shannon-Like Measure of Uncertainty for Infinite Sets / 91
Notes / 95
Exercises / 97
4 Generalized Measures and Imprecise Probabilities
101
4.1.
4.2.
Monotone Measures / 101
Choquet Capacities / 106
4.2.1. Möbius Representation / 107
4.3. Imprecise Probabilities: General Principles / 110
4.3.1. Lower and Upper Probabilities / 112
4.3.2. Alternating Choquet Capacities / 115
4.3.3. Interaction Representation / 116
4.3.4. Möbius Representation / 119
4.3.5. Joint and Marginal Imprecise Probabilities / 121
4.3.6. Conditional Imprecise Probabilities / 122
4.3.7. Noninteraction of Imprecise Probabilities / 123
4.4. Arguments for Imprecise Probabilities / 129
4.5. Choquet Integral / 133
4.6. Unifying Features of Imprecise Probabilities / 135
Notes / 137
Exercises / 139
5 Special Theories of Imprecise Probabilities
5.1.
5.2.
5.3.
5.4.
An Overview / 143
Graded Possibilities / 144
5.2.1. Möbius Representation / 149
5.2.2. Ordering of Possibility Profiles / 151
5.2.3. Joint and Marginal Possibilities / 153
5.2.4. Conditional Possibilities / 155
5.2.5. Possibilities on Infinite Sets / 158
5.2.6. Some Interpretations of Graded Possibilities / 160
Sugeno l-Measures / 160
5.3.1. Möbius Representation / 165
Belief and Plausibility Measures / 166
5.4.1. Joint and Marginal Bodies of Evidence / 169
143
CONTENTS
ix
5.4.2. Rules of Combination / 170
5.4.3. Special Classes of Bodies of Evidence / 174
5.5. Reachable Interval-Valued Probability Distributions / 178
5.5.1. Joint and Marginal Interval-Valued Probability
Distributions / 183
5.6. Other Types of Monotone Measures / 185
Notes / 186
Exercises / 190
6 Measures of Uncertainty and Information
196
6.1.
6.2.
General Discussion / 196
Generalized Hartley Measure for Graded Possibilities / 198
6.2.1. Joint and Marginal U-Uncertainties / 201
6.2.2. Conditional U-Uncertainty / 203
6.2.3. Axiomatic Requirements for the U-Uncertainty / 205
6.2.4. U-Uncertainty for Infinite Sets / 206
6.3. Generalized Hartley Measure in Dempster–Shafer
Theory / 209
6.3.1. Joint and Marginal Generalized Hartley Measures / 209
6.3.2. Monotonicity of the Generalized Hartley Measure / 211
6.3.3. Conditional Generalized Hartley Measures / 213
6.4. Generalized Hartley Measure for Convex Sets of Probability
Distributions / 214
6.5. Generalized Shannon Measure in Dempster-Shafer
Theory / 216
6.6. Aggregate Uncertainty in Dempster–Shafer Theory / 226
6.6.1. General Algorithm for Computing the Aggregate
Uncertainty / 230
6.6.2. Computing the Aggregated Uncertainty in Possibility
Theory / 232
6.7. Aggregate Uncertainty for Convex Sets of Probability
Distributions / 234
6.8. Disaggregated Total Uncertainty / 238
6.9. Generalized Shannon Entropy / 241
6.10. Alternative View of Disaggregated Total Uncertainty / 248
6.11. Unifying Features of Uncertainty Measures / 253
Notes / 253
Exercises / 255
7 Fuzzy Set Theory
7.1.
7.2.
7.3.
An Overview / 260
Basic Concepts of Standard Fuzzy Sets / 262
Operations on Standard Fuzzy Sets / 266
260
x
CONTENTS
7.3.1. Complementation Operations / 266
7.3.2. Intersection and Union Operations / 267
7.3.3. Combinations of Basic Operations / 268
7.3.4. Other Operations / 269
7.4. Fuzzy Numbers and Intervals / 270
7.4.1. Standard Fuzzy Arithmetic / 273
7.4.2. Constrained Fuzzy Arithmetic / 274
7.5. Fuzzy Relations / 280
7.5.1. Projections and Cylindric Extensions / 281
7.5.2. Compositions, Joins, and Inverses / 284
7.6. Fuzzy Logic / 286
7.6.1. Fuzzy Propositions / 287
7.6.2. Approximate Reasoning / 293
7.7. Fuzzy Systems / 294
7.7.1. Granulation / 295
7.7.2. Types of Fuzzy Systems / 297
7.7.3. Defuzzification / 298
7.8. Nonstandard Fuzzy Sets / 299
7.9. Constructing Fuzzy Sets and Operations / 303
Notes / 305
Exercises / 308
8 Fuzzification of Uncertainty Theories
315
8.1.
8.2.
8.3.
8.4.
8.5.
Aspects of Fuzzification / 315
Measures of Fuzziness / 321
Fuzzy-Set Interpretation of Possibility Theory / 326
Probabilities of Fuzzy Events / 334
Fuzzification of Reachable Interval-Valued Probability
Distributions / 338
8.6. Other Fuzzification Efforts / 348
Notes / 350
Exercises / 351
9 Methodological Issues
9.1.
9.2.
9.3.
An Overview / 355
Principle of Minimum Uncertainty / 357
9.2.1. Simplification Problems / 358
9.2.2. Conflict-Resolution Problems / 364
Principle of Maximum Uncertainty / 369
9.3.1. Principle of Maximum Entropy / 369
355
CONTENTS
xi
9.3.2. Principle of Maximum Nonspecificity / 373
9.3.3. Principle of Maximum Uncertainty in GIT / 375
9.4. Principle of Requisite Generalization / 383
9.5. Principle of Uncertainty Invariance / 387
9.5.1. Computationally Simple Approximations / 388
9.5.2. Probability–Possibility Transformations / 390
9.5.3. Approximations of Belief Functions by Necessity
Functions / 399
9.5.4. Transformations Between l-Measures and Possibility
Measures / 402
9.5.5. Approximations of Graded Possibilities by Crisp
Possibilities / 403
Notes / 408
Exercises / 411
10
Conclusions
415
10.1. Summary and Assessment of Results in Generalized
Information Theory / 415
10.2. Main Issues of Current Interest / 417
10.3. Long-Term Research Areas / 418
10.4. Significance of GIT / 419
Notes / 421
Appendix A Uniqueness of the U-Uncertainty
425
Appendix B Uniqueness of Generalized Hartley Measure
in the Dempster–Shafer Theory
430
Appendix C Correctness of Algorithm 6.1
437
Appendix D Proper Range of Generalized
Shannon Entropy
442
Appendix E Maximum of GSa in Section 6.9
447
Appendix F
Glossary of Key Concepts
449
Appendix G
Glossary of Symbols
455
Bibliography
458
Subject Index
487
Name Index
494
PREFACE
The concepts of uncertainty and information studied in this book are tightly
interconnected. Uncertainty is viewed as a manifestation of some information
deficiency, while information is viewed as the capacity to reduce uncertainty.
Whenever these restricted notions of uncertainty and information may be confused with their other connotations, it is useful to refer to them as information-based uncertainty and uncertainty-based information, respectively.
The restricted notion of uncertainty-based information does not cover the
full scope of the concept of information. For example, it does not fully capture
our common-sense conception of information in human communication and
cognition or the algorithmic conception of information. However, it does play
an important role in dealing with the various problems associated with
systems, as I already recognized in the late 1970s. It is this role of uncertaintybased information that motivated me to study it.
One of the insights emerging from systems science is the recognition that
scientific knowledge is organized, by and large, in terms of systems of various
types. In general, systems are viewed as relations among states of some variables. In each system, the relation is utilized, in a given purposeful way, for
determining unknown states of some variables on the basis of known states of
other variables. Systems may be constructed for various purposes, such as prediction, retrodiction, diagnosis, prescription, planning, and control. Unless the
predictions, retrodictions, diagnoses, and so forth made by the system are
unique, which is a rather rare case, we need to deal with predictive uncertainty,
retrodictive uncertainty, diagnostic uncertainty, and the like. This respective
uncertainty must be properly incorporated into the mathematical formalization of the system.
In the early 1990s, I introduced a research program under the name “generalized information theory” (GIT), whose objective is to study informationbased uncertainty and uncertainty-based information in all their
manifestations. This research program, motivated primarily by some fundamental issues emerging from the study of complex systems, was intended to
expand classical information theory based on probability. As is well known,
the latter emerged in 1948, when Claude Shannon established his measure of
probabilistic uncertainty and information.
xiii
xiv
PREFACE
GIT expands classical information theory in two dimensions. In one dimension, additive probability measures, which are inherent in classical information
theory, are expanded to various types of nonadditive measures. In the other
dimension, the formalized language of classical set theory, within which probability measures are formalized, is expanded to more expressive formalized
languages that are based on fuzzy sets of various types. As in classical information theory, uncertainty is the primary concept in GIT, and information is
defined in terms of uncertainty reduction.
Each uncertainty theory that is recognizable within the expanded framework is characterized by: (a) a particular formalized language (classical or
fuzzy); and (b) a generalized measure of some particular type (additive or nonadditive). The number of possible uncertainty theories that are subsumed
under the research program of GIT is thus equal to the product of the number
of recognized formalized languages and the number of recognized types of
generalized measures. This number has been growing quite rapidly. The full
development of any of these uncertainty theories requires that issues at each
of the following four levels be adequately addressed: (1) the theory must be
formalized in terms of appropriate axioms; (2) a calculus of the theory must
be developed by which this type of uncertainty can be properly manipulated;
(3) a justifiable way of measuring the amount of uncertainty (predictive, diagnostic, etc.) in any situation formalizable in the theory must be found; and (4)
various methodological aspects of the theory must be developed.
GIT, as an ongoing research program, offers us a steadily growing inventory of distinct uncertainty theories, some of which are covered in this book.
Two complementary features of these theories are significant. One is their
great and steadily growing diversity. The other is their unity, which is manifested by properties that are invariant across the whole spectrum of uncertainty theories or, at least, within some broad classes of these theories. The
growing diversity of uncertainty theories makes it increasingly more realistic
to find a theory whose assumptions are in harmony with each given application. Their unity allows us to work with all available theories as a whole, and
to move from one theory to another as needed.
The principal aim of this book is to provide the reader with a comprehensive and in-depth overview of the two-dimensional framework by which the
research in GIT has been guided, and to present the main results that have been
obtained by this research. Also covered are the main features of two classical
information theories. One of them, covered in Chapter 3, is based on the concept
of probability. This classical theory is well known and is extensively covered in
the literature. The other one, covered in Chapter 2, is based on the dual
concepts of possibility and necessity. This classical theory is older and more
fundamental, but it is considerably less visible and has often been incorrectly
dismissed in the literature as a special case of the probability-based information theory. These two classical information theories, which are formally incomparable, are the roots from which distinct generalizations are
obtained.
PREFACE
xv
Principal results regarding generalized uncertainty theories that are based
on classical set theory are covered in Chapters 4–6. While the focus in Chapter
4 is on the common properties of uncertainty representation in all these theories, Chapter 5 is concerned with special properties of individual uncertainty
theories. The issue of how to measure the amount of uncertainty (and the associated information) in situations formalized in the various uncertainty theories is thoroughly investigated in Chapter 6. Chapter 7 presents a concise
introduction to the fundamentals of fuzzy set theory, and the fuzzification of
uncertainty theories is discussed in Chapter 8, in both general and specific
terms. Methodological issues associated with GIT are discussed in Chapter 9.
Finally, results and open problems emerging from GIT are summarized and
assessed in Chapter 10.
The book can be used in several ways and, due to the universal applicability of GIT, it is relevant to professionals in virtually any area of human affairs.
While it is written primarily as a textbook for a one-semester graduate course,
its utility extends beyond the classroom environment. Due to the comprehensive and coherent presentation of the subject and coverage of some previously unpublished results, the book is also a useful resource for researchers.
Although the treatment of uncertainty and information in the book is mathematical, the required mathematical background is rather modest: the reader
is only required to be familiar with the fundamentals of classical set theory,
probability theory and the calculus. Otherwise, the book is completely selfcontained, and it is thus suitable for self-study.
While working on the book, clarity of presentation was always on my mind.
To achieve it, I use examples and visual illustrations copiously. Each chapter
is also accompanied by an adequate number of exercises, which allow readers
to test their understanding of the studied material. The main text is only rarely
interrupted by bibliographical, historical, or any other references. Almost all
references are covered in specific Notes, organized by individual topics and
located at the end of each chapter. These notes contain ample information for
further study.
For many years, I have been pursuing research on GIT while, at the same
time, teaching an advanced graduate course in this area to systems science students at Binghamton University in New York State (SUNY-Binghamton). Due
to rapid developments in GIT, I have had to change the content of the course
each year to cover the emerging new results. This book is based, at least to
some degree, on the class notes that have evolved for this course over the
years. Some parts of the book, especially in Chapters 6 and 9, are based on my
own research.
It is my hope that this book will establish a better understanding of the very
complex concepts of information-based uncertainty and uncertainty-based
information, and that it will stimulate further research and education in the
important and rapidly growing area of generalized information theory.
Binghamton, New York
December 2004
George J. Klir
ACKNOWLEDGMENTS
Over more than three decades of my association with Binghamton University,
I have had the good fortune to advise and work with many outstanding doctoral students. Some of them contributed in a significant way to generalized
information theory, especially to the various issues regarding uncertainty measures. These students, whose individual contributions to generalized information theory are mentioned in the various notes in this book, are (in
alphabetical order): David Harmanec, Masahiko Higashi, Cliff Joslyn,
Matthew Mariano, Yin Pan, Michael Pittarelli, Arthur Ramer, Luis Rocha,
Richard Smith, Mark Wierman, and Bo Yuan. A more recent doctoral student,
Ronald Pryor, read carefully the initial version of the manuscript of this book
and suggested many improvements. In addition, he developed several computer programs that helped me work through some intricate examples in the
book. I gratefully acknowledge all this help.
As far as the manuscript preparation is concerned, I am grateful to two
persons for their invaluable help. First, and foremost, I am grateful to Monika
Fridrich, my Editorial Assistant and a close friend, for her excellent typing of
a very complex, mathematically oriented manuscript, as well as for drawing
many figures that appear in the book. Second, I am grateful to Stanley Kauffman, a graphic artist at Binghamton University, for drawing figures that
required special skills.
Last, but not least, I am grateful to my wife, Milena, for her contribution to
the appearance of this book: it is one of her photographs that the publisher
chose to facilitate the design for the front cover. In addition, I am also
grateful for her understanding, patience, and encouragement during my
concentrated, disciplined and, at times, frustrating work on this challenging
book.
xvii
1
INTRODUCTION
The mind, once expanded to the dimensions of larger ideas, never returns to its
original size.
—Oliver Wendel Holmes
1.1. UNCERTAINTY AND ITS SIGNIFICANCE
It is easy to recognize that uncertainty plays an important role in human
affairs. For example, making everyday decisions in ordinary life is inseparable from uncertainty, as expressed with great clarity by George Shackle
[1961]:
In a predestinate world, decision would be illusory; in a world of a perfect foreknowledge, empty, in a world without natural order, powerless. Our intuitive attitude to life implies non-illusory, non-empty, non-powerless decision. . . . Since
decision in this sense excludes both perfect foresight and anarchy in nature, it
must be defined as choice in face of bounded uncertainty.
Conscious decision making, in all its varieties, is perhaps the most fundamental capability of human beings. It is essential for our survival and well-being.
In order to understand this capability, we need to understand the notion of
uncertainty first.
In decision making, we are uncertain about the future. We choose a particular action, from among a set of conceived actions, on the basis of our anticiUncertainty and Information: Foundations of Generalized Information Theory, by George J. Klir
© 2006 by John Wiley & Sons, Inc.
1
2
1. INTRODUCTION
pation of the consequences of the individual actions. Our anticipation of future
events is, of course, inevitably subject to uncertainty. However, uncertainty in
ordinary life is not confined to the future alone, but may pertain to the past
and present as well. We are uncertain about past events, because we usually
do not have complete and consistent records of the past. We are uncertain
about many historical events, crime-related events, geological events, events
that caused various disasters, and a myriad of other kinds of events, including
many in our personal lives. We are uncertain about present affairs because we
lack relevant information. A typical example is diagnostic uncertainty in medicine or engineering. As is well known, a physician (or an engineer) is often
not able to make a definite diagnosis of a patient (or a machine) in spite of
knowing outcomes of all presumably relevant medical (or engineering) tests
and other pertinent information.
While ordinary life without uncertainty is unimaginable, science without
uncertainty was traditionally viewed as an ideal for which science should
strive. According to this view, which had been predominant in science prior to
the 20th century, uncertainty is incompatible with science, and the ideal is to
completely eliminate it. In other words, uncertainty is unscientific and its elimination is one manifestation of progress in science. This traditional attitude
toward uncertainty in science is well expressed by the Scottish physicist and
mathematician William Thomson (1824–1907), better known as Lord Kelvin,
in the following statement made in the late 19th century (Popular Lectures
and Addresses, London, 1891):
In physical science a first essential step in the direction of learning any subject
is to find principles of numerical reckoning and practicable methods for measuring some quality connected with it. I often say that when you can measure
what you are speaking about and express it in numbers, you know something
about it; but when you cannot measure it, when you cannot express it in numbers,
your knowledge is of meager and unsatisfactory kind; it may be the beginning of
knowledge but you have scarcely, in your thought, advanced to the state of
science, whatever the matter may be.
This statement captures concisely the spirit of science in the 19th century: scientific knowledge should be expressed in precise numerical terms; imprecision
and other types of uncertainty do not belong to science. This preoccupation
with precision and certainty was responsible for neglecting any serious study
of the concept of uncertainty within science.
The traditional attitude toward uncertainty in science began to change in
the late 19th century, when some physicists became interested in studying
processes at the molecular level. Although the precise laws of Newtonian
mechanics were relevant to these studies in principle, they were of no use in
practice due to the enormous complexities of the systems involved. A fundamentally different approach to deal with these systems was needed. It was
eventually found in statistical methods. In these methods, specific manifesta-
1.1. UNCERTAINTY AND ITS SIGNIFICANCE
3
tions of microscopic entities (positions and moments of individual molecules)
were replaced with their statistical averages. These averages, calculated under
certain reasonable assumptions, were shown to represent relevant macroscopic entities such as temperature and pressure. A new field of physics, statistical mechanics, was an outcome of this research.
Statistical methods, developed originally for studying motions of gas molecules in a closed space, have found utility in other areas as well. In engineering, they have played a major role in the design of large-scale telephone
networks, in dealing with problems of engineering reliability, and in numerous
other problems. In business, they have been essential for dealing with problems of marketing, insurance, investment, and the like. In general, they have
been found applicable to problems that involve large-scale systems whose
components behave in a highly random way. The larger the system and the
higher the randomness, the better these methods perform.
When statistical mechanics was accepted, by and large, by the scientific community as a legitimate area of science at the beginning of the 20th century, the
negative attitude toward uncertainty was for the first time revised. Uncertainty
became recognized as useful, or even essential, in certain scientific inquiries.
However, it was taken for granted that uncertainty, whenever unavoidable in
science, can adequately be dealt with by probability theory. It took more than
half a century to recognize that the concept of uncertainty is too broad to be
captured by probability theory alone, and to begin to study its various other
(nonprobabilistic) manifestations.
Analytic methods based upon the calculus, which had dominated science
prior to the emergence of statistical mechanics, are applicable only to problems that involve systems with a very small number of components that are
related to each other in a predictable way. The applicability of statistical
methods based upon probability theory is exactly opposite: they require
systems with a very large number of components and a very high degree of
randomness. These two classes of methods are thus complementary. When
methods in one class excel, methods in the other class totally fail. Despite their
complementarity, these classes of methods can deal only with problems that
are clustered around the two extremes of complexity and randomness scales.
In his classic paper “Science and Complexity” [1948], Warren Weaver refers
to them as problems of organized simplicity and disorganized complexity,
respectively. He argues that these classes of problems cover only a tiny fraction of all conceivable problems. Most problems are located somewhere
between the two extremes of complexity and randomness, as illustrated by the
shaded area in Figure 1.1. Weaver calls them problems of organized complexity for reasons that are well described in the following quote from his paper:
The new method of dealing with disorganized complexity, so powerful an
advance over the earlier two-variable methods, leaves a great field untouched.
One is tempted to oversimplify, and say that scientific methodology went from
one extreme to the other—from two variables to an astronomical number—and
4
1. INTRODUCTION
Randomness
Organized
complexity
Disorganized
complexity
Organized
simplicity
Complexity
Figure 1.1. Three classes of systems and associated problems that require distinct mathematical treatments [Weaver, 1948].
left untouched a great middle region. The importance of this middle region,
moreover, does not depend primarily on the fact that the number of variables is
moderate—large compared to two, but small compared to the number of atoms
in a pinch of salt. The problems in this middle region, in fact, will often involve
a considerable number of variables. The really important characteristic of the
problems in this middle region, which science has as yet little explored and conquered, lies in the fact that these problems, as contrasted with the disorganized
situations with which statistics can cope, show the essential feature of organization. In fact, one can refer to this group of problems as those of organized complexity. . . . These new problems, and the future of the world depends on many
of them, require science to make a third great advance, an advance that must be
even greater than the nineteenth-century conquest of problems of organized simplicity or the twentieth-century victory over problems of disorganized complexity. Science must, over the next 50 years, learn to deal with these problems of
organized complexity.
The emergence of computer technology in World War II and its rapidly
growing power in the second half of the 20th century made it possible to deal
with increasingly complex problems, some of which began to resemble the
notion of organized complexity. However, this gradual penetration into the
domain of organized complexity revealed that high computing power, while
important, is not sufficient for making substantial progress in this problem
domain. It was again felt that radically new methods were needed, methods
based on fundamentally new concepts and the associated mathematical theories. An important new concept (and mathematical theories formalizing its
various facets) that emerged from this cognitive tension was a broad concept
of uncertainty, liberated from its narrow confines of probability theory. To
1.1. UNCERTAINTY AND ITS SIGNIFICANCE
5
introduce this broad concept of uncertainty and the associated mathematical
theories is the very purpose of this book.
A view taken in this book is that scientific knowledge is organized, by and
large, in terms of systems of various types (or categories in the sense of mathematical theory of categories). In general, systems are viewed as relations
among states of given variables. They are constructed from our experiential
domain for various purposes, such as prediction, retrodiction, extrapolation in
space or within a population, prescription, control, planning, decision making,
scheduling, and diagnosis. In each system, its relation is utilized in a given purposeful way for determining unknown states of some variables on the basis of
known states of some other variables. Systems in which the unknown states
are always determined uniquely are called deterministic systems; all other
systems are called nondeterministic systems. Each nondeterministic system
involves uncertainty of some type. This uncertainty pertains to the purpose for
which the system was constructed. It is thus natural to distinguish predictive
uncertainty, retrodictive uncertainty, prescriptive uncertainty, extrapolative
uncertainty, diagnostic uncertainty, and so on. In each nondeterministic
system, the relevant uncertainty (predictive, diagnostic, etc.) must be properly
incorporated into the description of the system in some formalized language.
Deterministic systems, which were once regarded as ideals of scientific
knowledge, are now recognized as too restrictive. Nondeterministic systems
are far more prevalent in contemporary science. This important change in
science is well characterized by Richard Bellman [1961]:
It must, in all justice, be admitted that never again will scientific life be as satisfying and serene as in days when determinism reigned supreme. In partial
recompense for the tears we must shed and the toil we must endure is the satisfaction of knowing that we are treating significant problems in a more realistic
and productive fashion.
Although nondeterministic systems have been accepted in science since their
utility was demonstrated in statistical mechanics, it was tacitly assumed for a
long time that probability theory is the only framework within which uncertainty in nondeterministic systems can be properly formalized and dealt with.
This presumed equality between uncertainty and probability was challenged
in the second half of the 20th century, when interest in problems of organized
complexity became predominant. These problems invariably involve uncertainty of various types, but rarely uncertainty resulting from randomness,
which can yield meaningful statistical averages.
Uncertainty liberated from its probabilistic confines is a phenomenon of
the second half of the 20th century. It is closely connected with two important
generalizations in mathematics: a generalization of the classical measure
theory and a generalization of the classical set theory. These generalizations,
which are introduced later in this book, enlarged substantially the framework
for formalizing uncertainty. As a consequence, they made it possible to
6
1. INTRODUCTION
conceive of new uncertainty theories distinct from the classical probability
theory.
To develop a fully operational theory for dealing with uncertainty of some
conceived type requires that a host of issues be addressed at each of the following four levels:
•
•
•
•
Level 1—We need to find an appropriate mathematical formalization of
the conceived type of uncertainty.
Level 2—We need to develop a calculus by which this type of uncertainty
can be properly manipulated.
Level 3—We need to find a meaningful way of measuring the amount of
relevant uncertainty in any situation that is formalizable in the theory.
Level 4—We need to develop methodological aspects of the theory, including procedures of making the various uncertainty principles operational
within the theory.
Although each of the uncertainty theories covered in this book is examined
at all these levels, the focus is on the various issues at levels 3 and 4. These
issues are presented in greater detail.
1.2. UNCERTAINTY-BASED INFORMATION
As a subject of this book, the broad concept of uncertainty is closely connected
with the concept of information. The most fundamental aspect of this connection is that uncertainty involved in any problem-solving situation is a result
of some information deficiency pertaining to the system within which the
situation is conceptualized. There are various manifestations of information
deficiency. The information may be, for example, incomplete, imprecise, fragmentary, unreliable, vague, or contradictory. In general, these various information deficiencies determine the type of the associated uncertainty.
Assume that we can measure the amount of uncertainty involved in a
problem-solving situation conceptualized in a particular mathematical theory.
Assume further that this amount of uncertainty is reduced by obtaining relevant information as a result of some action (performing a relevant experiment
and observing the experimental outcome, searching for and discovering a relevant historical record, requesting and receiving a relevant document from an
archive, etc.). Then, the amount of information obtained by the action can be
measured by the amount of reduced uncertainty. That is, the amount of information pertaining to a given problem-solving situation that is obtained by
taking some action is measured by the difference between a priori uncertainty
and a posteriori uncertainty, as illustrated in Figure 1.2.
Information measured solely by the reduction of relevant uncertainty
within a given mathematical framework is an important, even though
restricted, notion of information. It does not capture, for example, the
1.3. GENERALIZED INFORMATION THEORY
7
Action
A Priori
Uncertainty: U1
A Posteriori
Uncertainty: U2
U1 - U2
Information
Figure 1.2. The meaning of uncertainty-based information.
common-sense conception of information in human communication and cognition, or the algorithmic conception of information, in which the amount of
information needed to describe an object is measured by the shortest possible description of the object in some standard language. To distinguish information conceived in terms of uncertainty reduction from the various other
conceptions of information, it is common to refer to it as uncertainty-based
information.
Notwithstanding its restricted nature, uncertainty-based information is very
important for dealing with nondeterministic systems. The capability of measuring uncertainty-based information in various situations has the same utility
as any other measuring instrument. It allows us, in general, to analyze and
compare systems from the standpoint of their informativeness. By asking a
given system any question relevant to the purpose for which the system has
been constructed (prediction, retrodiction, diagnosis, etc.), we can measure the
amount of information in the obtained answer. How well we utilize this capability to measure information depends of course on the questions we ask.
Since this book is concerned only with uncertainty-based information, the
adjective “uncertainty-based” is usually omitted. It is used only from time to
time as a reminder or to emphasize the connection with uncertainty.
1.3. GENERALIZED INFORMATION THEORY
A formal treatment of uncertainty-based information has two classical roots,
one based on the notion of possibility, and one based on the notion of probability. Overviews of these two classical theories of information are presented
in Chapters 2 and 3, respectively. The rest of the book is devoted to various
generalizations of the two classical theories. These generalizations have been
developing and have commonly been discussed under the name “Generalized
Information Theory” (GIT). In GIT, as in the two classical theories, the
primary concept is uncertainty, and information is defined in terms of uncertainty reduction.
The ultimate goal of GIT is to develop the capability to deal formally with
any type of uncertainty and the associated uncertainty-based information that
we can recognize on intuitive grounds. To be able to deal with each recognized
8
1. INTRODUCTION
type of uncertainty (and uncertainty-based information), we need to address
scores of issues. It is useful to associate these issues with four typical levels of
development of each particular uncertainty theory, as suggested in Section 1.1.
We say that a particular theory of uncertainty, T, is fully operational when the
following issues have been resolved adequately at the four levels:
•
•
•
•
Level 1—Relevant uncertainty functions, u, of theory T have been characterized by appropriate axioms (examples of these functions are probability measures).
Level 2—A calculus has been developed for dealing with functions u (an
example is the calculus of probability theory).
Level 3—A justified functional U in theory T has been found, which for
each function u in the theory measures the amount of uncertainty associated with u (an example of functional U is the well-known Shannon
entropy in probability theory).
Level 4—A methodology has been developed for dealing with the various
problems in which theory T is involved (an example is the Bayesian
methodology, combined with the maximum and minimum entropy principles, in probability theory).
Clearly, the functional U for measuring the amount of uncertainty
expressed by the uncertainty function u can be investigated only after this
function is properly formalized and a calculus is developed for dealing with it.
The functional assigns to each function u in the given theory a nonnegative
real number. This number is supposed to measure, in an intuitively meaningful way, the amount of uncertainty of the type considered that is embedded
in the uncertainty function. To be acceptable as a measure of the amount of
uncertainty of a given type in a particular uncertainty theory, the functional
must satisfy several intuitively essential axiomatic requirements. Specific
mathematical formulation of each of the requirements depends on the uncertainty theory involved. For the classical uncertainty theories, specific formulations of the requirements are introduced and discussed in Chapters 2 and 3.
For the various generalized uncertainty theories, these formulations are introduced and examined in both generic and specific terms in Chapter 6.
The strongest justification of a functional as a meaningful measure of the
amount of uncertainty of a considered type in a given uncertainty theory is
obtained when we can prove that it is the only functional that satisfies the
relevant axiomatic requirements and measures the amount of uncertainty in
some specific measurement units. A suitable measurement unit is uniquely
defined by specifying what the amount of uncertainty should be for a particular (and usually very simple) uncertainty function.
GIT is essentially a research program whose objective is to develop a
broader treatment of uncertainty-based information, not restricted to its classical notions. Making a blueprint for this research program requires that a sufficiently broad framework be employed. This framework should encompass a
1.3. GENERALIZED INFORMATION THEORY
9
broad spectrum of special mathematical areas that are fitting to formalize the
various types of uncertainty conceived.
The framework employed in GIT is based on two important generalizations
in mathematics that emerged in the second half of the 20th century. One of
them is the generalization of classical measure theory to the theory of monotone measures. The second one is the generalization of classical set theory to
the theory of fuzzy sets. These two generalizations expand substantially the
classical, probabilistic framework for formalizing uncertainty, which is based
on classical set theory and classical measure theory. This expansion is 2-dimensional. In one dimension, the additivity requirement of classical measures is
replaced with the less restrictive requirement of monotonicity with respect to
the subsethood relationship. The result is a considerably broader theory of
monotone measures, within which numerous branches are distinguished that
deal with monotone measures with various special properties. In the other
dimension, the formalized language of classical set theory is expanded to the
more expressive language of fuzzy set theory, where further distinctions are
based on various special types of fuzzy sets.
The 2-dimensional expansion of the classical framework for formalizing
uncertainty theories is illustrated in Figure 1.3. The rows in this figure represent various branches of the theory of monotone measures, while the columns
represent various types of formalized languages. An uncertainty theory of a
particular type is formed by choosing a particular formalized language and
expressing the relevant uncertainty (predictive, prescriptive, etc.) involved in
situations described in this language in terms of a monotone measure of a
chosen type. This means that each entry in the matrix in Figure 1.3 represents
an uncertainty theory of a particular type. The shaded entries indicate uncertainty theories that are currently fairly well developed and are covered in this
book.
As a research program, GIT has been motivated by the following attitude
toward dealing with uncertainty. One aspect of this attitude is the recognition
of multiple types of uncertainty and the associated uncertainty theories.
Another aspect is that we should not a priori commit to any particular theory.
Our choice of uncertainty theory for dealing with each given problem should
be determined solely by the nature of the problem. The chosen theory should
allow us to express fully our ignorance and, at the same time, it should not
allow us to ignore any available information. It is remarkable that these principles were expressed with great simplicity and beauty more than two millennia ago by the ancient Chinese philosopher Lao Tsu (ca. 600 b.c.) in his famous
book Tao Te Ching (Vintage Books, New York, 1972):
Knowing ignorance is strength.
Ignoring knowledge is sickness.
The primacy of problems in GIT is in sharp contrast with the primacy of
methods that is a natural consequence of choosing to use one particular theory
10
1. INTRODUCTION
Formalized languages
Uncertainty
theories
M
o
n
o
t
o
n
e
Classical
Sets
Standard
Fuzzy
Sets
Nonclassical Sets
Nonstandard fuzzy sets
Interval
Valued
Type 2
Level 2
Lattice
Based
∑∑∑
A
d
d Classical
i numerical
t probability
i
v
e
Possibility/
necessity
M
e
a N Sugeno
s o l-measures
u n
r a Belief/
e d plausibility
s d (capacities
i of order •)
t
i
v Capacities of
e various
finite orders
Intervalvalued
probability
distributions
∑
∑
∑
General
lower and
upper
probabilities
Figure 1.3. A framework for conceptualizing uncertainty theories, which is used as a blueprint
for research within generalized information theory (GIT).
for all problems involving uncertainty. The primary aim of GIT is to pursue
the development of new uncertainty theories, through which we gradually
extend our capability to deal with uncertainty honestly: to be able to fully recognize our ignorance without ignoring available information.
1.4. RELEVANT TERMINOLOGY AND NOTATION
The purpose of this section is to introduce names and symbols for some
general mathematical concepts, primarily from the area of classical set theory,
which are frequently used throughout this book. Names and symbols of many
other concepts that are used in the subsequent chapters are introduced locally
in each individual chapter.
1.4. RELEVANT TERMINOLOGY AND NOTATION
11
A set is any collection of some objects that are considered for some purpose
as a whole. Objects that are included in a set are called its members (or elements). Conventionally, sets are denoted by capital letters and elements of sets
are denoted by lowercase letters. Symbolically, the statement “a is a member
of set A” is written as a Œ A.
A set is defined by one of three methods. In the first method, members (or
elements) of the set are explicitly listed, usually within curly brackets, as in
A = {1, 3, 5, 7, 9}. This method is, of course, applicable only to a set that contains a finite number of elements. The second method for defining a set is to
specify a property that an object must possess to qualify as a member of the
set. An example is the following definition of set A:
A = {x | x is a real number that is greater than 0 and smaller than 1}.
The symbol | in this definition (and in other definitions in this book) stands
for “such that.” As can be seen from this example, this method allows us to
define sets that include an infinite number of elements.
Both of the introduced methods for defining sets tacitly assume that
members of the sets of concern in each particular application are drawn from
some underlying universal set.This is a collection of all objects that are of interest in the given application. Some common universal sets in mathematics have
standard symbols to represent them, such as ⺞ for the set of all natural
numbers, ⺞n for the set {1, 2, 3, . . . , n}, ⺪ for the set of all integers, ⺢ for the
set of all real numbers, and ⺢+ for the set of all nonnegative real numbers.
Except for these standard symbols, letter X is reserved in this book to denote
a universal set.
The third method to define a set is through a characteristic function. If cA
is the characteristic function of a set A, then cA is a function from the universal set X to the set {0, 1}, where
c A ( x) =
{10
if x is a member of A
if x is not an member of A
for each x Œ X. For the set A of odd natural numbers less then 10, the characteristic function is defined for each x Œ ⺞ by the formula
c A ( x) =
{10
if when x = 1, 3, 5, 7, 9
otherwise.
Set A is contained in or is equal to another set B, written A Õ B, if every
element of A is an element of B, that is, if x Œ A implies x Œ B. If A is contained in B, then A is said to be a subset of B, and B is said to be a superset of
A. Two sets are equal, symbolically A = B, if they contain exactly the same elements; therefore, if A Õ B and B Õ A then A = B. If A Õ B and A is not equal
to B, then A is called a proper subset of B, written A Ã B. The negation of each
12
1. INTRODUCTION
Table 1.1. Definition of All Subsets, A, of Set X = {x1,
x2, x3} by Their Characteristic Functions
A:
x1
x2
x3
0
1
0
0
1
1
0
1
0
0
1
0
1
0
1
1
0
0
0
1
0
1
1
1
of these propositions is expressed symbolically by a slash crossing the operator. That is x œ A, A À B, and A π B represent, respectively, x is not an element
of A, A is not a proper subset of B, and A is not equal to B.
The family of all subsets of a given set A is called the power set of A, and
it is usually denoted by P(A). The family of all subsets of P(A) is called a
second-order power set of A; it is denoted by P 2(A), which stands for P(P(A)).
Similarly, higher-order power sets P 3(A), P 4(A), . . . can be defined.
For any finite universal set, it is convenient to define its various subsets by
their characteristic functions arranged in a tabular form, as shown in Table 1.1
for X = {x1, x2, x3}. In this case, each set, A, of X is defined by a triple ·cA(x1),
cA(x2), cA(x3)Ò. The order of these triples in the table is not significant, but it
is useful for discussing typical examples in this book to list subsets containing
one element first, followed by subsets containing two elements and so on.
The intersection of sets A and B is a new set, A « B, that contains every
object that is simultaneously an element of both the set A and the set B. If A
= {1, 3, 5, 7, 9} and B = {1, 2, 3, 4, 5}, then A « B = {1, 3, 5}. The union of sets A
and B is a new set, A » B, which contains all the elements that are in set A or
in set B. With the sets A and B defined previously, A » B = {1, 2, 3, 4, 5, 7, 9}.
The complement of a set A, denoted Ā, is the set of all elements of the universal set that are not elements of A. With A = {1, 3, 5, 7, 9} and the universal
set X = {1, 2, 3, 4, 5, 6, 7, 8, 9}, the complement of A is Ā = {2, 4, 6, 8}. A related
set operation is the set difference, A - B, which is defined as the set of all elements of A that are not elements of B. With A and B as defined previously, A
- B = {7, 9} and B - A = {2, 4}. The complement of A is equivalent to X - A.
All the concepts of set theory can be recast in terms of the characteristic
functions of the sets involved. For example we have that A Õ B if and only if
cA(x) £ cB(x) for all x Œ X. Similarly,
c A « B ( x) = min{c A ( x), c B ( x)},
c A » B ( x) = max{c A ( x), c B ( x)}.
The phrase “for all” occurs so often in set theory that a special symbol, ",
is used as an abbreviation. Similarly, the phrase “there exists” is abbreviated
1.4. RELEVANT TERMINOLOGY AND NOTATION
13
as $. For example, the definition of set equality can be restated as A = B if and
only if cA(x) = cB(x), "x Œ X.
The size of a finite set, called its cardinality, is the number of elements it
contains. If A = {1, 3, 5, 7, 9}, then the cardinality of A, denoted by |A|, is 5. A
set may be empty, that is, it may contain no elements. The empty set is given a
special symbol ∆; thus ∆ = {} and |∆| = 0. When A is finite, then
P ( A) = 2 A , P 2 ( A) = 2 2 , etc.
A
The most fundamental properties of the set operations of absolute complement, union, and intersection are summarized in Table 1.2, where sets A,
B, and C are assumed to be elements of the power set P(X) of a universal set
X. Note that all the equations in this table that involve the set union and intersection are arranged in pairs. The second equation in each pair can be obtained
from the first by replacing ∆, », and « with X, «, and », respectively, and vice
versa.These pairs of equations exemplify a general principle of duality: for each
valid equation in set theory that is based on the union and intersection operations, there is a corresponding dual equation, also valid, that is obtained by
the replacement just specified.
Any two sets that have no common members are called disjoint. That is,
every pair of disjoint sets, A and B, satisfies the equation
A « B = Ø.
Table 1.2. Fundamental Properties of Set Operations
Law of contradiction
Law of excluded middle
=
A =A
A»B=B»A
A«B=B«A
(A » B) » C = A » (B » C)
(A « B) « C = A « (B « C)
A « (B » C) = (A « B) » (A « C)
A » (B « C) = (A » B) « (A » C)
A»A=A
A«A=A
A » (A « B) = A
A « (A » B) = A
A»X=X
A«⭋=⭋
A»⭋=A
A«X=A
A « Ā = ⭋
A » Ā = X
De Morgan’s laws
A« B = A» B
Involution
Commutativity
Associativity
Distributivity
Idempotence
Absorption
Absorption by X and ⭋
Identity
A» B = A« B
14
1. INTRODUCTION
A family of pairwise disjoint nonempty subsets of a set A is called a partition
on A if the union of these subsets yields the original set A. A partition on A
is usually denoted by the symbol p(A). Formally,
p ( A) = {Ai i Œ I , Ai Õ A, Ai π Ø}
is a partition on A iff (i.e., if and only if)
Ai « Aj π Ø
for each pair i, j Œ I, i π j, and
UA
i
= A.
i ŒI
Members of p(A), which are subsets of A, are usually referred to as blocks of
the partition. Each member of A belongs to one and only one block of p(A).
Given two partitions p1(A) and p2(A), we say that p1(A) is a refinement of
p2(A) iff each block of p1(A) is included in some block of p2(A). The refinement relation on the set of all partitions of A, P(A), which is denoted by £
(i.e., p1(A) £ p2(A) in our case), is a partial ordering. The pair ·P(A), £ Ò is a
lattice, referred to as the partition lattice of A.
Let A = {A1, A2, . . . , An} be a family of sets such that
Ai Õ Ai +1
for all i = 1, 2, . . . , n - 1.
Then, A is called a nested family, and the sets A1 and An are called the innermost set and the outermost set, respectively. This definition can easily be
extended to infinite families.
The ordered pair formed by two objects x and y, where x ŒX and y ŒY, is
denoted by ·x, yÒ. The set of all ordered pairs, where the first element is contained in a set X and the second element is contained in a set Y, is called a
Cartesian product of X and Y and is denoted as X ¥ Y. If, for example, X = {1,
2} and Y = {a, b}, then X ¥ Y = {·1, aÒ, ·1, bÒ, ·2, aÒ, ·2, bÒ}. Note that the size of
X ¥ Y is the product of the size of X and the size of Y when X and Y are finite:
|X ¥ Y| = |X| · |Y|. It is not required that the Cartesian product be defined on
distinct sets. A Cartesian product X ¥ X is perfectly meaningful. The symbol
X2 is often used instead of X ¥ X. If, for example, X = {0, 1}, then X2 = {·0, 0Ò,
·0, 1Ò, ·1, 0Ò, ·1, 1Ò}. Any subset of X ¥ Y is called a binary relation.
Several important properties are defined for binary relations R Õ X2. They
are: R is reflexive iff ·x, xÒ ŒR for all x ŒX; R is symmetric iff for every
·x, yÒ ŒR it is also ·y, xÒ ŒR; R is antisymmetric iff ·x, yÒ ŒR and ·y, xÒ ŒR implies
x = y; R is transitive iff ·x, yÒ ŒR and ·y, zÒ ŒR implies ·x, zÒ ŒR. Relations that
are reflexive, symmetric, and transitive are called equivalence relations. Relations that are reflexive and symmetric are called compatibility relations. Rela-
1.4. RELEVANT TERMINOLOGY AND NOTATION
15
tions that are reflexive, antisymmetric, and transitive are called partial orderings. When R is a partial ordering and ·x, yÒ ŒR, it is common to write x £ y
and say that x precedes y or, alternatively, that x is smaller than or equal to y.
A partial ordering £ on X does not guarantee that all pairs of elements x,
y in X are comparable in the sense that either x £ y or y £ x. If all pairs of elements are comparable, the partial ordering becomes total ordering (or linear
ordering). Such an ordering is characterized by—in addition to reflexivity, transitivity, and antisymmetry—a property of connectivity: for all x, y ŒX, x π y
implies either x £ y or y £ x.
Let X be a set on which a partial ordering is defined and let A be a subset
of X. If x ŒX and x £ y for every y ŒA, then x is called a lower bound of A on
X with respect to the partial ordering. If x ŒX and y £ x for every y ŒA, then
x is called an upper bound of A on X with respect to the partial ordering. If a
particular lower bound of A succeeds (is greater than) any lower bound of A,
then it is called the greatest lower bound, or infimum, of A. If a particular upper
bound precedes (is smaller than) every other upper bound of A, then it is
called the least upper bound, or supremum, of A.
A partially ordered set X any two elements of which have a greatest lower
bound (also referred to as a meet) and a least upper bound (also referred to
as a join) is called a lattice. The meet and join elements of x and y in X are
often denoted by x Ÿ y and x ⁄ y, respectively. Any lattice on X can thus be
defined not only by the pair ·X, £Ò, where £ is an appropriate partial ordering
of X, but also by the triple ·X, Ÿ, ⁄Ò, where Ÿ and ⁄ denote the operations of
meet and join.
A partially ordered set, any two elements of which have only a greatest
lower bound, is called a lower semilattice or meet semilattice. A partially
ordered set, any two elements of which have only a least upper bound, is called
an upper semilattice or join semilattice.
Elements of the power set P(X) of a universal set X (or any subset of X)
can be ordered by the set inclusion Õ. This ordering, which is only partial,
forms a lattice in which the join (least upper bound, supremum) and meet
(greatest lower bound, infimum) of any pair of sets A, B ŒP(X) is given by A
» B and A « B, respectively. This lattice is distributive (due to the distributive
properties of » and « listed in Table 1.2) and complemented (since each set
in P(X) has its complement in P(X)); it is usually called a Boolean lattice
or a Boolean algebra. The connection between the two formulations of
this important lattice, ·P(X), ÕÒ and ·P(X), », «Ò, is facilitated by the
equivalence
A Õ B iff A » B = B and A « B = A for any A, B Œ P (X ),
where “iff” is a common abbreviation of the phrase “if and only if” or its alternative “is equivalent to.” This convenient abbreviation is used throughout this
book.
16
1. INTRODUCTION
If R Õ X ¥ Y, then we call R a binary relation between X and Y. If
·x, yÒ Œ R, then we also write R(x, y) or xRy to signify that x is related to y by
R. The inverse of a binary relation R on X ¥ Y, which is denoted by R-1, is a
binary relation on Y ¥ X such that
y, x Œ R -1
iff
x, y Œ R.
For any pair of binary relations R Õ X ¥ Y and Q Õ Y ¥ Z, the composition
of R and Q, denoted by R ° Q, is a binary relation on X ¥ Z defined by the
formula
R o Q = { x, z
x, y Œ R and y, z ŒQ for some y}.
If a binary relation on X ¥ Y is such that each element x ŒX is related to
exactly one element of y ŒY, the relation is called a function, and it is usually
denoted by a lowercase letter. Given a function f, this unique assignment
of one particular element y ŒY to each element x ŒX is often expressed as
f(x) = y. Set X is called a domain of f and Y is called its range. The domain
and range of function f are usually specified in the form f: X Æ Y; the arrow
indicates that function f maps elements of set X to elements of set Y; f is called
a completely specified function iff each element x ŒX is included in at least one
pair ·x, y = f(x)Ò and it is called an onto function iff each element y ŒY is
included in at least one pair ·x, y = f(x)Ò. If the domain of a function (and possibly also its range) is a set of functions, then it is common to call such a function a functional.
The inverse of a function f is another function, f -1, which maps elements of
set Y to disjoint subsets of set X. If f is a completely specified and onto function, then f -1 maps elements of set Y to blocks of the unique partition, pf (X),
that is induced on the set X by function f. This partition consists of |Y| subsets
of X,
p f ( X ) = {X y y ŒY } ,
where
Xy = {x ŒX f ( x) = y}
for each y ŒY. Function f -1 thus has the form
f -1 :Y Æ p f ( X )
and is defined by the assignment f -1(y) = Xy for each y ŒY.
The notion of a Cartesian product is not restricted to ordered pairs. It may
involve ordered n-tuples for any n ≥ 2. An n-dimensional Cartesian product
for some particular n is the set of all ordered n-tuples that can be formed from
1.4. RELEVANT TERMINOLOGY AND NOTATION
17
the designated sets in a manner analogous to forming ordered pairs of a 2dimensional Cartesian product. When the n-tuples are formed from a single
set, say, set X, then the n-dimensional Cartesian product is usually denoted by
the symbol Xn. For example, if X = {0, 1}, then X3 = {·0, 0, 0Ò, ·0, 0, 1Ò, ·0, 1, 0Ò,
·0, 1, 1Ò, ·1, 0, 0Ò, ·1, 0, 1Ò, ·1, 1, 0Ò, ·1, 1, 1Ò}. Any subset of a given n-dimensional
Cartesian product is called an n-dimensional relation.
Several important concepts are associated with n-dimensional relations for
any finite n ≥ 2. For the sake of simplicity, let us define them in terms of a
ternary relation R Õ X ¥ Y ¥ Z. Generalizations to n > 3 are obvious. A projection of R into one of its dimensions, say, dimension X, is the set
[R Ø X ] = {x Œ X x, y, z Œ R for some y, z ŒY ¥ Z}.
The symbol [R Ø X] indicates that R is projected into dimension X. Projections [R Ø Y] and [R Ø Z] are defined in a similar way. A projection of R
into two of its dimensions, say, X ¥ Y, is the set
[R Ø X ¥ Y ] = { x, y Œ X ¥ Y x, y, z Œ R for some z Œ Z}.
Projections [R Ø X ¥ Z] and [R Ø Y ¥ Z] are defined in a similar way.
A cylindric extension of projection [R Ø X] of a ternary relation R Õ X ¥ Y
¥ Z with respect to Y ¥ Z is the set
([R Ø X ] ≠ Y ¥ Z) = { x, y, z Œ X ¥ Y ¥ Z x Œ[R Ø X ]}.
Similarly, a cylindric extension of projection [R Ø X ¥ Y] with respect to dimension Z is the set
([R Ø X ¥ Y ] ≠ Z) = { x, y, z Œ X ¥ Y ¥ Z x, y Œ X ¥ Y}.
The intersection of cylindric extensions of any given set P of projections of
relation R is called a cylindric closure of R with respect to projections in P.
For any pair of binary relations R Õ X ¥ Y and Q Õ Y ¥ Z, the join of R
and Q, denoted by R * Q, is a ternary relation on X ¥ Y ¥ Z defined by the
formula
R * Q = { x, y, z Œ X ¥ Y ¥Z
x, y Œ R and y, z ŒQ}.
Observe that
R o Q = [R * Q Ø X ¥ Z ] .
Important subsets of ⺢ are intervals of real numbers. Four types of intervals of real numbers between a and b are distinguished: Closed intervals, [a,
b], which contain the endpoints a and b; open intervals, (a, b), which do not
18
1. INTRODUCTION
contain the endpoints; semiopen intervals, (a, b] (left-open interval) and [b, a)
(right-open interval), which do not contain the left-end point and the rightend point, respectively.
An important and frequently used universal set is the set of all points in ndimensional Euclidean vector space ⺢n for some n ≥ 1 (i.e., all n-tuples of real
numbers). Sets defined in terms of ⺢n are often required to possess a property
referred to as convexity. A subset A of ⺢n is called convex iff, for every pair
of points
r = ri i Œ⺞n
and
s = si i Œ⺞n
in A and every real number l Œ [0, 1], the point
t = lri + (1 - l ) si i Œ⺞n
is also in A. In other words, a subset A of ⺢n is convex iff, for every pair of
points r and s in A, all points located on the straight-line segment connecting
r and s are also in A.
In ⺢, any set defined by a single interval of real numbers is convex; any set
defined by more than one interval that does not contain some points between
the intervals is not convex. For example, the set A = [0, 2] » [3, 5] is not convex,
as can be shown by producing one of an infinite number of possible counterexamples: let r = 1, s = 4, and l = 0.4; then, lr + (1 - l)s = 2.8 and 2.8 œ A.
Let R denote any set of real numbers (i.e., R Õ ⺢). If there is a real number
r (or a real number s) such that x £ r (or x ≥ s, respectively) for every x ŒR,
then r is called an upper bound of R (or a lower bound of R), and we say that
R is bounded above by r (or bounded below by s).
For any set of real numbers R that is bounded above, a real number r is
called the supremum of R iff:
(a) r is an upper bound of R.
(b) No number less than r is an upper bound of R.
If r is the supremum of R, we write r = sup R. If R has a maximum, then sup
R = max R. For example, sup(0, 1) = sup[0, 1] = 1, but only max[0, 1] = 1;
maximum of the open interval (0, 1) does not exist.
For any set of real numbers R that is bounded below, a real number s is
called the infimum of R iff:
(a) s is a lower bound of R.
(b) No number greater than s is a lower bound of R.
If s is the infimum of R, we write s = inf R. If R has a minimum, then
inf R = min R.
1.4. RELEVANT TERMINOLOGY AND NOTATION
19
Classical sets must satisfy two basic requirements. First, members of each
set must be distinguishable from one another; and second, for any given set
and any given object, it must be possible to determine whether the object is,
or is not, a member of the set.
Fuzzy sets, which play an important role in GIT, differ from classical sets
by rejecting the second requirement. Contrary to classical sets, fuzzy sets are
not required to have sharp boundaries that distinguish their members from
other objects. The membership in a fuzzy set is not a matter of affirmation or
denial, as it is in a classical set, but a matter of degree.
Due to their sharp boundaries, classical sets are usually referred to in fuzzy
literature as crisp sets. This convenient and well-established term is adopted
in this book. Also adopted is the usual notation, according to which both crisp
and fuzzy sets are denoted by capital letters. This is justified by the fact that
crisp sets are special (degenerate) fuzzy sets.
Each fuzzy set is defined in terms of a relevant crisp universal set by a
function analogous to the characteristic function of crisp sets. This function
is called a membership function. As explained in Chapter 7, the form of this
function depends on the type of fuzzy set that is defined by it. For the
most common fuzzy sets, referred to as standard fuzzy sets, the membership function used for defining a fuzzy set A on a given universal set X
assigns to each element x of X a real number in the unit interval [0, 1].
This number is interpreted as the degree of membership of x in A. When only
the extreme values, 0 and 1, are assigned to each x ŒX, the membership
function becomes formally equivalent to a characteristic function that defines
a crisp set. However, there is subtle conceptual difference between the
two functions. Contrary to the symbolic role of the numbers in characteristic functions of crisp sets, numbers assigned to objects by membership
functions of standard fuzzy sets clearly have a numerical significance. This
significance is preserved when crisp sets are viewed (from the standpoint
of fuzzy set theory) as special fuzzy sets. For example, when we calculate an
average of two or more membership functions, we obtain a membership
function that defines a meaningful standard fuzzy set. On the other hand,
an average of two or more characteristic functions is not a meaningful
characteristic function.
Two distinct notations are most commonly employed in the literature to
denote membership functions. In one of them, the membership function of a
fuzzy set A is denoted by mA, and its form for standard fuzzy sets is
m A : X Æ [0, 1] .
For each x ŒX, the value mA(x) is the degree of membership of x in A. In the
second notation, the membership function is denoted by A and, of course, has
the same form
A: X Æ [0, 1] .
20
1. INTRODUCTION
Clearly, A(x) is again the degree of membership of x in A.
According to the first notation, the symbol of the fuzzy set is distinguished
from the symbol of its membership function. According to the second notation, this distinction is not made, but no ambiguity results from this double use
of the same symbol, since each fuzzy set is uniquely defined by one particular
membership function. In this book, the second notation is adopted; it is
simpler, and, by and large, more popular in the current literature on fuzzy set
theory. Since crisp sets are viewed from the standpoint of fuzzy set theory as
special fuzzy sets, the same notation is used for them.
By exploiting degrees of membership, fuzzy sets are capable of expressing
gradual transitions from membership to nonmembership. This expressive
capability has wide utility. For example, it allows us to capture, at least in a
crude way, meanings of expressions in natural language, most of which are
inherently vague. Membership degrees in these fuzzy sets express compatibilities of relevant objects with the linguistic expression that the sets attempt to
capture. Crisp sets are hopelessly inadequate for this purpose.
Consider the four membership functions whose graphs are shown in Figure
1.4. These functions are defined on the set of nonnegative real numbers. Functions A and B define crisp sets, which are viewed here (from the fuzzy set perspective) as special fuzzy sets. Set A consists of a single object, the number 3;
set B consists of all real numbers in the closed interval [2, 4]. Functions C and
D define genuine fuzzy sets. Set C captures (in appropriate context) linguistic
expressions such as around 3, close to 3, or approximately 3. It may thus be
viewed as a fuzzy number. Similarly, fuzzy set D may be viewed as a fuzzy
interval.
Observe that the crisp set B in Figure 1.4 also consists of numbers that are
around 3. However, the sharp boundaries of the set are at odds with the vague
term around. The meaning of the term is certainly not captured, for example,
by excluding the number 1.999999 while including the number 2. The abrupt
transitions from membership to nonmembership make crisp sets virtually
unusable for capturing meanings of linguistic terms of natural language.
To explain the role of fuzzy set theory in GIT, an overview of its fundamentals is presented in Chapter 7.
1.5. AN OUTLINE OF THE BOOK
The objective of this book is to survey the current level of development of
GIT. The material, which is presented in a textbook-like manner, is organized
in the following way.
After setting the stage in this introductory chapter, the actual survey begins
with overviews of the two classical uncertainty theories. These are theories
based on the notion of possibility (Chapter 2) and the notion of probability
(Chapter 3). Due to extensive coverage of these theories in the literature,
especially the one based on probability, only the most fundamental features
21
1.5. AN OUTLINE OF THE BOOK
•
1
•
1
A(x)
C(x)
0
0
0
1
2
3
4
5
x
0
1
1
2
3
4
5
x
4
5
x
1
D(x)
B(x)
•
0
0
1
2
3
4
0
5
x
2
3
0
1
Figure 1.4. Examples of membership functions of fuzzy sets.
of these theories are covered. However, Notes in the two chapters guide the
reader through the literature dealing with these classical uncertainty theories.
The next part of the book (Chapters 4–6) is oriented toward introducing
some generalizations of the classical probability-based uncertainty theory.
These generalizations are obtained by replacing the additivity requirement of
probability measures with the weaker requirement of monotonicity of monotone measures, but they are still formalized within the language of classical set
theory. These theories may be viewed as theories of imprecise probabilities of
various types. While the focus in Chapter 4 is on common properties of uncertainty functions in all these theories, Chapter 5 is concerned with distinctive
properties of the individual theories. Covered are only theories that had been
well developed when this book was written. Functionals for measuring uncertainty in the introduced theories are examined in Chapter 6. These function-
22
1. INTRODUCTION
als, which are central to GIT, are shown to be largely invariant with respect to
the great diversity of uncertainty theories.
Further generalization of uncertainty theories by extending classical sets to
fuzzy sets of various types, which is usually referred to as fuzzification, is the
subject of Chapter 8. In order to make the book self-contained, relevant concepts of fuzzy set theory are introduced in Chapter 7. This important area of
GIT is still largely undeveloped. Only a few uncertainty theories have been
fuzzified thus far, and all these developed fuzzifications involve only the standard fuzzy sets.
The survey of GIT is concluded by examining four important principles of
uncertainty in Chapter 9. These are methodological principles justified on epistemological grounds.Their applications to a wide variety of practical problems,
which are discussed in this chapter, demonstrate the importance of GIT.
The closing chapter of the book (Chapter 10) is devoted to conclusions. GIT
is examined in this chapter in both retrospect and prospect. The overall conclusion from this examination is that GIT is still in an early stage of its development, notwithstanding the many important results that are surveyed in this
book.
NOTES
1.1. The recent book by Pollack [2003] is recommended as supplementary reading to
this chapter. It is a well-written and thorough discussion of the important role of
uncertainty in both science and ordinary life. Also recommended is the book by
Smithson [1989], in which the many facets of changing attitudes toward uncertainty in science and other areas of human affairs throughout the 20th century are
carefully examined.
1.2. The turning point leading to the acceptance of methods based on probability
theory in science was the publication of sound mathematical foundations of statistical mechanics by Willard Gibbs [1902].
1.3. Research on a broader conception of uncertainty-based information, liberated
from the confines of classical set theory and probability theory, began in the
early 1980s [Higashi and Klir, 1983a, b; Höhle, 1982; Yager, 1983]. The name
generalized information theory (GIT) was coined for this research program by
Klir [1991].
1.4. Information measured solely by the reduction of uncertainty is not explicitly concerned with the semantic and pragmatic aspects of information viewed in the
broader sense [Cherry, 1957; Jumarie, 1986, 1990; Kornwachs and Jacoby, 1996].
However, these aspects are not ignored, but they are assumed to be addressed
prior to each particular application. For example, when dealing with a system of
some kind (in general, a set of interrelated variables), we are assumed to understand the language (formalized or natural) in which the system is described (this
resolves the semantic aspect of information), and we are also assumed to know
the purpose for which the system has been constructed (this resolves the pragmatic aspect). These assumptions certainly restrict the applicability of uncertainty-
EXERCISES
23
based information. However, an argument can be made [Dretske, 1981, 1983] that
the notion of uncertainty-based information is sufficiently rich as a basis for additional treatment, through which the broader concept of information, pertaining to
human communication and cognition, can adequately be formalized.
1.5. The concept of information has also been investigated in terms of the theory of
computability. In this approach, which is not covered in this book, the amount of
information represented by an object is measured by the length of its shortest
description in some standard language (e.g., by the shortest program for the standard Turing machine). Information of this type is usually referred to as descriptive
information or algorithmic information [Kolmogorov, 1965; Chaitin, 1987], and it
is connected with the concept of Kolmogorov complexity [Li and Vitányi, 1993].
1.6. Some additional approaches to information have appeared in the literature since
the early 1990s. For example, Devlin [1991] formulates and investigates information in terms of logic, while Stonier [1990] views information as a physical property defined as the capacity to organize a system or to maintain it in an organized
state. Another physics-based approach to information is known in the literature
as Fisher information [Fisher, 1950; Frieden, 1998]. A more recent, measurementbased approach was developed by Harmuth [1992]. Again, these various
approaches are not covered in this book.
1.7. A digest of most mathematical concepts that are relevant to the subject of this
book is in Section 1.4. A useful reference for strengthening the background in classical set theory, which is suitable for self-study is Set Theory and Related Topics by
S. Lipschutz (Schaum Series/McGraw-Hill, New York). Basic familiarity with calculus and some aspects of mathematical analysis are also needed for understanding this book. The book Mathematical Analysis by T.M. Apostol (Addison-Wesley,
Reading, MA) is recommended as a useful reference in this area.
EXERCISES
1.1. For which of the following pairs of sets is A = B?
(a) A = {0, 1, 2, 3}; B = {1, 3, 2, 0}
(b) A = {0, 1, 0, 2, 3}; B = {0, 1, 2, 3, 2}
(c) A = ∆; B = {∆}
(d) A = {0}; B = {∆}
1.2. Which of the following definitions are acceptable as definitions of classical (crisp) sets?
(a) A = {a | a is a real number}
(b) B = {b | b is a real number much greater than 1}
(c) C = {c | c is a living organism}
(d) D = {d | d is a section in this book}
(e) E = {e | e is a set}
(f) F = { f | f is a pretty girl}
24
1. INTRODUCTION
1.3. Which of the following statements are correct provided that X = {∆, 0,
1, 2, {0, 1}}?
(a) {0, 1} ŒX
(b) {0, 1} Ã X
(c) {0, 1, 2, {0, 1}} = X
(d) {{0, 1}} Ã X
(e) {0, 1, 2} ŒX
(f) {∆} ŒX
1.4. Let A = ⺞40, B = {b | b is a natural number divisible by 3, b £ 30},
C = {c | c is an odd natural number, a £ 50}. Determine the following sets:
(a) A - B; A - C; C - A, C - B
(b) A « B; A « C; B « C
(c) A » B; A » C; B » C
(d) (A « B) » C; (A » B) « C; B » (A « C)
1.5. Determine the partition lattices of sets A = {1, 2, 3} and B = {1, 2, 3, 4}.
1.6. How many possible relations can be defined on the following Cartesian
products of finite sets?
(a) X1 ¥ X2 ¥ . . . ¥ Xn
(b) A ¥ B2 ¥ C3
(c) P(A) ¥ P(B)
(d) A2 ¥ P2(B) ¥ C
1.7. For each of the following relations, determine whether or not it is reflexive, symmetric, antisymmetric, or transitive:
(a) R Õ P(X) ¥ P(X); ·A, BÒ ŒR iff A Õ B for all A, B ŒP(X).
(b) R Õ C ¥ C, where C denotes the set of courses in a graduate
program: ·a,bÒ ŒR iff course a is a prerequisite of course b.
(c) Rn Õ ⺞ ¥ ⺞: ·a,bÒ ŒRn iff the remainders obtained by diving a and
b by n are the same, where n is some specific natural number greater
than 1.
(d) R Õ W ¥ W, where W denotes the set of all English words:
·a,bÒ ŒR iff a is a synonym of b.
(e) R Õ F ¥ F, where F denotes the set of all five-letter English words:
·a,bÒ ŒR iff a differs from b in at most one position.
(f) R Õ A ¥ A: ·a,bÒ ŒR iff f(a) = f(b), where f is a function of the form
f : A Æ A.
(g) R Õ T ¥ T, where T denotes all persons included in a family tree:
·a,bÒ Œ R iff a is an ancestor of b.
EXERCISES
25
(h) R = {·0,0Ò, ·0,1Ò, ·1,0Ò, ·1,1Ò, ·1,2Ò, ·2,2Ò, ·0,2Ò, ·3,3Ò}
(i) R = {·0,0Ò, ·0,3Ò, ·1,1Ò, ·2,2Ò, ·1,0Ò, ·0,1Ò, ·3,1Ò, ·3,3Ò, ·3,0Ò}
1.8. Let X = ⺞4. Determine which of the following relations on X2 are equivalence relations, compatibility relations, or partial orderings:
(a) R1 = {·x, yÒ | x < y}
(b) R2 = {·x, yÒ | x £ y}
(c) R3 = {·x, yÒ | x = y}
(d) R4 = {·x, yÒ | 2x = y}
(e) R5 = {·x, yÒ | x = y - 1}
(f) R6 = {·x, yÒ | x < y or x > y}
(g) R7 = {·1, 2Ò, ·2, 3Ò, ·3, 2Ò, ·4, 3Ò}
1.9. For the binary relations in Exercise 1.8, determine the following compositions and joins:
(a) R1 ° R2 and R1 * R2
-1
(b) R1 ° R-1
1 and R1 * R 2
(c) R4 ° R5 and R3 * R7
-1
(d) R6 ° R-1
7 and R7 * R 6
1.10. For each of the following functions, determine the supremum and
infimum, as well as the maximum and minimum (if they exist):
(a) f(x) = 1 - x, where x Œ [0, 1)
(b) f(x) = sin x, where x Œ [0, 2p]
(c) f(x) = x/(1 - x), where x Œ (0, 10)
(d) f(x) = sin x/x, where x Œ ⺢
(f) f(x) = max[0, 2x - x2], where x Œ [0, 1]
1.11. For each of the following binary relations on ⺢2, determine its 1dimensional projections, its cylindric extensions, and the cylindric closure
of these projections:
(a) R = {·x, yÒ | x2 + y2 £ 1}
(b) R = {·x, yÒ | 0 £ y £ 2x - x2}
(c) R = {·x, yÒ | |x| + |y| £ 1}
(d) R = {·x, yÒ | x2 + 2y2 £ 1}
2
CLASSICAL POSSIBILITYBASED UNCERTAINTY
THEORY
When you have eliminated the impossible, whatever remains must have been the
case, however improbable it may seem to be.
—Sherlock Holmes
2.1. POSSIBILITY AND NECESSITY FUNCTIONS
One of the two classical theories of uncertainty, which is the subject of this
chapter, is based on the notion of possibility and the associated notion of necessity. The other classical theory, which is the subject of Chapter 3, is based on
the notion of probability. Of the two classical theories, the one based on possibility is simpler, more fundamental, and older. To describe this rather simple
theory, let X denote a finite set of mutually exclusive alternatives that are of
concern to us (diagnoses, predictions, etc.). This means that in any given situation only one of the alternatives is true. To identify the true alternative, we
need to obtain relevant information (e.g., by conducting relevant diagnostic
tests). The most elementary and, at the same time, the most fundamental kind
of information is a demonstration (based, for example, on outcomes of the
conducted diagnostic tests) that some of the alternatives in X are not possible. After excluding these alternatives from X, we obtain a subset E of X. This
subset contains only alternatives that, according to the obtained information,
are possible. We can say that alternatives in E are supported by the evidence.
Let the characteristic function of the set of all possible alternatives, E, be
called in this context a basic possibility function and be denoted by rE. Then,
Uncertainty and Information: Foundations of Generalized Information Theory, by George J. Klir
© 2006 by John Wiley & Sons, Inc.
26
2.2. HARTLEY MEASURE OF UNCERTAINTY FOR FINITE SETS
rE ( x) =
{10
when x Œ E
when x œ E .
27
(2.1)
Using common sense, a possibility function, PosE, defined on the power set,
P(X), is then given by the formula
PosE ( A) = max rE ( x),
x ŒA
(2.2)
for all A ŒP(X). It is indeed correct to say that it is possible that the true alternative is in A when A contains at least one possible alternative (an alternative that is also contained in E). It follows immediately that
PosE ( A» B) = max{PosE ( A), PosE (B)}
(2.3)
for any pair of sets A, B ŒP(X).
Given the possibility function PosE on the power set of X, it is useful to
define another function, NecE, to describe for each A ŒP(X) the necessity that
the true alternative is in A. Clearly, the true alternative is necessarily in A if
and only if it is not possible that it is in Ā, the complement of A. Hence,
NecE ( A) = 1 - PosE ( A )
(2.4)
for all A ŒP(X).
2.2. HARTLEY MEASURE OF UNCERTAINTY FOR FINITE SETS
The question of how to measure the amount of uncertainty associated with
a finite set E of possible alternatives was addressed by Hartley [1928]. He
showed that the only meaningful way to measure this amount is to use a functional of the form.
c log b
Âr
E
( x)
x ŒX
or, alternatively,
c log b E ,
where |E| denotes the cardinality of set E; b and c are positive constants, and
it is required that b π 1. Each choice of values b and c determines the unit in
which the uncertainty is measured. Requiring, for example, that
c log b 2 = 1,
which is the most common choice, uncertainty would be measured in bits. One
bit of uncertainty is equivalent to uncertainty regarding the truth or falsity of
28
2. CLASSICAL POSSIBILITY-BASED UNCERTAINTY THEORY
one elementary proposition. Conveniently choosing b = 2 and c = 1 to satisfy
the preceding equation, we obtain a unique functional, H, defined for any basic
possibility function, rE, by the formula
H (rE ) = log 2 E .
This functional is usually called a Hartley measure of uncertainty. It is easy to
see that uncertainty measured by this functional results from the lack of specificity. Clearly, the larger the set of possible alternatives, the less specific are
predictions, diagnoses, and the like. Full specificity is obtained when only one
of the considered alternatives is possible. This type of uncertainty is thus well
characterized by the term nonspecificity.
2.2.1. Simple Derivation of the Hartley Measure
This simple derivation is due to Hartley [1928]. Assume, as before, that from
among a finite set X of considered alternatives, the only possible alternatives
are those in a nonempty subset E of X. That is, there is evidence that alternatives that are not in set E are not possible.
Given now a particular set E of possible alternatives, sequences of its
elements can be formed by successive selections. If s selections are made, then
there are |E|s different potential sequences. The amount of uncertainty in identifying one of the sequences, H(|E|s), which is equivalent to the amount of
information needed to remove this uncertainty, should be proportional to s.
That is,
( ) = K ( E ) ◊ s,
H E
s
where K(|E|) is some function of |E|.
Consider now two nonempty subsets of X, E1 and E2, such that |E1| π |E2|.
Assume that after making s1 selections from E1 and s2 selections from E2,
the number of sequences in both cases is the same. Then, the amounts of
information needed to remove uncertainty associated with the two sets of
sequences should be the same as well. That is, if
E1
s1
s2
= E2 ,
(2.5)
then
K E1 ◊ s1 = K E2 ◊ s 2 .
From Eqs. (2.5) and (2.6), we obtain
s 2 log b E1
=
s1 log b E2
(2.6)
29
2.2. HARTLEY MEASURE OF UNCERTAINTY FOR FINITE SETS
and
s 2 K E1
=
,
s1 K E2
respectively. Hence,
K E1
log b E1
=
.
K E2 log b E2
This equation can be satisfied only by a function K of the form
K ( E ) = c ◊log b E ,
where b, c are positive constants and b π 1. Each choice of values of the constants b and c determines the unit in which uncertainty is measured. When
b = 2 and c = 1, which is the most common choice, uncertainty is measured
in bits, and we obtain
H (E ) = log 2 E ,
(2.7)
This can also be expressed in terms of the basic possibility function rE as
H (rE ) = log 2
Âr
E
( x).
x ŒX
(2.8)
2.2.2. Uniqueness of the Hartley Measure
Hartley’s derivation of the measure of possibilistic uncertainty (or nonspecificity) H, which is expressed by either Eq. (2.7) or Eq. (2.8), is certainly convincing. However, it does not explicitly prove that H is the only meaningful
functional to measure possiblistic uncertainty in bits. To be meaningful, the
functional must satisfy some essential axiomatic requirements. The uniqueness
of this functional H was proven on axiomatic grounds by Rényi [1970b].
Since, according to our intuition, the possibilistic uncertainty depends only
on the number of possible alternatives (the number of elements in the set
E Õ X), Rényi conceptualized the measure of possibilistic uncertainty, H, as a
functional of the form:
H : ⺞Æ ⺢ + .
Using this form, he characterized the functional by the following axioms:
Axiom (H1) Branching. H(n · m) = H(n) + H(m).
30
2. CLASSICAL POSSIBILITY-BASED UNCERTAINTY THEORY
Axiom (H2) Monotonicity. H(n) £ H(n + 1).
Axiom (H3) Normalization. H(2) = 1.
Axiom (H1) involves a set with m · n elements, which can be partitioned
into n subsets each with m elements. A characterization of an element from
the full set requires the amount H(m · n) of information. However, we can
also proceed in two steps to characterize the element by taking advantage of
the partition of the set. First, we characterize the subset to which the element
belongs: the required information is H(n). Then, we characterize the element
within the subset: the required information is H(m). These two amounts of
information completely characterize an element of the full set and, hence, their
sum should equal H(m · n). This is exactly what the axiom requires.
Axiom (H2) expresses an essential and rather obvious requirement:
When the number of possible alternatives increases, the amount of information needed to characterize one of them cannot decrease. Axiom (H3) is
needed to define the measurement unit. As it is stated by Rényi, the defined
measurement unit is the bit.
Using the three axioms, Rényi established that H defined by Eq. (2.7) is the
only functional that satisfies these axioms. This is the subject of the following
uniqueness theorem.
Theorem 2.1. The functional H(n) = log2n is the only functional that satisfies
Axioms (H1)–(H3).
Proof. Let n be an integer greater than 2. For each integer i, define the integer
q(i) such that
2 q (i ) £ ni < 2 q (i )+1.
(2.9)
These inequalities can be written as
q(i) log 2 2 £ i log 2 n < (q(i) + 1) log 2 2 .
When we divide these inequalities by i and replace log22 with 1, we get
q(i)
q(i) + 1
£ log 2 n <
.
i
i
Consequently,
lim
iƕ
q(i)
= log 2 n.
i
(2.10)
Let H denote a functional that satisfies Axioms (H1)–(H3). Then, by Axiom
(H2),
2.2. HARTLEY MEASURE OF UNCERTAINTY FOR FINITE SETS
H (a) £ H (b)
31
(2.11)
for a < b. Combining Eq. (2.11) and Eq. (2.9), we obtain
H (2 q (i ) ) £ H (ni ) £ H (2 q (i )+1 ).
(2.12)
By Axiom (H1), we obtain
H (ak ) = k ◊ H (a).
If we apply this to all three terms of Eq. (2.12), we get
q(i) ◊ H (2) £ i ◊ H (n) £ (q(i) + 1) ◊ H (2).
By Axiom (H3), H(2) = 1, so these inequalities become
q(i) £ i ◊ H (n) £ q(i) + 1.
Dividing through by i yields
q(i)
q(i) + 1
,
£ H (n) £
i
i
and consequently,
lim
iƕ
q(i)
= H (n).
i
(2.13)
Comparing Eq. (2.10) with Eq. (2.13), we conclude that H(n) = log2n for
n > 2. Since log22 = 1 and log21 = 0, function H trivially satisfies all the axioms
for n = 1, 2 as well.
䊏
2.2.3. Basic Properties of the Hartley Measure
First, it is easy to see that the Hartley measure defined by Eq. (2.7) satisfies
the inequalities
0 £ H (E ) £ log 2 X
for any E ŒP(X). The lower bound is obtained when only one of the considered alternatives is possible, as exemplified by deterministic systems. The
upper bound is obtained when all considered alternatives are possible. This
expresses the state of total ignorance.
In some applications, it is preferable to use a normalized Hartley measure,
NH, which is defined by the formula
32
2. CLASSICAL POSSIBILITY-BASED UNCERTAINTY THEORY
NH (E ) =
H (E )
.
log 2 X
(2.14)
The range of NH is clearly the unit interval [0, 1], independent of X and E Õ
X. Moreover, NH is invariant with respect to the choice of measurement units.
Assume now that a given set of possible alternatives, E, is reduced by the
outcome of an action to a smaller set E¢ Ã E. Then, the amount of information obtained by the action, IH(E, E¢) is measured by the difference H(E) H(E¢). That is,
I H (E , E ¢) = log 2
E
.
E¢
(2.15)
When the action eliminates all alternatives in E except one (i.e., when |E¢| =
1), we obtain IH(E, E¢) = log2|E| = H(E). This means that H(E) may also be
viewed as the amount of information needed to characterize one element of
set E.
Consider now two universal sets, X and Y, and assume that a relation R Õ
X ¥ Y describes a set of possible alternatives in some situation of interest. Consider further the sets
RX = {x Œ X ( x, y) Œ R for some y ŒY },
RY = {y ŒY ( x, y) Œ R for some x Œ X },
which are usually referred to as projections of R on sets X, Y, respectively.
Then three distinct Hartley measures are applicable, H(RX), H(RY), and H(R),
which are defined on the power sets of X, Y, and X ¥ Y, respectively. The first
two,
H (RX ) = log 2 RX ,
(2.16)
H (RY ) = log 2 RY ,
(2.17)
are called simple or marginal Hartley measures. The third one,
H (R) = log 2 R
(2.18)
is called a joint Hartley measure.
Two additional Hartley measures are defined,
H (RX RY ) = log 2
R
,
RY
(2.19)
H (RY RX ) = log 2
R
,
RX
(2.20)
2.2. HARTLEY MEASURE OF UNCERTAINTY FOR FINITE SETS
33
which are called conditional Hartley measures. These definitions can be generalized by restricting the set of possible conditions to R¢Y Õ RY and R¢X Õ RX,
respectively. The generalized definitions are:
H (RX RY¢ ) = log 2
R
,
RY¢
(2.21)
H (RY RX¢ ) = log 2
R
,
RX¢
(2.22)
Observe that the ratio |R|/|RY| in Eq. (2.19) represents the average number
of elements of RX that are possible alternatives under the condition that an
element of RY has already been selected. This means that H(RX | RY) measures
the average nonspecificity regarding possible choices from RX for all possible
choices from RY. Function H(RY | RX) defined by Eq. (2.20) clearly has a
similar meaning, with the roles of sets RX and RY exchanged. The generalized
forms of conditional Hartley measures defined by Eqs. (2.21) and (2.22) obviously have the same meaning under the restricted sets of possible conditions
R¢Y and R¢X, respectively.
The marginal, joint, and conditional Hartley measures are related in numerous ways. To describe these various relations generically, it is useful (and a
common practice) to identify only the universal sets involved and not the
actual subsets of possible alternatives.That is, the generic symbols H(X), H(Y),
H(X ¥ Y), H(X | Y), and H(Y | X) are used instead of their specific counterparts H(RX), H(RY), H(R), H(RX | RY) and H(RY | RX), respectively. As is
shown later in this book, the generic descriptions of the relations have the
same form in every uncertainty theory, even though the related entities are
specific to each theory and change from theory to theory.
The equations
H ( X Y ) = H ( X ¥ Y ) - H (Y ),
(2.23)
H (Y X ) = H (X ¥ Y ) - H (X ),
(2.24)
which follow immediately from Eqs. (2.19) and (2.20), express in generic form
the relationship between marginal, joint, and conditional Hartley measures. As
is demonstrated later in this book, these important equations hold in every
uncertainty theory when the Hartley measure is replaced with its counterpart
in the other theory.
If possible alternatives from X do not depend on selections from Y, and
vice versa, then R = X ¥ Y and the sets RX and RY are called noninteractive.
Then, clearly,
H ( X Y ) = H ( X ),
(2.25)
H (Y X ) = H (Y ),
(2.26)
H ( X ¥ Y ) = H ( X ) + H (Y ).
(2.27)
34
2. CLASSICAL POSSIBILITY-BASED UNCERTAINTY THEORY
In the general case, when sets RX and RY are not necessarily interactive, these
equations become the inequalities
H ( X Y ) £ H ( X ),
(2.28)
H (Y X ) £ H (Y ),
(2.29)
H ( X ¥ Y ) £ H ( X ) + H (Y ).
(2.30)
The following functional, which is usually referred to as information transmission, is a useful indicator of the strength of constraint between possible
alternatives in sets X and Y:
TH ( X , Y ) = H ( X ) + H (Y ) - H ( X ¥ Y ).
(2.31)
When the sets are noninteractive, TH(X,Y) = 0; otherwise, TH(X,Y) > 0. Using
Eqs. (2.23) and (2.24), TH(X,Y) can be also expressed in terms of the conditional uncertainties:
TH ( X , Y ) = H ( X ) - H ( X Y ),
(2.32)
TH ( X , Y ) = H (Y ) - H (Y X ).
(2.33)
The maximum value, T̂H(X,Y), of information transmission associated with
relations R Õ X ¥ Y is obtained when
H ( X Y ) = H (Y X ) = 0.
This means that
H ( X ¥ Y ) - H (Y ) = 0,
H ( X ¥ Y ) - H ( X ) = 0,
and, hence,
H ( X ¥ Y ) = H ( X ) = H (Y ).
This implies that |R| = |RX| = |RY|. These equalities can be satisfied only for
|R| = 1, 2, . . . , min{|X|,|Y|} Clearly, the largest value of information transmission is obtained for
R = RX = RY = min{ X , Y }.
Hence,
TˆH ( X , Y ) = min{log 2 X ,log 2 Y }.
(2.34)
35
2.2. HARTLEY MEASURE OF UNCERTAINTY FOR FINITE SETS
The normalized information transmission, NTH, is then defined by the formula
NTH ( X , Y ) = TH ( X , Y ) TˆH ( X , Y ) .
(2.35)
2.2.4. Examples
The meaning of uncertainty measured by the Hartley functional depends on
the meaning of the set E. For example, when E is a set of predicted states of
a variable (from the set X of all states defined for the variable), H(E) is a
measure of predictive uncertainty; when E is a set of possible diseases of a
patient determined from relevant medical evidence, H(E) is a measure of diagnostic uncertainty; when E is a set of possible answers to an unsettled historical question, H(E) is a measure of retrodictive uncertainty; when E is a set for
possible policies, H(E) is a measure of prescriptive uncertainty. The purpose of
this section is to illustrate the utility of the Hartley measure on simple examples in some of these application contexts.
EXAMPLE 2.1. Consider a simple dynamic system with four states whose
purpose is prediction. Let S = {s1, s2, s3, s4} denote the set of states of the system,
and let R denote the state-transition relation on S2 (the set of possible transitions from present states to next states) that is defined in matrix form by the
basic possibility function rR in Figure 2.1a. Entries in the matrix MR, are values
rR(si,sj) for all pairs ·si,sjÒ Œ S2. All possible transitions from present states to
next states (for which rR(si,sj) = 1) are also illustrated by the directed arcs
(edges) in the diagram in Figure 2.1b. It is assumed that transitions occur only
at specified discrete times. The system is clearly nondeterministic, which means
that its predictions inevitably involve some nonspecifity. For convenience, let
R Ã S t ¥ S t+1
St+1
S4
St
Present states
Next states
rR
s1
s2
s3
s4
s1
s2
s3
s4
0
0
1
0
1
1
1
1
1
1
0
0
1
0
0
0
= MR
S2
S1
S3
(a)
(b)
Figure 2.1. Illustration to Examples 2.1 and 2.2.
36
2. CLASSICAL POSSIBILITY-BASED UNCERTAINTY THEORY
St denote the set of considered states of the system at some specified initial
time, and let St+k for some k Œ ⺞ denote the set of considered states of the
system at time t + k. Clearly, St = St+k = S for any k Œ ⺞.
Since the purpose of the system is prediction, it makes sense to ask the
system questions regarding possible future states or sequences of future states.
For each question, the system provides us with a particular prediction that, in
general, is not fully specific. The Hartley measure allows us to calculate the
actual amount of nonspecificity in this prediction. The following are a few
examples illustrating the use of Hartley measure for this purpose:
(a) Assuming that any of the four states is possible at time t, what is the
average nonspecificity in predicting the state at time t + 1? Applying
Eq. (2.23) for X ¥ Y = St+1 ¥ St, the answer is
H (St +1 St ) = H (St ¥ St +1 ) - H (St ) = log 2 8 - log 2 4 = 1.
(b) Assuming that only states s1 and s2 are possible at time t, what is the
average nonspecificity in predicting the state at time t + 1? Applying
Eq. (2.23) for X ¥ Y = {{s1,s2} ¥ St+1}, the answer is
H (St +1 {s1 , s 2 }) = H ({s1 , s 2 } ¥ St +1 ) - H ({s1 , s 2 }) = log 25 - log 2 2 = 1.32.
(c) If any state is possible at time t, what is the average nonspecificity in
predicting the sequence of states of length n? For any n ≥ 1, the answer
is given by the formula
H (St +1 ¥ St + 2 ¥ . . . ¥St +n St ) = H (St ¥ St +1 ¥ . . . ¥St +n ) - H (St ).
(2.36)
For n = 1 this formula becomes the one in Example 2.1a. To apply this formula
for any n ≥ 2, we need to determine the number of possible sequences of the
respective lengths. This can be done easily by using the matrix representation
of the state-transition relation. For n = 2, the total number of possible
sequences is obtained by adding all entries in the resulting matrix of the matrix
product MR ¥ MR. In our example,
È0
Í0
Í1
Í
Î0
1
1
1
1
1
1
0
0
MR
1 ˘ È0
0 ˙ Í0
¥
0 ˙ Í1
˙ Í
0˚ Î0
1
1
1
1
1
1
0
0
MR
1 ˘ È1
0 ˙ Í1
=
0 ˙ Í0
˙ Í
0˚ Î0
3
1
0˘
2
1
0˙
.
2
2
1˙
˙
1
1
0˚
MR ¥ MR
By adding all entries in the resulting matrix MR ¥ MR, we obtain 16, and this
is exactly the number of possible sequences of length 2. Moreover, the sums
of entries in the individual rows of the resulting matrix are equal to the number
of possible sequences of length 2 that begin in states assigned to the respec-
2.2. HARTLEY MEASURE OF UNCERTAINTY FOR FINITE SETS
37
tive rows. That is, there are 5, 4, 5, 2 possible sequences of length 2 that begin
in states s1, s2, s3, s4, respectively. Similarly, the sums of the entries in the individual columns of the matrix are equal to the number of possible sequences
of length 2 that terminate in states assigned to the respective columns. The
same results apply to sequences of lengths 3, 4, and so on, but we need to
perform, respectively,
(M R ¥ M R ) ¥ M R , ((M R ¥ M R ) ¥ M R ) ¥ M R ,
and so on.
Determining the number of possible sequences for n Œ ⺞10 and calculating
the average predictive nonspecificity for each n in this range by Eq. (2.36), we
obtain the following sequence of predictive nonspecificities: 1, 2, 2.95, 4.11,
5.16, 6.21, 7.27, 8.32, 9.37, 10.43. As expected, the predictive nonspecificities
increase with n. This means qualitatively that long-term predictions by a nondeterministic system are less specific than short-term predictions by the same
system.
Assume now that only one state, si, is possible at time t and we want to calculate again the nonspecifity in predicting the sequence of states of length n.
In this case,
H (St +1 ¥ St + 2 ¥ ◊ ◊ ◊ ¥ St +n {si }) = H ({si } ¥ St +1 ¥ St + 2 ¥ ◊ ◊ ◊ ¥ St +n ) - H ({si }).
As already mentioned, the number of sequences of states of length n that begin
with state si, which we need for this calculation, is obtained by adding the
entries in the respective row of the matrix resulting from the required chain
of n - 1 matrix products. For st = s1 in our example and n Œ ⺞10, we obtain the
following predictive nonspecificities: 1.58, 2.32, 3.46, 4.46, 5.55, 6.58, 7.65, 8.70,
9.75, 10.81. As expected from the high initial nonspecificity H(St+1 | {s1}), all
these values are above average. On the other hand, the following values for st
= s4 are all below average: 0, 1, 2, 3.17, 4.17, 5.25, 6.29, 7.35, 8.40, 9.45.
EXAMPLE 2.2. Consider the same system and the same types of predictions
as in Example 2.1. However, let the focus in this example be on the predictive
informativeness of the system rather than its predictive nonspecificity. That is,
the aim of this example is to calculate the amount of information contained
in each prediction of a certain type made by the system. In each case, we need
to calculate the maximum amount of predictive nonspecificity, obtained in the
face of total ignorance, and the actual amount of predictive nonspecificity associated with the prediction made by the system. The amount of information
provided by the system is then defined as the difference between the maximum
and actual amounts of predictive nonspecificity.
In general, the distinguishing feature of total ignorance within the classical
possibility theory is that all recognized alternatives are possible. In our
example, the recognized alternatives are transitions from states to states, each
38
2. CLASSICAL POSSIBILITY-BASED UNCERTAINTY THEORY
of which is represented by one cell in the matrix in Figure 2.1a. Predictions of
states or sequences of states are determined via these transitions. Maximum
nonspecificity in each prediction is obtained when all the recognized transitions are possible.When, on the other hand, only one transition from each state
is possible, each prediction is fully specific and, hence, the system is
deterministic.
The following are examples that illustrate the use of the Hartley measure
for calculating informativeness of those types of predictions that are examined in Example 2.1:
(a) Let H(St+1 | St) have the same meaning as in Example 2.1, and let
Ĥ(St+1 | St) be the average nonspecificity in predicting the state at time
t + 1 in the face of total ignorance. Then, the average amount of information, IH(St+1 | St), contained in the prediction made by the system (or
the informativeness of the system with the respect to predicting the
next state) is given by the formula
I H (St +1 St ) = Hˆ (St +1 St ) - H (St +1 St ).
Since there are 16 possible transitions in the face of total ignorance,
Hˆ (St +1 St ) = log 216 - log 24 = 2.
From Example 2.1a, H(St+1 | St) = 1. Hence, IH(St+1 | St) = 2 - 1 = 1.
(b) In this case,
Hˆ (St +1 {s1 , s 2 }) = Hˆ ({s1 , s 2 } ¥ St +1 ) - H ({s1 , s 2 }) = log 2 8 - log 2 2 = 2.
Then, using the result in Example 2.1b, we have
I H (St +1 {s1 , s 2 }) = Hˆ (St +1 {s1 , s 2 }) - H (St +1 {s1 , s 2 }) = 2 - 1.32 = 0.68.
(c) If any state is possible at time t, the number of possible sequences of
states of length n in the face of total ignorance is clearly equal to 4n+1.
Hence,
Hˆ (St +1 ¥ St + 2 ¥ . . . ¥ St +n St ) = Hˆ (St ¥ St +1 ¥ . . . ¥ St +n ) - H (Sn )
= log 24 n +1 - log 24 = 2 n,
and consequently,
I H (St +1 ¥ St + 2 ¥ . . . ¥ St +n St ) = 2 n - H (St +1 ¥ St + 2 ¥ . . . ¥ St +n St ).
2.2. HARTLEY MEASURE OF UNCERTAINTY FOR FINITE SETS
39
Using the values of H(St+1 ¥ St+2 ¥ · · · ¥ St+n | St) calculated for n Œ ⺞10 in
Example 2.1c, we readily obtain the corresponding values of IH(St+1 ¥ St+2 ¥
· · · ¥ St+n | St): 1, 2, 3.05, 3.89, 4.84, 5.79, 6.73, 7.68, 8.63, 9.57.
When only one state, si, is possible at time t, the number of possible
sequences of states of length n in the face of total ignorance is equal to 4n,
which means that
Hˆ (S t +1 ¥ S t + 2 ¥ . . . ¥ S n {si}) = Hˆ ({si} ¥ S t +1 ¥ S t + 2 ¥ . . . ¥ S t + n ) - H ({si})
= Hˆ ({si} ¥ S t +1 ¥ . . . ¥ S t + n )
= log 4 n
= 2 n.
Hence,
I H (St +1 ¥ St + 2 ¥ . . . ¥ Sn {si }) = 2 n - H (St +1 ¥ St + 2 ¥ . . . ¥ Sn {si }).
Using the values of H(St+1 ¥ St+2 ¥ · · · ¥ Sn | {si}) calculated for n Œ ⺞10 in
Example 2.1c, we obtain the corresponding values of IH(St+1 ¥ St+2 ¥ . . . ¥
Sn | {si}): 0.42, 1.68, 2.54, 4.45, 5.42, 6.35, 7.30, 8.25, 9.19.
EXAMPLE 2.3. Consider a system with four variables, x1, x2, x3, x4, which
take their values from the set {0, 1}. These variables are constrained via a
particular 4-dimensional relation R Õ {0, 1}4, but this relation is not known.
We only know how the following pairs of the four variables are related:
·x1, x2Ò, ·x1, x4Ò, ·x2, x3Ò, ·x3, x4Ò. Let R12, R14, R23, R34 denote, respectively,
these partial relations on {0, 1}2, and let P = {R12, R14, R23, R34}. The partial
relations are defined in Figure 2.2a. All of the introduced relations can also be
represented by their basic possibility functions. Let r, r12, r14, r23, r34 denote these
functions.
If relation R were known, the four partial relations (or any of the other
partial relations) would be uniquely determined as specific projections of R
via the max operation of possibility theory. For example, using the labels introduced for all overall states (elements of the Cartesian product {0, 1}4) in Figure
2.2c, we have
r12 (0, 0) = max{r ( s0 ), r ( s1 ), r ( s 2 ), r ( s3 )},
r12 (0, 1) = max{r ( s4 ), r ( s 5 ), r ( s6 ), r ( s 7 )},
r12 (1, 0) = max{r ( s8 ), r ( s9 ), r ( s10 ), r ( s11 )},
r12 (1, 1) = max{r ( s12 ), r ( s13 ), r ( s14 ), r ( s15 )}.
In our case, R is not known and we want to determine it on the basis of information in the partial relations (projections of R). This inverse problem, illustrated in Figure 2.2b, is usually referred to as system identification. In general,
40
2. CLASSICAL POSSIBILITY-BASED UNCERTAINTY THEORY
R12:
x1
0
0
1
R14:
x2
0
1
1
x1
0
0
1
R23:
x4
0
1
1
x2
0
1
1
R34:
x3
0
0
1
x3
0
0
1
x4
0
1
0
(a)
x1
x2
R12
R14
fi
R23
R
x1
x4
States
s0
s1
s2
s3
s4
s5
s6
s7
s8
s9
s10
s11
s12
s13
s14
s15
R34
x1
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
x3
x4
(b)
x3
x2
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
x2
x3
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
x4
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
Cylindric closure
States
s0
s1
s4
s5
s6
s13
x1
0
0
0
0
0
1
x2
0
0
1
1
1
1
x3
0
0
0
0
1
0
x4
0
1
0
1
0
1
Possible states of R
(states that are consistent with
R12, R14, R23, R34)
(d )
Potential states of R (Cartesian
product {0, 1}4)
(c)
Figure 2.2. System identification (Example 2.3).
R cannot be determined uniquely from its projections. We can only determine
a family, RP, of all relations that are consistent with the given projections in
set P. Clearly, RP Õ P({0, 1}4). It is convenient to determine RP in two steps.
First, we determine the set of all overall states (elements of {0, 1}4 in our case)
2.2. HARTLEY MEASURE OF UNCERTAINTY FOR FINITE SETS
41
that are possible under the given information. These are states that are consistent with the given projections. In our case, a particular overall state ·x.1, x.2,
x.3, x.4Ò is possible if and only if
x˙ 1 , x˙ 2 ŒR12 and x˙ 1 , x˙ 4 ŒR14 and x˙ 2 , x˙ 3 ŒR23 and x˙ 3 , x˙ 4 ŒR34 .
The possibility of each overall state ·x.1, x.2, x.3, x.4Ò Œ {0, 1}4 thus can be determined by the equation
r x˙ 1 , x˙ 2 , x˙ 3 , x˙ 4 = min{r12 ( x˙ 1 , x˙ 2 ), r14 ( x˙ 1 , x˙ 4 ), r23 ( x˙ 2 , x˙ 3 ), r34 ( x˙ 3 , x˙ 4 )}.
The resulting set of all possible overall states, which is usually called a cylindric closure of the given projections, is shown in Figure 2.2d. The term “cylindric closure” emerged from a classical method for determining the set of all
possible overall states from given projections. In this method (less efficient
than the one described here), the cylindric extension is constructed for each
projection with respect to the remaining dimensions and the intersection of
all these cylindric extensions is the cylindric closure. The unknown relation R
is guaranteed to be a subset of the cylindric closure.
Once the set of all possible overall states (the cylindric closure) is determined, the next step is to determine all its subsets that are complete in the
sense that they cover all possible states of the given projections. In our
example, there are eight such subsets, one of which is the cylindric closure
itself:
{s0 , s1 , s4 , s 5 , s6 , s13 }
{s0 , s1 , s4 , s6 , s13 }
{s0 , s1 , s 5 , s6 , s13 }
{s0 , s4 , s 5 , s6 , s13 }
{s1 , s4 , s 5 , s6 , s13 }
{s0 , s1 , s6 , s13 }
{s0 , s 5 , s6 , s13 }
{s1 , s4 , s6 , s13 }
Each of these subsets of the Cartesian product {0, 1}4 can be the unknown relation R, but we have no basis to decide which one it is. We therefore identified
a family, RP, of all possible overall relations. Each of these relations is both
consistent and complete with respect to the given projections in P. The
identification nonspecificity is given by the Hartley measure
H (R P ) = log 2 R P
= log 2 8 = 3.
42
2. CLASSICAL POSSIBILITY-BASED UNCERTAINTY THEORY
The identification nonspecificity is of course uniquely determined by the given
set P of projections of R. Its maximum is obtained when P = ∆, which expresses
our total ignorance about R. Clearly,
(
4
H (R Ø ) = log 2 P {0, 1}
= log 2 216 = 16.
)
The amount of information, I(P), about R contained in the given set P of projections of R is then calculated by the formula
I (P ) = H (R Ø ) - H (R P )
= 16 - 3 = 13.
The dependence of H(RP) and I(P) on P is examined in the next example.
EXAMPLE 2.4. Consider a system with three variables, x1, x2, x3. The variables, whose values are in the set {0, 1}, are constrained by a ternary relation
R Õ {0, 1}3 that is not known. Labels of all potential states of the system, that
is, elements of {0, 1}3, are introduced in Table 2.1a.Assume that the three binary
relations specified Table 2.1b are projections of R. Assume further that we
know either all of these projections or only two of them (see Table 2.1c, the
first column). Our aim is to identify the unknown relation R from information
in each of these four sets of projections, P1, P2, P3, P4. For each Pi (i Œ ⺞4), we
determine first the cylindric closure and all its complete subsets, R Pi , in the
same way as in Example 2.3 (the second column in the Table 2.1c). Then, we
can calculate for each Pi the identification nonspecificity, H(R Pi), and the information content, I(R Pi), of the projections in Pi (columns 3 and 4 in the Table
2.1c). This example is quite illustrative. It shows that the choice of projections
is important. When we know all the projections (P1), the identification is fully
specific and the information content is 8 bits (maximum identification nonspecificity is log228 = 8 and the information contained in the three projections
reduces it to 0). When we know only projections R13 and R23 (P4), the identification is still fully specific. Therefore I(P1) = I(P4), which means that adding
projection R12 to P4 does not increase the information content. Each of the
remaining pairs of projections, P2 and P3, identifies seven possible overall relations, so their identification nonspecificity is log27 = 2.81 and their information
content is 8 - 2.81 = 5.19.
EXAMPLE 2.5. The purpose of this example is to illustrate how the various
properties of the Hartley measure, expressed by Eqs. (2.16)–(2.35), can be utilized for analyzing n-dimensional relations (n ≥ 2). A simple system with four
variables, x1, x2, x3, x4, is employed here as an example. The variables take their
values in sets X1, X2, X3, X4, respectively, where X1 = X2 = {0, 1} and X3 = X4 =
{0, 1, 2}. All possible overall states of the system are listed in Table 2.2a. This
43
2.2. HARTLEY MEASURE OF UNCERTAINTY FOR FINITE SETS
Table 2.1. System Identification (Example 2.4)
States
x1
x2
x3
s0
s1
s2
s3
s4
s5
s6
s7
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
R12:
x1
x2
0
0
1
0
1
0
R13:
(a)
R23:
x1
x3
0
0
1
0
1
0
x2
x3
0
0
1
0
1
1
(b)
Sets of Known
Projections: Pi
Cylindric Closure (CC) and Its
Complete Subsets: RPi
H(RPi)
I(Pi)
P1 = {R12, R13, R23}
{s0, s1, s3, s4} (CC)
0
8
P2 = {R12, R13}
{s0, s1, s2, s3, s4} (CC)
{s0, s1, s2, s4}
{s0, s1, s3, s4}
{s0, s2, s3, s4}
{s1, s2, s3, s4}
{s0, s3, s4}
{s1, s2, s4}
2.81
5.19
P3 = {R12, R23}
{s0, s1, s3, s4, s5} (CC)
{s0, s1, s3, s4}
{s0, s1, s3, s5}
{s0, s3, s4, s5}
{s1, s3, s4, s5}
{s0, s3, s5}
{s1, s3, s5}
2.81
5.19
P4 = {R13, R23}
{s0, s1, s3, s4,} (CC)
0
8
(c)
set of overall states is a 4-dimensional relation R on the Cartesian product {0,
1}2 ¥ {0, 1, 2}2. An important way of analyzing such a relation is to search for
strong dependencies between various subsets of variables. The capability of
measuring conditional uncertainty and information transmission for any two
disjoint subsets of variables is essential for conducting such a search in a meaningful way.
44
Table 2.2. Information Analysis of a 4-Dimensional Relation (Example 2.5)
Relation R
Conditional uncertainties and information transmissions
x1
x2
x3
x4
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
1
1
0
0
1
0
0
1
0
1
0
0
0
0
0
0
1
1
1
1
1
0
1
2
2
2
0
0
0
1
1
0
1
1
0
1
2
2
2
2
0
1
2
1
0
1
2
0
1
1
2
2
2
0
1
2
0
1
0
1
2
(a)
H(X1|X2 ¥ X3 ¥ X4)
H(X2 ¥ X3 ¥ X4|X1)
TH(X1, X2 ¥ X3 ¥ X4)
H(X2|X1 ¥ X3 ¥ X4)
H(X1 ¥ X3 ¥ X4|X2)
TH(X2, X1 ¥ X3 ¥ X4)
H(X3|X1 ¥ X2 ¥ X4)
H(X1 ¥ X2 ¥ X4|X3)
TH(X3, X1 ¥ X2 ¥ X4)
H(X4|X1 ¥ X2 ¥ X3)
H(X1 ¥ X2 ¥ X3|X4)
TH(X4, X1 ¥ X2 ¥ X3)
H(X1 ¥ X2|X3 ¥ X4)
H(X3 ¥ X4|X1 ¥ X2)
TH(X1 ¥ X2, X3 ¥ X4)
H(X1 ¥ X3|X2 ¥ X4)
H(X2 ¥ X4|X1 ¥ X3)
TH(X1 ¥ X3, X2 ¥ X4)
H(X1 ¥ X4|X2 ¥ X3)
H(X2 ¥ X3|X1 ¥ X4)
TH(X1 ¥ X4, X2 ¥ X3)
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
4.39 - 3.91 = 0.48
4.39 - 1 = 3.39
1 + 3.91 - 4.39 = 0.52
4.39 - 3.91 = 0.48
4.39 - 1 = 3.39
1 + 3.91 - 4.39 = 0.52
4.39 - 3.32 = 1.07
4.39 - 1.58 = 2.81
1.58 + 3.32 - 4.39 = 0.51
4.39 - 3.46 = 0.93
4.39 - 1.58 = 2.81
1.58 + 3.46 - 4.39 = 0.65
4.39 - 3.17 = 1.22
4.39 - 2 = 2.39
3.17 + 2 - 4.39 = 0.78
4.39 - 2.85 = 1.54
4.39 - 2.85 = 1.54
2.85 + 2.85 - 4.39 = 1.31
4.39 - 2.85 = 1.54
4.39 - 2.85 = 1.54
2.85 + 2.85 - 4.39 = 1.31
(b)
Normalized counterparts
0.48/log2 2 = 0.48
3.39/log2 18 = 0.81
0.52/log2 2 = 0.52
0.48/log2 2 = 0.48
3.39/log2 18 = 0.81
0.52/log2 2 = 0.52
1.07/log2 3 = 0.68
2.81/log2 12 = 0.78
0.51/log2 3 = 0.32
0.93/log2 3 = 0.59
2.81/log2 12 = 0.78
0.65/log2 3 = 0.41
1.22/log2 4 = 0.61
2.39/log2 9 = 0.75
0.78/log2 6 = 0.39
1.54/log2 6 = 0.54
1.54/log2 6 = 0.54
1.31/log2 6 = 0.51
1.54/log2 6 = 0.54
1.54/log2 6 = 0.54
1.31/log2 6 = 0.51
(c)
2.3. HARTLEY-LIKE MEASURE OF UNCERTAINTY FOR INFINITE SETS
45
Suppose we want to calculate conditional uncertainties and information
transmissions for all partitions of {x1, x2, x3, x4} with two blocks, as listed in
Table 2.2b. Due to Eqs. (2.23), (2.24), and (2.31), all these calculations are
based on the following values of the Hartley measure, which are obtained
directly from the given relation R:
H ( X 1 ) = H ( X 2 ) = log 2 2 = 1,
H ( X 3 ) = H ( X 4 ) = log 23 = 1.58,
H ( X 1 ¥ X 2 ) = log 24 = 2,
H ( X 3 ¥ X 4 ) = log 29 = 3.17,
H (X1 ¥ X 3 ) = H (X1 ¥ X 4 ) = H (X 2 ¥ X 3 ) = H (X 2 ¥ X 4 )
= log 26 = 2.85,
H ( X 1 ¥ X 2 ¥ X 3 ) = log 211 = 3.46,
H ( X 1 ¥ X 2 ¥ X 4 ) = log 210 = 3.32,
H ( X 1 ¥ X 3 ¥ X 4 ) = H ( X 2 ¥ X 3 ¥ X 4 ) = log 215 = 3.91,
H ( X 1 ¥ X 2 ¥ X 3 ¥ X 4 ) = log 2 21 = 4.39.
These values are shown in the calculations in Table 2.2b. Also shown in Table
2.2c are calculations of their normalized counterparts.
The capability of calculating conditional uncertainties and information
transmissions between groups of variables, illustrated here by a simple
example, is a particularly important tool for analyzing high-dimensional relations. Normally, we need to do these calculations only for some groups of variables that are of interest in each individual application.
2.3. HARTLEY-LIKE MEASURE OF UNCERTAINTY FOR INFINITE SETS
The Hartley measure is applicable only to finite sets. Its counterpart for
bounded and convex subsets of the n-dimensional Euclidean space ⺢n (n ≥ 1,
finite), which is called a Hartley-like measure, emerged only in the mid-1990s
(see Note 2.3).
2.3.1. Definition
Let X denote a universal set of concern that is assumed to be a bounded and
convex subset of ⺢n for some n ≥ 1, and let HL denote the Hartley-like
measure defined on convex subsets of X. Then, HL is a functional of the form
HL:C Æ ⺢ + ,
46
2. CLASSICAL POSSIBILITY-BASED UNCERTAINTY THEORY
where C denotes the family of all convex subsets of X. Functionals in the class
defined for all convex subsets A of X by the formula
n
È n
˘¸
Ï
HL( A) = minÌ c log b Í’ [1 + m ( Ait )] + m ( A) - ’ m ( Ait )˙ ˝,
t ŒT Ó
Î i=1
˚˛
i=1
(2.37)
were found to satisfy all axiomatic requirements, stated in Section 2.3.2, that
are essential for measuring uncertainty in this case. Symbols in Eq. (2.37) have
the following meaning:
•
•
•
•
m denotes the Lebesgue measure;
T denotes the set of all isometric transformations from one orthogonal
coordinate system to another;
Ait denotes the ith projection of A in coordinate system t;
b and c denote positive constants (b π 1), whose choice defines a measurement unit.
Equation (2.37) allows us to define a measurement unit for the Hartley-like
measure by choosing any positive values b and c except b = 1. Let the
measurement unit be defined by the requirement that HL(A) = 1 when A
is a closed interval of real numbers of length 1 in some assumed unit of
length, which must be specified in each particular application. That is, we
require that
c log b 2 = 1
for the specified unit of length. It is convenient to choose b = 2 and c = 1 to
satisfy this equation. Then,
n
È n
˘¸
Ï
HL( A) = minÌlog 2 Í’ [1 + m ( Ait )] + m ( A) - ’ m ( Ait )˙ ˝.
t ŒT Ó
Î i=1
˚˛
i=1
(2.38)
The chosen measurement unit is intuitively appealing, since HL(A) = 1 when
A is a unit interval, HL(A) = 2 when A is a unit square, HL(A) = 3 when A is
a unit cube, and so on.
2.3.2. Required Properties
n
Let X = i=1 Xi denote a universal set of concern that is assumed to be a bounded
and convex subset of ⺢n for some finite n ≥ 1. Then the Hartley-like measure
defined on the family of all convex subsets of X, C, by Eq. (2.38) is expected
to satisfy the following axiomatic requirements:
2.3. HARTLEY-LIKE MEASURE OF UNCERTAINTY FOR INFINITE SETS
47
Axiom (HL1) Range. For each A Œ C, HL(A) Œ [0, •), where HL(A) = 0 if
and only if A = {x} for some x Œ X.
Axiom (HL2) Monotonicity. For all A, B Œ C, if A Õ B, then HL(A) £ HL(B).
n
Axiom (HL3) Subadditivity. For each A Œ C, HL( A) £  HL( Ai ), where Ai
i =1
denotes the 1-dimensional projection of A to dimension i in some coordinate
system.
n
Axiom (HL4) Additivity. For all A Œ C, such that A = i=1 Ai where Ai has the
same meaning as in Axiom (HL3),
n
HL( A) = Â HL( Ai ).
i =1
Axiom (HL5) Coordinate Invariance. Functional HL does not change under
isometric transformations of the coordinate system.
Axiom (HL6) Continuity. HL is a continuous functional.
It is evident that HL defined by Eq. (2.38) is continuous, invariant with
respect to isometric transformations of the coordinate system, and that it satisfies the required range. The monotonicity of the functional follows from the
corresponding monotonicity of the Lebesgue measure, and its subadditivity is
demonstrated as follows: for any A Œ C,
n
È n
˘¸
Ï
HL( A) = minÌlog 2 Í’ [1 + m ( Ait )] + m ( A) - ’ m ( Ait )˙ ˝
t ŒT Ó
Î i=1
˚˛
i=1
n
È
˘
Ï
¸
£ minÌlog 2 Í’ [1 + m ( Ait )]˙ ˝
t ŒT Ó
Î i=1
˚˛
n
Ï
¸
= minÌ Â log 2 [1 + m ( Ait )]˝
t ŒT Ó
˛
i=1
n
=
 HL( A ).
i
i=1
It remains to show that the proposed functional is additive, to be fully justified as a general measure of nonspecificity of convex subsets of X in ⺢n for
any finite n ≥ 1.
To prove that the proposed functional is additive, we must prove that
n
HL( A) = Â HL( Ai )
i =1
48
2. CLASSICAL POSSIBILITY-BASED UNCERTAINTY THEORY
n
for any A Œ C such that A = i=1 Ai. It has already been shown that
n
HL( A) £ Â HL( Ai )
i =1
for any A Œ C. Hence, it remains to prove that
n
HL( A) ≥ Â HL( Ai )
i =1
n
when A = i=1 Ai. This, in turn, amounts to proving that for any rotation of the
set A,
n
n
n
’ [1 + m ( A )] + m ( A) - ’ m ( A ) ≥ ’ [1 + m ( A )].
it
it
i =1
i
i =1
i =1
n
Since m ( A) = ’ m ( Ai ), this inequality can be written as
i =1
n
n
n
n
i =1
i =1
i =1
i =1
’ [1 + m ( Ait )] - ’ m ( Ait ) ≥ ’ [1 + m ( Ai )] - ’ m ( Ai ).
(2.39)
For n = 1, this inequality is trivially satisfied. For n = 2, 3, it nis proved here
by directly examining the effect of rotation of the set A = i=1 Ai (n = 2, 3)
on projections. The proof for an arbitrary finite n is not presented here,
since it is based on some special results from convexity theory (see
Note 2.3).
Generally, in the n-dimensional space, ⺢n, any rotation can be represented
by the orthogonal matrix
Ècos a 11
Ícos a 21
I=Í
M
Í
Îcos a n 1
cos a 12
cos a 22
M
cos a n 2
...
...
O
...
cos a 1n ˘
cos a 2n ˙
˙,
M
˙
cos a nn ˚
where the parameters satisfy the following properties:
1.
Â
n
2.
Â
Â
n
i =1
cos 2a ij = 1, " j Œ⺞n , and
k =1
n
k =1
Â
n
j =1
cos 2a ij = 1, "i Œ⺞n .
cos a ik cos a jk = 0, "i, j Œ⺞n , and
cos a ki cos a kj = 0, "i, j Œ⺞n .
2.3. HARTLEY-LIKE MEASURE OF UNCERTAINTY FOR INFINITE SETS
49
For each given rotation defined by matrix I, an arbitrary point
x = x1 , x2 , ... , xn
T
in ⺢n is transformed to the point
x ¢ = x1¢ , x2¢ , ... , xn¢
T
by the matrix equation
x ¢ = Ix.
That is,
x1¢ = x1cos a 11 + x 2 cos a 12 + . . . + x n cos a 1n
x ¢2 = x1cos a 21 + x 2 cos a 22 + . . . + x n cos a 2n
M
x ¢n = x1cos a n 1 + x2 cos a n 2 + . . . + xn cos a nn .
Let us consider, without any loss of generality, that
n
A = i=1 [0, a1]
for some ai Œ ⺢, i Œ ⺞n. Then, the ith projection of this set subjected to rotation defined by the matrix I is the set
Ai = {xi¢ x ¢ = Ix, "x ŒA}
for any i Œ ⺞n. The Lebesgue measure of the projection is
m ( Ai ) = max{ xi¢ - yi¢ }.
x ,y ŒA
That is
m ( Ai ) = max
x ,y ŒA
{Â
n
j =1
( x j - y j ) cos a ij
}
for any i Œ ⺞n. Since this maximum must be reached by two vertices of the set
A, the Lebesgue measure of the projection can be rewritten by
n
m ( Ai ) = Â ak cos a ik
k =1
50
2. CLASSICAL POSSIBILITY-BASED UNCERTAINTY THEORY
for any i Œ ⺞n.
Using the last formula, let us examine some aspects of the proposed function pertaining to its additivity. First let us present the following two basic
properties:
n
(a)
n
 m ( A ) ≥  a . This is because
i
i
i =1
i =1
n
n
n
 m(A ) =   a
k
i
i =1
cos a ik
i =1 k =1
n n
= Â Â ak cos a ik
k =1 i =1
n n
≥ Â Â ak cos 2a ik
k =1 i =1
n
n
=  ak  cos 2a ik
k =1
n
i =1
= Â ak .
k =1
n
(b)
n
’ m ( A ) ≥ ’ a . This is because of the fact that the Lesbesgue measure
i
i
i =1
i =1
of the set is less than or equal to the Lesbesgue measure of the
Cartesian product of its 1-dimensional projections.
The Two-Dimensional Case Let set A be a rectangle in the standard coordinate system that is shown in Figure 2.3. Since A = [0, a1] ¥ [0, a2] in this
system, we have
2
2
’ [1 + m ( A )] + m ( A) - ’ m ( A ) = (1 + a )(1 + a
it
it
i=1
1
2
).
i=1
Now we prove that this is the minimum for all rotations. In the 2-dimensional
space, any rotation can be represented by the matrix
cos q
I = ÈÍ
Î sin q
-sin q ˘
.
cos q ˙˚
Figure 2.3 illustrates a rotated rectangle A and its projections. It is easy to show
that
2.3. HARTLEY-LIKE MEASURE OF UNCERTAINTY FOR INFINITE SETS
51
Y
A
A2
q
X
A1
Figure 2.3. Rotation of set A in ⺢2.
m ( A1 ) = a1 cos q + a2 sin q
m ( A2 ) = a1 sin q + a2 cos q .
Then under the new coordinate system,
2
2
’ [1 + m ( A )] + m ( A) - ’ m ( A )
it
i =1
it
i =1
= 1 + a1 cos q + a2 sin q + a1 sin q + a2 cos q
+ (a1 cos q + a2 sin q )(a1 sin q + a2 cos q ) + a1a2
≥ 1 + a1 cos 2q + a2 sin 2q + a1 sin 2q + a2 cos 2q
+ (a1 cos 2q + a2 sin 2q )(a1 sin 2q + a2 cos 2q ) + a1a2
≥ (1 + a1 )(1 + a2 ).
Therefore, in the 2-dimensional case the measure is additive.
The Three-Dimensional Case To prove the additivity in the 3-dimensional
space, we only need to prove that for any rotation of set A Œ C,
m ( A1 )m ( A2 ) + m ( A1 )m ( A3 ) + m ( A2 )m ( A3 ) ≥ a1a2 + a1a3 + a2 a2 .
52
2. CLASSICAL POSSIBILITY-BASED UNCERTAINTY THEORY
The Cartesian product of the projections includes the original set as a subset,
and both of them are cubes. Hence, the area of the surface is a Cartesian
product, which is twice the left-hand side of the preceding inequality. Hence,
this area is greater than or equal to the area of the surface of the set A, which
is twice the right-hand side. Therefore, the inequality holds.
n-Dimensional Case In addition to n = 2, 3, it is easy to prove the additivity for any arbitrary n under the assumption that A is a set with equal edges.
To show the proof, let ai = a for all i Œ ⺞n. Then, we have
n
m ( Ai ) = Â ak cos a ik
k =1
n
= Â a cos a ik
k =1
n
= a cos a ik
k =1
n
≥ a cos 2a ik
k =1
= a,
and consequently,
n
n
n
’ [1 + m ( A )] + m ( A) - ’ m ( A ) ≥ ’ [1 + m ( A )] ≥ (1 + a)
it
i =1
it
i =1
it
n
.
i =1
Hence, the additivity holds. For a fully general proof of additivity of the
Hartley-like measure, see Note 2.3.
2.3.3. Examples
The purpose of this section is to illustrate by specific examples some subtle
issues involved in computing the various forms of the Hartley-like measure
(basic, conditional, normalized) as well as information based on the Hartleylike measure. For the sake of simplicity, the examples are restricted to n = 2
and it is assumed that the universal set is the Cartesian product of X ¥ Y, where
X = Y = [0, 100] in some units of length, which are assumed to be the same in
all examples. This means that m(X) = m(Y) = 100 and m(X ¥ Y) = 1002. Due to
the additivity of HL, we have
HL( X ¥ Y ) = log 2 (1012 + 100 2 - 100 2 )
= log 2 1012 = 13.32.
EXAMPLE 2.6. Assume that, according to given evidence, the only possible
alternatives (points in X ¥ Y) are located in a square, S, whose side is equal
2.3. HARTLEY-LIKE MEASURE OF UNCERTAINTY FOR INFINITE SETS
53
to 8. Due to the additivity of HL, we can readily calculate the nonspecificity
of this evidence, the associated conditional nonspecificities, and the information transmission:
HL(S) = log 2 (9 2 + 8 2 - 8 2 )
= log 2 81 = 6.34,
HL(S X ) = HL(SY ) = log 2 9 = 3.17,
HL(S X SY ) = H (S) - H (SY ) = 3.17,
HL(SY S X ) = H (S) - H (S X ) = 3.17,
THL (S X , SY ) = HL(S X ) + HL(SY ) - H (S) = 0.
Next, we can calculate the amount of information contained in the given
evidence:
I HL (S ) = HL(X ¥ Y ) - HL(S ) = 6.98.
It may also be desirable to calculate the normalized counterparts of the
nonspecificity and information, which are independent of the chosen units:
NHL(S) = HL(S) HL( X ¥ Y ) = 0.48,
NI HL (S) = I HL (S) HL( X ¥ Y ) = 0.52.
The last result means that the given evidence contains 52% of the total amount
of information needed to identify the true alternative. Clearly, NHL(S) +
NIHL(S) = 1.
EXAMPLE 2.7. In this example, the only possible alternatives are known to
be located in a circle, C, with a radius r = 4. Clearly, m(C) = p · 42 = 50.27. The
two projections of C, CX and CY, are in this case invariant with respect to rotations and, clearly, m(CX) = m(CY) = 8. Hence, the calculations of the same quantities as in Example 2.6 is straightforward:
HL(C ) = log 2 (9 2 + 50.27 - 8 2 ) = 6.07,
HL(C X ) = H (CY ) = log 2 9 = 3.17,
HL(C X CY ) = HL(C ) - HL(CY ) = 2.9,
HL(CY C X ) = HL(C ) - HL(C X ) = 2.9,
THL (C X , CY ) = HL(C X ) + HL(CY ) - HL(C ) = 0.27,
I HL (C ) = HL( X ¥ Y ) - HL(C ) = 7.25,
NHL(C ) = HL(C ) HL( X ¥ Y ) = 0.46,
NI HL (C ) = I HL (C ) HL( X ¥ Y ) = 0.54.
54
2. CLASSICAL POSSIBILITY-BASED UNCERTAINTY THEORY
EXAMPLE 2.8. According to given evidence, the only possible alternatives
are located in an equilateral triangle, E, with sides of length a (in some
appropriate units of lengths). Assume, without any loss of generality, that one
vertex of the triangle is located at the origin of the coordinated system, as
shown in Figure 2.4a. When the triangle is rotated around the origin, the
projections of E change and, hence, the logarithmic function in Eq. (2.38)
changes as well. To calculate HL(E), we need to determine the minimum of
this function.
As shown in Figure 2.4a, the position of the triangle can be expressed by
the angle a. Due to the symmetry of E, it is sufficient to consider values
a Œ [0, 30°]. Within this range, the dependence of the projections, EX and EY,
on a is expressed by the equations
E X (a ) = a cos a ,
EY (a ) = a cos(30∞ -a ).
Moreover, m (E ) = a 2 3 4 which is the area of the triangle. The logarithmic
function in Eq. (2.38) is then a function of a, f(a), expressed by the formula
f (a ) = log 2 [1 + a cos a + a cos(30∞ -a ) + a 2 3 4].
A natural way to determine extremes of f(a) is to solve the equation f ¢(a) =
0 for a, where f ¢(a) denotes the derivative of f with respect to a. That is,
f ¢(a ) =
-a sin a + a sin(30∞ -a )
= 0.
ln 2(1 + cos a + a cos(30∞ - a ) + a 2 3 4)
This equation reduces to the simple equation
- sin a + sin(30∞ -a ) = 0,
which is independent of a. The solution is a = 15°. However, by plotting the
function f(a) for some value of a (or by determining that its second derivative is negative), we can easily find that the function attains its maximum at a
= 15°. Due to the periodicity of the rotation with the cycle of 30°, maxima of
f(a) are also obtained at a = (15 ± 30k)°, for any nonnegative integer k. Three
cycles (k = 0, 1, 2) are shown are Figure 2.4b for a = 4. We can see that the
minimum value of f(a) is attained at a = (0 ± 30k)°, which are values at which
the function is not differentiable. When a = 4, the minimum value of f(a) is
3.944 (Figure 2.4b), which is also the value of HL(E). For an arbitrary a,
HL(E ) = f (0) = log 2 [1 + a + a 3 2 + a 2 3 4].
Moreover, we have
2.3. HARTLEY-LIKE MEASURE OF UNCERTAINTY FOR INFINITE SETS
Y
Ey (a)
EE
a
X
Ex(a)
(a)
f
3.97
3.965
3.96
3.955
3.95
3.945
a
20
40
60
(b)
Figure 2.4. Calculation of HL(E ) in Example 2.8.
80
55
56
2. CLASSICAL POSSIBILITY-BASED UNCERTAINTY THEORY
HL(E X ) = log 2 [1 + a]
and
HL(EY ) = log 2 [1 + a 3 2].
For a = 4, HL(EX) = 2.322 and HL(EY ) = 2.158. Then,
HL(E X EY ) = HL(E ) - HL(EY ) = 1.786,
HL(EY E X ) = HL(E ) - HL(E X ) = 1.622,
THL (E X , EY ) = HL(E X ) + HL(EY ) - H (E ) = 0.536,
I NL (E ) = HL( X ¥ Y ) - HL(E ) = 9.376,
NHL(E ) = HL(E ) HL( X ¥ Y ) = 0.296,
NI HL (E ) = I HL (E ) HL( X ¥ Y ) = 0.704.
NOTES
2.1. Possibility and necessity functions introduced in Section 2.1 are closely connected
with operators of possibility and necessity in classical modal logic [Chellas, 1980;
Hughes and Creswell, 1996]. To show this connection let PosE(A) defined by Eq.
(2.2) be interpreted for each A Œ P(X) as the truth value of the proposition “Given
evidence E, it is possible that the true alternative is in set A.” Then, according to
modal logic,
PosE ( A or B) iff
PosE ( A) or PosE ( B),
which is the counterpart of Eq. (2.3). Moreover, the following is a tautology in
modal logic: “Any proposition p is necessary iff its complement, p̄, is not possible.”
Its counterpart is Eq. (2.4).
2.2. Possibilistic measure of uncertainty was derived by Hartley [1928]. Its significance
is discussed by Kolmogorov [1965]. Axiomatic treatment of the Hartley measure
and the proof of its uniqueness presented in Section 2.2 are due to Rényi [1970b].
The uniqueness of the Hartley measure was also proved under different axioms
(somewhat less intuitive) and in more complicated ways by the Hungarian
mathematician Erdös in 1946 and by the Russian mathematician Fadeev in 1957.
References to these historical publications are given in [Rényi, 1970b], including
a reference to his own original publication of the proof in 1959.
2.3. The Hartley-like measure, HL, was proposed in a paper by Klir and Yuan [1995b].
The proposed measure was proved in the paper to satisfy all essential requirements with one exception: the additivity for sets with unequal edges when n > 3
remained an open problem. It was posed in SIAM Review [38(2), 1996, p. 315] in
the following form:
Let A = [0,a1] ¥ · · · ¥ [0,an] be a block of Rn, and let At be the block obtained
by rigid rotation t of A around the origin. For i = 1, 2, . . . , n, let bi denote
the length of the projection of At on the ith coordinate axis. Show that
EXERCISES
n
n
n
57
n
’ a + ’ (1 + b ) ≥ ’ (1 + a ) + ’ b .
i
i
i =1
i =1
i
i =1
i
i =1
The problem has come from an attempt to find a measure for nonspecificity
of convex sets, which is a generalization of the Hartley measure for infinite
sets. The requirement of additivity leads to the above mentioned inequality.
The problem was solved by Ramer and the solution was presented first in a very
concise form in the SIAM Review [39(3), 1997, p. 516–51]. Its more elaborate version
is covered in Ramer and Padet [2001]. A possibility of extending the applicability
of the Hartley-like measure to nonconvex sets and some other related issues are
also discussed in this paper.
EXERCISES
2.1. Repeat the calculations in Examples 2.1c and 2.2c under the assumption
that the possible states at the time t are: (i) {s2}; (ii) {s3}; (iii) {s1, s4}, (iv)
{s2, s3, s4}.
2.2. Repeat Examples 2.1 and 2.2 for a system with three states, s1, s2, s3,
whose state-transition relation R is defined by the following matrix:
rR
s1
s2
s3
s1
s2
s3
1
1
0
1
0
1
0
1
0
= MR.
In addition, calculate the nonspecificity in predicting the state at time
t + k (k = 2, 3, 4).
2.3. Assume that the systems employed in the previous Exercises are used
for retrodiction (determining past states or sequences of states) rather
than prediction. Repeat, under this assumption, some of the calculations
done in the previous Exercises.
2.4. Repeat Example 2.3 for some other sets of projections defined by you.
Some projections may be 3-dimensional.
2.5. Consider a system with three variables, x1, x2, x3. The variables, whose
values are in the set {0, 1}, are constrained by a ternary relation R, which
is not known. It is only known that each of the three binary projections
is the full Cartesian product {0, 1}2. Determine the cylindric closure of
those projections and all its subsets that are consistent and complete with
58
2. CLASSICAL POSSIBILITY-BASED UNCERTAINTY THEORY
respect to the projections. Then, calculate the amount of nonspecificity
in identifying the overall relation and the amount of information provided by the projections.
2.6. Consider a system with three variables, x1, x2, x3, whose values are in the
set {0, 1}. The variables are constrained by the following ternary relation:
x1
x2
x3
0
0
1
1
0
1
0
1
1
0
0
1
Calculate conditional nonspecificities and information transmissions, as
well as their normalized versions, for all pairs of variables and for all
two-block partitions of {x1, x2, x3}.
2.7. Assume that the following messages were received with some missing
information. Determine the nonspecificity and informativeness of each
of the messages.
(a) A number with 10 decimal digits was received, but k of the digits
were not readable (k Œ ⺞10).
(b) A coded message with six letters of the English alphabet was
received, but k of the letters were not readable (k Œ ⺞6).
2.8. Consider three variables that are related by the equation v = v1 + v2.
Values of v1 and v2 are integers from 0 to 100, values of v are integers
from 0 to 200. Given a particular value v, what is the nonspecificity in
determining the corresponding values of v1 and v2, and what is the informativeness of v about v1 and v2?
2.9. Consider the equation d = a + bc, where a, b, c are input variables whose
values are integers in the set {0, 1, . . . , 9}, and d is an output variable
with values in the set {0, 1, . . . , 90}. Assume now that the values of the
input variables are not fully specific. We know only that a Œ {3, 4},
b Œ {1, 2, 3}, and c Œ {7, 8}. What are the input and output nonspecificities in this case? What are the amounts of information contained in the
input and in the output?
2.10. Assume that 1000 attractive design alternatives are conceived by an
engineering designer. After applying requirements r1, r2, r3, r4, r5 (in that
order), the number of alternatives is reduced to 200, 100, 64, 12, 1 (in the
respective order). What are the prescriptive nonspecificities at the individual stages of the design process, and what is the amount of prescriptive information contained in each of the requirements?
EXERCISES
59
2.11. To test a particular digital electronic chip with n inputs and m outputs
for correctness means to determine the actual logic function the chip
implements at each output solely by manipulating the input variables
n
and observing the output variables. Initially, there are 2 2 possible logic
functions at each output and, hence, the diagnostic nonspecificity is 2n
bits. To resolve this nonspecificity and determine that the implemented
function is the correct one, 2n tests must be conducted. If n is large, this
is not realistic. However, when less than 100% of the required tests have
been carried out, some diagnostic nonspecificity remains (unless a defect
in the chip was discovered by one of the tests). As an example, let n =
30 and m = 10, and assume that only 90% of the required tests have been
carried out and they are all positive. Calculate the information obtained
by the tests and the remaining diagnostic nonspecificity.
2.12. Consider the 2-dimensional Euclidean space ⺢2, and let the domain of
interest (the universal set) X ¥ Y be the square [0, 1000]2. This specification is expressed in some chosen units of length. Our aim is to determine the location of an object, which we know must be somewhere in
the square [0, 1000]2. From one information source, we know that the
object cannot be outside the square area A shown in Figure 2.5. From
another source, we know that it cannot be outside the circular area B
also shown in the figure. Calculate, assuming that a = 2 (in the chosen
units of length), the following:
(a) Basic and conditional nonspecificities of A, B, and A B;
(b) Normalized versions of these nonspecificities;
(c) Information obtained by source 1, source 2, and both sources taken
together and their normalized versions.
2.13. Assume that the chosen unit of length in Exercise 2.12 is a meter. Repeat
the calculations by expressing the same length in centimeters.
2.14. Repeat Example 2.8 for the following areas in ⺢2:
(a) A hexagon with sides equal to 1;
(b) An ellipse with semiaxes a = 2 and b = 1;
(c) A semicircle with radius r = 5.
2.15. Consider the 3-dimensional Euclidean space ⺢3 within which the domain
of interest, X ¥ Y ¥ Z, is the cube [0, 100]3. For the following convex
subsets of possible points in this domain, calculate the various basic and
conditional amounts of nonspecificity, and the associated information, as
well as values of relevant information transmissions:
(a) A unit cube;
(b) A sphere with radius r = 2;
(c) An ellipsoid with semiaxes a = 4, b = 2, c = 1;
(d) A regular tetrahedron with sides s = 2.
60
2. CLASSICAL POSSIBILITY-BASED UNCERTAINTY THEORY
y
1000
X ¥Y
a
A
Ã
A B
B
0
1000
x
Figure 2.5. Illustration to Exercise 2.12.
2.16. In each of the following algebraic expressions, x and y are input variables whose values are in the interval [0, 10], and z is an output variable
whose range is determined by each of the expressions:
(a) z = x + y
(b) z = xy/(x + y)
(c) z = x/(x + y) + y/(x + y)
(d) z = (x + y)x
Assuming that the values of x and y are known only imprecisely,
x Œ [ x, x̄] and y Œ [ y, ȳ], determine for each expression the input and
output nonspecificity and the input informativeness about the output.
Consider, for example, x Œ [1, 2] and y Œ [5, 7], or x Œ [0.1, 1] and
y Œ [0, 1].
2.17. Consider the set Al of all points on a straight-line segment in the 2dimensional Euclidean space whose length is l(l ≥ 0). Calculate the nonspecificity of Al.
3
CLASSICAL PROBABILITYBASED UNCERTAINTY
THEORY
Probability is degree of certainty and differs from absolute certainty as the part
differs from the whole.
—Jacques Bernoulli
3.1. PROBABILITY FUNCTIONS
Probability-based uncertainty theory, which is the subject of this chapter,
is one of the two classical theories of uncertainty. It is based on the notion
of probability, which, in turn, is based on the notion of classical (additive)
measure. The study of probability has a long history whose outcome is a theory
that is now well developed at each of the four fundamental levels (formalization, calculus, measurement, and methodology). The literature on probability
theory, including textbooks at various levels, is abundant. Since probability
theory is also covered in most academic programs, it is reasonable to assume that the reader is familiar with its fundamentals. Therefore, only a few
basic concepts from probability theory are briefly reviewed in this section.
These are concepts that are needed for examining in greater depth the various
issues regarding the measurement of probabilistic uncertainty (Sections 3.2
and 3.3).
Uncertainty and Information: Foundations of Generalized Information Theory, by George J. Klir
© 2006 by John Wiley & Sons, Inc.
61
62
3. CLASSICAL PROBABILITY-BASED UNCERTAINTY THEORY
3.1.1. Functions on Finite Sets
As in the case of possibilistic uncertainty, consider first a finite set X of mutually exclusive alternatives (predictions, diagnoses, etc.) that are of concern to
us in a given application context. In general, alternatives in set X may be
viewed as states of a variable X. Only one of the alternatives is true, but we
are not certain which one it is. In probability theory, this uncertainty about the
true alternative is expressed by a function
p : X Æ [ 0, 1],
for which
 p( x) = 1.
(3.1)
x ŒX
This function is called a probability distribution function, and the associated
tuple of values p(x) for all x Œ X,
p = p(x) x ŒX ,
is called a probability distribution. For each x Œ X, the value p(x) expresses
the degree of evidential support that x is the true alternative. A variable X
whose states x Œ X are associated with probabilities p(x) is usually called a
random variable.
Given a probability distribution function p, the associated probability
measure, Pro, is obtained for all A Œ P(X) via the formula
Pro( A) =
 p(x).
(3.2)
x ŒA
However, it is often not necessary to consider all sets in P(X). Any family of
subsets of X, C 債 P(X), is acceptable provided that it contains X and it is closed
under complementation and finite unions. Members of C are called events. For
any pair of disjoint events A and B,
Pro( A » B) = Pro( A) + Pro(B).
(3.3)
This basic property of probability measures is referred to as additivity.
Given a probability distribution function p on X and any real-valued function f on X, the functional
a( f , p) =
 f (x) p(x)
(3.4)
x ŒX
is called an expected value of f. Clearly, a( f, p) is a weighted average of values
f(x), in which the weights are probabilities p(x).
3.1. PROBABILITY FUNCTIONS
63
Now, consider two sets of alternatives, X and Y, which may be viewed, in
general, as sets of states of variables X and Y, respectively. A probability function defined on X ¥ Y is called a joint probability distribution function. The
associated marginal probability distribution functions, pX and pY, on X and Y,
respectively, are determined by the formulas
pX (x) =
 p(x, y),
(3.5)
 p(x, y),
(3.6)
y ŒY
for each x Œ X, and
pY ( y) =
x ŒX
for each y ŒY. Variables X, Y with marginal probability distribution functions
pX, pY, respectively, are called noninteractive iff
p(x, y) = pX (x) ◊ pY ( y)
(3.7)
for all x Œ X and y Œ Y. Conditional probability distribution functions, p(x | y)
and p(y | x), are defined for all x Œ X and y Œ Y such that pX(x) π 0 and pY(y)
π 0 by the formulas
pX Y (x y) =
p(x, y)
,
pY ( y)
(3.8)
pY X ( y x) =
p(x, y)
.
pX (x)
(3.9)
When pX|Y(x | y) = pX(x) for all x Œ X, variable X is said to be independent of
variable Y. Similarly, when pY|X(y | x) = pY(y) for all y, variable Y is said to be
independent of variable X.
It is easy to show that the concepts of probabilistic noninteraction and
probabilistic independence are equivalent. Given two variables, X and Y,
with probability distributions, pX and pY, defined on their states sets, X
and Y, assume that they are noninteractive. This means that their joint
probability distribution satisfies Eq. (3.7) for all x Œ X and y Œ Y. Then, Eq.
(3.8) becomes
pX (x) ◊ pY ( y)
pY ( y)
= pX (x)
pX Y (x y) =
and, similarly, Eq. (3.9) becomes
64
3. CLASSICAL PROBABILITY-BASED UNCERTAINTY THEORY
pX (x) ◊ pY ( y)
pX (x)
= pY ( y).
pY X ( y x) =
Hence, noninteraction implies independence.
Assume now that the variables are independent. This means that
p(x, y)
pY ( y)
= pX (x)
pX Y (x y) =
and similarly,
p(x, y)
pX (x)
= pY ( y).
pX Y (x y) =
In both cases, clearly, we obtain Eq. (3.7), which means that independence
implies noninteraction. Hence, the two concepts, noninteraction and independence, are equivalent in probability theory. This equivalence does not hold in
other theories of uncertainty.
3.1.2. Functions on Infinite Sets
When X is the set of real numbers, ⺢, or a bounded interval of real numbers,
[x, x̄], the set of alternatives is infinite, and the way in which probability measures are defined for finite X is not applicable. It is not any more meaningful
to define probability measures on the full power set P(X ). In each particular
application, a relevant family of subsets of X (events), C, must be chosen,
which is required to contain X and be closed under complements and countable unions (these requirements imply that C is also closed under countable
intersections).Any such family together with the operations of set union, intersection, and complement is usually called a s-algebra. In many applications,
family C consists of all bounded, right-open subintervals of X.
Probability distribution function p cannot be defined for infinite sets in the
same way in which it is defined for finite sets. For X = ⺢ or X = [x, x̄], function
p is defined for all x Œ X by the equation
p(x) = Pro({a ŒX
a < x}),
(3.10)
where Pro denotes, as before, a probability measure. This definition utilizes
the ordering of real numbers. Function p is clearly nondecreasing, and it is
usually expected to be continuous at each x ŒX and differentiable everywhere
except at a countable number of points.
3.1. PROBABILITY FUNCTIONS
65
Connected with the probability distribution function p is another function,
q, defined for all x Œ X as the derivative of p. This function is called a probability density function. Since p is a nondecreasing function, q(x) ≥ 0 for all
x ŒX.
Given a s-algebra defined on a family C, and a probability density function
q on X, the probability of any set A Œ C, Pro(A), can be calculated via the
integral
Pro( A) = Ú q(x) dx.
A
(3.11)
Since it is required that Pro(X) = 1 (the true alternative must be in X), the
probability density function is constrained by the equation
Ú
X
q(x) dx = 1,
(3.12)
which is the counterpart of Eq. (3.1) for the infinite case.
Given a probability distribution function on X and another real-valued
function f on X, the functional
a( f , p) = Ú f (x) dp
X
(3.13)
is called an expected value of f. Clearly, Eq. (3.13) is a counterpart of Eq. (3.4)
for the infinite case. Observe that a( f, p) can also be expressed in terms of the
probability density function q associated with p as
a( f , p) = Ú f (x)q(x) dx.
X
(3.14)
When function q is defined on a Cartesian product X ¥ Y = [x, x̄] ¥ [ y, ȳ],
it is called a joint probability density function. The associated marginal probability density functions, qX and qY, on X and Y, respectively, are defined for
each x ŒX and each y ŒY by the formulas
q X (x) = Ú q(x, y) dy,
(3.15)
qY ( y) = Ú q(x, y) dx.
(3.16)
Y
X
Marginal probability density functions are called noninteractive iff
q(x, y) = q X (x) ◊ qY ( y)
(3.17)
for all x Œ X and each y Œ Y. Conditional probability density functions,
qX|Y(x | y) and qY|X(y | x), are defined for all x Œ X and all y Œ Y such that
pX(x) π 0 and pY(y) π 0 by the formulas
66
3. CLASSICAL PROBABILITY-BASED UNCERTAINTY THEORY
q X Y (x y) =
q(x, y)
,
qY ( y)
(3.18)
qY X ( y x) =
q(x, y)
.
q X (x)
(3.19)
Clearly, Eqs. (3.15)–(3.19) are counterparts of Eqs. (3.5)–(3.9) for the infinite
case. Again, the concepts of probabilistic noninteraction and probabilistic
independence are equivalent in this case.
3.1.3. Bayes’ Theorem
Consider a s-algebra with events in a family C 債 P(X) and a probability
measure Pro on C. For each pair of sets A, B Œ C such that Pro(B) π 0, the
conditional probability of A given B, Pro(A | B), is defined by the formula
Pro( A B) =
Pro( A « B)
.
Pro(B)
(3.20)
Similarly, the conditional probability of B given A, Pro(B | A) is defined by
the formula
Pro(B A) =
Pro( A « B)
.
Pro( A)
(3.21)
Expressing Pro(A « B) from Eqs. (3.20) and (3.21) results in the equation
Pro( A B) ◊ Pro(B) = Pro(B A) ◊ Pro( A)
that establishes a relationship between the two conditional probabilities. One
of the conditional probabilities is then expressed in terms of the other one by
the equation
Pro( A B) =
Pro(B A) ◊ Pro( A)
,
Pro(B)
(3.22)
which is usually referred to as Bayes’ theorem. Since Pro(B) can be expressed
in terms of given elementary, mutually exclusive events Ai(i Œ ⺞m) of the
s-algebra as
Pro(B) =
 Pro( A « B)
i
iŒ⺞ m
=
 Pro(B
iŒ⺞ m
Ai ) ◊ Pro( Ai )
3.2. SHANNON MEASURE OF UNCERTAINTY FOR FINITE SETS
67
when C is finite, Bayes’ theorem may also be written in the form
Pro( A B) =
Pro(B A) ◊ Pro( A)
.
 Pro(B Ai ) ◊ Pro(Ai )
(3.23)
iŒ⺞ m
For infinite sets, Bayes’ theorem must be properly reformulated in terms
of probability density functions and the summation in Eq. (3.23) must be
replaced with integration.
Bayes’ theorem is a simple procedure for updating given probabilities on
the basis of new evidence. From prior probabilities Pro(A) and new evidence
expressed in terms of conditional probabilities Pro(B | A), we calculate posterior probabilities Pro(A | B). When further evidence becomes available, the
posterior probabilities are employed as prior probabilities and the procedure
of probability updating is repeated.
EXAMPLE 3.1. Let X denote the population of a given town community. It
is known from statistical data that 1% of the town residents have tuberculosis. Using this information, the probability, Pro(A), that a randomly chosen
member of the community has tuberculosis (event A) is 0.01. Suppose that this
member takes a tuberculosis skin test (TST) and the outcome is positive. On
the basis of the information, the prior probability changes to a posterior probability Pro(A | B), where B denotes the event “positive outcome of the TST
test.” Clearly, the posterior probability depends on the reliability of the TST
test. Assume that the following is known about the test: (1) the probability
of a positive outcome for a person with tuberculosis, Pro(B | A), is 0.99; and
(2) the probability of a positive outcome for a person with no tuberculosis,
Pro(B | Ā), is 0.04. Using this information regarding the reliability of the
TST test, the posterior probability Pro(A | B) that the person has tuberculosis
is calculated from the prior probability Pro(A) via Eq. (3.23) as follows:
P(B A)P( A)
–
–
Pro(B A)P( A) + Pro(B A)P( A)
0.99 ◊ 0.01
=
= 0.2.
0.99 ◊ 0.01 + 0.04 ◊ 0.99
Pro( A B) =
The probability that the person has tuberculosis is thus 0.2. Observe that if the
test were fully reliable (Pro(B | A) = 1 and Pro(B | Ā) = 0), we would conclude
(as expected) that the person has tuberculosis with probability 1.
3.2. SHANNON MEASURE OF UNCERTAINTY FOR FINITE SETS
The question of how to measure the amount of uncertainty (and the associated information) in classical probability theory was first addressed by
68
3. CLASSICAL PROBABILITY-BASED UNCERTAINTY THEORY
Shannon [1948]. He established that the only meaningful way to measure the
amount of uncertainty in evidence expressed by a probability distribution
function p on a finite set is to use a functional of the form
- c  p(x) log b p(x),
where b and c are positive constants, and b π 1. Each choice of values b and
c determines the unit in which the uncertainty is measured. The most common
choice is to define the measurement unit by the requirement that the amount
of uncertainty be 1 when X = {x1, x2} and p(x1) = p(x2) = 1/2. This requirement,
which is usually referred to as a normalization requirement, is formally
expressed by the equation
- c log b
1
= 1.
2
It can be conveniently satisfied by choosing b = 2 and c = 1. The resulting measurement unit is called a bit. That is, 1 bit is the amount of uncertainty removed
(or information gained) upon learning the answer to a question whose two
possible answers were equally likely. The resulting functional,
S ( p) = - Â p(x) log 2 p(x),
(3.24)
x ŒX
is called a Shannon measure of uncertainty or, more frequently, a Shannon
entropy.
One way of getting insight into the type of uncertainty measured by the
Shannon entropy is to rewrite Eq. (3.24) in the form
È
˘
S ( p) = - Â p(x) log 2 Í1 - Â p( y)˙.
Î yπ x
˚
x ŒX
(3.25)
The term
Con(x) =
 p({ y})
yπ x
in Eq. (3.25) represents the total evidential claim pertaining to alternatives
that are different from x. That is, Con(x) expresses the sum of all evidential
claims that fully conflict with the one focusing on x. Clearly, Con(x) Œ [0, 1]
for each x ŒX. The function -log2[1 - Con(x)], which is employed in Eq. (3.25),
is monotonic increasing with Con(x) and extends its range from [0, 1] to
[0, •). The choice of the logarithmic function is a result of axiomatic requirements for S, which are discussed later in this chapter. It follows from these
facts and from the form of Eq. (3.25) that the Shannon entropy is the mean
3.2. SHANNON MEASURE OF UNCERTAINTY FOR FINITE SETS
69
(expected) value of the conflict among evidential claims expressed by each
given probability distribution function p.
3.2.1. Simple Derivation of the Shannon Entropy
Suppose that a particular alternative in a finite set X of considered alternatives occurs with the probability p(x). When this probability is very high, say
p(x) = 0.999, then the occurrence of x is taken almost for granted and, consequently, we are not much surprised when it actually occurs. That is, our uncertainty in anticipating x is quite small and, therefore, our observation that x has
actually occurred contains very little information. On the other hand, when
the probability is very small, say p(x) = 0.001, then we are greatly surprised
when x actually occurs. This means, in turn, that we are highly uncertain in our
anticipation of x and, hence, the actual observation of x has very large information content. We can conclude from these considerations that the anticipatory uncertainty of x prior to the observation (and the information content of
observing x) should be expressed by a decreasing function of the probability
p(x): the more likely the occurrence of x, the less information its actual observation contains.
Consider a random experiment with n considered outcomes, i = 1, 2, . . . , n,
whose probabilities are p1, p2, . . . , pn, respectively. Assume that pi > 0 for all
i Œ ⺞n, which means that no outcomes with zero probabilities are considered.
The uncertainty in anticipating a particular outcome i (and the information
obtained by actually observing this outcome) should clearly be a function of
pi. Let
s : (0, 1] Æ [ 0, •)
denote this function. To measure in a meaningful way the anticipatory uncertainty, function s should satisfy the following properties:
(s1) s(pi) should decrease with increasing pi.
(s2) s(1) = 0.
(s3) s should behave properly when applied to joint outcomes of independent experiments.
To elaborate on property (s3), let rij denote the joint probabilities of outcomes
of two independent experiments. Assume that one of the experiments has n
outcomes with probabilities pi(i Œ ⺞n) and the other one has m outcomes with
probabilities qj( j Œ ⺞m). Then, according to the calculus of probability theory,
rij = pi ◊ q j
(3.26)
for all i Œ ⺞n and all j Œ ⺞m. Since the experiments are independent, the
anticipatory uncertainty of a particular joint outcome ·i, jÒ should be equal to
70
3. CLASSICAL PROBABILITY-BASED UNCERTAINTY THEORY
the sum of anticipatory uncertainties of the individual outcomes i and j. That
is, the equation
s(rij ) = s( pi ) + s( q j )
should hold when Eq. (3.26) holds. This leads to the functional equation
s( pi ◊ q j ) = s( pi ) + s(q j ),
where pi, qj Œ (0, 1]. This is known as one form of the Cauchy equation whose
solution is the class of functions defined for each a Œ (0, 1] by the equation
s(a) = c log b a,
where c is an arbitrary constant and b is a nonnegative constant distinct from
1. Since s is required by (s1) to be a decreasing function on (0, 1] and the
logarithmic function is increasing, c must be negative. Furthermore, defining
the measurement unit by the requirement that s(1/2) = 1 and choosing
conveniently b = 2 and c = -1, we obtain a unique function s, defined for each
a Œ (0, 1] by the equation
s(a) = - log 2 a.
(3.27)
A graph of this function is shown in Figure 3.1.
Now consider a finite set X of considered alternatives with probabilities
p(x) for all x ŒX. Let S(p) denote the expected value of s[ p(x)] for all x ŒX.
Then,
s(a)
6
5
4
3
2
1
a
0.2
0.4
0.6
0.8
Figure 3.1. Graph of function s(a) = -log2a.
1
3.2. SHANNON MEASURE OF UNCERTAINTY FOR FINITE SETS
S ( p) =
71
 p(x) ◊ s[ p(x)].
x ŒX
Substituting for s from Eq. (3.27), we obtain the functional
S ( p) = - Â p(x) ◊ log 2 p(x),
x ŒX
which is the Shannon entropy (compare with Eq. (3.24)).
Observe that the term -p(x) · log2p(x) in the formula for S(p) is not defined
when p(x) = 0. However, employing l’Hospital’s rule for indeterminate forms,
we can calculate its limit for p(x) Æ 0:
lim (- p(x) log 2 p(x)) = lim
p( x )Æ 0
- log 2 p(x)
p( x )Æ 0
1
p(x)
-1
p(x)
p(x) ln 2
= lim
= lim
= 0.
p( x )Æ 0
p( x )Æ 0 ln 2
-1
p 2 (x)
When only two alternatives are considered, x1 and x2, whose probabilities
are p(x1) = a and p(x2) = 1 - a, the Shannon entropy, S(p), depends only on a
in the way illustrated in Figure 3.2a; graphs of the two components of S(p),
s1(a) = -a log2a and s2(a) = -(1 - a) log2(1 - a), are shown in Figure 3.2b.
3.2.2. Uniqueness of the Shannon Entropy
The issue of measuring uncertainty and information in probability theory
has also been treated axiomatically in various ways. It has been proved in
numerous ways, from several well-justified axiomatic characterizations, that
the Shannon entropy is the only meaningful functional for measuring uncertainty and information in probability theory. To survey this more rigorous
treatment, assume that X = {x1, x2 . . . xn}, and let pi = p(xi) for all i Œ ⺞n. In
addition, let Pn denote the set of all probability distributions with n components. That is,
Pn = { p1 , p2 , . . . , pn } pi Œ[ 0, 1] for all i Œ⺞n and
n
Âp
i
= 1}.
i=1
Then, for each integer n, a measure of probabilistic uncertainty is a functional,
Sn, of the form
S n : Pn Æ [ 0, •),
which satisfies appropriate requirements. For the sake of simplicity, let
S n ( p1 , p2 , . . . , pn ) be written as S ( p1 , p2 , . . . , pn ).
72
3. CLASSICAL PROBABILITY-BASED UNCERTAINTY THEORY
S(a,1-a)
1
0.8
0.6
0.4
0.2
a
0.2
0.4
0.6
0.8
1
0.6
0.8
1 a
(a)
(s1,s2)
1
0.8
0.6
0.4
0.2
0.2
0.4
(b)
Figure 3.2. Graphs of: (a) S(a, 1 - a); (b) components s1(a) = -a log2a and s2(1 - a) =
-(1 - a)log2(1 - a) of S(a, 1 - a).
Different subsets of the following requirements, which are universally considered essential for a probabilistic measure of uncertainty and information,
are usually taken as axioms for characterizing the measure.
Axiom (S1) Expansibility. When a component with zero probability is added
to the probability distribution, the uncertainty should not change. Formally,
3.2. SHANNON MEASURE OF UNCERTAINTY FOR FINITE SETS
73
S ( p1 , p2 , . . . , pn ) = S ( p1 , p2 , . . . , pn , 0)
for all · p1, p2, . . . , pnÒ Œ Pn.
Axiom (S2) Symmetry. The uncertainty should be invariant with respect to
permutations of probabilities of a given probability distribution. Formally,
S ( p1 , p2 , . . . , pn ) = S (p ( p1 , p2 , . . . , pn ))
for all · p1, p2, . . . , pnÒ ŒPn and for all permutations p (p1, p2, . . . , pn).
Axiom (S3) Continuity. Functional S should be continuous in all its
arguments p1, p2, . . . , pn. This requirement is often replaced with a
weaker requirement: S(p, 1 - p) is a continuous functional of p in the
interval [0, 1].
Axiom (S4) Maximum. For each positive integer n, the maximum uncertainty
should be obtained when all the probabilities are equal to 1/n. Formally,
S ( p1 , p2 , . . . , pn ) £ S
1ˆ
Ê1 1
, , ..., .
Ën n
n¯
Axiom (S5) Subadditivity. The uncertainty of any joint probability distribution should not be greater than the sum of the uncertainties of the corresponding marginal distributions. Formally,
S ( p11 , p12 , . . . , p1m , p21 , p22 , . . . , p2 m , . . . , pn1 , pn 2 , . . . , pnm )
n
m
m
n
n
Ê m
ˆ
Ê
ˆ
£ S Á Â p1 j , Â p2 j , . . . , Â pnj ˜ + S Á Â pi1 , Â pi2 , . . . , Â pim ˜
Ë
¯
Ë j=1
¯
i=1
i=1
i=1
j=1
j=1
for any given joint probability distribution ·pij | i Œ ⺞n, j Œ⺞mÒ.
Axiom (S6) Additivity. The uncertainty of any joint probability distribution
that is noninteractive should be equal to the sum of the uncertainties of the
corresponding marginal distributions. Formally,
S ( p1 q1 , p1 q 2 , . . . , p1 q m , p2 q1 , p2 q 2 , . . . , p2 q m , . . . , pn q1 , pn q 2 , . . . , pn q m )
= S ( p1 , p2 , . . . , pn ) + S (q1 , q 2 , . . . , q m )
for any given marginal probability distributions ·p1, p2, . . . , pnÒ and ·q1, q2, . . . ,
qmÒ. This requirement is sometimes replaced with a restricted requirement of
weak additivity which applies only to uniform marginal probability distributions
with pi = 1/n and qj = 1/m. Formally,
74
3. CLASSICAL PROBABILITY-BASED UNCERTAINTY THEORY
S
1
1 ˆ
1ˆ
1ˆ
Ê 1
Ê1 1
Ê 1 1
,
, ...,
= S , , ...,
+S
, , ...,
.
Ë nm nm
Ën n
Ëm m
nm ¯
n¯
m¯
Introducing a convenient function f such that f(n) = S(1/n, 1/n, . . . , 1/n), then
the weak additivity can be expressed by the equation
f (nm) = f (n) + f (m)
for positive integers n and m.
Axiom (S7) Monotonicity. For probability distributions with equal probabilities 1/n, the uncertainty should increase with increasing n. Formally, for any
positive integers m and n, when m < n, then f(m) < f(n), where f denotes the
function introduced in (S6).
Axiom (S8) Branching. Given a probability distribution p = · pi | i Œ ⺞nÒ on
set X = {xi | i Œ ⺞n} for some integer n ≥ 3, let X be partitioned into two blocks,
A = {x1, x2, . . . , xs} and B = {xs+1, xs+2, . . . , xn} for some integer s. Then, the
equation
ps ˆ
Ê p1 p2
,
, ...,
Ë pA pA
pA ¯
pn ˆ
Ê ps +1 ps + 2
+ pB S
,
, ...,
Ë pB pB
pB ¯
S ( p1 , p2 , . . . , pn ) = S ( p A , pB ) + p A S
s
should hold, where pA =
Âp
i
i=1
s
and p B =
 p . On the left-hand side of the
i
i= s +1
equation, the uncertainty is calculated directly; on the right-hand side, it is calculated in two steps, following the calculus of probability theory. In the first
step (expressed by the first term), the uncertainty associated with the probability distribution · pA, pBÒ on the partition is calculated. In the second step
(expressed by the second and third terms), the expected value of uncertainty
associated with the conditional probability distributions within the blocks A
and B of the partition is calculated. This requirement, which is also called a
grouping requirement or a consistency requirement is sometimes presented in
various other forms. For example, one of its weaker forms is given by the
formula
S ( p1 , p2 , p3 ) = S ( p1 + p2 , p3 ) + ( p1 + p2 )S
p2 ˆ
Ê p1
,
.
Ë p1 + p2 p1 + p2 ¯
3.2. SHANNON MEASURE OF UNCERTAINTY FOR FINITE SETS
75
It matters little which of these forms is adopted since they can be derived
from one another. (The branching axiom is illustrated later in this chapter by
Example 3.6 and Figure 3.6.)
Axiom (S9) Normalization. To ensure (if desirable) that the measurement
units of S are bits, it is essential that
S
Ê 1 1ˆ
,
= 1.
Ë 2 2¯
This axiom must be appropriately modified when other measurement units are
preferred.
The listed axioms for a probabilistic measure of uncertainty and information are extensively discussed in the abundant literature on classical information theory. The following subsets of these axioms are the best known
examples of axiomatic characterization of the probabilistic measure of
uncertainty:
1.
2.
3.
4.
Continuity, weak additivity, monotonicity, branching, and normalization.
Expansibility, continuity, maximum, branching, and normalization.
Symmetry, continuity, branching, and normalization.
Expansibility, symmetry, continuity, subadditivity, additivity, and
normalization.
Any of these collections of axioms (as well as some additional collections)
is sufficient to characterize the Shannon entropy uniquely. That is, it has been
proven that the Shannon entropy is the only functional that satisfies any of
these sets of axioms. To illustrate in detail this important issue of uniqueness,
which gives the Shannon entropy its great significance, the uniqueness proof
is presented here for the first of the listed sets of axioms.
Theorem 3.1. The only functional that satisfies the axioms of continuity, weak
additivity, monotonicity, branching, and normalization is the Shannon entropy.
Proof. (i) First, we prove the proposition f(nk) = kf(n) for all positive integers
n and k by induction on k, where
f (n) = S
1ˆ
Ê1 1
, , ...,
Ën n
n¯
is the function that is used in the definition of weak additivity. For k = 1, the
proposition is trivially true. By the axiom of weak additivity, we have
76
3. CLASSICAL PROBABILITY-BASED UNCERTAINTY THEORY
f (n k+1 ) = f (n k ◊ n) = f (n k ) + f (n).
Assume the proposition is true for some k Œ ⺞. Then,
f (n k+1 ) = f (n k ) + f (n)
= kf (n) + f (n)
= (k + 1) f (n),
which demonstrates that the proposition is true for all k Œ ⺞.
(ii) Next, we demonstrate that f(n) = log2n. This proof is identical to that of
Theorem 2.1, provided that we replace the Hartley measure H with f. Therefore, we do not repeat the derivation here. Observe that the proof requires
weak additivity, monotonicity, and normalization.
(iii) We prove now that S(p, 1 - p) = -p log2p - (1 - p)log2(1 - p) for rational p. Let p = r/s where r, s Œ ⺞. Then
1 1 1
1ˆ
Ê1 1
, , ..., , , , ...,
Ës s
s s s
s¯
r
s-r
s-r
Ê r s - rˆ r
=S ,
+ f (r ) +
f (s - r )
Ës s ¯ s
s
{
{
f (s) = S
by the branching axiom. By (ii) and the definition of p we obtain
log 2 s = S ( p, 1 - p) + p log 2 r + (1 - p) log 2 (s - r ).
Solving this equation for S(p, 1 - p) results in
S ( p, 1 - p) = log 2 s - p log 2 r - (1 - p) log 2 (s - r )
= p log 2 s - p log 2 s + log 2 s - p log 2 r - (1 - p) log 2 (s - r )
= p log 2 s - p log 2 r + (1 - p) log 2 s - (1 - p) log 2 (s - r )
Ê rˆ
Ê s - rˆ
= - p log 2
- (1 - p) log 2
Ë s¯
Ë s ¯
= - p log 2 p - (1 - p) log 2 (1 - p).
(iv) We now extend (iii) to the real numbers p Œ [0, 1] with the help of the
continuity axiom. Let p be any number in the unit interval and let p¢ be a series
of rational numbers that approach p as a limit. Then,
S ( p, 1 - p) = lim S ( p ¢, 1 - p ¢)
p¢Æp
by the continuity axiom. Moreover,
lim S ( p ¢, 1 - p ¢) = lim[ - p ¢ log 2 p ¢ - (1 - p ¢) log 2 (1 - p ¢)]
p¢Æp
p¢Æp
= - p log 2 p - (1 - p) log 2 (1 - p),
77
3.2. SHANNON MEASURE OF UNCERTAINTY FOR FINITE SETS
since all the functions involved are continuous.
(v) We now conclude the proof by showing that
n
S ( p1 , p2 , . . . , pn ) = - Â pi log 2 pi .
i=1
This is accomplished by induction on n. The result is proved in (ii) and (iv)
for n = 1,2, respectively. For n ≥ 3, we may use the branching axiom to
obtain
S ( p1 , p2 , . . . , pn ) = S ( p A , pn ) + p A S
where p A =
Â
n-1
i=1
pn-1 ˆ
Ê p1 p2
Ê pn ˆ
+ pn S
,
, ...,
,
Ë pA pA
¯
Ë pn ¯
pA
pi . Since S(pn/pn) = S(1) = 0 by (ii), we obtain
S ( p1 , p2 , . . . , pn ) = S ( p A , pn ) + p A S
pn-1 ˆ
Ê p1 p2
,
, ...,
.
Ë pA pA
pA ¯
By (iv) and assuming the proposition we want to prove to be true for n - 1,
we may rewrite this equation as
n-1
pi
pi
log 2
p
p
A
A
i=1
n-1
pi
= - p A log 2 p A - pn log 2 pn - Â pi log 2
pA
i=1
S ( p1 , p2 , . . . , pn ) = - p A log 2 p A - pn log 2 pn - p A Â
n-1
n-1
i=1
n-1
i=1
= - p A log 2 p A - pn log 2 pn - Â pi log 2 pi + Â pi log 2 p A
= - p A log 2 p A - pn log 2 pn - Â pi log 2 pi + p A log 2 p A
i=1
n
= - Â pi log 2 pi .
䊏
i=1
3.2.3. Basic Properties of the Shannon Entropy
The literature dealing with information theory based on the Shannon entropy
is extensive. No attempt is made to give a comprehensive coverage of the
theory in this book. However, the most fundamental properties of the Shannon
entropy are surveyed.
First, a theorem is presented that plays an important role in classical information theory. This theorem is essential for proving some basic properties of
Shannon entropy, as well as introducing some additional important concepts
of classical information theory.
78
3. CLASSICAL PROBABILITY-BASED UNCERTAINTY THEORY
Theorem 3.2. The inequality
n
n
i=1
i=1
- Â pi log 2 pi £ -Â pi log 2 q i
(3.28)
is satisfied for all probability distributions · pi | i Œ ⺞nÒ and ·qi | i Œ ⺞nÒ and
for all n Œ ⺞n; the equality in (3.28) holds if and only if pi = qi for all
i Œ ⺞n.
Proof. Consider the function
s( pi , q i ) = pi (ln pi - ln q i ) - pi + q i
for pi, qi Œ [0, 1]. This function is finite and differentiable for all values of
pi and qi except the pair pi = 0 and qi π 0. For each fixed qi π 0, the partial
derivative of s with respect to pi is
∂ s( pi , q i )
= ln pi - ln q i .
∂ pi
That is,
<0
∂ s( pi , q i ) ÏÔ
Ì= 0
∂ pi
ÔÓ > 0
for
for
for
pi < q i
pi = q i
pi > q i
and, consequently, s is a convex function of pi, with its minimum at pi = qi.
Hence, for any given i, we have
pi (ln pi - ln q i ) - pi + q i ≥ 0,
where the equality holds if and only if pi = qi. This inequality is also satisfied
for qi = 0, since the expression on its left-hand side is +• if pi π 0 and qi = 0,
and it is zero if pi = 0 and qi = 0. Taking the sum of this inequality for all
i Œ ⺞n, we obtain
n
 [ p ln p - p lnq
i
i
i
i
- pi + q i ] ≥ 0,
i=1
which can be rewritten as
n
n
n
n
i=1
i=1
j=1
i=1
 pi ln pi -  pi ln q i -  pi +  q i ≥ 0.
3.2. SHANNON MEASURE OF UNCERTAINTY FOR FINITE SETS
79
The last two terms on the left-hand side of this inequality cancel each other
out, as they both sum up to one. Hence,
n
n
 p ln p -  p ln q
i
i
i
i=1
i
≥ 0,
i=1
which is equivalent to Eq. (3.28) when multiplied through by 1/ln2.
䊏
This theorem, sometimes referred to as Gibbs’ theorem, is quite useful in
studying properties of the Shannon entropy. For example, the theorem can be
used as follows for proving that the maximum of the Shannon entropy for
probability distributions with n elements is log2n.
Let qi = 1/n for all i Œ ⺞n. Then, Eq. (3.28) yields
n
S ( pi | i Œ ⺞n ) = -  pi log 2 pi
i=1
n
£ - Â pi log 2
i=1
= - log 2
= log 2 n.
1
n
1 n
 pi
n i=1
Thus, S(pi | i Œ ⺞n) £ log2n. The upper bound is obtained for pi = 1/n for all
i Œ ⺞n.
Let us now examine Shannon entropies of joint, marginal, and conditional
probability distributions defined on sets X and Y. In agreement with a common
practice in the literature dealing with the Shannon entropy, we simplify the
notation in the rest of this section by using S(X) instead of S(PX(x) | x Œ X)
or S(p1, p2, . . . , pn). Furthermore, assuming x Œ X and y ŒY we use symbols
pX(x) and pY(y) to denote marginal probabilities on sets X and Y, respectively,
the symbol p(x, y) for joint probabilities on X ¥ Y, and the symbols p(x | y) and
p(y | x) for the corresponding conditional probabilities. In this simplified notation for conditional probabilities, the meaning of each symbol is uniquely
determined by the arguments shown in the parentheses.
Given two sets X and Y which may be viewed, in general, as state sets of
random variables X and Y, respectively, we can recognize the following three
types of Shannon entropies:
1. A joint entropy defined in terms of the joint probability distribution on
X ¥ Y,
S (X ¥ Y ) = -
Â
p(x, y) log 2 p(x, y)
x ,y ŒX ¥ Y
2. Two simple entropies based on marginal probability distributions:
(3.29)
80
3. CLASSICAL PROBABILITY-BASED UNCERTAINTY THEORY
S (X ) = - Â pX (x) log 2 pX (x),
(3.30)
S (Y ) = - Â pY ( y) log 2 pY ( y).
(3.31)
x ŒX
y ŒY
3. Two conditional entropies defined in terms of weighted averages of local
conditional probabilities:
S (X Y ) = - Â pY ( y) Â p(x y) log 2 p(x y)
(3.32)
S (Y X ) = - Â pX (x) Â p( y x) log 2 p( y x).
(3.33)
y ŒY
x ŒX
x ŒX
y ŒY
In addition to these three types of Shannon entropies, the functional
T S (X , Y ) = S (X ) + S (Y ) - S (X , Y )
(3.34)
is often used in the literature as a measure of the strength of the relationship
(in the probabilistic sense) between elements of set X and Y. This functional
is called an information transmission. It is analogous to the functional defined
by Eq. (2.31) for the Hartley measure: it can be generalized to more than two
sets in the same way.
It remains to examine the relationship among the various types of entropies
and the information transmission. The key properties of this relationship are
expressed by the next several theorems.
Theorem 3.3
S (X Y ) = S (X ¥ Y ) - S (Y ).
(3.35)
Proof
S (X Y ) = - Â pY ( y) Â p(x y) log 2 p(x y)
y ŒY
x ŒX
p(x, y)
p(x, y)
log 2
(
)
p
y
pY ( y)
Y
x ŒX
p(x, y)
p(x, y) log 2
pY ( y)
p(x, y) log 2 p(x, y) + Â Â p(x, y) log 2 pY ( y)
= - Â pY ( y) Â
y ŒY
= -Â
Â
= -Â Â
y ŒY x ŒX
y ŒY x ŒX
y ŒY x ŒX
  p(x, y) log p ( y)
= S (X ¥ Y ) + Â log p ( y) Â p(x, y)
= S (X ¥ Y ) + Â p ( y) log p ( y)
= S (X ¥ Y ) +
2
Y
y ŒY x ŒX
2
Y
y ŒY
x ŒX
Y
2
Y
y ŒY
= S (X ¥ Y ) - S (Y ).
䊏
3.2. SHANNON MEASURE OF UNCERTAINTY FOR FINITE SETS
81
The same theorem can obviously be proved for the conditional entropy of
Y given X as well:
S (Y X ) = S (X ¥ Y ) - S (X ).
(3.36)
The theorem can be generalized to more than two sets. The general form,
which can be derived from either Eq. (3.35) or Eq. (3.36), is
S (X1 ¥ X 2 ¥ . . . ¥ X n ) = S (X1 ) + S (X 2 X1 ) + S (X 3 X1 ¥ X 2 )
+ . . . + S (X n X1 ¥ X 2 ¥ . . . ¥ X n-1 ).
(3.37)
This equation is valid for any permutation of the sets involved.
Theorem 3.4
S (X ¥ Y ) £ S (X ) + S (Y ).
(3.38)
Proof
S (X ) = - Â pX (x) log 2 pX (x)
x ŒX
=-Â
 p(x, y) log  p(x, y)
2
x ŒX y ŒY
y ŒY
S (Y ) = - Â pY ( y) log 2 pY ( y)
y ŒY
= -Â
 p(x, y) log  p(x, y)
2
y ŒY x ŒX
È
S (X ) + S (Y ) = - Â
˘
 p(x, y)ÍÎlog  p(x, y) + log  p(x, y)˙˚
 p(x, y)[log p (x) + log p ( y)]
 p(x, y) log [ p (x) ◊ p ( y)].
x ŒX y ŒY
=-
x ŒX
2
2
y ŒY
x ŒX
2
X
2
X
2
Y
x ,y ŒX ¥ Y
=-
Y
x ,y ŒX ¥ Y
By Gibbs’ theorem we have
S (X ¥ Y ) = -
Â
Â
x ,y ŒX ¥ Y
£-
x ,y ŒX ¥ Y
p(x, y) log 2 p(x, y)
p(x, y) log 2 [ pX (x) ◊ pY ( y)] = S (X ) + S (Y ).
82
3. CLASSICAL PROBABILITY-BASED UNCERTAINTY THEORY
Hence S(X ¥ Y ) £ S(X) = S(Y ); and furthermore (again by Gibbs’ theorem),
the equality holds if and only if
p(x, y) = pX (x) ◊ pY ( y),
which means that the random variables whose state sets are X and Y are
noninteractive.
䊏
Theorem 3.4 can easily be generalized to more than two sets. Its general
form is
S (X1 ¥ X 2 ¥ ... ¥ X n ) £
n
 S (X ),
i
(3.39)
i=1
which holds for every n Œ ⺞.
Theorem 3.5
S (X ) ≥ S (X Y ).
(3.40)
Proof. From Theorem 3.3,
S (X Y ) = S (X ¥ Y ) - S (Y ),
and from Theorem 3.4
S (X ¥ Y ) £ S (X ) + S (Y ).
Hence,
S (X Y ) + S (Y ) £ S (X ) + S (Y ),
and the inequality
S (X Y ) £ S (X )
follows immediately.
䊏
Exchanging X and Y in Theorem 3.5, we obtain
S (Y ) ≥ S (Y X ).
Additional equations expressing the relationships among the various
entropies and the information transmission can be obtained by simple formula
3.2. SHANNON MEASURE OF UNCERTAINTY FOR FINITE SETS
83
manipulations with the aid of key properties in Theorems 3.3 through 3.5. For
example, when we substitute for S(X, Y) from Eq. (3.35) into Eq. (3.34), we
obtain
T S (X , Y ) = S (X ) - S (X Y );
(3.41)
similarly, by substituting Eq. (3.36) into Eq. (3.34), we obtain
T S (X , Y ) = S (Y ) - S (Y X ).
(3.42)
By comparing Eqs. (3.41) and (3.42), we also obtain
S (X ) - S (Y ) = S (X Y ) - S (Y X ).
(3.43)
For each type of the Shannon entropy, S, the normalized counterpart, NS,
is calculated by dividing the respective entropy by its maximum value. Thus,
for example,
S (X )
,
log 2 X
(3.44)
NS (X ¥ Y ) =
S (X ¥ Y )
,
log 2 ( X ◊ Y )
(3.45)
NS (X Y ) =
S (X Y )
.
log 2 X
(3.46)
NS (X ) =
The range of each of these counterparts is, of course, [0, 1]. The maximum
value, T̂S(X, Y), of information transmission associated with joint probability
distributions on X ¥ Y can be derived in a similar way as its possibilistic counterpart (Eq. (2.34)). It is given by the formula
Tˆ S (X , Y ) = min{log 2 X ,log 2 Y }.
(3.47)
Then,
NT S (X , Y ) =
T S (X , Y )
.
Tˆ S (X , Y )
(3.48)
3.2.4. Examples
The purpose of this section is to illustrate the various properties and applications of the Shannon entropy by simple examples, some of which are probabilistic counterparts of examples in Chapter 2.
84
3. CLASSICAL PROBABILITY-BASED UNCERTAINTY THEORY
Table 3.1. Illustration to Example 3.2
x
y
p(x, y)
x
pX(x)
y
pY(y)
0
0
1
1
0
1
0
1
0.7
0.2
0.0
0.1
0
1
0.9
0.1
0
1
0.7
0.3
(b)
(a)
x
y
pX(x) · pY(y)
0
0
1
1
0
1
0
1
0.63
0.27
0.07
0.03
(c)
EXAMPLE 3.2. Consider two variables, X and Y, whose states are 0 or 1 and
whose joint probabilities p(x, y) on X ¥ Y = {0, 1}2 are specified in Table 3.1a.
Uncertainty associated with these joint probabilities is determined by the
Shannon entropy
S (X ¥ Y ) = -0.7 log 2 0.7 - 0.2 log 2 0.2 - 0.1 log 2 0.1 = 1.16.
The marginal probabilities pX(x) and pY(y), calculated by Eqs. (3.5) and (3.6),
are shown in Table 3.1b. Their uncertainties are:
S (X ) = -0.9 log 2 0.1 - 0.1 log 2 0.1 = 0.47,
S (Y ) = -0.7 log 2 0.7 - 0.3 log 2 0.3 = 0.88.
The conditional uncertainties can now be calculated by Eqs. (3.35) and (3.36):
S (X Y ) = S (X ¥ Y ) - S (Y ) = 0.28,
S (Y X ) = S (X ¥ Y ) - S (X ) = 0.69.
Moreover, the information transmission, which expresses the strength of the
relationship between the variables, can be calculated by Eq. (3.34):
T S (X , Y ) = S (X ) + S (Y ) - S (X ¥ Y ) = 0.19.
EXAMPLE 3.3. Consider the same variables as in Example 3.2. However,
only their marginal probabilities given in Table 3.1b are known. Assume in this
3.2. SHANNON MEASURE OF UNCERTAINTY FOR FINITE SETS
85
example that the variables are independent. Since probabilistic independence
is equivalent to probabilistic nonineteraction, as shown in Section 3.1.1, we can
calculate their joint probability distribution based on this assumption by Eq.
(3.7). This joint distribution is shown in Table 3.1c. The uncertainty, Sind, based
on the assumption of independence is thus readily calculated as
S ind (X ¥ Y ) = -0.63 log 2 0.63 - 0.27 log 2 0.27
-0.07 log 2 0.07 - 0.03 log 2 0.03 = 1.35.
Observe that Sind(X ¥ Y ) - S(X ¥ Y ) = 0.19. This means that 0.19 bits of information are gained when we know the actual joint probability distribution in
Table 3.1a.
EXAMPLE 3.4. Consider three variables, X, Y, Z, whose states are in sets X
= Y = {0, 1} and Z = {0, 1, 2}, respectively. The joint probabilities on X ¥ Y ¥ Z
are given in Table 3.2a. In this case, there are six distinct conditional uncertainties and four distinct information transmissions.To calculate them, we need
Table 3.2. Illustration to Example 3.4
x
y
z
p(x,y,z)
x
y
pXY(x,y)
x
z
pXZ(x,z)
y
z
pYZ(y,z)
0
0
0
1
1
1
1
1
0
1
0
0
0
0
1
1
0
0
2
0
1
2
1
2
0.05
0.10
0.22
0.05
0.20
0.10
0.08
0.20
0
0
1
1
0
1
0
1
0.27
0.10
0.35
0.28
0
0
1
1
1
0
2
0
1
2
0.15
0.22
0.05
0.28
0.30
0
0
0
1
1
1
0
1
2
0
1
2
0.10
0.20
0.32
0.10
0.08
0.20
(b)
(a)
x
pX(x)
y
pY(y)
z
pZ(z)
0
1
0.37
0.63
0
1
0.62
0.38
0
1
2
0.20
0.28
0.52
(c)
S(X ¥ Y ¥ Z) = 2.80
S(X ¥ Y) = 1.89
S(X ¥ Z) = 2.14
S(Y ¥ Z) = 2.41
(d)
S(X) = 0.95
S(Y) = 0.96
S(Z) = 1.47
86
3. CLASSICAL PROBABILITY-BASED UNCERTAINTY THEORY
to determine all two-variable marginal probability distributions (shown in
Table 3.2b) and all one-variable marginal probability distributions (shown
in Table 3.2c). Values of the Shannon entropy for all probability distributions
in Table 3.2(a)–(c) are shown in Table 3.2d. These values form the basis from
which all the conditional uncertainties and information transmissions are
calculated as follows:
S (X Y ¥ Z ) = S (X ¥ Y ¥ Z ) - S (Y ¥ Z ) = 0.39
S (Y X ¥ Z ) = S (X ¥ Y ¥ Z ) - S (X ¥ Z ) = 0.66
S (Z X ¥ Y ) = S (X ¥ Y ¥ Z ) - S (X ¥ Y ) = 0.91
S (X ¥ Y Z ) = S (X ¥ Y ¥ Z ) - S (Z ) = 1.33
S (X ¥ Z Y ) = S (X ¥ Y ¥ Z ) - S (Y ) = 1.84
S (Y ¥ Z X ) = S (X ¥ Y ¥ Z ) - S (X ) = 1.85
T (X ¥ Y , Z ) = S (X ¥ Y ) + S (Z ) - S (X ¥ Y ¥Z ) = 0.56
T (X ¥ Z , Y ) = S (X ¥ Z ) + S (Y ) - S (X ¥ Y ¥Z ) = 0.30
T (Y ¥ Z , X ) = S (Y ¥ Z ) + S (X ) - S (X ¥ Y ¥Z ) = 0.56
T (X ¥ Y , Z ) = S (X ) + S (Y ) + S (Z ) - S (X ¥ Y ¥Z ) = 0.58.
EXAMPLE 3.5. This example is in some sense a probabilistic counterpart of
the simple nondeterministic dynamic system discussed in possibilistic terms
in Examples 2.1 and 2.2. The subject here is a simple probabilistic dynamic
system with state set X = {x1, x2, x3}. State transitions of the system occur only
at specified discrete times and are fully determined for each initial probability distribution on X by the conditional probabilities specified by the matrix
or the diagram in Figure 3.3a and 3.3b, respectively.
0.2
Present states
Next states
t +1
t
x1
0.0
0.9
0.1
t
x2
0.2
0.0
0.8
x3
0.0
0.6
0.4
t
x1
t +1
t +1
mij
x2
0.9
X1
x3
=M
0.8
0.1
0.6
X3
0.4
(a)
X2
(b)
Figure 3.3. Simple probabilistic system discussed in Example 3.5.
3.2. SHANNON MEASURE OF UNCERTAINTY FOR FINITE SETS
87
To describe how the system behaves, let t = 1, 2, . . . denote discrete times
at which state-transitions occur, let p(txi) denote the probability of state xi at
time t, and let
t
p = p( t x i ) x i ŒX
denote the probability distribution of all states of the system at time t. Furthermore, let M = [mij] denote the matrix of conditional probabilities
p(t+1xj | txi) for all pairs ·xi, xjÒ Œ X2, which are independent of t. That is,
mij = p( t +1 x j
t
xi)
for all i, j Œ ⺞3 and all t Œ ⺞.
Given the probability distribution tp at some time t, the system is capable
of predicting probability distributions at time t + k (k = 1, 2, . . .) or probability distributions of sequences of future states of some lengths. The Shannon
entropy of each of these distributions measures the amount of uncertainty in
the respective prediction.We can also measure the amount of information contained in each prediction made by the system (predictive informativeness of
the system). For each prediction type, this is the difference between the
maximum predictive uncertainty allowed by the framework of the system and
the actual predictive uncertainty. The maximum predictive uncertainty is
obtained for the state-transition matrix, M̂ = [m̂ij], in which each row is a
uniform probability distribution. In our case m̂ij = 1/3 for all i, j Œ ⺞.
To illustrate the calculations of predictive uncertainty and predictive informativeness for the various prediction types, let us assume that the system is
in state x1 at time t (as indicated in Figure 3.3 by the arrow pointed at x1).
This is formally expressed as tp = ·1, 0, 0Ò. Maximum and actual uncertainties
for some predictions are given in Figure 3.4. The diagram, which contains all
sequences of states with nonzero probabilities of length 4 or less, also shows
probabilities of individual states at each of the considered times. Each of the
arrows under the diagram indicates the time at which the prediction is made
and the time span of the prediction. Each of the first four arrows is a prediction about the next-time probability distributions made at different times. The
next three arrows indicate predictions made at time t about sequences of states
of lengths 2, 3, and 4. The last three arrows indicate predictions made at time
t about probability distributions at time t + 2, t + 3, and t + 4. The two numbers
on top of each arrow indicate the two uncertainties needed for calculating the
informativeness of the prediction, the maximum one and the actual one. Let
us follow in detail the calculation of some of these uncertainties.
Using Figure 3.4 as a guide, the next state prediction made at t + 2 is calculated by the formula
t +3
p = t + 2 p ¥ M.
88
3. CLASSICAL PROBABILITY-BASED UNCERTAINTY THEORY
0.9
x2
0.2
0.18
x1
0.8
0.9
0.9
0.618
x2 0.2
0.8
0.2
0.8
0.1
1.585
0.469
0.6
0 .76
0.1
x3 0.4
x3
0.1236
x1
0.9
0.1
0.1
0.06
x2
1
x1
0.012
x1
0.6
0.2328
x2
0.6
0.370
x3 0.4
0.4
0.6436
x3
1.585
0.747
1.585
0.866
1.585
0.811
3.17
1.216
4.755
2.082
6.34
2.893
1.585
0.99
1.585
1.036
1.585
1.272
t
t+1
t+2
t+3
t+4
Figure 3.4. Predictive uncertainties of the system defined in Example 3.5.
Substituting for t+2p and M, we obtain
t+ 2
È 0.0
p = [ 0.18, 0.06, 0.76] ¥ Í 0.2
ÍÎ 0.0
= [ 0.012, 0.618, 0.370].
0.9 0.1 ˘
0.0 0.8 ˙
0.6 0.4 ˙˚
Its uncertainty is measured by the conditional Shannon entropy
S[ p( t +3 x j
t+ 2
x i ) i, j Œ⺞3 ] = 0.18 ◊ S (0.0, 0.9, 0.1) + 0.06 ◊ S (0.2, 0.0, 0.8)
+ 0.76 ◊ S (0.0, 0.6, 0.4)
= 0.866,
3.2. SHANNON MEASURE OF UNCERTAINTY FOR FINITE SETS
x2: 0.162
0.9
x1
0.1
x3: 0.018
0.2
x2
0.8
x3
0.9
x2: 0.432
0.6
0.4
x1
x3: 0.288
0.2
0.1
x2
x3
89
0.8
0.6
0.4
x3
0.6
x1: 0.012
x3: 0.048
x2: 0.024
0.4
x3: 0.016
Figure 3.5. Probabilities of sequences of states of length 3 in Example 3.5.
which is calculated here by using Eq. (3.32). Its maximum counterpart, Ŝ, is
Sˆ [ pˆ ( t +3 x j
t+ 2
x i ) i, j Œ⺞3 ] = 0.18 ◊ S (1 3, 1 3, 1 3) + 0.06 ◊ S (1 3, 1 3, 1 3)
+ 0.76 ◊ S (1 3, 1 3, 1 3)
= log 2 3 = 1.585.
Now consider the prediction made at time t of sequences of states of length
3. There are, of course, 33 = 27 such sequences, but only 8 of them have nonzero
probabilities; these are shown in Figure 3.5. Probabilities of these sequences
are calculated by the formula
p( t +1 x i , t + 2 x j , t +3 x k ) = p( t x1 ) ¥ p( t +1 x i t x1 ) ¥ p( t + 2 x j
t +1
x i ) ¥ p( t +3 x k
t+ 2
x j)
for all i, j, k Œ ⺞3. For example,
p( t +1 x 2 , t + 2 x3 , t +3 x 2 ) = 1 ¥ 0.9 ¥ 0.8 ¥ 0.6 = 0.432.
The amount of uncertainty in predicting at time t sequences of states at times
t + 1, t + 2, t + 3 is measured by the Shannon entropy for the probability distribution obtained for the sequences and shown in Figure 3.5. Its value is 2.082.
Since there are 27 possible sequences of states of length 3, the associated
maximum uncertainty is clearly equal to log227 = 4.755.
90
3. CLASSICAL PROBABILITY-BASED UNCERTAINTY THEORY
Predicting at time t sequences of states at times t + 1, t + 2, and t + 3 is, of
course, very different from predicting states at time t + 3. The latter prediction
is based on probabilities
p( t x i ) ◊ p( t + 2 x k
t
xi)
for all i, k Œ ⺞3. Since p(tx1) = 1 in our case, the only relevant probabilities are
p( t + 2 x1
p( t + 2 x 2
p( t + 2 x3
x1 ) = 0.012,
x1 ) = 0.618,
t
x1 ) = 0.370.
t
t
The predictive uncertainty is thus equal to S(0.012, 0.618, 0.370) = 1.036, and
the maximum counterpart is equal to log23 = 1.585.
EXAMPLE 3.6. Let the set X = {x1, x2, x3, x4} with the probability distribution
p = p1 = 0.25, p2 = 0.5, p3 = 0.125, p4 = 0.125
be given where pi denotes the probability of xi for all i Œ ⺞4. Consider the
four branching schemes specified in Figure 3.6 for calculating the uncertainty
of this probability distribution. Employing the branching property of the
Shannon entropy, the resulting uncertainty should be the same regardless of
which of the branching schemes we use. Let us perform and compare the four
schemes of calculating the uncertainty.
Scheme I. According to this scheme, we calculate the uncertainty directly:
S ( p) = -0.25 log 2 0.25 - 0.5 log 2 0.5 - 2 ¥ 0.125 log 2 0.125
= 0.5 + 0.5 + 0.75 = 1.75.
Scheme II
S ( p) = S ( p A , pB ) + p A ◊ S ( p1 p A , p2 p A ) + pB ◊ S ( p3 pB , p4 pB )
Ê 3 1ˆ
Ê 1 2ˆ
Ê 1 1ˆ
=S ,
+ 0.75 ◊ S ,
+ 0.25 ◊ S ,
Ë 4 4¯
Ë 3 3¯
Ë 2 2¯
= 0.811 + 0.689 + 0.25 = 1.75.
Scheme III
S ( p) = S ( p1 , p A ) + p A ◊ S ( p2 p A , p3 p A , p4 p A )
Ê 1 3ˆ
Ê 2 1 1ˆ
=S ,
+ 0.75 ◊ S , ,
Ë 4 4¯
Ë 3 6 6¯
= 0.811 + 0.939 = 1.75.
3.3. SHANNON-LIKE MEASURE OF UNCERTAINTY FOR INFINITE SETS
p1 =
1
4
p2 =
1
2
p3 =
1
8
p4 =
1
8
p1
1
=
pA
3
p2
2
=
pA
3
p3
1
=
pB
2
pA = p 1 + p 2 =
( p)
Scheme I
p1 =
1
4
p2
2
=
pA
3
p4
1
=
pB
2
pB = p 3 + p 4 =
3
4
91
1
4
( pA , pB)
Scheme II
p3
1
=
pA
6
p4
1
=
pA
6
p1 =
1
4
p2
2
=
pA
3
p3
1
=
pB
2
p + p4
pB
1
= 3
=
2
pA
pA
3
pA = p 2 + p 3 + p 4 =
4
( p1 , pA)
Scheme III
p4
1
=
pB
2
pA = p 2 + p 3 + p 4 =
3
4
( p1 , pA , ( p2 , pB))
Scheme IV
Figure 3.6. Application of the branching property of Shannon entropy.
Scheme IV
S ( p) = S ( p1 , p A ) + p A ◊ S ( p2 p A , pB p A ) + pB ◊ S ( p3 pB , p4 pB )
Ê 1 3ˆ
Ê 2 1ˆ
Ê 1 1ˆ
=S ,
+ 0.75 ◊ S ,
+ 0.25 ◊ S ,
Ë 4 4¯
Ë 3 3¯
Ë 2 2¯
= 0.811 + 0.689 + 0.25 = 1.75.
These results thus demonstrate that the uncertainty can be calculated in terms
of any branching scheme. There are, of course, many additional branching
schemes in this example, each of which can be employed for calculating the
uncertainty and each of which must lead to the same result.
3.3. SHANNON-LIKE MEASURE OF UNCERTAINTY FOR
INFINITE SETS
One important aspect of the Shannon entropy remains to be discussed. This
aspect concerns its restriction to finite sets. Is this restriction necessary? It is
suggestive that the function
92
3. CLASSICAL PROBABILITY-BASED UNCERTAINTY THEORY
b
B(q(x) x Œ[ a, b]) = -Ú q(x) log 2 q(x) dx,
(3.49)
a
where q denotes a probability density function on the interval [a, b] of real
numbers, could be viewed as the counterpart of the Shannon entropy in the
domain of real numbers. Indeed, the form of this functional, usually referred
to as a Boltzmann entropy or a differential entropy, is analogous to the form
of the Shannon entropy. The former is obtained from the latter by replacing
summation with integration and a probability distribution function with a
probability density function. Notwithstanding this analogy, the following question cannot be avoided: Is the Boltzman entropy a genuine counterpart of the
Shannon entropy? To answer this nontrivial question, we must establish a connection between the two functionals.
Let q be a probability density function on the interval [a, b] of real numbers.
That is, q(x) ≥ 0 for all x Œ [a, b] and,
Ú
b
a
q(x)dx = 1.
(3.50)
Consider a sequence of probability distributions np = ·np1, np2, . . . , npnÒ such
that
n
pi = Ú
xi
x i -1
q(x) dx
(3.51)
for every n Œ ⺞n, where
xi = a + i
b-a
n
for each i Œ ⺞n, and x0 = a by convention. For convenience, let
Dn =
b-a
n
so that
x i = a + iD n .
For each probability distribution np = ·np1, np2, . . . , npnÒ, let nd(x) denote a
probability density function on [a,b] such that
n
d(x) =
n
di (x) i Œ⺞n ,
3.3. SHANNON-LIKE MEASURE OF UNCERTAINTY FOR INFINITE SETS
93
where
n
di (x) =
n
pi
Dn
for x Œ[ x i-1 , x i )
(3.52)
for all i Œ ⺞n. Then due to continuity of q(x), the sequence ·nd(x)|n Œ ⺞Ò converges to q(x) uniformly on [a,b].
Given the probability distribution np for some n Œ ⺞, its Shannon entropy
is
n
S ( n p) = - Â n pi log 2 n pi
i=1
or, using the introduced probability density function nd,
n
S ( n p) = - Â n di (x)D n log 2 [ n di (x)D n ].
i=1
This equation can be modified as follows:
n
n
S ( n p) = - Â n di (x)D n log 2 di (x) - Â n di (x)D n log 2 D n
i=1
n
n
i=1
[
n
]
= -  di (x) log 2 di (x) D n - log 2 D n  n pi .
i=1
n
n
i=1
Since probabilities npi of the distribution np must add to one, and by the definition of Dn, we obtain
n
[
]
S ( n p) = - Â n di (x) log 2 di (x) D n + log 2
i=1
n
n
.
b-a
(3.53)
When n Æ • (or Dn Æ 0), we have
n
lim - Â
nƕ
i=1
[
n
]
b
di (x) log 2 di (x) D n = -Ú q(x) log 2 q(x) dx
n
a
according to the introduced relation among np, q(x), and ndi(x), in particular
Eqs. (3.51) and (3.52). Equation (3.53) can thus be written for n Æ • as
lim S ( n p) = B(q(x)) + lim
nƕ
nƕ
n
.
b-a
(3.54)
The last term in this equation clearly diverges. This means that the Boltzmann
entropy is not a limit of the Shannon entropy for n Æ • and, consequently, it
is not a measure of uncertainty and information.
94
3. CLASSICAL PROBABILITY-BASED UNCERTAINTY THEORY
The discrepancy between the Shannon and Boltzmann entropies can be reconciled in a modified form of the Boltzmann entropy,
b
q(x)
dx,
Bˆ [ q(x), r (x) x Œ[ a, b]] = Ú q(x) log 2
a
r (x)
(3.55)
which involves two probability density functions, q(x) and r(x), defined on
[a, b]. If only q is given, it is convenient to use
r (x) =
1
,
b-a
which is the probability density function corresponding to the uniform
probability distribution on [a, b]. The finite counterpart of B̂ is the
functional
Sˆ [ p(x), p ¢(x) x ŒX ] =
 p(x) log
x ŒX
2
p(x)
.
p ¢(x)
(3.56)
This functional, which is known in classical information theory as a crossentropy or a directed divergence, measures uncertainty in relative rather
than absolute terms. When p¢ is the uniform probability distribution function
(p¢(x) = 1/|X|) for all x ŒX ), then
Sˆ [ p(x), p ¢(x) x ŒX ] = log 2 X - S[ p(x) x ŒX ].
In this special form, Ŝ clearly measures the amount of information carried
by function p¢ with respect to total ignorance (expressed in probabilistic
terms).
When q(x) in Eq. (3.55) is replaced with a density function, q(x,y), of a joint
probability distribution on X ¥ Y, and r(x) is replaced with the product of
density functions of marginal distributions on X and Y, qX(x) · qY(y), B̂ becomes
the continuous counterpart the information transmission given by Eq. (3.34).
This means that the continuous counterpart, TB, of the information transmission can be expressed as
TB [ q(x, y), q X (x) ◊ qY ( y) x Œ[ a, b], y Œ[ c , d]]
b d
q(x, y)
= Ú Ú f (x, y) log 2
dx dy.
a c
q X (x) ◊ qY (x)
(3.57)
It is well established that this functional is finite when functions q, qX, and
qY are continuous; it is always positive, and it is invariant under linear
transformations.
NOTES
95
NOTES
3.1. As is well described by Hacking [1975], the concept of numerical probability
emerged in the mid-17th century. However, its adequate formalization was
achieved only in the 20th century by Kolmogorov [1950]. This formalization is
based on the classical measure theory [Halmos, 1950]. The literature dealing with
probability theory and its applications is copious. Perhaps the most comprehensive study of foundations of probability was made by Fine [1973]. Among the enormous number of other books published on the subject, it makes sense to mention
just a few that seem to be significant in various respects: Billingsley [1986], De
Finetti [1974, 1975], Feller [1950, 1966], Gnedenko [1962], Jaynes [2003], Jeffreys
[1939], Reichenbach [1949], Rényi [1970a, b], Savage [1972].
3.2. A justified way of measuring uncertainty and uncertainty-based information in
probability theory was established in a series of papers by Shannon [1948]. These
papers, which are also reprinted in the small book by Shannon and Weaver [1949],
opened a way for developing the classical probability-based information theory.
Among the many books providing general coverage of the theory, particularly
notable are classical books by Ash [1965], Billingsley [1965], Csiszár and Körner
[1981], Feinstein [1958], Goldman [1953], Guiasu [1977], Jelinek [1968], Jones
[1979], Khinchin [1957], Kullback [1959], Martin and England [1981], Reza [1961],
and Yaglom and Yaglom [1983], as well as more recent books by Blahut [1987],
Cover and Thomas [1991], Gray [1990], Ihara [1993], Kåhre [2002], Mansuripur
[1987], and Yeung [2002]. The role of information theory in science is well
described in books by Brillouin [1956, 1964] and Watanabe [1969]. Other books
focus on more specific areas, such as economics [Batten, 1983; Georgescu-Roegen,
1971; Theil, 1967], engineering [Bell, 1953; Reza, 1961], chemistry [Eckschlager,
1979], biology [Gatlin, 1972], psychology [Attneave, 1959; Garner, 1962; Quastler,
1955; Weltner, 1973], geography [Webber, 1979], and other areas [Hyvärinen,
1968; Kogan, 1988; Moles, 1966; Yu, 1976]. Useful resources to major papers on
classical information theory that were published in the 20th century are the
books edited by Slepian [1974] and Verdú and McLaughlin [2000]. Claude
Shannon’s contributions to classical information theory are well documented in
[Sloane and Wymer, 1993]. Most current contributions to classical information
theory are published in the IEEE Transactions on Information Theory. Some
additional books on classical information theory, not listed here, are included in
Bibliography.
3.3. Various subsets of the axioms for a probabilistic measure of uncertainty that are
presented in Section 3.2.2. were shown to be sufficient for providing the uniqueness of Shannon entropy by Feinstein [1958], Forte [1975], Khinchin [1957], Rényi
[1970b], and others. The uniqueness proof presented as Theorem 3.1 is adopted
from a book by Ash [1965]. Excellent overviews of the various axiomatic treatments of Shannon entropy can be found in books by Aczél and Daróczy [1975],
Ebanks et al. [1997], and Mathai and Rathie [1975]. All these books are based
heavily on the use of functional equations. An excellent and comprehensive monograph on functional equations was prepared by Aczél [1966].
3.4. Several classes of functionals that subsume the Shannon entropy as a special case
have been proposed and studied. They include:
96
3. CLASSICAL PROBABILITY-BASED UNCERTAINTY THEORY
1. Rényi entropies (also called entropies of degrees a), which are defined for all
real numbers a π 1 by the formula
Ha ( p1 , p2 , . . . , pn ) =
n
1
log 2 Â pai .
1-a
i=1
(3.58)
It is well known that the limit of H for a Æ 1 is the Shannon entropy. For
a = 0, we obtain
n
Ha ( p1 , p2 , . . . , pn ) = log 2 Â p0i .
i=1
This functional represents one of the probabilistic interpretations of Hartley
information as a measure that is insensitive to actual values of the given probabilities and distinguishes only between zero and nonzero probabilities. As the
name suggests, Rényi entropies were proposed and investigated by Rényi
[1970b].
2. Entropies of order b, introduced by Daróczy [1970], which have the form
H b ( p1 , p2 , . . . , pn ) =
1
2
1- b
n
Ê
ˆ
b
Á Â pi - 1˜
¯
- 1 Ë i=1
(3.59)
for all b π 1. As in the case of Rényi entropies, the limit of Hb for b Æ 1 results
in the Shannon entropy.
3. R-norm entropies, which are defined for all R π 1 by the functional
1 R
R È Ê n Rˆ ˘
H R ( p1 , p2 , . . . , pn ) =
Í1 - Á Â pi ˜ ˙.
¯ ˙˚
R - 1 ÍÎ Ë i=1
(3.60)
As in the other two classes of functionals, the limit of HR for R Æ 1 is the
Shannon entropy. This class of functionals was proposed by Boekee and Van
der Lubbe [1980] and was further investigated by Van der Lubbe [1984].
Formulas converting entropies from one class to other classes are well
known. Conversion formulas between Ha and Hb were derived by Aczél and
Daróczy [1975, p. 185]; formulas for converting Ha and Hb into HR were derived
by Boeke and Van der Lubbe [1980, p.144].
Except for the Shannon entropy, these classes of functionals are not adequate
measures of uncertainty, since each of them violates some essential requirement
for such a measure. For example, when a, b, R > 0, Rényi entropies violate subadditivity, entropies of order b violate additivity, and R-norm entropies violate
both subadditivity and additivity. The significance of these functions in the
context of information theory is thus primarily theoretical, as they help us
better understand Shannon entropy as a limiting case in these classes of functions. Strong arguments supporting this claim can be found in papers by Aczél,
Forte, and Ng [1974] and Forte [1975].
EXERCISES
97
3.5. Using Theorems 3.3 through 3.5 as a base, numerous theorems regarding the relationship among the information transmission and basic, conditional, and joint
Shannon entropies can be derived by simple algebraic manipulations and by mathematical induction to obtain generalizations. Conant [1981] offers some useful
ideas in this regard. A good summary of practical theorems for Shannon entropy
was prepared by Ashby [1969]. Ashby [1965, 1970, 1972] and Conant [1976, 1988]
also demonstrated the utility of these theorems for analyzing complex systems.
3.6. An excellent examination of the difference between Shannon and Boltzmann
entropy is made by Reza [1961]. This issue is also discussed by Ash [1965], Guiasu
[1977], Jones [1979], and Ihara [1993]. The origin of the concept of entropy in
physics is discussed in detail by Fast [1962]; see also an important early paper by
Elsasser [1937].
3.7. Guiasu [1977, Chapter 4] introduced and studied a generalization of the Shannon
entropy to the weighted Shannon entropy,
WS ( p(x), w(x) x ŒX ) = - Â w(x) p(x) log 2 p(x).
(3.61)
x ŒX
where w(x) are nonnegative numbers that are called weights. For each alternative
x Œ X, the weight w(x) characterizes the importance (or utility) of the alternative
in a given application context. Guiasu showed that the functional WS possesses
the following properties:
• WS(p(x), w(x) | x Œ X) ≥ 0, where the equality is obtained iff p(x) = 1 for one
particular x Œ X.
• WS is subadditive and additive.
• The maximum of WS,
WSmax = l +
 w(
1- l ) w ( x )
(3.62)
x ŒX
is reached when p(x) = e(1-l)/w(x) for all x Œ X, where l is determined by the
equation
 e(
1- l ) w ( x )
= 1.
(3.63)
x ŒX
EXERCISES
3.1. For each of the probability distributions in Table 3.3, calculate the
following:
(a) All conditional uncertainties;
(b) Information transmissions for all partitions of the set of variables;
(c) The reduction of uncertainty (information) with respect to the
uniform probability distribution;
(d) Normalized counterparts of results obtained in parts (a), (b), (c).
98
3. CLASSICAL PROBABILITY-BASED UNCERTAINTY THEORY
Table 3.3. Probability Distributions in Exercise 3.1
x
y
z
p1(x, y, z)
p2(x, y, z)
p3(x, y, z)
p4(x, y, z)
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0.30
0.00
0.00
0.20
0.00
0.10
0.30
0.10
0.00
0.00
0.25
0.25
0.25
0.25
0.00
0.00
0.10
0.00
0.00
0.30
0.40
0.00
0.00
0.20
0.10
0.00
0.20
0.00
0.30
0.00
0.40
0.00
Table 3.4. State-Transition Matrix in Exercise 3.3
t+1
t
x1
x2
t
x3
t
x4
t
x1
0.2
0.0
0.0
0.5
t+1
x2
0.0
0.0
0.9
0.3
t+1
x3
0.8
0.0
0.0
0.2
t+1
x4
0.0
1.0
0.1
0.0
3.2. Repeat Example 3.5 using the following assumptions:
(a) The state at time t is x2;
(b) The state at time t is x3;
(c) The probability distribution at time t is tp(x1) = 0, tp(x2) = 0.6,
t
p(x3) = 0.4.
3.3. Consider a system with state set X = {x1, x2, x3, x4} whose transitions from
present states to next states are characterized by the state-transition
matrix of conditional probabilities specified in Table 3.4. Assuming that
the initial probability distribution on the states (at time t) is tp(x1) = 0.5,
t
p(x2) = 0.3, tp(x3) = 0.2, tp(x4) = 0, determine the following:
(a) Uncertainties in predicting states at time t = 2, 3, 4;
(b) Uncertainty in predicting sequences of states of lengths 2, 3, 4;
(c) Associated normalized uncertainties for parts (a) and (b);
(d) Associated amounts of information contained in the system with
respect to questions regarding predictions in parts (a) and (b).
3.4. Repeat Exercise 3.3 for some other probability distributions on the
states at time t.
3.5. Use some additional branching in Example 3.6 to calculate the value of
the Shannon entropy for the given probability distributions, for example:
(a) pA = p1 + p4, pB = p2 + p3
(b) pA = p1 + p2 + p4, pB = p3
(c) Scheme IV in which p1 is exchanged with p4 and p2 is exchanged with
p3.
EXERCISES
q(x)
q(x)
c
q(b)
99
cx 2
0
a
b
x
0 (= a)
b
x
Figure 3.7. Probability density functions in Exercise 3.9.
3.6. Derive the generalized form Eq. (3.37) of Eq. (3.35) in Theorem 3.3.
3.7. Derive the generalized form Eq. (3.39) of Eq. (3.38) in Theorem 3.4.
3.8. The so-called Q-factor, which is defined by the equation
Q(X ¥ Y ¥ Z ) = S (X ) + S (Y ) + S (Z ) - S (X ¥ Y )
- S (X ¥ Z ) - S (Y ¥ Z ) + S (X ¥ Y ¥ Z ),
is often used in classical information theory. Express Q(X ¥ Y ¥ Z) solely
in terms of the various information transmissions.
3.9. For each of the probability density functions q(x) shown in Figure 3.7,
calculate the Boltzmann entropy and demonstrate that it is negative,
zero, or positive, depending on the values of a, b, c. Note: Remember
that the condition
Ú
b
a
q(x) dx = 1
must be satisfied.
3.10. Let the graphs in Figure 3.8a and 3.8b represent, respectively, a probability density function q and the associated probability distribution function p. Determine mathematical definitions of both these functions.
3.11. Consider variables X, Y whose states are in sets X, Y, respectively, and
that are characterized by joint probabilities for all ·x, yÒ Œ X ¥ Y. Show
that:
(a) X is independent of Y iff Y is independent of X;
(b) X and Y are noninteractive iff X and Y are independent of one
another.
3.12. Calculate the posterior probability Pro(A | B) in Example 3.1 under the
following assumptions:
100
3. CLASSICAL PROBABILITY-BASED UNCERTAINTY THEORY
q(x)
1
0.8
0.6
0.4
0.2
x
0.25
0.5
0.75
1
1.25
1.5
1.75
2
1.75
2
(a)
p(x)
1
0.8
0.6
0.4
0.2
x
0.25
0.5
0.75
1
1.25
1.5
(b)
Figure 3.8. Illustration to Exercise 3.10.
(a) The outcome of the TST test is negative;
(b) The reliability of the TST test is higher; Pro(B | A) = 0.999 and
Pro(B | Ā) = 0.02;
(c) the reliability of the TST test is lower: Pro(B | A) = 0.8 and
Pro(B | Ā) = 0.15.
4
GENERALIZED MEASURES
AND IMPRECISE
PROBABILITIES
An educated mind is satisfied with the degree of precision that the nature of
the subject admits and does not seek exactness where only approximation is
possible.
—Aristotle
4.1. MONOTONE MEASURE
The term “classical information theory” is used in the literature, by and large,
to refer to the theory based on the notion of probability (Chapter 3). Uncertainty functions in this theory are expressed in terms of classical measure
theory, which in turn, is formalized in terms of classical set theory. Generalizing the concept of a classical measure is thus one way of enlarging the framework for a broader treatment of the concept of uncertainty and the associated
concept of uncertainty-based information. The purpose of this chapter is to
discuss this generalization. Further enlargement of the framework, which is
discussed in Chapter 8, is obtained by fuzzifications of classical as well as generalized measures. Basic characteristics of both these generalizations are
depicted in Figure 4.1.
Given a universal set X and a nonempty family C of subsets of X with an
appropriate algebraic structure (e.g., a s-algebra), a classical measure, m, is a
set function of the form
Uncertainty and Information: Foundations of Generalized Information Theory, by George J. Klir
© 2006 by John Wiley & Sons, Inc.
101
102
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
Boolean algebras:
Classical sets or
propositions
Weaker algebras:
Fuzzy sets or propositions
of special types
∑∑∑
Classical
information
theory
Generalized
information
theory
∑∑∑
Classical
Measures:
Additive set functions
Generalized
Measures:
Monotone set functions
with special properties
∑∑∑
Generalizations
Figure 4.1. Classical information theory and its generalizations.
m : C Æ [0, •]
that satisfies the following requirements:
(cm1) If ∆ Œ C (family C is usually assumed to contain ∆), then m(∆) = 0;
(cm2) For every sequence A1, A2, . . . of pairwise disjoint sets of C,
•
if U Ai ŒC
i=1
•
Ê
ˆ
then m Á U Ai ˜ =
Ë i=1 ¯
•
 m ( A ).
i
i=1
Observe that probability is a classical measure such that C is a s-algebra and
m(X) = 1.
Property (cm2), which is the distinguishing feature of classical measures, is
called a countable additivity. A variant of this property, which is called a finite
additivity, is defined as follows:
(cm2¢) for every finite sequence A1, A2, . . . , An, of pairwise disjoint sets of
C,
n
if U Ai ŒC
i=1
n
ˆ
Ê
then m Á U Ai ˜ =
Ë i=1 ¯
n
 m ( A ).
i
i=1
4.1. MONOTONE MEASURE
103
It is well known that any countable additive measure is also finitely additive,
but not the other way around.
The requirement of additivity (countable or finite) of classical measures is
based on the assumption that disjoint sets are noninteractive with respect to
the measured property. This assumption is too restrictive in some application
contexts. Consider, for example, a set of workers in a workshop whose purpose
is to manufacture products of a specific type. Assume that the set is partitioned
into subsets (working groups) A1, A2, . . . , An, and let m(Ai) denote the number
of products made by group Ai(i Œ ⺞n) within a given unit of time. Then, clearly,
any of the following can happen for any two groups Ai, Aj:
•
•
•
m(Ai » Aj) = m(Ai) + m(Aj) when groups Ai and Aj work separately.
m(Ai » Aj) > m(Ai) + m(Aj) when the groups work together and their cooperation is efficient.
m(Ai » Aj) < m(Ai) + m(Aj) when the groups work together and their cooperation is inefficient.
Numerous other examples could be presented to illustrate that the additivity requirement of classical measures severely limits their applicability.
Some examples, relevant to the various issues of uncertainty formalization, are
discussed later in this chapter.
After recognizing that classical measures are too restrictive, it is not obvious
how to generalize them. One possibility is to eliminate the additivity requirement and define generalized measures solely by the requirement (cm1).
Although this sweeping generalization seems too radical, it has been found
useful in some applications. However, its utility for dealing with uncertainty is
questionable. Another possibility is to replace the additivity requirement with
an appropriate weaker requirement. It is generally recognized that the highest
generalization of classical measures that is meaningful for formalizing uncertainty functions is the one that replaces the additivity requirement with a
weaker requirement of monotonicity with respect to the subsethood ordering.
Generalized measures of this kind are called monotone measures. The following is their formal definition.
Given universal set X and nonempty family C of subsets X (usually with an
appropriate algebraic structure), a monotone measure, m, on ·X, C Ò is a function of the type
m : C Æ [ 0 , •]
that satisfies the following requirements:
(m1) m(∆) = 0 (vanishing at the empty set).
(m2) For all A, B ŒC, if A 債 B, then m(A) £ m(B) (monotonicity).
(m3) For any increasing sequence A1 債 A2 債 . . . of sets in C,
104
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
•
•
ˆ
Ê
if U Ai ŒC, then lim m ( Ai ) = m Á U Ai ˜
iƕ
¯
Ë
i=1
i=1
( continuity from below ).
(m4) For any decreasing sequence A1 A2 . . . of sets in C,
•
•
ˆ
Ê
if I Ai ŒC, then lim m ( Ai ) = m Á I A1 ˜
iƕ
¯
Ë
i=1
i=1
( continuity from above ).
Observe that the same symbol, m, is used for both monotone and additive
measures. This does not create any notational confusion since additive measures are contained in the class of monotone measures. It is just required that
the meaning of the symbol be stated explicitly when it stands for some special
type of monotone measures, such as additive measures.
Functions that satisfy requirements (m1), (m2), and either (m3) or (m4) are
equally important in the theory of monotone measures. In fact, they are essential for formalizing imprecise probabilities (Section 4.3). These functions are
called semicontinuous from below or above, respectively. When the universal
set X is finite, requirements (m3) and (m4) are trivially satisfied and may thus
be disregarded. If X ŒC and m(X) = 1, m is called a regular monotone measure
(or regular semicontinuous monotone measure). Uncertainty functions of any
type are always regular monotone measures.
Observe that requirement (m2) defines measures that are actually monotone increasing. By changing the inequality m(A) £ m(B) in (m2) to m(A) ≥ m(B),
we can define measures that are monotone decreasing. Both types of monotone measures are useful, even though monotone increasing measures are more
common in dealing with uncertainty. Unless specified otherwise, the term
“monotone measure” is used in this book to refer to monotone increasing
measures that are regular. The utility of monotone decreasing measures is discussed later in the book.
The following inequalities hold for every monotone measure m: if A, B,
A » B ŒC, then
m ( A « B) £ min{m ( A), m (B)},
(4.1)
m ( A » B) ≥ max{m ( A), m (B)}.
(4.2)
These inequalities follow from monotonicity of m and from the facts that
A « B 債 A and A « B 債 B, and similarly, A » B 傶 A and A » B 傶 B. If, in
addition, either the inequality
m ( A » B) ≥ m ( A) + m (B)
(4.3)
m ( A » B) £ m ( A) + m (B)
(4.4)
or the inequality
4.1. MONOTONE MEASURE
105
holds for all A, B, A » B ŒC such that A « B = ∆, the monotone measure is
called superadditive or subadditive, respectively.
It is easy to see that additivity implies monotonicity, but not the other way
around. For all A, B, A » B ŒC such that A « B = ∆, a monotone measure m
is capable of capturing any of the following situations:
(a) m(A » B) > m(A) + m(B), which expresses a cooperative action or
synergy between A and B in terms of the measured property.
(b) m(A » B) = m(A) + m(B), which expresses the fact that A and B are
noninteractive with respect to the measured property.
(c) m(A » B) < m(A) + m(B), which expresses some sort of inhibitory effect
or incompatibility between A and B as far as the measured property is
concerned.
Observe that probability theory, which is based on classical measure theory,
is capable of capturing only situation (b). This demonstrates that the theory
of monotone measures provides us with a considerably broader framework
than probability theory for formalizing uncertainty.As a consequence, it allows
us to capture types of uncertainty that are beyond the scope of probability
theory.
The need for monotone measures arises in many problem areas. One
example is the area of ordinary measurement in physics. While additivity characterizes well many types of measurement under idealized, error-free conditions, it is not fully adequate to characterize most measurements under real,
physical conditions, when measurement errors are unavoidable. To illustrate
this claim by an example, consider two disjoint events, A and B, defined in
terms of adjoining intervals of real numbers, as shown in Figure 4.2a. Observations in close neighborhoods (within a measurement error) of the end points
Discount rate functions
Discount rate functions
1
(
0
Event A
(
Event B
(a)
Event A B
(b)
Figure 4.2. An example illustrating the violation of the additivity axiom of probability theory.
106
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
of each event are unreliable and should be properly discounted, for example,
according to the discount rate functions shown in Figure 4.2a. That is, observations in the neighborhoods of the end points should carry less evidence than
those outside them. The closer they are to the end points, the less evidence
they should carry. When measurements are taken for the union of the two
events, as shown in Figure 4.2b, one of the discount rate functions is not applicable. Hence, the same observations produce more evidence for the single event
A » B than for the two disjoint events A and B. This implies that the probability of A » B should be greater than the sum of the probabilities of A
and B. The additivity requirement is thus violated. To properly formalize
this situation, we need to use an appropriate monotone measure that is
superadditive.
For some historical reasons of little significance, monotone measures are
often referred to in literature as fuzzy measures. This name is somewhat confusing, since no fuzzy sets are involved in the definition of monotone measures.
To avoid this confusion, the term “fuzzy measures” should be reserved to measures (additive or nonadditive) that are defined on families of fuzzy sets.
Since all monotone measures discussed in the rest of this book are regular,
it is reasonable to omit the adjective “regular.” Therefore, by convention, the
term “monotone measure” refers in the rest of this book to regular monotone
measures. Moreover, it is assumed, unless it is stated otherwise, that the universal set, X, is finite and that C = P(X). That is, it is normally assumed that
the monotone measures of concern are set functions
m : P (X ) Æ [ 0, 1],
where X is a finite set, that satisfy the following requirements:
(m1¢) m(∆) = 0 and m(X) = 1.
(m2¢) For all A, B ŒP(X), if A 債 B, then m(A) £ m(B).
4.2. CHOQUET CAPACITIES
The general notion of a monotone measure provides us with a broad framework, within which various special types of monotone measures can be
defined. Among these special types are the classical, additive measures, the
classical (crisp) possibility measures and necessity measures, and a great
variety of other nonadditive measures.
Each special type of monotone measures has a potential for formalizing a
certain type of uncertainty. In this section, an important family of special types
of nonadditive measures is introduced. Measures in this family are called
Choquet capacities. Other types of nonadditive measures, which have been utilized for formalizing imprecise probabilities, are introduced in Chapter 5.
4.2. CHOQUET CAPACITIES
107
Given a particular integer k ≥ 2, a Choquet capacity of order k (or kmonotone Choquet capacity) is a monotone measure m that satisfies the
inequalities
K +1 Ê
ˆ
Ê k
ˆ
m Á U A j ˜ ≥ Â (-1) m Á I A j ˜
Ë jŒK ¯
Ë j=1 ¯ K Õ N k
(4.5)
K π∆
for all families of k subsets of X. For convenience, monotone measures that
are not required to satisfy Eq. (4.5) or any other special property are often
referred to as Choquet capacities of order 1 (1-monotone). That is, any general
monotone measure is also viewed as a Choquet capacity of order 1.
Since sets Aj in Eq. (4.5) are not necessarily distinct, every Choquet capacity of order k > 2 is also of order k¢ = 2, 3, . . . , k. However, a capacity of order
k is clearly not a capacity of any higher order (k + 1, k + 2, etc.). Hence, capacities of order 2, which satisfy the simple inequalities
m ( A1 » A2 ) ≥ m ( A1 ) + m ( A2 ) - m ( A1 « A2 )
(4.6)
for all pairs of subsets of X, are the most general capacities. The least general
ones are those of order k (or k-monotone) for all k ≥ 2. These are called
Choquet capacities of infinite order (or •-monotone). They satisfy the
inequalities
m ( A1 » A2 » . . . » Ak ) ≥ Â m ( Ai ) - Â m ( Ai « A j ) +
i< j
i
k+1
- . . . + (-1) m ( A1 « A2 « . . . Ak )
(4.7)
for every k ≥ 2 and every family of k subsets of X. Observe that probability
measures are special •-monotone Choquet capacities for which all the
inequalities in Eq. (4.7) collapse to equalities.
4.2.1. Möbius Representation
It is well known (see Note 4.4) that every set function
m : P (X ) Æ ⺢,
where X is a finite set, can be uniquely represented by another set function
m
m : P (X ) Æ ⺢,
via the formula
m
m( A) =
Â
B BÕ A
(-1)
A -B
m (B)
(4.8)
108
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
for all A ŒP(X). This formula is called a Möbius transform and function mm is
called a Möbius representation of m (or a Möbius function).
The Möbius transform is a one-to-one function and it is thus invertible. Its
inverse is defined for all A ŒP(X) by the formula
m ( A) =
Â
m
m(B).
(4.9)
B BÕ A
EXAMPLE 4.1. A set function m : P(X) Æ [0, 1], where X = {x1, x2, x3}, and its
Möbius representations mm are shown in Table 4.1. Subsets of A of X are
defined in the table by their characteristic functions. Given values m(A) for all
A ŒP(X), we can calculate the value of mm(A) for each A by the Möbius transform Eq. (4.8). For example,
m
m
m({x 2 , x3 }) = m ({x 2 , x3 }) - m ({x 2 }) - m ({x3 })
= 0.5 - 0 - 0.3 = 0.2,
m({x1 , x 2 , x3 }) = m ({x1 , x 2 , x3 }) - m ({x1 , x 2 }) - m ({x1 , x3 }) - m ({x 2 , x3 })
+ m ({x1}) + m ({x 2 }) + m ({x3 })
= 1 - 0.4 - 0.3 - 0.5 + 0 + 0 + 0.3 = 0.1.
Function m can uniquely be reconstructed from its Möbius representation mm
by the inverse transform defined by Eq. (4.9). For example,
m ({x 2 , x3 }) = m m({x 2 , x3 }) + m m({x 2 }) + m m({x3 })
= 0.2 + 0 + 0.3 = 0.5,
m ({x1 , x 2 , x3 }) =
Â(
B ŒP X )
m
m(B) = 1.
It can easily be shown that a set function m is a monotone measure (regular)
if and only if its Möbius representation mm has the following properties:
Table 4.1. Set Function m and Its Möbius Representation mm
A:
x1
x2
x3
m(A)
0
1
0
0
1
1
0
1
0
0
1
0
1
0
1
1
0
0
0
1
0
1
1
1
0.0
0.0
0.0
0.3
0.4
0.3
0.5
1.0
m
m(A)
0.0
0.0
0.0
0.3
0.4
0.0
0.2
0.1
4.2. CHOQUET CAPACITIES
109
(m1) m m(∆) = 0.
(m2)
(m 3)
Â
m
A ŒP ( X )
Â
{ }
m( A) = 1.
m
m(B) ≥ 0 for all A ŒP (X ) and all x Œ A.
x ÕBÕ A
Property (m1) follows directly from Eq. (4.8) and the requirement m(∆) = 0
of monotone measures. Property (m2) follows from Eq. (4.9) and the requirement m(X) of (regular) monotone measures. Property (m3) follows from
Eq. (4.8) and the fact that monotonicity of m holds if and only if it holds for
pairs A, A - {x} of sets (x ŒX). Observe that property (m3) implies that
m
m({x}) ≥ 0 for all x ŒX.
Additional properties of the Möbius representation have been recognized
for Choquet capacities of various orders k ≥ 2. Some of these properties, which
are utilized later in this book, are:
(m4) If m is a Choquet capacity of order k and 2 £ |A| £ k, then mm(A) ≥ 0.
(m5) m is a Choquet capacity of order • if and only if mm(A) ≥ 0 for all
A ŒP(X).
(m6) m is a probability measure if and only if mm(A) > 0 when |A| = 1 and
m
m(A) = 0 otherwise.
(m7) m is a Choquet capacity of order k (k ≥ 2) if and only if
Â
m(B) ≥ 0
C ÕBÕ A
for all A ŒP(X) and all C ŒP(X) such that 2 £ |C| £ k.
For further information regarding the Möbius representation, including these
properties, see Note 4.4.
EXAMPLE 4.2. Four monotone measures defined on P({x1, x2, x3}) and their
Möbius representations are specified in Table 4.2. Measure m1 is a monotone
measure, but it is not a Choquet capacity of any order k ≥ 2. This follows, for
example, from the inequalities
m1 ({x1} » {x 2 }) < m1 ({x1}) + m1 ({x 2 }),
m1 ({x 2 } » {x3 }) < m1 ({x 2 }) + m1 ({x3 }),
which violate the required inequalities (4.6) for Choquet capacities of order
2. It also follows from the negative values of m1({x1, x2}) and m1({x2, x3}), which
violates property (m4) for k = 2. Measure m2 is not a Choquet capacity of any
order k ≥ 2 as well; for example,
110
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
Table 4.2 Examples of Monotone Measures Discussed in Example 4.2
x1
A: 0
1
0
0
1
1
0
1
x2
x3
m1(A)
m1(A)
m2(A)
0
0
1
0
1
0
1
1
0
0
0
1
0
1
1
1
0.0
0.5
0.2
0.4
0.5
0.9
0.5
1.0
0.0
0.5
0.2
0.4
-0.2
0.0
-0.1
0.2
0.0
0.4
0.1
0.2
0.5
0.8
0.5
1.0
m2(A)
0.0
0.4
0.1
0.2
0.0
0.2
0.2
-0.1
m3(A)
0.0
0.4
0.1
0.2
0.6
0.7
0.5
1.0
m3(A)
0.0
0.4
0.1
0.2
0.1
0.1
0.2
-0.1
m4(A)
m4(A)
0.0
0.3
0.0
0.2
0.3
0.9
0.3
1.0
0.0
0.3
0.0
0.2
0.0
0.4
0.1
0.0
m 2 ({x1 , x 2 } » {x1 , x3 }) < m 2 ({x1 , x 2 }) + m 2 ({x1 , x3 }) - m 2 ({x1}),
which violates the required inequalities (4.6) for 2-monotone measures.
However, this cannot be determined by property (m4). Measure m3 is a
Choquet capacity of order 2, as can be easily verified by checking the required
inequalities (4.6). However, it is not a Choquet capacity of order 3, since
m3 ({x1 , x 2 } » {x1 , x3 } » {x 2 , x3 }) < m3 ({x1 , x 2 }) + m3 ({x1 , x3 }) + m3 ({x 2 , x3 })
- m3 ({x1}) - m3 ({x 2 }) - m3 ({x3 }).
Observe, that this measure also violates property (m7) for k = 3. Measure m4
is clearly a Choquet capacity of order • since m4(A) ≥ 0 for all A 債 {x1, x2, x3}.
4.3. IMPRECISE PROBABILITIES: GENERAL PRINCIPLES
Classical probability theory requires that probabilities of all recognized alternatives (elementary events) be precise real numbers. Given these numbers,
probabilities of the various sets of alternatives are then uniquely determined
by the additivity property of probability measures. These, again, are precise
real numbers. This requirement of precision is overly restrictive since there are
many problem situations in which more than one probability distribution is
compatible with given evidence. One such situation is illustrated by the following example.
EXAMPLE 4.3. Consider a universal set X ¥ Y, where X = {x1, x2} and
Y = {y1, y2} are state sets of random variables X and Y, respectively. Assume
that we know the marginal probabilities pX(x1), pX(x2) = 1 - pX(x1),
pY(y1), pY(y2) = 1 - pY(p1), and we want to use this information to determine
the unknown joint probabilities pij = p(xi, yj)(i, j Œ {1, 2}). According to the calculus of probability theory, the joint probabilities are related to the marginal
probabilities by the following equations:
4.3. IMPRECISE PROBABILITIES: GENERAL PRINCIPLES
(a)
(b)
(c)
(d)
p11 + p12
p21 + p22
p11 + p21
p12 + p22
111
= pX(x1).
= 1 - pX(x1).
= pY(y1).
= 1 - pY(y1).
Only three of these equations are linearly independent. For example, Eq. (d)
is a linear combination of the other equations: (d) = (a) + (b) - (c). By excluding it, the remaining three equations are linearly independent. Since they
contain four unknowns, one of them must be chosen as a free variable. Choosing, for example, p11 as the free variable, we obtain the following solution:
p12 = pX (x1 ) - p11 ,
p21 = pY ( y1 ) - p11 ,
p22 = 1 - pX (x1 ) - pY ( y1 ) + p11 .
Since p12, p21, and p22 are required to be nonnegative numbers, the free variable p11 is constrained by the inequalities
max{0, pX (x1 ) + pY ( y1 ) - 1} £ p11 £ min{ pX (x1 ), pY ( y1 )}.
When a particular value of p11 that satisfies these inequalities is chosen, the
values of p12, p21, and p22 are uniquely determined. The resulting joint probability distribution is consistent with the given marginal distributions. Since
values of p11 range over a closed interval of real numbers, there is a closed and
convex set of joint probability distributions that are consistent with the marginal distributions. According to the given evidence (the known marginal distributions), these are the only possible joint distributions, and the actual one
is among them. All the other joint distributions on X ¥ Y are not possible.
Consider, for example, that pX(x1) = 0.8 and pY(y1) = 0.6 Then, the set of all
possible joint probability distribution functions is specified by the following
four statements:
p11 Œ[ 0.4, 0.6],
p12 = 0.8 - p11 ,
p21 = 0.6 - p11 ,
p22 = p11 - 0.4.
Other types of incomplete information regarding a probability distribution
(e.g., knowing only the expected value of a random variable, analyzing statistical data with observation gaps, etc.) result in sets of possible probability distributions as well. To deal with each given incomplete information correctly,
112
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
we need to recognize and work with the whole set of possible probability distributions, and not to choose only one of them, as required by probability
theory. This means, in turn, that we need to work with imprecise probabilities,
as shown in the rest of this section.
Sets of probability distribution functions defined on some universal set X
are often referred to in the literature as credal sets on X. This shorter term is
convenient and it is occasionally used in this book.
4.3.1. Lower and Upper Probabilities
Let X denote a finite universal set of concern (a set of elementary events) and
let D denote a given set of probability distribution functions (a credal set), p,
on X. Then, the associated lower probability function, D m , is defined for all sets
A ŒP(X) by the formula
m ( A) = inf
D
p ŒD
 p(x).
(4.11)
x ŒA
Similarly, the associated upper probability function,
A ŒP(X) by the formula
m ( A) = sup  p(x).
D
D
m , is defined for all
(4.12)
p ŒD x ŒA
It follows directly from Eqs. (4.11) and (4.12) that lower and upper probabilities are monotone measures.
.
Given a particular set A Œ P(X), let p denote one of the probability distribution functions in D for which the infimum in Eq. (4.11) is obtained. Since
 p˙ (x) +  p˙ (x) = 1
x ŒA
x œA
.
is required by probability theory for each set A Œ P(X), p must also be a probability distribution function for which the supremum in Eq. (4.12) is obtained.
Hence, the equation
D
–
m ( A) = 1 - Dm ( A)
(4.13)
holds for all A ŒP(X). Due to this property, functions D m and Dm̄ are called
dual (or conjugate). One of them is sufficient for capturing information in D;
the other one is uniquely determined by Eq. (4.13).
It follows directly from Eqs. (4.11) and (4.12) that
D
m ( A) £ D m ( A)
for all A ŒP(X) and, in addition,
(4.14)
4.3. IMPRECISE PROBABILITIES: GENERAL PRINCIPLES
m (∆) =
D
m (X ) =
D
D
D
113
m (∆) = 0,
(4.15)
m (X ) = 1.
(4.16)
Moreover, any lower probability function is superadditive. That is,
m ( A » B) ≥ Dm ( A) + Dm (B)
D
(4.17)
for all A, B ŒP(X) such that A « B = ∆. This follows form the fact that the
infima for disjoint sets A and B may be obtained in Eq. (4.11) for distinct probability distribution functions in D, but the infimum for A » B must be obtained
for a single probability distribution function in D. By the same argument
applied to the suprema for A, B, and A » B in Eq. (4.12), it follows that the
upper probability function is subadditive. That is,
D
m ( A » B) £
D
m ( A) + D m (B)
(4.18)
for all A, B ŒP(X).
Lower probabilities also satisfy the inequality
Â
D
m ({x}) £ 1.
(4.19)
x ŒX
This can be shown by using Eqs. (4.16) and (4.17), and the associativity of the
operation of set union:
1=
D
m (X ) =
D
m Ê U {x}ˆ ≥
Ë x ŒX
¯
Â
m ({x}).
D
x ŒX
In a similar way, it can be shown that
Â
D
m ({x}) ≥ 1.
(4.20)
x ŒX
Due to the inequality (4.14), the lower and upper probabilities form for
each set A Œ P(X) a closed interval
[
D
m ( A), D m ( A)]
of possible probabilities of that set. When D m (A) = Dm̄(A) for all A ŒP(X),
classical (precise) probabilities are obtained. It is important to realize that a
lower probability function (or, alternatively, an upper probability function) can
be derived from more than one set of probability distribution functions on a
given set X. This possibility is illustrated by the following example.
114
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
EXAMPLE 4.4. Consider X = {x1, x2, x3} and the following two sets of probability distributions on X:
D1 = { p(x1 ), p(x 2 ), p(x3 )
p(x1 ) = p(x 2 ) = a, p(x3 ) = 1 - 2 a, a Œ[ 0, 0.5]},
D2 = { p(x1 ), p(x 2 ), p(x3 ) p(x1 ) = a, p(x 2 ) = b,
p(x3 ) = 1 - a - b, a Œ[ 0, 0.5], b Œ[ 0, 0.5]}.
A geometrical interpretation of these sets is shown in Figure 4.3a. It is easy to
see that applying Eqs. (4.11) and (4.12) to these sets results in the same lower
and upper probability functions given in Figure 4.3b. Moreover, any set D such
that D1 債 D 債 D2 is also associated with these functions.
Associated with any given lower probability function m on P(X) is the
unique set, ¯mD, of all probability distribution functions p on X that are
consistent with m (or dominate m ). That is,
m
D = ÏÌ p m ( A) £
Ó
 p(x) for all A ŒP (X )¸˝˛.
(4.21)
x ŒA
Clearly, ¯m D is the largest among those sets of probability distribution
functions on X that are associated with m . Given the lower probability function m in Example 4.4, the unique set ¯m D defined by Eq. (4.21) is clearly the
set D2.
Alternatively, the set of all probability distributions p on X that are consistent with a given upper probability function m̄ on P(X) (or are dominated
by m̄) is defined as follows:
m
D = ÏÌ p m ( A) ≥
Ó
 p(x) for all A ŒP (X )¸˝˛.
(4.22)
x ŒA
However, m̄D = ¯m D due to the duality of m and m̄. Introducing the symbol
m = m, m
for any dual pair of lower and upper probability functions, the symbols m̄D and
m
m
¯ D may conveniently be replaced with one common symbol D.
m
It is important to realize that D is always a closed convex set of probability distribution functions on X. It is the intersection of the closed convex
sets of probability distribution functions characterized by the individual
inequalities in Eqs. (4.21) or (4.22). When m is derived by Eqs. (4.11) and (4.12)
from a given set D that is not convex, then D is not necessarily contained in
m
D.
All lower probability functions are superadditive and so are all Choquet
capacities of any order k ≥ 2. These two classes of functions are thus compatible. This means that special types of Choquet capacities (of the various orders
k ≥ 2) represent in a natural way special types of lower probability functions.
4.3. IMPRECISE PROBABILITIES: GENERAL PRINCIPLES
115
p(x3)
0, 0, 1
D2
0.5, 0, 0.5
0.5, 0.5, 0
D1
p(x1)
1, 0, 0
0, 0.5, 0.5
0, 1, 0
p(x2)
(a)
A:
x1
x2
x3
m (A)
m (A)
0
1
0
0
1
1
0
1
0
0
1
0
1
0
1
1
0
0
0
1
0
1
1
1
0.0
0.0
0.0
0.0
0.0
0.5
0.5
1.0
0.0
0.5
0.5
1.0
1.0
1.0
1.0
1.0
(b)
Figure 4.3. Illustration to Example 4.4.
That is, Choquet capacities of each order form a basis for formalizing a particular theory of imprecise probabilities.
4.3.2. Alternating Choquet Capacities
Due to the duality of lower and upper probabilities expressed by Eq. (4.13),
each theory of imprecise probabilities may also be formalized in terms of the
upper probabilities. In that case, Choquet capacities that are dual to those pre-
116
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
viously introduced must be used. They are called alternating Choquet capacities of order k (or k-alternating Choquet capacities) and are defined for all families of k subsets of X (k ≥ 2) by the inequalities
K +1 Ê
ˆ
Ê k
ˆ
m Á I A j ˜ £ Â (-1) m Á U A j ˜ .
Ë jŒK ¯
Ë j=1 ¯ K Õ N k
(4.23)
K π∆
Moreover, alternating Choquet capacities of order • (or •-alternating) are
defined by the inequalities
m ( A1 « A2 « . . . « Ak ) £
 m(A ) -  m(A » A ) + - . . .
i
i
+ (-1)
j
i< j
i
k+1
m ( A1 » A2 » . . . » Ak )
(4.24)
for every k ≥ 2 and every family of k subsets of X.
4.3.3. Interaction Representation
Given a finite universal set X = {xi | i Œ⺞n}, let X̃ denote the set of all n! permutations of X. Denoting for convenience, the set of all permutations of ⺞n
by Pn, we have
~
X = {xp = xp ( i) i Œ ⺞n , p Œ P n }.
For each p ŒPn and each k Œ⺞n, let
Ap ,k = {xp (1) , xp ( 2 ) , . . . , xp ( k )},
and let Ap,0 = ∆ for any p ŒPn by convention. Clearly, the sequence of sets Ap,k
for all k Œ⺞0,n is one particular maximal chain of nested subsets of X in the
Boolean lattice ·P(X), 債Ò. This chain is uniquely characterized by the chosen
permutation p. Given a regular monotone measure m on P(X ), a sequence of
values m(Ap,k) for k Œ⺞0,n is associated with subsets in the chain. Clearly, m(Ap,0)
= 0, m(Ap,n) = 1,
m ( Ap ,k ) - m ( Ap ,k-1 ) Œ[ 0, 1]
for all k Œ⺞n, and
n
 [m(A
p ,k
) - m ( Ap ,k-1 )] = 1.
k=1
Hence, probability distribution functions,
p Œ Pn and all k Œ⺞n by the formula
p, defined for each particular
mp
117
4.3. IMPRECISE PROBABILITIES: GENERAL PRINCIPLES
mp
p (xp ( k ) ) = m ( Ap ,k ) - m ( Ap ,k-1 )
(4.25)
are induced by m in the Boolean lattice ·P(X), 債Ò. Clearly, functions defined
for different permutations are not necessarily distinct. Let mB denote the set
of all distinct probability functions defined by Eq. (4.25). Then, clearly, 1 £ |mB |
£ n!.
EXAMPLE 4.5. Two distinct monotone measures, m1 and m2, are defined in
Figures 4.4a and 4.5a. In Figures 4.4b and 4.5b, respectively, these measures
are expressed in terms of the Hasse diagrams of the underlying Boolean
lattice. Measure m1 is additive and, hence, it is completely characterized by a
single probability distribution function p whose values are equal to the values
of m1 for singletons. It can be easily verified that functions m ,pp for all p ŒP3
are equal to p. Measure m2 is a Choquet capacity of order 2. Functions m ,pp for
all p Œ P3 are shown in Figure 4.5c. Clearly, m B consists in this case of four
probability distributions, p1, p2, p3, p4, as is obvious from Figure 4.5c.
The explained representation of a given monotone measure m by the associated set mB of probability distribution functions is usually called an interaction representation in the literature. The term “interaction” refers to the
capability of monotone measures to express positive or negative interactions
among disjoint sets with respect to the measured property. It turns out that
the interactions intrinsic to a monotone measure m are more explicitly
expressed by the associated set mB.
The following are some basic properties for the interaction representation
of monotone measures that pertain to imprecise probabilities (see Note 4.4):
1
2
2
A:
x1
x2
x3
m1 (A)
0
1
0
0
1
1
0
1
0
0
1
0
1
0
1
1
0
0
0
1
0
1
1
1
0.0
0.5
0.3
0.2
0.8
0.7
0.5
1.0
{x1, x2, x3}: 1.0
{x1, x2}: 0.8
{x1}: 0.5
{x1, x3}: 0.7
{x2, x3}: 0.5
{ x2}: 0.3
{x3}: 0.2
∆ : 0.0
(a)
(b)
Figure 4.4. Interaction representation of additive measure m1 (Example 4.5).
118
A:
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
x1
0
1
0
0
1
1
0
1
x2
0
0
1
0
1
0
1
1
x3
0
0
0
1
0
1
1
1
m2(A)
0.0
0.2
0.1
0.4
0.5
0.7
0.6
1.0
{x1, x2, x3}: 1.0
{x1, x2}: 0.5
{x1}: 0.2
{x1, x3}: 0.7
{x2, x3}: 0.6
{x2}: 0.1
{x3}: 0.4
∆: 0.0
(a)
(b)
p(x3)
·0, 0, 1Ò
p
·1, 2, 3Ò
·1, 3, 2Ò
·2, 1, 3Ò
·2, 3, 1Ò
·3, 1, 2Ò
·3, 2, 1Ò
pi(x1)
0.2
0.2
0.4
0.4
0.3
0.4
pi(x2)
0.3
0.3
0.1
0.1
0.3
0.2
pi(x3)
0.5
0.5
0.5
0.5
0.4
0.4
m2
B
p1
m2
p2
p3
p4
p1
D
p2
p3
p4
·1, 0, 0Ò
·0, 1, 0Ò
p(x2)
(c)
p(x1)
(d )
Figure 4.5. Interaction representation of 2-monotone measure (Example 4.5).
(i1) For any pair of dual monotone measures, m = · m , m̄ Ò, ¯m B = m̄B, but the
same probability functions in ¯m B and m̄B are obtained for different permutations.
(i2) A monotone measure m is additive iff |mB| = 1.
(i3) Let m = · m , m̄Ò be a pair of dual monotone measures, so that ¯mB = m̄B =
m
B. Then, m is a Choquet capacity of order 2 (and m̄ is the associated
alternating capacity of order 2) iff
m ( A) = min
m
Â
p j (x i ),
(4.26)
m ( A) = max
m
Â
p j (x i )
(4.27)
p j Œ B x i ŒA
p j Œ B x i ŒA
4.3. IMPRECISE PROBABILITIES: GENERAL PRINCIPLES
119
for all A Œ P(X). When m and m̄ are more general monotone measures,
Eqs. (4.26) and (4.27) are not applicable. In that case, m and m̄ are determined from mB if the permutations corresponding to each probability
distribution function mB are known.
(i4) A given monotone measure m is a Choquet capacity of order 2 iff
m
B 債 mD.
(i5) If a given monotone measure m is a Choquet capacity of order 2, then
m
B is the set of extreme points of mD, which is commonly referred to
as the profile of mD.
Property (i5) is particularly important. It allows us to determine mD directly
from the interaction representation mB provided that m is a Choquet capacity
of order 2.
EXAMPLE 4.6. Consider the lower and upper probability functions m and m̄
defined in Figure 4.3b. Hasse diagrams of the Boolean lattice with values m
(A) and m̄(A) are shown in Figure 4.6a and 4.6b, respectively. Probability distribution functions ¯m ,pp and m̄,pp for all permutations p Œ P3 are shown in Figure
4.6c and 4.6d, respectively. We can see that ¯mB = m̄B = mB is the set of the
extreme points of mD, which are shown in Figure 4.3a.
EXAMPLE 4.7. The lower probability function m 2 defined in Figrue 4.5a is a
Choquet capacity of order 2. According to property (i5) of the interaction representation, m B = {p1, p2, p3, p4} (given in Figure 4.5c) is the set of extreme
points of m D. Hence, m D is characterized by the linear combination of these
points. Locations of the extreme points in the probabilistic simplex and the set
of all points in m D are shown in Figure 4.5d.
2
2
2
2
4.3.4. Möbius Representation
When the Möbius transform in Eq. (4.8) is applied to lower and upper probability functions, m and m̄, distinct functions, m
– and m̄, are obtained respectively. By applying the inverse transform in Eq. (4.9) to m
– and m̄, we obtain
m and m̄ are dual, the corresponding
m
and
m̄,
respectively.
Since
the
functions
–
functions m
– and m̄ are dual as well. It is established that the duality of the
latter functions is expressed for all A ŒP(X) by the equation
m ( A) = (-1)
A +1
Â
m(B).
(4.28)
BB A
For more information, see Note 4.4.
EXAMPLE 4.8. Lower and upper probability functions, m and m̄, and their
m
Möbius representations, m
– and m̄, are given in Table 4.3. Since functions and
120
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
{x1, x2, x3}: 1.0
{x1, x2}: 0.0
{ x1}: 0.0
{x1, x2, x3}: 1.0
{x1, x3}: 0.5
{x2, x3}: 0.5
{ x2}: 0.0
{x3}: 0.0
{x1, x2}: 1.0
{ x1}: 0.5
{x1, x3}: 1.0
{x2, x3}: 1.0
{ x2}: 0.5
{x3}: 1.0
∆: 0.0
∆: 0.0
(a)
(b)
p
pj(x1)
pj(x2)
pj(x3)
B
p
pj(x1)
pj(x2)
pj(x3)
·1, 2, 3Ò
·2, 1, 3Ò
0.0
0.0
0.0
0.0
1.0
1.0
p1
·1, 2, 3Ò
·2, 1, 3Ò
0.5
0.5
0.5
0.5
0.0
0.0
p1
·1, 3, 2Ò
0.0
0.5
0.5
p2
·1, 3, 2Ò
0.5
0.0
0.5
p2
p3
·2, 3, 1Ò
0.0
0.5
0.5
p3
p4
·3, 1, 2Ò
·3, 2, 1Ò
0.0
0.0
0.0
0.0
1.0
1.0
p4
·2, 3, 1Ò
0.5
0.0
0.5
·3, 1, 2Ò
·3, 2, 1Ò
0.5
0.5
0.5
0.5
0.0
0.0
m
m
B
(d )
(c)
Figure 4.6. Illustration to Example 4.6.
Table 4.3. Duality of Möbius Representations of Lower and Upper Probability
Functions (Example 4.8)
A:
x1
x2
x3
0
1
0
0
1
1
0
1
0
0
1
0
1
0
1
1
0
0
0
1
0
1
1
1
m(A)
¯
0.0
0.1
0.3
0.2
0.5
0.3
0.6
1.0
m(A)
¯
0.0
0.1
0.3
0.2
0.1
0.0
0.1
0.2
m̄ (A)
m̄(A)
0.0
0.4
0.7
0.5
0.8
0.7
0.9
1.0
0.0
0.4
0.7
0.5
-0.3
-0.2
-0.3
0.2
4.3. IMPRECISE PROBABILITIES: GENERAL PRINCIPLES
121
m̄ are dual, m
– and m̄ are dual as well. This means that they uniquely determine
each other via Eq. (4.28). For example,
m ({x1 , x 2 }) = -[ m({x1 , x 2 }) + m({x1 , x 2 , x3 })]
= -[ 0.1 + 0.2] = -0.3,
m ({x3 }) = m({x3 }) + m({x1 , x3 }) + m({x 2 , x3 }) + m({x1 , x 2 , x3 })
= 0.2 + 0.0 + 0.1 + 0.2 = 0.5,
m ({x 2 , x3 }) = -[ m({x 2 , x3 }) + m({x1 , x 2 , x3 })]
= -[ 0.1 + 0.2] = -0.3.
4.3.5. Joint and Marginal Imprecise Probabilities
Let D, m , m̄ and m denote the four basic representations of joint imprecise
probabilities on a Cartesian product X ¥ Y, and let DX, DY, m X, m Y, m̄X, m̄Y, mX,
and mY denote their marginal counterparts. For each of the four basic representations of the joint imprecise probabilities, its marginal counterparts are
determined by appropriate rules of projection that are defined as follows.
•
•
Marginal sets of probability distributions:
Ï
DX = Ì pX pX (x) =
Ó
 p(x, y) for some p ŒD ˝˛,
¸
(4.29)
y ŒY
DY = ÏÌ pY pY ( y) =
Ó
 p(x, y) for some p ŒD ¸˝˛.
(4.30)
x ŒX
Marginal lower probabilities:
m ( A) = m ( A ¥ Y ) for all A ŒP (X ),
(4.31)
m (B) = m (X ¥ B) for all B ŒP (Y ).
(4.32)
X
Y
•
•
Marginal upper probabilities:
m X ( A) = m ( A ¥ Y ) for all A ŒP (X ),
(4.33)
mY (B) = m (X ¥ B) for all B ŒP (Y ).
(4.34)
Marginal Möbius functions:
mX ( A) =
Â
R A = RX
m(R) for all A ŒP (X ),
(4.35)
122
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
mY (B) =
Â
m(R) for all B ŒP (Y ),
(4.36)
R B= RY
where
RX = {x Œ X
x, y Œ R for some y ŒY },
RY = { y ŒY
x, y Œ R for some x Œ X }.
It is well established that marginal lower and upper probability functions
calculated by these formulas are measures of the same type as the given joint
lower and upper probability functions.
4.3.6. Conditional Imprecise Probabilities
Given a lower probability function m or an upper probability function m̄ on
P(X), a natural way of defining conditional lower probabilities m (A | B) for
any subsets A, B of X is to employ the associated convex set of probability
distribution functions D (defined by Eq. (4.21) or Eq. (4.22), respectively). For
each p Œ D, the conditional probability is defined in the classical way,
Pro( A B) =
Pro( A « B)
,
Pro(B)
and Eqs. (4.11) and (4.12) are modified for the conditional probabilities to
calculate the lower and upper conditional probabilities. That is,
m ( A B) = inf
p ŒD
 p(x)
,
 p(x)
(4.37)
 p(x)
,
 p(x)
(4.38)
x ŒA « B
x ŒB
m ( A B) = sup
p ŒD
x ŒA « B
x ŒB
for all A, B Œ P(X).
Consider now joint lower and upper probability functions m and m̄ on P(X
¥ Y ) and the associated set D of joint probability distribution functions p that
dominate m and are dominated by m̄. Then, for all A ŒX, B ŒY and p ŒD,
Pro( A ¥ B)
Pro( A B) =
=
Pro(X ¥ B)
Â
p(x, y)
Â
p(x, y)
x ,y Œ A ¥ B
x ,y ŒX ¥ B
,
4.3. IMPRECISE PROBABILITIES: GENERAL PRINCIPLES
123
and m (A | B) or m̄(A | B) are obtained, respectively, by taking the infimum or
supremum of Pro(A | B) for all p ŒD. Similarly,
Pro( A ¥ B)
Pro(B A) =
=
Pro( A ¥ Y )
Â
p(x, y)
Â
p(x, y)
x ,y Œ A ¥ B
x ,y Œ A ¥ Y
and m (B | A) or m̄(B | A) are obtained, respectively, by taking the infimum or
supremum of Pro(B | A).
4.3.7. Noninteraction of Imprecise Probabilities
It remains to address the following question: Given marginal imprecise probabilities on X and Y, how to define the associated joint imprecise probabilities under the assumption that the marginal ones are noninteractive? The
answer depends somewhat on the type of monotone measures by which the
marginal imprecise probabilities are formalized, as is discussed in Chapters 5
and 8. However, when operating at the general level of Choquet capacities of
order 2, the question is adequately answered via the convex sets of probabilities distributions associated with the given marginal lower and upper probabilities as follows:
(a) Given marginal lower and upper probability functions, mX = · m X, m̄XÒ and
mY = · m Y, m̄YÒ, on P(X) and P(Y), respectively, determine the associated
convex sets of marginal probability distributions, DX and DY, on X and
Y.
(b) Assuming that mX and mY are noninteractive, apply the notion of noninteraction in classical probability theory, expressed by Eq. (3.7), to
probability distributions in set DX and DY to define a unique set of
joint probability distributions, D, on set X ¥ Y.
(c) Apply Eqs. (4.11) and (4.12) to D to determine the lower and upper
probability functions m = ·D m , Dm̄Ò. These are, by definition, the unique
joint lower and upper probability functions that correspond to noninteractive marginal lower and upper probability functions mX and mY.
It is known (see Note 4.5) that the following properties hold under this
definition for all A ŒP(X) and all B ŒP(Y):
m ( A ¥ B) = m ( A) ◊ m (B),
(4.39)
m ( A ¥ B) = m X ( A) ◊ mY (B)
(4.40)
X
Y
m[( A ¥ Y ) » (X ¥ B)] = m ( A) + m (B) - m ( A) ◊ m (B),
X
Y
X
Y
(4.41)
124
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
m [( A ¥ Y ) » (X ¥ B)] = m X ( A) + mY (B) - m X ( A) ◊ mY (B).
(4.42)
Moreover, it is guaranteed that the marginal measures of m = · m , m̄ Ò are again
the given marginal measures mX = · m X, m̄XÒ and mY = · m Y, m̄YÒ.
EXAMPLE 4.9. Consider the marginal lower and upper probability functions
mX = · m X, m̄XÒ and mY = · m Y, m̄YÒ on sets X = {x1, x2} and Y = {y1, y2}, respectively,
which are given in Table 4.4a. Assuming that these marginal probability functions are noninteractive, the corresponding joint lower and upper probabilities m = · m , m̄Ò are uniquely determined by the introduced definition of
noninteraction. One way to compute them is to directly follow the definition
of noninteraction. Another, more convenient way is to use Eqs. (4.39)–(4.42)
for all subsets of X ¥ Y for which the equations are applicable and use the
direct method only for the remaining subsets.
Following the definition of noninteraction, we need to determine first the
convex sets DX and DY of marginal probability distributions. From Table 4.4a,
pX (x1 ) Œ[ 0.6, 0.8], pX (x 2 ) Œ[ 0.2, 0.4],
pY ( y1 ) Œ[ 0.5, 0.9], pY ( y2 ) Œ[ 0.1, 0.5].
Hence, DX is the convex combination of the extreme distributions,
pX (x1 ) = 0.6, pX (x 2 ) = 0.4
and
pX (x1 ) = 0.8, pX (x 2 ) = 0.2 ,
that dominate the lower probabilities (Figure 4.7a) or, alternatively, are dominated by the upper probabilities (Figure 4.7c). That is,
pX (x1 ) Œ {0.6l X + 0.8(1 - l X ) l X Œ[ 0, 1]}
= {0.8 - 0.2 l X l X Œ[ 0, 1]}
pX (x 2 ) Œ {0.4 l X + 0.2(1 - l X ) l X Œ[ 0, 1]}
= {0.2 + 0.2 l X l X Œ[ 0, 1]}
and
DX = { 0.8 - 0.2 l X , 0.2 + 0.2 l X
l X Œ[ 0, 1]}.
Similarly,
pY ( y1 ) Œ {0.5lY + 0.9(1 - lY ) lY Œ[ 0, 1]}
= {0.9 - 0.4 lY lY Œ[ 0, 1]}
pY ( y2 ) Œ {0.5lY + 0.1(1 - lY ) lY Œ[ 0, 1]}
= {0.1 + 0.4 lY lY Œ[ 0, 1]},
4.3. IMPRECISE PROBABILITIES: GENERAL PRINCIPLES
125
Table 4.4. Joint Lower and Upper Probability Functions Based on Noninteractive
Marginal Lower and Upper Probability Functions (Example 4.9)
A:
B:
x1
x2
0
1
0
1
0
0
1
1
y1
y2
0
1
0
1
0
0
1
1
mX(A)
¯
0.0
0.6
0.2
1.0
m̄X(A)
mY(B)
¯
0.0
0.5
0.1
1.0
m̄ Y(B)
0.0
0.8
0.4
1.0
0.0
0.9
0.5
1.0
(a)
C:
x1
y1
x1
y2
x2
y1
x2
y2
m(C)
¯
m̄(C)
m(C)
0
1
0
0
0
1
1
1
0
0
0
1
1
1
0
1
0
0
1
0
0
1
0
0
1
1
0
1
1
0
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
1
1
1
0
0
0
0
1
0
0
1
0
1
1
0
1
1
1
1
0.00
0.30
0.06
0.10
0.02
0.60
0.50
0.50
0.26
0.10
0.20
0.80
0.64
0.60
0.28
1.00
0.00
0.72
0.40
0.36
0.20
0.80
0.90
0.74
0.50
0.50
0.40
0.98
0.90
0.94
0.70
1.00
0.00
0.30
0.06
0.10
0.02
0.24
0.10
0.18
0.10
0.02
0.08
-0.10
-0.18
-0.18
-0.10
0.36
(b)
and
DY = { 0.9 - 0.4 lY , 0.1 + 0.4 lY
l X Œ[ 0, 1]},
as can be derived with the help of Figure 4.7b (or alternatively, Figure 4.7d).
Values of lX and lY are independent of each other.
Now applying the definition of noninteraction, we determine the set D of
joint probability distributions p by taking pairwise products of components of
the marginal probability distributions. That is,
126
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
1
·0.6, 0.4Ò
0.4
DX
pX (x2)
·0.8, 0.2Ò
0.2
m X ({ x 2 })
0
0.6
m X ({ x1 })
(a)
1
0.8
pX (x1)
1
·0.5, 0.5Ò
0.5
DY
pY (y2)
·0.9, 0.1Ò
0.1
m Y ({ y 2 })
(b)
0.5
0
m Y ({ y1 })
0.9
1
pY (y1)
Figure 4.7. Illustration to Example 4.9. (a) DX via lower probability m X. (b) DY via lower
¯
probability mY.
¯
D = { p(x i , y j ) = pX (x1 ) ◊ pY ( y j ) x i Œ X , y j ŒY }.
For example,
p(x1 , y1 ) Œ {(0.8 - 0.2 l X )(0.9 - 0.4 lY ) l X , lY Œ[ 0, 1]}.
Clearly, the minimum p(x1, y1) is 0.3, and it is obtained for lX = lY = 1; the
maximum is 0.72, and it is obtained for lX = lY = 0. Hence,
p(x1 , y1 ) Œ[ 0.3, 0.72],
4.3. IMPRECISE PROBABILITIES: GENERAL PRINCIPLES
m X ({ x1 })
pX (x1)
0.6
0
127
0.8
1
·0.6, 0.4Ò
0.4
m X ({ x 2 })
pX (x2)
DX
·0.8, 0.2Ò
0.2
(c)
0
m Y ({ x1 })
pY (y1)
0.5
0
0.9
1
·0.5, 0.5Ò
0.5
m Y ({ x 2 })
pY (y2)
DY
·0.9, 0.1Ò
0.1
(d )
0
Figure 4.7. (c) DX via upper probability m̄ X. (d ) DY via upper probability m̄ Y.
and, consequently,
m ({x1 , y1}) = 0.3
and
m ({x1 , y1}) = 0.72.
Similarly, we can calculate the ranges of joint probabilities p(xi, yi) for the other
pairs of xi ŒX and yj ŒY whose minima and maxima are the joint lower and
upper probabilities of singleton sets, which are given in Table 4.4b:
128
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
p(x1 , y2 ) Œ {(0.8 - 0.2 l X )(0.1 + 0.4 lY ) l X , lY Œ[ 0, 1]}
= [ 0.06, 0.40],
p(x 2 , y1 ) Œ {(0.2 + 0.2 l X )(0.9 - 0.4 lY ) l X , lY Œ[ 0, 1]}
= [ 0.1, 0.36],
p(x 2 , y2 ) Œ {(0.2 + 0.2 l X )(0.1 + 0.4 lY ) l X , lY Œ[ 0, 1]}
= [ 0.02, 0.20].
We can also calculate ranges of joint probability measures, Pro, for the other
subsets of X ¥ Y by adding the respective products of marginal probability.
The minima and maxima of these ranges are, respectively, the lower and upper
probabilities of these subsets (shown in Table 4.4b). For example,
Pro({ x1 , y1 , x1 , y2 }) Œ {(0.8 - 0.2 l X )(0.9 - 0.4 lY )
+ (0.8 - 0.2 l X )(0.1 + 0.4 lY ) l X , lY Œ[ 0, 1]}
= {0.8 - 0.2 l X l X Œ[ 0, 1]}
= [ 0.6, 0.8].
Pro({ x1 , y2 , x2 , y1 }) Œ{(0.8 - 0.2l X )(0.1 + 0.4lY )
+ (0.2 + 0.2l X )(0.9 - 0.4lY ) l X , lY Œ[0, 1]}
= {0.26 + 0.16l X + 0.24lY - 0.16l X lY l X , lY Œ[0, 1]}
= [0.26, 0.50].
Pro({ x1 , y1 , x2 , y2 }) Œ{(0.8 - 0.2l X )(0.9 - 0.4lY )
+ (0.2 + 0.2l X )(0.1 + 0.4lY ) l X , lY Œ[0, 1]}
= {0.74 - 0.16l X - 0.24lY + 0.16l X lY l X , lY Œ[0, 1]}
= [0.50, 0.74].
Pro({ x1 , y1 , x1 , y2 , x 2 , y1 }) Œ {(0.8 - 0.2 l X )(0.9 - 0.4 lY )
+ (0.8 - 0.2 l X )(0.1 + 0.4 lY )
+ (0.2 + 0.2 l X )(0.9 - 0.4 lY ) l X , lY Œ[ 0, 1]}
= {0.98 - 0.02 l X - 0.08 lY
- 0.08 l X lY l X , lY Œ[ 0, 1]}
= [ 0.80, 0.98].
Using Eqs. (4.39)–(4.42), the joint lower and upper probabilities can be calculated more conveniently for all subsets of X ¥ Y except the subsets {·x1, y1Ò,
·x2, y2Ò}, and {·x1, y2Ò, ·x2, y1Ò}. For example,
m ({ x1 , y1 }) = m ({x1}) ◊ m ({ y1})
X
Y
= 0.6 ◊ 0.5 = 0.3,
m ({ x1 , y1 }) = m X ({x1}) ◊ mY ({ y1})
= 0.8 ◊ 0.9 = 0.72,
129
4.4. ARGUMENTS FOR IMPRECISE PROBABILITIES
m ({ x1 , y1 , x1 , y2 }) = m ({x1}) ◊ m (Y )
X
= 0.6 ◊ 1 = 0.6,
Y
m ({ x1 , y1 , x1 , y2 }) = m X ({x1}) ◊ m Y (Y )
= 0.8 ◊ 1 = 0.8,
m ({ x1 , y1 , x1 , y2 , x 2 , y1 }) = m[({x1} ¥ Y ) » (X ¥ { y1})]
= m ({x1}) + m ({ y1}) - m ({x1}) ◊ m ({ y1})
X
Y
X
= 0.6 + 0.5 - 0.6 ◊ 0.5 = 0.8,
Y
m ({ x1 , y1 , x1 , y2 , x 2 , y1 }) = m [({x1} ¥ Y ) » (X ¥ { y1})]
= m X ({x1}) + mY ({ y1}) - m X ({x1}) ◊ mY ({ y1})
= 0.8 + 0.9 - 0.8 ◊ 0.9 = 0.98.
4.4. ARGUMENTS FOR IMPRECISE PROBABILITIES
The need for enlarging the framework of classical probability theory by allowing imprecision in probabilities has been discussed quite extensively in the
literature, and many arguments for imprecise probabilities have been put
forward. The following are some of the most common of these arguments.
1. Situations in which given information implies a set of probability distributions on a given set X are quite common. One type of such situations is illustrated by Example 4.3. All these situations can be properly formalized by
imprecise probabilities, as is shown in Section 4.3, but not by classical, precise
probabilities. Choosing any particular probability distribution from the given
set, as is required by classical probability theory, is utterly arbitrary. Regardless how this choice is justified within classical probability theory, it is not supported by available information. The extreme case is reached in a situation of
total ignorance—a situation in which no information about probability distributions is available. This situation, in which all probability distributions on X
are possible, is properly formalized by special lower and upper probability
functions that are defined for all A ŒP(X) by the formulas
Ï 0 when A π X
m ( A) = Ì
Ó1 when A = X .
(4.43)
Ï 0 when A = ∆
m ( A) = Ì
Ó1 when A π ∆.
(4.44)
These lower and upper probabilities define, in turn, the following ranges of
probabilities, Pro, that are maximally imprecise:
130
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
Pro( A) Œ[ 0, 1] for all A ŒP (X ) - {∆, X },
and, of course, Pro(∆) = 0 and Pro(X) = 1. These maximally imprecise probabilities are usually called vacuous probabilities. It is obvious that they are
associated with the set of all probability distributions on X.
2. Imprecision of probabilities is needed to reflect the amount of statistical information on which they are based. The precision should increase with
amount of statistical information. Imprecise probabilities allow us to utilize
this sensible principle methodologically. As a simple example, let X denote a
finite set of states of a variable and let the variable be observed at discrete
times. Assume that in a sequence of N observations of the variable, each state
x ŒX was observed n(x)-times. According to classical probability theory, probabilities of individual states, p(x), are estimated by the ratios n(x)/N for all
x ŒX. While these estimates are usually acceptable when N is sufficiently large
relative to the number of all possible states, they are questionable when N is
small. An alternative is to estimate lower and upper probabilities, m (x) and
m̄(x), in such a way that we start with the maximum imprecision ( m (x) = 0 and
m̄(x) = 1 for all x Œ X) when N = 0 (total ignorance) and define the imprecision (expressed by the differences m (x) - m̄(x)) by a function that is monotone decreasing with N. This can be done, for example, by using for each
x Œ X the functions (estimators)
m (x) =
n(x)
,
N+c
(4.45)
m (x) =
n(x) + c
,
N+c
(4.46)
where c ≥ 1 is a coefficient that expresses how quickly the imprecision in estimated probabilities decreases with the amount of statistical information,
which is expressed by the value of N. The chosen value of c expresses the
caution in estimating the probabilities. The larger the value, the more cautious
the estimators are. As a simple example, let X = {0, 1} be a set of states of a
single variable n, whose observations, n(t), at discrete times t Œ ⺞50 are given
in Figure 4.8a. These observations were actually randomly generated with
probabilities p(0) = 0.39 and p(1) = 0.61. Figure 4.8b shows lower and upper
probabilities of x = 0 estimated for each N Œ ⺞50 by Eqs. (4.45) and (4.46),
respectively, with c = 4. Figure 4.8c shows the same for x = 1.
3. Classical probability theory requires that Pro(A) + Pro(Ā) = 1. This
means, for example, that a little evidence supporting A implies a large amount
of evidence supporting Ā. However, in many real-life situations, we have a little
evidence supporting A and a little evidence supporting Ā as well. Suppose, for
4.4. ARGUMENTS FOR IMPRECISE PROBABILITIES
131
v (t) = 0111001010100111010111010
1101110111011110101011011
(a)
1
0.8
0.6
0.4
0.2
10
20
30
40
50
30
40
50
(b)
1
0.8
0.6
0.4
0.2
10
20
(c)
Figure 4.8. Example of the decrease of imprecision in estimated probabilities with the
amount of statistical information.
example, that an old painting was discovered that somewhat resembles paintings of Raphael. Such a discovery is likely to generate various questions
regarding the status of the painting. The most obvious questions are:
(a) Is the discovered painting a genuine painting by Raphael?
(b) It the discovered painting a counterfeit?
132
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
According to these questions, the discovered painting belongs either to the set
of Raphael’s paintings, R, or to its complement, R̄. Evidential support for R
and R̄, assessed by an expert after examining the painting, may be expressed
by numbers s(R) and s(R̄) in [0, 1]. Assume that the assessment is rather small
for both R and R̄, say s(R) = 0.1 and s(R̄) = 0.3. These numbers cannot be
directly interpreted as probabilities, but can be converted to probabilities p(R)
and p(R̄) by normalization. We obtain p(R) = 0.25 and p(R̄) = 0.75. Assuming
now that the assessment were ten times smaller, s¢(R) = 0.01 and s¢(R̄) = 0.03,
we would still obtain p(R) = 0.25 and p(R̄) = 0.75. Precise probabilities cannot
thus distinguish the two cases. However, they can be distinguished by imprecise probabilities, since values s(R), s(R̄), s¢(R), and s¢(R̄) in the two examples
are directly interpretable as lower probabilities. In both cases, we obtain
only imprecise assessments of the probabilities. In the first case, the assessments are
p(R) Œ[0.1, 0.7],
–
p(R) Œ[0.3, 0.9],
while in the second case they are
p¢(R) Œ[0.01, 0.97],
–
p¢(R) Œ[0.03, 0.99].
Clearly, the weaker evidence is expressed here by greater imprecision of the
estimated probabilities.
4. In general, imprecise probabilities are not only easier to elicit from
experts than precise ones, but their derivation is more meaningful. Allowing
experts to express their assessments in terms of lower and upper probabilities
does not deter them from making precise assessments, when they feel comfortable doing so, but gives them more flexibility. Imprecise probabilities, contrary to precise ones, can also be constructed from qualitative judgments of
various types that are easy to understand and justify in some applications. Moreover, we may not be able to elicit precise probabilities in practice, even if that
is possible in principle. This may be due to lack of time or the computational
resources needed for a thorough analysis of a complex body of evidence.
5. Precise probabilistic assessments regarding some decision-making situation that are obtained from several sources (sensors, experts, individuals of a
group in a group decision) are often inconsistent. In order to be able to
combine information from the individual sources, their inconsistencies must
be resolved. A natural way to do that is to use imprecise probabilities. The
degree of imprecision in the combined probabilities is a manifestation of the
4.5. CHOQUET INTEGRAL
133
extent to which the sources are inconsistent. This interesting methodological
issue is discussed in Chapter 9.
6. Assume that a certain monetary unit, u, is available to bet on the occurrence of the various events A ŒP(X). If we know the probability, Pro(A), of
event A, the highest acceptable rate (a fraction of u) for betting on the occurrence of A is Pro(A). This means that we are willing to bet no more than u ·
Pro(A) to receive u when A occurs and to receive nothing when A does not
occur. We are also willing to bet no more than u(1 - Pro(A)) against the occurrence of A. Imprecise probabilities are obviously needed in any betting situation in which relevant information is not sufficient to determine a unique
probability measure on recognized events. In such situations, the lower probability of any event A is the highest acceptable rate for betting on the occurrence of A, while the upper probability of A is the highest acceptable rate for
betting against the occurrence of A. If a utility function f defined on X is also
involved, the betting behavior is guided by the lower and upper expected
values of f (Section 4.5).
4.5. CHOQUET INTEGRAL
When working with imprecise probabilities, the classical notion of expected
value of a given function is generalized. Instead of one expected value, we have
now a range of expected values. This range is captured by lower and upper
expected values, which are based on lower and upper probability functions,
respectively. Since lower and upper probability functions are monotone measures, which in general are nonadditive, the classical Lebesgue integral, which
is applicable only to additive measures, must be appropriately generalized to
calculate an expected value of a function with respect to a monotone measure.
It is generally recognized in the literature that a proper generalization of the
Lebesgue integral to monotone measures is a functional that is called a
Choquet integral.
Given a real-valued function f (a utility function) on set X, a monotone
measure m on an appropriate family C of subsets of X that contains ∆ and X,
and a particular set A Œ C, the Choquet integral of f with respect to m on A,
(C)兰A f dm, is a functional defined by the equation
(C )ÚA f du = ÚA m ( A « a F ) da ,
(4.47)
where aF = {x | f(x) ≥ a}. Observe that the Choquet integral is defined via a
special Lebesgue integral (the one on the right-hand side of this equation).
The Choquet integral possesses some properties that make it a meaningful
generalization of the Lebesgue integral for monotone measures. They include
(see Note 4.6):
134
•
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
When m is an additive measure, the Choquet integral coincides with the
Lebesgue integral:
(C )Ú f dm = Ú f dm.
A
A
•
If f1(x) £ f2(x) for all x Œ A, then
(C )Ú f1 dm £ (C )Ú f2 dm.
A
A
•
If m1(B) £ m2(B) for all B ŒP(A), then
(C )Ú f dm1 £ (C )Ú f dm 2 .
A
A
•
For any nonnegative constant c,
(C )Ú (c ◊ f ) dm = c (C )Ú f dm ,
A
A
(C )Ú (c + f ) dm = cm ( A) + (C )Ú f dm .
A
A
•
When f is a nonnegative function, the Choquet integral (C)兰A f dm produces a measure that preserves the following properties of the given
monotone measure m: monotonicity, superadditivity, subadditivity, continuity from above, and continuity from below.
Applying the Choquet integral of function f on X to relevant lower and
upper probability functions, m and m̄, we obtain the following lower and upper
expected values of f :
a( f , m ) = (C )Ú m ( X « a F ) da ,
(4.48)
a ( f , m ) = (C )Ú m (X « a F ) da ,
(4.49)
X
X
Hence, the expected value is expressed imprecisely by the interval [a(f, m ),
¯
ā(f, m̄)].
When X is a finite set and f is a nonnegative function, it is convenient to
introduce the following special notation to simplify the computation of the
Choquet integral.
Let X = {x1, x2, . . . , xn} and f(xk) = fk. Assume that elements of X are permuted in such a way that f1 ≥ f2 ≥ . . . ≥ fn ≥ 0 and let fn+1 = 0 by convention.
For each xk ŒX, let Ak = {x1, x2, . . . , xk}. Then, given a monotone measure m on
4.6. UNIFYING FEATURES OF IMPRECISE PROBABILITIES
135
X, the Choquet integral of f on X with respect to m can be expressed by the
simple formula
n
(C )Ú f dm = Â ( fk - fk+1 )m ( Ai ),
X
(4.50)
k=1
where fn+1 = 0 by convention.
EXAMPLE 4.10. To illustrate the use of Eq. (4.50), consider the joint lower
and upper probability functions on X ¥ Y in Table 4.4b, and assume that the
following function is defined on the same set X ¥ Y: f(·x1, y1Ò) = 5, f(·x1, y2Ò) =
3, f(·x2, y1Ò) = 2, f(·x2, y2Ò) = 3. Let Z = {z1, z2, z3, z4} = X ¥ Y, where z1 = ·x1, y1Ò,
z2 = ·x1, y2Ò, z3 = ·x2, y2Ò, z4 = ·x2, y1Ò, and let f(zk) = fk. Then, f1 ≥ f2 ≥ f3 ≥ f4 and
Eq. (4.50) is applicable for calculating the lower and upper expected values
of f :
a( f , m ) =
4
 (f
k
- fk+1 )m ({z1 , z2 , . . . , zk })
k=1
= 2 ◊ 0.30 + 1 ◊ 0.60 + 0 ◊ 0.64 + 2 ◊ 1 = 3.20
a(f , m) =
4
 (f
k
- fk+1 )m ({z1 , z2 , . . . , zk })
k=1
= 2 ◊ 0.72 + 1 ◊ 0.80 + 0 ◊ 0.90 + 2 ◊ 1 = 4.24.
Hence, the expected value of f is in the interval [3.20, 4.24].
4.6. UNIFYING FEATURES OF IMPRECISE PROBABILITIES
Imprecise probabilities can be formalized in terms of classical set theory in
many different ways, each based on using a monotone measure of some particular type. An important class of types of monotone measures are the
Choquet capacities of orders 2, 3, . . . , which are introduced in Section 4.2. This
class plays an important role in the area of imprecise probabilities, primarily
due to its large scope and the natural ordering of the described types of monotone measure by their generalities. Notwithstanding the significance of imprecise probabilities based on the Choquet capacities of various orders, other
types of monotone measures have increasingly been recognized as useful for
formalizing imprecise probabilities. Some of the most prominent among them
are examined in Chapter 5.
Formalizing imprecise probabilities is thus associated with considerable
diversity and this, in turn, results in high methodological diversity. Fortunately,
this ever increasing diversification of the area of imprecise probabilities can
be countered, at least to some extent, by the various common, and thus unifying, features of imprecise probabilities that are discussed in the preceding
136
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
(4.11)
(4.13)
m
m
D
(4.19)
(4.9)
(4.8)
m
Figure 4.9. Unifying features of imprecise probabilities.
sections of this chapter. Before embarking on the examination of diverse theories of imprecise probabilities in Chapter 5, their unifying features are summarized as follows. For convenience, the summary is facilitated via Figure 4.9.
Perhaps the principal feature that unifies all imprecise probabilities is that
each is invariably associated with a set of classical probability measures. This
set may often be the initial representation of imprecise probabilities. It consists of all probability measures pertaining to a given situation that are consistent with available information. Clearly, its structure reflects the nature of
the given information.
The set of probability measures derived from given information is usually,
but not always, convex. If it is convex, then it can be converted to other,
methodologically more convenient representations. These representations
include lower or upper probability functions and their Möbius as well as interaction representations, all introduced in Section 4.3. Given any of these representations, it can be converted to the other ones as needed. Using symbols
introduced in Section 4.3, the conversions are summarized in Figure 4.9. When
applicable, references are made in the figure to equations that define the individual conversions.
If the set of probability measures derived from given information is not
convex, the conversions are not applicable. This means that the imprecisions
in probabilities have to be dealt with directly in terms of the given set. Alternatively, the nonconvex set can be approximated by the smallest convex set
that contains it (its convex hull). However, the methodology for dealing with
nonconvex sets of probability measures has not been adequately developed
as yet.
A set of probability measures is not always the initial representation.
Depending on the application context, any of the other representations may
NOTES
137
be a natural initial representation. In many applications, for example, we begin
with lower or upper probabilities assessed by human experts and the set of
probability measures consistent with them can be derived by the conversion
equation when needed. In other applications, we begin with the Möbius representation or the interaction representation and convert them to the other
representations as needed.
NOTES
4.1. Classical measure theory has been recognized as an important area of mathematics since the late 19th century. For a self-study, the classic text by Halmos
[1950] is recommended. Among many other books, only a few are mentioned
here, which are significant in various respects. The book by Caratheodory [1963],
whose original German version was published in 1956, is one of the earliest and
most influential books on classical measure theory. Books by Temple [1971] and
Weir [1973] provide pedagogically excellent introductions to classical measure
theory and require only basic knowledge of calculus and algebra as prerequisites.
The books by Billingsley [1986] and Kingman and Taylor [1966] focus on the connections between classical measure theory and probability theory. The history of
classical measure theory and Lebesgue integral is carefully traced in a fascinating book by Hawkins [1975]. He describes how modern mathematical concepts
regarding these theories (involving concepts such as a function, continuity, convergence, measure, integration, and the like) developed (primarily in the 19th
century and the early 20th century) through the work of many mathematicians,
including Cauchy, Fourier, Borel, Riemann, Cantor, Dirichlet, Hankel, Jordan,
Weierstrass, Volterra, Peano, Lebesgue, and Radon.
4.2. The idea of nonadditive measures is due to Gustave Choquet, a distinguished
French mathematician. He conceived of this idea in 1953, during his one-year residence at the University of Kansas at Lawrence, and used it for developing his
theory of capacities. The theory was published initially as a large research report
at the University of Kansas and shortly afterwards in the open literature
[Choquet, 1953–54]. In spite of this important publication, the idea of nonadditive measures was almost completely ignored by the scientific community for
many years. It seems that is was recognized only in the early 1970s by Huber in
his effort to develop a robust statistics (see [Huber, 1981]). It began to attract
more attention only in the 1980s, primarily under the influence of work by Michio
Sugeno on monotone measures, which he introduced in his doctoral dissertation
[Sugeno, 1977] under the name “fuzzy measures.” Although this name is illconceived (there is no fuzziness in these measures), it is unfortunately still widely
used in the literature. Since the late 1970s, research on the theory of monotone
(fuzzy) measures has been steadily growing. The outcome of this research, which
is covered in too many papers to be cited here, is adequately represented by the
following books: a textbook by Wang and Klir [1992], monographs by Denneberg
[1994] and Pap [1995], and a book edited by Grabisch et al. [2000]. Additional
useful sources include: a survey article by Sims and Wang [1990], a Special Issue
of Fuzzy Sets and Systems on Fuzzy Measures and Integrals [92(2), 1997, pp.
137–264] and, above all, an extensive handbook edited by Pap [2002].
138
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
4.3. According to an interesting historical study made by Shafer [1978], some aspects
of nonadditive (and, hence, imprecise) probabilities are recognizable in the work
of Bernoulli and Lambert in the 17th and 18th centuries. However, these traces
of nonadditivity were lost in the course of history and were rediscovered only in
the second half of the 20th century. Perhaps the first thorough investigation of
imprecise probabilities was made by Dempster [1967a,b], even though it was
preceded by a few earlier, but narrower investigations. Since the publication of
Dempster’s papers, the number of publications on imprecise probabilities has
been rapidly growing. However, most of these publications deal with various
special types of imprecise probabilities. Some notable early exceptions are
papers by Walley and Fine [1979, 1982] and Kyburg [1987]. Since the early 1990s,
a greater emphasis can be observed in the literature on studying imprecise probabilities from various highly general perspectives. It is likely that this trend was
influenced by the publication of an important book by Walley [1991]. Employing
simple, but very fundamental principles of avoiding sure loss, coherence, and
natural extension, Walley develops a general theory of imprecise probabilities in
this book. In addition to its profound contribution to the area of imprecise probabilities, the book is also a comprehensive guide to the literature and history of
this area. Short versions of the material in the book and some new ideas are presented in [Walley, 1996, 2000]. An important resource for researchers in the area
of imprecise probabilities is a Web site dedicated to the “Imprecise Probabilities
Project.” The purpose of the project is “to help advance the theory and applications of imprecise probabilities, mainly by the dissemination of relevant information.” The Web site, whose address is http://ippserv.rug.ac.be, contains a
bibliography, information about people working in the field, abstracts of recent
papers, and other relevant information. Associated with the Imprecise Probability Project are biennial International Symposia on Imprecise Probabilities and
Their Applications (ISIPTA), which were initiated in 1999.
4.4. While the Möbius transform is well established in combinatorics, the interaction
representation of monotone measures has its roots in the theory of cooperative
games. These connections are discussed in several papers by Grabisch [1997a–c,
2000]. Mathematical properties of the Möbius representation and the interaction
representation as well as conversions between these representations are thoroughly investigated in [Grabisch, 1997a]. In particular, this paper contains a
derivation of Eq. (4.28). Properties (m1)–(m7) of the Möbius representation of
Choquet capacities, which are listed in Section 4.2.1, are proven in [Chateauneuf
and Jaffray, 1989]. Properties (i1)–(i5) of the interaction representation, which
are listed in Section 4.3.3, are proven in [De Campos and Bolaños [1989] as well
as in [Chateauneuf and Jaffray, 1989]. Miranda et al. [2003] investigates a generalization of the interaction representation to infinite sets.
4.5. The concept of noninteraction for lower and upper probabilities is investigated
in [De Campos and Huete, 1993]. This paper contains proofs of Eqs. (4.39)–(4.42).
The concept of conditional monotone measure is investigated in [De Campos et
al., 1990]. Various algorithms for dealing with imprecise probabilities represented
by convex sets of probability measures are presented in [Cano and Moral, 2000].
4.6. The Choquet integral, as well as other integrals based on monotone measures,
are covered quite extensively in the literature. An excellent tutorial on this
subject was written by Murofushi and Sugeno [2000]. Some other notable references dealing with various aspects of the Choquet integral include [Weber, 1984;
EXERCISES
4.7.
4.8.
4.9.
4.10.
139
Murofushi and Sugeno, 1989, 1993; De Campos and Bolaños, 1992; Wang et al.,
1996; Benvenuti and Mesiar, 2000; Dennenberg, 1994; and Pap, 1995, 2002].
Murofushi et al. [1994] show that the Choquet integral is applicable to set functions that are not monotone as well.
The various issues of constructing monotone measures from available evidence
are discussed by Klir et al. [1997]. Reche and Salmerón [2000] develop a procedure for constructing lower and upper monotone measures that are compatible
with coherent partial information. Coletti and Scozzafava [2002] deal with
probability (imprecise in general) in terms for coherent partial assessments and
coherent extensions.
It seems that the term “credal sets,” which is fairly routinely used in the literature for closed and convex sets of probability distributions, emerged from some
writings by Levi [1980, 1984, 1986, 1996, 1997].
The interaction representation is studied in papers by De Campos and Bolaños
[1987] and Grabisch [1997a,b, 2000]. Grabisch shows that this representation is
closely connected with the interaction between players, as represented in the
mathematical theory of non-atomic games studied by Shapley [1971] and Auman
and Shapley [1974].
Imprecise probabilities can also be studied within the mathematics of general
interval structures. Relevant references in this regard are [Aubin and
Frankowska, 1990] and [Wong et al., 1995]. Another way of studying imprecise
probabilities is to look at them as deviations from precise (additive) probabilities [Ban and Gal, 2002].
EXERCISES
4.1. Determine which of the following set functions m is a monotone measure
or a semicontinuous monotone measure:
(a) Given ·X, C Ò, m is defined for all A ŒC by the formula
Ï1
m ( A) = Ì
Ó0
when x0 Œ A
when x0 œ A,
where x0 is a fixed element of X.
(b) Let X = {x1, x2, x3}, C = P(X), and
Ï1
Ô
m ( A) = Ì 0
Ô1 2
Ó
when A = X
when A = ∆
otherwise.
(c) Let X = {1, 2, . . . , n}, C = P(X ), and m(A) = |A|/n for all A ŒC.
(d) Let X be the set of all positive integers, C = P(X ) and m(A) = Si ŒAi
for all A ŒC.
140
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
(e) Let X = [0, 1], C = P(X), and m(A) = infxŒAf(x) for all A ŒC, where
f is a real-valued function on X.
(f) Repeat part (e) for m(A) = supxŒA f(x).
4.2. Determine the Möbius representations of the set functions defined in
Exercise 4.1b and c.
4.3. Show that every additive measure is also monotone, but not the other
way around.
4.4. Let m be a regular monotone measure on ·X, C Ò that is continuous from
above, where C is a Boolean algebra of subsets of X. Show that the set
function n defined on ·X, C Ò by the equation
–
v(E) = 1 - m (E)
is a regular monotone measure that is continuous from below.
4.5. Construct some monotone measures on ·X = {x1, x2, x3}, P(X)Ò that are:
(a) Superadditive
(b) Subadditive
(c) Neither superadditive nor subadditive
4.6. Determine for each monotone measure constructed in Exercise 4.5 its
dual measure and the Möbius representations of the resulting pair of
dual measures. Then, answer the question: Is one of the dual measures
a Choquet capacity of order 2?
4.7. Show that every Choquet capacity of order k, where k > 2, is also a
Choquet capacity of order k - 1, but not the other way around.
4.8. Consider the set of joint probability distribution functions defined at the
end of Example 4.3. Determine the corresponding lower and upper
probabilities, their Möbius representations, and whether or not the
measure representing the lower probabilities is:
(a) Superadditive
(b) A Choquet capacity of order 2
Repeat the exercise symbolically for any values of the marginal probabilities (assume for convenience that pX(x1) ≥ pY(y1)).
4.9. Determine the interaction representation of the following monotone
measures:
(a) Measures m3 and m4 in Table 4.2;
(b) The lower and upper probability measures in Table 4.3
(c) The monotone measures defined in Table 4.5.
EXERCISES
141
Table 4.5. Monotone Measures in Exercises 4.9–4.14
C:
x1
y1
x1
y2
x2
y1
x2
y2
m1(C)
m2(C)
m3(C)
0
1
0
0
0
1
1
1
0
0
0
1
1
1
0
1
0
0
1
0
0
1
0
0
1
1
0
1
1
0
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
1
1
1
0
0
0
0
1
0
0
1
0
1
1
0
1
1
1
1
0.00
0.26
0.26
0.26
0.00
0.59
0.53
0.27
0.53
0.27
0.27
0.87
0.61
0.55
0.55
1.00
0.00
0.08
0.00
0.00
0.00
0.20
0.40
0.08
0.00
0.00
0.00
0.52
0.20
0.40
0.00
1.00
0.0
0.3
0.4
0.4
0.2
0.7
0.7
0.4
0.8
0.6
0.6
1.0
0.7
0.7
0.9
1.0
4.10. For each of the measures defined in Table 4.5, determine the following:
(a) The dual measure;
(b) Möbius representations of the measure and its dual;
(c) The interaction representation.
4.11. For each of the measures defined in Table 4.5, determine the highest
order k(k ≥ 1) for which the measure is a Choquet capacity.
4.12. For each of the measures defined in Table 4.5, determine the
corresponding marginal measures.
4.13. For each of the marginal measures determined in Exercise 4.12,
construct the joint measure based on the assumption of noninteraction.
4.14. For each of the marginal measures in Table 4.5 and their duals, calculate
the Choquet integral of the following functions on {x1, x2} ¥ {y1, y2} (for
convenience, let f(xi, yj) = fij):
(a) f11 = 0.5, f12 = 1.0, f21 = 0.7, f22 = 0.3
(b) f11 = 250, f12 = 120, f21 = 500, f22 = 750
(c) f11 = 1, f12 = 3, f21 = 3, f22 = 1
142
4. GENERALIZED MEASURES AND IMPRECISE PROBABILITIES
4.15. Perform computer experiments by generating two values, 0 and 1, of a
single random variable with probabilities p(0) and p(1) = 1 - p(0). Apply
Eqs. (4.45) and (4.46) to generated sequences of length N ≥ 0 and
observe the convergence to the given probabilities as a function N
and c.
4.16. Generalize Eqs. (4.45) and (4.46) to more values and to more variables.
5
SPECIAL THEORIES OF
IMPRECISE PROBABILITIES
I think it wiser to avoid the use of a probability model when we do not have the
necessary data than to fill the gaps arbitrarily; arbitrary assumptions yield arbitrary conclusions
—Terrence L. Fine
5.1. AN OVERVIEW
This chapter is in some sense complementary to Chapter 4. While the focus of
Chapter 4 is on the examination of the unifying features of theories of imprecise probabilities, the purpose of this chapter is to explore the great diversity
of these theories. Clearly, this diversity results from the many ways in which
monotone measures can be constrained by special requirements.
The theories of imprecise probabilities that are well developed and have
been proved useful in some application contexts are covered in detail. Other
theories, which have not been sufficiently developed or tested in applications
as yet, are introduced only as themes for future research.
Among the theories of imprecise probabilities that are examined in this
chapter are those based on the Choquet capacities of various orders, which
are already known from Chapter 4. In particular, the one based on capacities
of order • is covered in detail. This theory, which is usually referred to in the
literature as the Dempster–Shafer theory, is already quite well developed and
has been utilized in many applications. Two more special theories, both sub-
Uncertainty and Information: Foundations of Generalized Information Theory, by George J. Klir
© 2006 by John Wiley & Sons, Inc.
143
144
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
sumed under the Dempster–Shafer theory, are also examined in detail: a
theory based on graded possibilities and a theory based on special monotone
measures that are called Sugeno l-measures (or just l-measures). One additional theory, which is based on monotone measures derived from intervalvalued probability distributions, is covered in this chapter in detail. This theory
is not comparable with the Dempster–Shafer theory, but it is subsumed under
the theory based on Choquet capacities of order 2.
Ordering of the mentioned theories by levels of their generality is shown
in Figure 5.1. Each arrow T Æ T¢ in the figure means that theory T¢ is more
general than theory T. The presentation in this chapter follows these arrows,
starting with the two least general theories of imprecise probabilities shown
in the figure. One of them is a simple generalization of classical possibility
theory, in which possibilities are graded. The other one is a simple generalization of classical probability theory, which is based on l-measures. The presentation then proceedes to the Dempster–Shafer theory, and the theory based
on interval-valued probability distributions. The chapter concludes with a
survey of other types of monotone measures that can be used for formalizing
imprecise probabilities.
5.2. GRADED POSSIBILITIES
The theory examined in this section is a generalization of the classical possibility theory, which is reviewed in Chapter 2. Instead of distinguishing only
between possibility and impossibility, as in the classical possibility theory, the
generalized possibility theory is designed to distinguish grades (or degrees) of
possibility. It is thus appropriate to view it as a theory of graded possibilities.
In analogy with the classical possibility theory, its generalized counterpart
is based on two dual monotone measures: a possibility measure and a necessity measure. Contrary to the classical possibility and necessity measures,
whose values are in the set {0, 1}, the values of their generalized counterparts
cover the whole unit interval [0, 1].
As in the classical case, it is convenient to formalize the generalized possibility theory in terms of generalized possibility measures, which are appropriate monotone measures that characterize graded possibilities. For each given
generalized possibility measure, Pos, its dual generalized possibility measure,
Nec, is then defined for each recognized set A by the duality equation
Nec( A) = 1 - Pos( A ),
(5.1)
which is a generalization of Eq. (2.4).
Family C on which generalized possibility measures are defined is required
to be an ample field. This is a family of subsets of X that are closed under arbitrary unions and intersections, and under complementation in X. When X is
finite, C is usually the whole power set of X.
5.2. GRADED POSSIBILITIES
145
1-Monotone
measures
2-Monotone
measures
Decomposable
measures
Interval-valued
probability
distributions
k-Monotone
measures
•-Monotone
measures
Graded
possibilities
l-Measures
k-Additive
measures
Crisp
possibilities
Additive
measures
Classical
uncertainty
theories
Dirac
measures
Figure 5.1. Ordering of monotone measures used for representing imprecise probabilities by
their levels of generality. (Dirac measures are defined in Note 5.12.)
Since the generalized possibility theory subsumes the classical one as a
special case and classical possibility, and necessity measures are special cases
of their graded counterparts, it is sensible to omit the adjectives “generalized”
and “graded” from now on.
For the sake of clarity, the following formalization of possibility measures
is based on the assumption that the set of all considered alternatives, X, is
146
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
finite. A few remarks regarding the case when X is infinite are made later in
this section.
Given a finite universal set X and assuming that C = P(X), a possibility
measure, Pos, is a function
Pos :P (X ) Æ [ 0, 1]
that satisfies the following axiomatic requirements:
Axiom (Pos1). Pos(⭋) = 0.
Axiom (Pos2). Pos(X) = 1.
Axiom (Pos3). For any sets A, B Œ P(X),
Pos( A » B) = max{Pos( A), Pos(B)}.
(5.2)
Observe that axioms (Pos1) and (Pos2) are shared by all monotone measures. It is axiom (Pos3) that distinguishes possibility measures from other
monotone measures. Observe also that Eq. (5.2) is the limiting case of inequality (4.2) that holds for all monotone measures.
Recall that each probability measure is uniquely determined by its values
on singletons, expressed by a probability distribution function. It turns our that
each possibility measure is also uniquely expressed by its values on singletons,
expressed by a basic possibility function, as formally stated in the following
theorem.
Theorem 5.1. Every possibility measure, Pos, defined on subsets of a finite set
X is uniquely determined for each A Œ P(X) by a basic possibility function
r : X Æ [ 0, 1]
via the formula
Pos( A) = max{r ( x)}.
(5.3)
x ŒA
Proof. The theorem is proved by induction on the cardinality of set A. Let |A|
= 1. Then, A = {x} for some x Œ X, and Eq. (5.3) is trivially satisfied. Assume
now that Eq. (5.3) is satisfied for |A| = n - 1 and let A = {x1, x1, . . . , xn}. Then,
by Eq. (5.2),
Pos( A) = max{Pos({x1 , x 2 ,... x n-1}), Pos({x n })}
= max max {Pos({x i})}, Pos({x n })
{
iŒ⺞ n -1
= max{Pos({x i})}
}
iŒ⺞ n
= max{r (x i )}.
iŒ⺞ n
䊏
5.2. GRADED POSSIBILITIES
147
Although it can be easily shown that Eq. (5.2) implies monotonicity, this is
even more transparent from Eq. (5.3). Observe also that
max{r ( x)} = 1
x ŒX
(5.4)
due to Axiom (Pos2). This property is usually called a possibilistic
normalization.
In the literature function r is often called a possibility distribution function.
In spite of its common use, this term is avoided in this book since it is
misleading. The term “distribution function” makes perfectly good sense in
probability theory, where any probability distribution function p is required
to satisfy the equation
 p( x) = 1.
x ŒX
That is, each probability distribution function actually distributes the fixed
value of 1 among the individual elements of X. On the contrary, a basic possibility function r does not distribute any fixed value among the elements of X,
as is obvious from the following inequalities
1£
 r(x) £ X .
x ŒX
Basic possibility functions are thus non-distributive and it is misleading to call
them distribution functions.
The property that distinguishes necessity measures from other monotone
measures can be determined from the duality between possibility and necessity measures, as expressed by the following theorem.
Theorem 5.2. Let Pos denote a particular possibility measure defined on
subsets of a finite set X. If Nec is a particular necessity measure that is dual
to Pos via Eq. (5.1), then for any sets A, B ŒP(X)
Nec ( A « B) = min{Nec ( A), Nec (B)}.
(5.5)
Proof
Nec ( A « B) = 1 - Pos( A « B)
= 1 - Pos( A » B )
= 1 - max{Pos( A ), Pos(B )}
= 1 - max{1 - Nec ( A), 1 - Nec (B)}
= 1 - [1 - min{Nec ( A), Nec (B)}]
= min{Nec ( A), Nec (B)}.
䊏
148
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
Observe that the distinguishing property Eq. (5.5) of necessity measures is
the limiting case of the inequality (4.1) that holds for all monotone measures.
Observe also that possibility theory may equally well be formalized via axioms
of necessity measures and possibility measures defined via the duality equation. The following is this alternative formalization.
Given a finite universal set X, a necessity measure, Nec, is a function
Nec :P (X ) Æ [ 0, 1]
that satisfies the following axiomatic requirements:
Axiom (Nec1). Nec(⭋) = 0.
Axiom (Nec2). Nec(X) = 1.
Axiom (Nec3). For any sets A, B Œ P(X),
Nec ( A « B) = min{Nec ( A), Nec (B)}.
Necessity measures do not possess a natural counterpart of the property in
Eq. (5.3), which allows any possibility measure to be fully determined by its
values on singletons. Therefore, the formalization of possibility theory in terms
of possibility measures is more transparent and offers some methodological
advantages.
In addition to their axiomatic properties, Eqs. (5.2) and (5.5), possibility and
necessity measures must also conform to the properties in Eqs. (4.1) and (4.2)
of general monotone measures. This means that
Pos( A « B) £ min{Pos( A), Pos(B)}
(5.6)
Nec ( A » B) ≥ max{Nec ( A), Nec (B)}
(5.7)
and
for all A, B Œ P(X). Moreover,
max{Pos( A), Pos( A )} = 1
(5.8)
min{Nec( A), Nec( A )} = 0.
(5.9)
and
These equations follow directly from the axioms of possibility and necessity
measures, respectively. A direct consequence of Eq. (5.8) is the inequality
Pos( A) + Pos( A ) ≥ 1,
(5.10)
149
5.2. GRADED POSSIBILITIES
and applying to it the duality equation, Eq. (5.1), we obtain the equation
Nec ( A) + Nec ( A ) £ 1.
(5.11)
It can also easily be shown that
Nec ( A) £ Pos( A)
(5.12)
for all A Œ P(X). Clearly, if Pos(A) < 1, then Pos(Ā) = 1 by Eq. (5.8) and
Nec(A) = 0 by Eq. (5.1). Hence, Nec(A) £ Pos(A). If Pos(A) = 1, then trivially
Nec(A) £ Pos(A).
Due to their properties, necessity and possibility measures can be interpreted as lower and upper probabilities, respectively. For any A Œ P(X), the
probability interval has the form
{
( )
[ Nec( A), Pos( A)] = [ 0, Pos A ]
[ Nec( A), 1]
when Pos( A) < 1
when Nec ( A) > 0.
5.2.1. Möbius Representation
In order to investigate the Möbius representation in possibility theory, it is
convenient to introduce the following special notation. It is usually assumed
that the Möbius representation employed is the one based on the lower probability function (i.e., the necessity measure in this case).
Given a basic possibility function on X = {x1, x1, . . . , xn}, where n ≥ 1, it is
assumed that the elements of X are reordered in such a way that
r (x i ) ≥ r (x i+1 )
for all i Œ ⺞n-1. Let r(xi) = ri and let the n-tuple
r = ri i Œ ⺞n
be called a possibility profile of length n. Clearly, r1 = 1 due to possibilistic
normalization. Let
Ai = {x1 , x2 , . . . , xi }
for all i Œ ⺞n, and let m denote the Möbius function based on the necessity
measure. Due to the ordered possibility profile, m(A) π 0 only when A = Ai
for some i Œ ⺞n. Let m(Ai) = mi for all i Œ ⺞n, and let the n-tuple
m = mi i Œ ⺞n
150
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
denote the Möbius representation of the possibility profile. Then,
n
ri = Â mk
(5.13)
k =i
for all i Œ ⺞n. These equations have following form:
r1 = m1 + m2 + m3 + . . . + mi + mi+1 + . . . + mn
r2 =
m2 + m3 + . . . + mi + mi+1 + . . . + mn
M
ri =
mi + mi+1 + . . . + mn
M
rn =
mn .
Solving them for mi(i Œ ⺞n), we obtain
mi = ri - ri+1 ,
(5.14)
where rn+1 = 0 by convention. Now,
i
Nec( Ai ) = Â mk .
(5.15)
k =1
Substituting mk from Eq. (5.14) and recognizing that r1 = 1, we obtain
Nec( Ai ) = 1 - ri +1
(5.16)
for all i Œ ⺞n (with rn+1 = 0). Clearly, Nec(A) = 0 when A π Ai for all i Œ ⺞n.
Moreover, Pos(Ai) = 1 and Pos(An - Ai-1) = ri (where A0 = ⭋ by convention
and An = X) for all i Œ ⺞n.
The special notation introduced in this section and the calculation of mi =
m(Ai), Nec(Ai), Pos(Ai), and Pos(An - Ai) for a given possibility profile on
X = {x1, x1, . . . , xn} is illustrated in Figure 5.2.
EXAMPLE 5.1. A specific example illustrating the special notation and the
various calculations is shown in Figure 5.3. Assuming that the possibility
profile ·r1, r2, r3, r4Ò = ·1.0, 0.8, 0.4, 0.3Ò is given, all values of m(A), Nec(A), and
Pos(A) shown in the figure can be calculated for all A 債 {x1, x2, x3, x4} in two
different ways:
(a) Calculate Pos(A) for all A by Eq. (5.3), calculate Nec(A) for all A by Eq.
(5.1), and calculate m(A) for all A by the Möbius transform Eq. (4.8).
(b) Calculate m(A) for all A by the formula
m( A) =
{0r - r
i
i +1
when A = Ai
otherwise.
5.2. GRADED POSSIBILITIES
r1 = 1 ≥
x1
(A0 = ∆)
≥
r3
x2
A1
mi = m(Ai): 1 – r2
≥
r2
A2
r4
≥
◊◊◊
≥
rn ≥ 0 ( = rn+1)
x3
x4
◊◊◊
xn
A3
A4
◊◊◊
An
◊◊◊
rn
r2 – r3
r3 – r4
r4 – r5
Nec(Ai):
1 – r2
1 – r3
1 – r4
1 – r5
Pos(Ai):
1
1
1
1
◊◊◊
◊◊◊
Pos(An – Ai–1):
1
r2
r3
r4
◊◊◊
151
rn
1
rn
Figure 5.2. Illustration of the special notation for possibility theory.
Calculate then Nec(A) for all A by the inverse Möbius transform,
Eq. (4.9), and calculate Pos(A) for all A by Eq. (5.1).
5.2.2. Ordering of Possibility Profiles
Possibility profiles of the same length can be partially ordered in the following way: given any two possibility profiles of length n,
j
r=
j
k
r=
k
r1 , jr2 , . . . , jrn ,
r1 , k r2 , . . . , k rn ,
we define
j
r £ k r iff
ri £ kri
j
for all i Œ ⺞n. This partial ordering forms a lattice, Rn, on the set of all possibility profiles of a particular length n. Its join, ⁄, and meet, Ÿ, are defined,
respectively, as
j
r ⁄ k r = max{ jri , kri} i Œ ⺞n ,
j
r Ÿ k r = min{ jri , kri} i Œ ⺞n ,
152
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
A:
x1
0
1
0
0
0
1
1
1
0
0
0
1
1
1
0
1
x2
0
0
1
0
0
1
0
0
1
1
0
1
1
0
1
1
x3
0
0
0
1
0
0
1
0
1
0
1
1
0
1
1
1
x4
0
0
0
0
1
0
0
1
0
1
1
0
1
1
1
1
mi = m ( A i )
0.0
0.2
0.0
0.0
0.0
0.4
0.0
0.0
0.0
0.0
0.0
0.1
0.0
0.0
0.0
0.3
Ai
A1
A2
A3
A4
Nec(A)
0.0
0.2
0.0
0.0
0.0
0.6
0.2
0.2
0.0
0.0
0.0
0.7
0.6
0.2
0.0
1.0
Pos(A)
r1
r2
r3
r4
0.0
= 1.0
= 0.8
= 0.4
= 0.3
1.0
1.0
1.0
0.8
0.8
0.4
1.0
1.0
1.0
0.8
1.0
(a)
r1 = 1.0
x1
A1
r2 = 0.8
r3 = 0.4
x2
A2
r4 = 0.3
x3
x4
A3
A4
(b)
Figure 5.3. Illustration to Example 5.1.
for all pairs of possibility profiles of the same length. The smallest possibility
profile, which expresses no uncertainty, is
1, 0, . . . , 0 ;
the greatest one, which expresses total ignorance, is
1, 1, . . ., 1 .
For any jr, kr Œ Rn, if jr £ kr, then kr represents greater uncertainty than jr or,
in other words, jr contains more information then kr.
5.2. GRADED POSSIBILITIES
153
Observe that there exists a one-to-one correspondence between possibility
profiles
r = ri i Œ ⺞n
and their Möbius representations
m = mi i Œ ⺞n
that is expressed by Eqs. (5.13) and (5.14). The natural ordering of possibility
profiles thus induces an ordering of the associated Möbius representations.
In this sense, the smallest and the largest Möbius representations are,
respectively,
1, 0, 0, . . . , 0
and
0, 0, . . . , 0, 1
5.2.3. Joint and Marginal Possibilities
When a basic possibility function r is defined on a Cartesian product X ¥ Y,
its values r(x, y) for all x Œ X and y Œ Y are called joint possibilities. The
associated marginal possibility functions, rX and rY, are defined by the
formulas
rX ( x) = max{r ( x, y)}
(5.17)
rY ( y) = max{r (x, y)}
(5.18)
y ŒY
for all x Œ X, and
x ŒX
for all y Œ Y. These formulas follow directly from Eq. (5.3). To see this, note
that
Pos X ({x}) = Pos({x} ¥ Y )
holds for each x Œ X. Using Eq. (5.3), this equation can be rewritten as Eq.
(5.17). Equation (5.18) can be derived in a similar way.
Alternatives in sets X and Y may be viewed as states of associated variables X and Y, respectively. The marginal possibility functions rX and rY, then,
describe information regarding the individual variables within the theory of
graded possibilities. Similarly, the joint possibility function r describes (within
this theory) information regarding a relation between the variables.
Now assume that the marginal possibility functions rX and rY are given.
These functions are clearly not sufficient to determine the joint possibility
154
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
function r. However, when the variables do not interact, the joint possibility
function r is defined for all x Œ X and y Œ Y by the equation
r ( x, y) = min{rX ( x), rY ( y)}.
(5.19)
This definition is justified in the following way:
(a) For each x Œ X,
Pos({x} ¥ Y ) = PosX ({x}),
and for each y Œ Y
Pos( X ¥ {y}) = PosY ({y}).
(b) For each pair ·x, yÒ Œ X ¥ Y,
Pos({ x, y }) = Pos(({x} ¥ Y ) « (X ¥ { y}))
£ min{Pos({x} ¥ Y ), Pos(X ¥ { y})}
= min{PosX ({x}), PosY ({ y})}.
Hence,
r (x, y) £ min{rX (x), rY ( y)}.
(c) Under the assumption of noninteraction, the joint possibilities must
express the minimal constraint, which means that for each x Œ X and
each y Œ Y the values r(x, y) must be the largest acceptable values. This
immediately results in Eq. (5.19).
EXAMPLE 5.2. Figure 5.4a illustrates the situation in which the joint possibilities r(x, y) are given, and we want to determine the marginal possibilities
rX(x) and rY(y). Using Eq. (5.17), for example,
rX (1) = max{r (1, 0), r (1, 1)}
= max{0.6, 0.4} = 0.6.
Figure 5.4b illustrates a different situation, in which the marginal possibilities
rX(x) and rY(y) are given, and we want to determine the joint possibilities
r¢(x, y) under the assumption of noninteraction of the marginals. Using
Eq. (5.19), we get
r (1, 1) = min{rX (1), rY (1)}
= min{0.6, 0.8} = 0.6,
155
5.2. GRADED POSSIBILITIES
r(x, y)
X
0
1
0
1.0
0.8
1
0.6
0.4
X
rX (x)
0
1.0
1
0.6
Y
Y
rY (y)
0
1
1.0
0.8
(a)
r'(x, y)
X
0
1
0
1.0
0.8
1
0.6
0.6
X
rX (x)
0
1.0
1
0.6
Y
Y
rY ( y )
0
1
1.0
0.8
(b)
Figure 5.4. Illustration to Example 5.2.
for example. Observe that the marginal possibilities in the two cases in Figure
5.4 are the same, but the joint possibilities are different. In fact, among all joint
possibility profiles that are consistent with the marginal possibilities in Figure
5.4, the one based on the assumption of noninteractive marginals, ·r¢(x, y) |
x Œ X, y Œ YÒ, is the largest one.
5.2.4. Conditional Possibilities
Again consider marginal possibility functions rX and rY that describe information regarding variables X and Y. As before, state sets of the variables are
X and Y, respectively. When the variables do not interact, their joint possibility function r is defined by Eq. (5.19), as is explained in Section 5.2.3. When
they do interact, we need to employ appropriate conditional possibility functions, rX|Y and rY|X, to account for the interaction. In general, the joint possibility function can be expressed via either of the following equations:
r (x, y) = min{rY ( y), rX Y (x y)},
(5.20)
r (x, y) = min{rX (x), rY X ( y x)},
(5.21)
The conditional possibility functions are implicitly defined by these functional
equations. By solving the equations for rX|Y and rY|X, we obtain for each
·x, yÒ Œ X ¥ Y the following formulas:
rX Y (x y) =
{[rr((xx,,yy)), 1]
when rY ( y) > r (x, y)
when rY ( y) = r (x, y)
(5.22)
rY X ( y x) =
{[rr((xx,,yy)), 1]
when rX (x) > r (x, y)
when rX (x) = r (x, y).
(5.23)
156
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
Conditional possibilities are thus interval-valued. However, to satisfy the possibilistic normalization, at least one of the intervals must be replaced with its
maximum value of 1. Since there is no rationale for choosing any particular
interval for this replacement, the usual way is to replace each of them with 1.
This unique choice from all the intervals in Eqs. (5.22) and (5.23) is justified
by the principle of maximum uncertainty, which is introduced and discussed
in Chapter 9. Using this principle, we obtain the following definitions of the
conditional possibility functions:
rX Y (x y) =
{r1(x, y)
when rY ( y) > r (x, y)
when rY ( y) = r (x, y)
(5.24)
rY X ( y x) =
{r1(x, y)
when rX (x) > r (x, y)
when rX (x) = r (x, y).
(5.25)
EXAMPLE 5.3. An example of joint and marginal possibility functions on
X ¥ Y is given in Figure 5.5. The associated conditional possibility functions
rX|Y and rY|X are shown in Figure 5.5b and 5.5c, respectively. For the sake of
completeness, intervals determined by Eqs. (5.22) and (5.23) are shown in
these figures. However, according to Eqs. (5.24) and (5.25), each of these
intervals is normally replaced with 1, as required by the possibilistic normalization and the principle of maximum uncertainty.
Conditional possibilities can be used for defining possibilistic independence. Variable X is said to be independent of variable Y (in the possibilistic
sense) iff
rX Y (x y) = rX (x)
(5.26)
for all ·x, yÒ Œ X ¥ Y. Similarly, variable Y is said to be independent of variable
X iff
rY X ( y x) = rY ( y)
(5.27)
for all ·x, yÒ Œ X ¥ Y.
Assume now that Eq. (5.26) holds. Then, by replacing rX|Y in Eq. (5.20) with
rX, we obtain Eq. (5.19). Similarly, assuming that Eq. (5.27) holds and replacing rY|X in Eq. (5.21) with rY, we obtain Eq. (5.19). Hence, possibilistic independence implies possibilistic noninteraction. The converse, however, is not
true. To show this, assume that Eq. (5.19) holds. Then, Eq. (5.20) can be written
as
min{rX (x), rY ( y)} = min{rY ( y), rX Y (x y)}.
5.2. GRADED POSSIBILITIES
r ( xi , y j )
y1
y2
y3
y4
Y
X
rX ( xi )
X
x1
1.0
0.7
0.2
0.0
x2
0.8
0.5
0.6
0.0
x3
0.5
0.4
0.5
0.0
x4
0.4
0.4
0.4
0.2
x5
0.3
0.3
0.0
0.2
x6
0.0
0.0
0.0
0.1
x1
1.0
x2
0.8
x3
0.5
x4
0.4
x5
0.3
x6
0.1
(a)
rX Y ( xi | y j )
Y
y1
y2
y3
y4
X
x1
x2
1.0
0.8
[0.7, 1]
0.5
0.2
[0.6, 1]
0.0
0.0
x3
0.5
0.4
0.5
0.0
x4
x5
0.4
0.3
0.4
0.3
0.4
0.0
[0.2, 1] [0.2, 1]
x6
0.0
0.0
0.0
0.1
(b)
rY X ( y j | xi )
Y
y1
y2
y3
y4
x1
1.0
0.7
0.2
0.0
X
x2
x3
x4
x5
[0.8, 1] [0.5, 1] [0.4, 1] [0.3, 1]
0.5
0.4
[0.4, 1] [0.3, 1]
0.6
[0.5, 1] [0.4, 1]
0.0
0.0
0.0
0.2
0.2
x6
0.0
0.0
0.0
[0.1, 1]
(c)
Figure 5.5. Illustration to Example 5.3.
Solving this equation for rX|Y(x | y), we obtain
rX Y (x y) =
{[rr ((xy)), 1]
X
Y
when rX (x) < rY ( y)
when rY ( y) £ rX (x),
and not Eq. (5.26). Similarly, Eq. (5.21) can be written as
min{rX (x), rY ( y)} = min{rX (x), rY X ( y x)}
and, by solving it for rY|X(y | x), we obtain
rY X ( y x) =
{[rr ((yx)), 1]
Y
X
when rY ( y) < rX (x)
when rX (x) £ rY ( y),
157
Y
rY ( y j )
y1
y2
y3
y4
1.0
0.7
0.6
0.2
158
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
and not Eq. (5.27).
Observing that both probability measures and possibility measures are
uniquely represented by a function on X, it is interesting to compare the
various properties of probability theory and possibility theory. A summary of
this comparison is in Table 5.1.
5.2.5. Possibilities on Infinite Sets
When X is an infinite set and C is an ample field of subsets of X, a possibility
measure, Pos, is a function
Pos :C Æ [ 0, 1]
that satisfies the following axiomatic requirements
Axiom (Pos1). Pos(⭋) = 0.
Axiom (Pos2). Pos(X) = 1.
Axiom (Pos3). For any family C = {Ai | i Œ I}, where I is an arbitrary index set,
PosÊ U Ai ˆ = sup Pos( Ai ).
Ë iŒI ¯
iŒI
In analogy with the finite case, every possibility measure is uniquely determined by a basic possibility function, r, via the formula
Pos( A) = sup r (x)
x ŒA
for each A Œ C. Clearly, r is required to satisfy the possibilistic normalization
sup r (x) = 1.
x ŒX
Alternatively, the theory can be formalized by the following axiomatic
requirements of the dual necessity measure, Nec:
Axiom (Nec1). Nec(⭋) = 0.
Axiom (Nec2). Nec(X) = 1.
Axiom (Nec3). For any family C = {Ai | i ŒI}, where I is an arbitrary index set,
Nec Ê I Ai ˆ = inf Nec ( Ai ).
Ë iŒI ¯ iŒI
It is well established that the possibility measures and necessity measures
are semicontinuous from below and from above, respectively.
Table 5.1. Probability Theory Versus Possibility Theory: Comparison of
Mathematical Properties for Finite Sets
Probability Theory
Possibility Theory
Based on measures of one type:
probability measures, Pro
Based on measures of two types: possibility
measures, Pos, and necessity measures, Nec
Body of evidence consists of
singletons
Body of evidence consists of a family of
nested subsets
Unique representation of Pro by
a probability distribution
function
p : X Æ [0, 1]
via the formula
Pro ( A) =
p( x)
Unique representation of Pro by a basic
possibility function
Â
r : X Æ [0, 1]
via the formula
Pos( A) = max r( x)
x ŒA
x ŒA
Normalization:
p( x) = 1
Normalization:
max r( x) = 1
Â
x ŒX
x ŒX
Additivity:
Pro(A » B) = Pro(A) +
Pro(B) - Pro(A « B)
Max/Min rules:
Pos(A » B) = max {Pos(A), Pos(B)}
Pos(A « B) £ min {Pos(A), Pos(B)}
Nec(A « B) = min {Nec(A), Nec(B)}
Nec(A » B) ≥ max {Nec(A), Nec(B)}
Not applicable
Duality:
Nec(A) = 1 - Pos(Ā)
Pos(A) < 1 fi Nec(A) = 0
Nec(A) > 0 fi Pos(A) = 1
Pro(A) + Pro(Ā) = 1
Pos(A) + Pos(Ā) ≥ 1
Nec(A) + Nec(Ā) £ 1
max {Pos(A), Pos(Ā)} = 1
min {Nec(A), Nec(Ā)} = 0
Total ignorance:
p(x) = 1/|X| for all x Œ X
Total ignorance:
r(x) = 1 for all x Œ X
Conditional probabilities:
p( x, y)
p X Y ( x y) =
pY ( y)
Conditional possibilities:
r( x, y)
when rY ( y) > r( x, y)
rX Y ( X Y ) = ( )
[r x, y ,1]
when rY ( y) = r( x, y)
p X Y ( y x) =
{
p( x, y)
pX ( x)
rY X (Y X ) =
{[rr((xx,,yy)),1]
when rX ( x) > r( x, y)
when rX ( y) = r( x, y)
Probabilistic noninteraction:
p(x, y) = pX (x) · pY (y)
(a)
Possibilistic noninteraction:
r(x, y) = min {rX(x), rY(y)}
Probabilistic independence:
pX|Y(x|y) = pX(x)
Possibilistic independence:
rX|Y(x|y) = rX(x)
pY|X(y|x) = pY(y)
(a) fi (b)
and
(b)
(b) fi (a)
rY|X(y|x) = rY(y)
(b) fi (a)
but
(a) fi
/ (b)
(a)
(b)
160
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
5.2.6. Some Interpretations of Graded Possibilities
Viewing dual pairs of necessity and possibility measures as imprecise probabilities is just one interpretation of possibility theory. Other interpretations,
totally devoid of any connection to probability theory, appear to be even more
fundamental.
Perhaps the most important interpretation of possibility theory is based on
defining possibility grades in terms of grades of membership of relevant fuzzy
sets. This fuzzy-set interpretation of possibility theory is examined in Chapter
8.
Another important interpretation of possibility theory is based on the
concept of similarity. In this interpretation, the possibility r(x) reflects the
degree of similarity between x and an ideal prototype, xi, for which the possibility degree is 1. That is, r(x) is expressed by a suitable distance between x
and xi defined in terms of relevant attributes of the elements involved. The
closer x is to xi according to the chosen distance, the more possible we consider it in this interpretation of possibility theory. In some cases, the closeness
may be determined objectively by a defined measurement procedure. In other
cases, it may be based on a subjective judgment of a person (e.g., an expert in
the application area involved).
A quite common interpretation of possibility theory is founded on special
orderings, £Pos, defined on the power set P(X). For any A, B Œ P(X), A £Pos B
means that B is at least as possible as A. This phrase “at least as possible as”
may, of course, have various special interpretations, such as, for example, “at
least as easy to achieve” or “at most constrained as.” When £Pos satisfies the
requirement
A£
Pos
B fi A » C £
Pos
B»C
for all A, B, C, ŒP(X), it is called a comparative possibility relation. It is known
that the only measures that conform to comparative possibility orderings are
possibility measures. It is also known that for each ordering £Pos there exists a
dual ordering, £Nec, defined by the equivalence
A£
Pos
B¤ A £
Nec
B.
These dual orderings are called comparative necessity relations; the only measures that conform to them are necessity measures.
5.3. SUGENO l-MEASURES
Sugeno l-measures (or just l-measures) are special monotone measures,
m, that are characterized by the following axiomatic requirement: for all A,
B Œ P(X), if A « B = ⭋, then
l
5.3. SUGENO l-MEASURES
l
m ( A » B) = l m ( A) + l m (B) + ll m ( A)l m (B),
161
(5.28)
where l > -1 is a parameter by which different l-measures are distinguished.
Equation (5.28) is usually called a l-rule.
When X is a finite set and values lm({x}) are given for all x Œ X, then it is
obvious that the value lm(A) for any A Œ P(X) can be determined from these
values on singletons by a repeated application of the l-rule. This value can be
expressed succinctly as
l
m ( A) =
1È
’ (1 + ll m ({x}) - 1˘˙˚.
l ÍÎ x ŒA
(5.29)
Observe that, given values lm({x}) for all x Œ X, the value of l can be determined by the requirement that lm(X) = 1. Applying this requirement to Eq.
(5.29) results in the equation
1 + l = ’ (1 + ll m ({x}))
(5.30)
x ŒX
for l. This equation determines the parameter uniquely under the conditions
stated in the following theorem.
Theorem 5.3. Let lm({x}) < 1 for all x Œ X and let lm({x}) > 0 for at least two
elements of X. Then, Eq. (5.30) determines the parameter l uniquely as
follows:
(a) If
Â
l
m ({x}) < 1 , then l is equal to the unique root of the equation in
x ŒX
the interval (0, •).
(b) If
Â
l
m ({x}) = 1 , then l = 0, which is the only root of the equation.
Â
l
m ({x}) > 1 , then l is equal to the unique root of the equation in
x ŒX
(c) If
x ŒX
the interval (-1, 0).
Proof. [Wang and Klir, 1992, pp. 46–47].
Any l-measure is thus completely determined by its values on all singletons, lm({x}), for all x Œ X. Given values lm({x}) for all x Œ X, the value of l
is determined via Eq. (5.30), and values lm(A) for all subsets of X are then
determined by Eq. (5.29). According to Theorem 5.3, three situations are
distinguished:
(a) When
Â
x ŒX
l
m ({x}) < 1 , which means that lm qualifies as a lower
probability, l > 0.
162
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
(b) When
Â
l
m ({x}) = 1 , which means that lm is a classical probability
x ŒX
measure, l = 0.
(c) When  l m ({x}) > 1 , which means that lm qualifies as an upper
x ŒX
probability, l < 0.
䊏
EXAMPLE 5.4. Let X = {x1, x2, x3} and let lm({x1}) = 0.2, lm({x2}) = 0, and
m({x3}) = 0.5. The equation for l has the form
l
1 + l = (1 + 0.2l )(1 + 0.5l ).
Its only solution is l = 3. The remaining values now can be readily determined
by the l-rule: lm({x1, x2}) = 0.2, lm({x1, x3}) = 1, lm({x2, x3}) = 0.5, lm(X) = 1.
When lm is a joint l-measure defined on subsets of X ¥ Y and lmX and lmY
are the associated marginal l-measures, the general equations
l
m X ( A) = l m X ( A ¥ Y ),
(5.31)
l
mY (B) = l m X (X ¥ B)
(5.32)
hold for any A 債 X and any B 債 Y. However, using Eq. (5.29), the relationship between the joint and marginal l-measures can also be expressed for all
x Œ X and all y ŒY by the formulas
l
m X ({x}) =
1È
˘
’ (1 + ll m ({x, y}) - 1˙˚
l ÍÎ y ŒY
(5.33)
l
mY ({ y}) =
1È
(1 + ll m ({x, y}) - 1˘˙.
’
Í
l Î x ŒX
˚
(5.34)
Let lm and l̄ m denote l-measures that represent, respectively, a lower probability and the corresponding upper probability defined for each A Œ P(X) by
the duality equation
l
m ( A) = 1 - l- m ( A ).
(5.35)
To investigate the relationship between l and l̄, the meaning of lm(Ā) for any
l-measure must be determined first. This can be done by applying the l-rule
to sets A and Ā:
l
m ( A » A ) = l m ( A) + l m ( A ) + ll m ( A) l m ( A ).
Recognizing that lm(A » Ā) = lm(X) = 1 and solving this equation for lm(Ā),
we readily obtain
163
5.3. SUGENO l-MEASURES
l
m (A) =
1 - l m ( A)
1 - lm ( A)
(5.36)
and
l
1 - l m (A) =
(1 + l ) m ( A)
.
1 + ll m ( A)
(5.37)
The last equation makes it possible to prove the following theorem.
Theorem 5.4. For any pair of dual l-measures lm and l̄ m
l =-
l
.
1+ l
(5.38)
Proof. For any sets A, B Œ P(X) such that A « B = ⭋, we have
l
l l
l
m ( A) m (B)
1+ l
l
= 1 - l m ( A ) + 1 - l m (B ) [(1 - l m ( A))(1 - l m (B ))]
1+ l
l
l
l
l
(1 + l ) m ( A) (1 + l ) m (B)
(1 + l ) m ( A) m (B)
=
+
l
l
l
l
l
1 + l m ( A) 1 + l m (B)
1 + l m (B)
1 + l m ( A)
m ( A » B) = l m ( A) + l m (B) -
=
[
[1 + l
l
l
l
][
l
]
]
][
]
m ( A) 1 + l m (B)
l
=
[
l
(1 + l ) m ( A) + m (B) + l m ( A) m (B)
l
(1 + l ) m ( A » B)
1 + ll m ( A » B)
= 1 - l m ( A » B)
= l m ( A » B).
䊏
EXAMPLE 5.5. The l-measure in Example 5.4 represents clearly a lower
probability function and, therefore, it should be denoted as lm. Its values (calculated in Example 5.4) are shown in Table 5.2. Also shown in the table are
values of the dual l-measure l̄ m, which represents an upper probability function. The value of l̄ can be calculated either by Eq. (5.30) or by Eq. (5.38).
Equation (5.30) has the form
1 + l = (1 + 0.5l )(1 + 0.8l ),
and its unique solution is l̄ = -0.75. Since l = 3, we obtain
l =-
l
= 0.75
1+ l
by Eq. (5.38). As expected, these results are the same.
164
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
Table 5.2. Illustration to Examples 5.4 and 5.5
A:
l
x2
x3
0
1
0
0
1
1
0
1
0
0
1
0
1
0
1
1
0
0
0
1
0
1
1
1
X
m ({x, y})
Y
0
0.2
0.1
1
0.2
0.0
X
0
1
m X ({x})
0.4
0.2
0
1
Y
l
x1
0
1
l
mY ({y})
0.6
0.1
(a)
00
A: 0
1
0
0
0
1
1
1
0
0
0
1
1
1
0
1
l̄
l
¯
m(A)
0.0
0.2
0.0
0.5
0.2
1.0
0.5
1.0
x, y
01 10 11
0 0 0
0 0 0
1 0 0
0 1 0
0 0 1
1 0 0
0 1 0
0 0 1
1 1 0
1 0 1
0 1 1
1 1 0
1 0 1
0 1 1
1 1 1
1 1 1
m(A)
0.0
0.5
0.0
0.8
0.5
1.0
0.8
1.0
l
m (A)
0.0
0.2
0.2
0.1
0.0
0.6
0.4
0.2
0.4
0.2
0.1
1.0
0.6
0.4
0.4
1.0
l
m (A)
0.0
0.6
0.6
0.4
0.0
0.9
0.8
0.6
0.8
0.6
0.4
1.0
0.9
0.8
0.8
1.0
(b)
Figure 5.6. Illustration to Example 5.6.
EXAMPLE 5.6. Consider the values of the joint l-measure on singletons that
are given in Figure 5.6a. Since
Â
l
m ({ x, y }) < 1,
x ,y ŒX ¥ Y
the l-measure represents a lower probability, which is characterized by a parameter l > 0. The value of l is determined by the unique positive root of the
equation
1 + l = (1 + 0.2l )(1 + 0.2l )(1 + 0.1l ),
165
5.3. SUGENO l-MEASURES
which is l = 5. The marginal lower probability functions lmX and lmY, which are
also shown in Figure 5.6a, are readily determined from lm by the l-rule. For
example,
l
l
m X ({0}) = l m ({ 0, 0 }) + l m ({ 0, 1 }) + ll m ({ 0, 0 }) m ({ 0 , 1 }) = 0.4.
The joint lower probabilities lm(A) are determined for all A 債 X ¥ Y from the
values lm({·x, yÒ}) by the l-rule or by Eq. (5.29). The joint upper probabilities
l̄
m(A) are determined from the lower probabilities by the duality equation,
Eq. (5.35). The results are shown in Figure 5.6.
5.3.1. Möbius Representation
The Möbius representation of l-measure lm (l > 0) can be expressed by a
special formula, which is a consequence of the l-rule. This formula is a subject
of the following theorem.
Theorem 5.5. The Möbius representation, lm, of a l-measure lm (l > 0) defined
on subsets of a finite set X can be expressed for all A Œ P(X) by the formula
l
ÔÏ l
m( A) = Ì
ÔÓ 0
A -1
’
l
x ŒA
m ({x})
when A π ∆
when A = ∆.
(5.39)
Proof. Using Eq. (5.29), we have
l
1
l
1
=
l
l
È
˘
ÍÎ ’ 1 + l m ({x}) - 1˙˚
x ŒA
È
È B
˘˘
l
Í Â ÎÍ l ’ m ({x})˚˙ ˙
Î ∆ÃB Õ A
˚
x ŒB
È B -1
˘
l
m ({x})˙
= Â Íl
’
˚
x ŒB
∆ÃB Õ A Î
= Â l m(B).
m ( A) =
(
)
BÕ A
Since
l
m (X ) = 1,
Â
BÕX
l
representation of m.
l
m(B) = 1. Hence, lm, defined by Eq. (5.29) is the Möbius
䊏
It is obvious from Eq. (5.39) that lm(A) ≥ 0 for all A Œ P(X). Hence, lm is
a Choquet capacity of order infinity and its dual, l̄ m, is an alternate capacity
of order infinity. This means that imprecise probabilities formalized in terms
of l-measures are also subject to the rules of those formalized in terms of
capacities of order •, which are examined in Section 5.4.
166
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
5.4. BELIEF AND PLAUSIBILITY MEASURES
According to a generally accepted terminology in the literature, belief measures are Choquet capacities of order • and plausibility measures are alternating capacities of order •. A theory based on dual pairs of these measures
is usually referred to as evidence theory or Dempster–Shafer theory (DST). The
latter name, which seems more common in recent publications, is adopted in
this book.
Given a universal set X, assumed here to be finite, a belief measure, Bel, is
a function
Bel :P (X ) Æ [ 0, 1]
such that Bel(⭋) = 0, Bel(X) = 1, and
Bel ( A1 » A2 » ... » An ) ≥ Â Bel ( A j ) - Â Bel ( A j « Ak )
j< k
j
+ ... + (-1)
n+1
Bel ( A1 « A2 « ... « An ) (5.40)
for all possible families of subsets of X. When X is infinite, the domain of Bel
is a family C of subsets of X with an appropriate algebraic structure (s-algebra,
etc.) and it is required that Bel be semicontinuous from above.
For each A Œ P(X), Bel(A) is interpreted as the degree of belief (based on
available evidence) that the true alternative of X (prediction, diagnosis, etc.)
belongs to the set A. We may also view the various subsets of X as answers to
a particular question. We assume that some of the answers are correct, but we
do not know with full certainty which ones they are. In DST, X is often called
a frame of discernment. When the sets A1, A2, . . . , An in Eq. (5.40) are pairwise disjoint, the inequality requires that the degree of belief associated with
the union of the sets is not smaller than the sum of the degrees of belief pertaining to the individual sets. This basic property of belief measures is thus a
weaker version of the additivity property of probability measures. This implies
that probability measures are special cases of belief measures for which the
equality in Eq. (5.40) is always satisfied.
Let A1 = A and A2 = Ā in Eq. (5.40) for n = 2. Since A » Ā = X, A « Ā =
⭋, and it is required that Bel(X) = 1 and Bel(⭋) = 0, we can immediately derive
from Eq. (5.40) the following fundamental property of belief measures:
Bel ( A) + Bel ( A ) £1.
A plausibility measure is a function
Pl :P ( X ) Æ [0, 1]
(5.41)
5.4. BELIEF AND PLAUSIBILITY MEASURES
167
such that Pl(⭋) = 0, Pl(X) = 1, and
Pl ( A1 « A2 « ... « An ) £
 Pl ( A ) -  Pl ( A
j
j
» Ak )
j< k
j
+ ... + (-1)
n+1
Pl ( A1 » A2 » ... » An )
(5.42)
for all possible families of subsets of X. When X is infinite, function Pl is also
required to be semicontinuous from below.
Let n = 2, A1 = A and A2 = Ā in Eq. (5.42). Since A » Ā = X, A « Ā = ⭋,
and it is required that Pl(X) = 1 and Pl(⭋) = 0, we immediately obtain the following basic inequality of plausibility measures from Eq. (5.42):
Pl ( A) + Pl ( A ) ≥ 1.
(5.43)
Belief measures and plausibility measures are dual in the usual sense. That
is,
Pl ( A) = 1 - Bel ( A )
(5.44)
for all A Œ P(X). Pairs of these dual measures form the basis of DST.
Any belief measure, Bel, can of course be represented by its Möbius representation, m, which is obtained for each A Œ P(X) by the usual formula
m( A) =
Â
(-1)
A- B
Bel (B).
(5.45)
B BÕ A
In DST, it is guaranteed that m(A) ≥ 0. Due to this special property, function m is usually called a basic probability assignment in DST. For each set
A Œ P(X), the value m(A) expresses the proportion to which all available and
relevant evidence supports the claim that a particular element of X, whose
characterization in terms of relevant attributes is deficient, belongs to the set
A. This value, m(A), pertains solely to one set, set A; it does not imply any
additional claims regarding subsets of A. If there is some additional evidence
supporting the claim that the element belongs to a subset of A, say B Ã A, it
must be expressed by another value m(B).
Since values m(A) are positive and add to 1 for all A Œ P(X), function m
resembles a probability distribution function. However, there is a fundamental difference between probability distribution functions in probability theory
and basic probability assignments in DST: the former are defined on X, while
the latter are defined on P(X). Observe also that none of the properties of
monotone measures are required for function m. It is thus not a measure. To
obtain a monotone measure, elementary pieces of evidence expressed by
values m(A) must be properly aggregated. Two obvious aggregations, which
result in a belief measure and a plausibility measure, are expressed for all
A Œ P(X) by the formulas
168
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
Bel ( A) =
Â
m(B),
(5.46)
B BÕ A
Pl ( A) =
Â
m(B).
(5.47)
B A « B π∆
Equation (5.46) is, of course, the inverse of the Möbius transform in Eq.
(5.45) [compare with Eqs. (4.9) and (4.8)], and Eq. (5.47) follows from the
duality equation, Eq. (5.44). Clearly,
Pl ( A) = 1 - Bel ( A )
= 1 - Â m(B)
B BÕ A
=
=
Â
Â
–
B B債 A
m(B)
m(B).
B A « B π∆
The relationship between m(A) and Bel(A), expressed by Eq. (5.46), has
the following meaning.The value of m(A) characterizes the degree of evidence
that the true alternative is in set A, but is does not take into account any
additional evidence for the various subsets of A. The associated value Bel(A)
represents the total evidence (or belief) that the true alternative is in set A,
which is obtained by adding degrees of evidence for the set itself, as well as
for any of its subsets. The value Pl(A), as expressed by Eq. (5.47), has a different meaning. It represents not only the total evidence that the true alternative is in set A, but also partial (or plausible) evidence for the set that is
associated with any set that overlaps with A. From Eqs. (5.46) and (5.47),
clearly,
Pl ( A) £ Bel ( A)
(5.48)
for all A Œ P(X).
Given a basic probability assignment m on P(X), every set A for which
m(A) > 0 is called a focal set (or focal element). The pair ·F, mÒ, where
F denotes the family of all focal sets induced by m, is called a body of
evidence.
Total ignorance is expressed in terms of the basic probability assignment by
m(X) = 1 and m(A) = 0 for all A π X. That is, we know that the element is in
the universal set, but we have no evidence about its location in any subset of
X. It follows from Eq. (5.46) that the expression of total ignorance in terms of
the corresponding belief measure is exactly the same: Bel(X) = 1 and Bel(A)
= 0 for all A π X. However, the expression of total ignorance in terms of the
associated plausibility measure (obtained from the belief measure by Eq.
(5.44) is quite different: Pl(⭋) = 0 and Pl(A) = 1 for all A π 0. This expression
follows directly from Eq. (5.47).
169
5.4. BELIEF AND PLAUSIBILITY MEASURES
Table 5.3. Conversion Formulas in the Dempster–Shafer Theory
m
m(A) =
Bel(A) =
m(A)
 m( B)
Bel
 ( -1)
A- B
Pl
 (-1)
Bel( B)
A -B
BÕ A
BÕ A
Bel(A)
1 - Pl(Ā )
[1 - Pl( B)]
BÕ A
Pl(A) =
 m( B)
 m( B)
AÕ B
 (-1)
 (1)
Q( B)
Q( B)
AÕB
B
BÕ A
1 - Bel(Ā )
 (-1)
Pl(A)
B « A π∆
Q(A) =
Q
A -B
B+1
Q( B)
∆π BÃA
 ( -1)
B
 (-1)
Bel( B )
B +1
Pl( B)
Q(A)
∆π BÃA
BÃA
In addition to the basic probability assignment, belief measure, and plausibility measure, it is also sometimes convenient to use the function
Q( A) =
Â
m(B),
(5.49)
B AÕ B
which is called a commonality function. For each A Œ P(X), the value Q(A)
represents the total portion of belief that can move freely to every point of A.
Function Q is an example of a monotone decreasing measure. It can be used
for representing given evidence, since it is uniquely convertible to functions
m, Bel, and Pl. All conversion formulas between the four functions are given
in Table 5.3.
5.4.1. Joint and Marginal Bodies of Evidence
Given joint belief and plausibility measures, Bel and Pl, defined on subsets of
X ¥ Y, the associated marginal measures are determined for each A Œ X and
B Œ Y by the formulas
Bel X ( A) = Bel ( A ¥ Y ),
(5.50)
BelY (B) = Bel ( X ¥ B),
(5.51)
Pl X ( A) = Pl ( A ¥ Y ),
(5.52)
PlY (B) = Pl ( X ¥ B).
(5.53)
However, the relationship between the joint and marginal bodies of evidence
can also be expressed in terms of the basic probability assignment. To do that,
we need to define its projections for each subset C of X ¥ Y,
170
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
C X = {x Œ X
x, y ŒC for some y ŒY },
CY = { y ŒY
x, y ŒC for some x Œ X },
on X and Y, respectively. Then,
mX ( A) =
Â
m(C ) for all A ŒP (X ),
(5.54)
Â
m(C ) for all B ŒP (Y ).
(5.55)
C A =C X
and
mY (B) =
C A =C Y
Given marginal bodies of evidence, ·FX, mXÒ and ·FY, mYÒ, are said to be
noninteractive if and only if for all A Œ FX, B Œ FY, and C Œ P(X ¥ Y)
m(C ) =
{0m (A)◊ m (B)
X
Y
when C = A ¥ B
otherwise.
(5.56)
That is, if marginal bodies of evidence are noninteractive, then the only focal
sets of the joint body of evidence are Cartesian products of the marginal focal
sets.
EXAMPLE 5.7. Given the marginal bodies of evidence in Table 5.4a, the joint
body of evidence in Table 5.4b has been calculated under the assumption that
the marginal ones are noninteractive. Using Eq. (5.56), we obtain, for example,
for the first and seventh joint focal sets in Table 5.4b:
m({x 2 , x3 } ¥ { y2 , y3 }) = mX ({x 2 , x3 }) ◊ mY ({ y2 , y3 }) = 0.25 ¥ 0.25 = 0.0625
m({x1 , x3 } ¥ { y1}) = mX ({x1 , x3 }) ◊ mY ({ y1}) = 0.15 ¥ 0.25 = 0.0375.
5.4.2. Rules of Combination
Evidence obtained in the same context from two independent sources (for
example, from two experts in the field of inquiry or from two sensors) and
expressed by two basic probability assignments m1 and m2 on some power set
P(X) must be appropriately aggregated to obtain the combined basic probability assignment m1,2. In general, evidence can be combined in various ways,
some of which can take into consideration the reliability of the sources and
other relevant aspects. The standard way of combing evidence is expressed by
the formula
m1,2 ( A) =
Â
m1 (B) ◊ m2 (C )
B «C = A
1-c
(5.57)
171
5.4. BELIEF AND PLAUSIBILITY MEASURES
Table 5.4. Illustration of Noninteractive Marginal Bodies of Evidence (Example 5.7)
(a) Given Marginal Bodies of Evidence
mX(A)
X
A:
x1
x2
x3
0
1
1
1
1
0
1
1
1
1
0
1
mY(B)
Y
0.25
0.15
0.30
0.30
B:
y1
y2
y3
0
1
1
1
0
1
1
0
1
0.25
0.25
0.50
(b) Joint Body of Evidence Under the Assumption That the Given Marginal Bodies
of Evidence Are Noninteractive
X¥Y
C:
m(C)
x1
y1
x1
y2
x1
y3
x2
y1
x2
y2
x2
y3
x3
y1
x3
y2
x3
y3
0
0
0
0
0
0
1
1
1
1
1
1
0
0
0
1
1
1
0
0
0
1
1
1
0
0
0
1
1
1
0
0
0
1
1
1
0
1
1
0
0
0
0
1
1
0
1
1
1
0
1
0
1
1
0
0
0
0
1
1
1
0
1
0
1
1
0
0
0
0
1
1
0
1
1
0
0
0
1
0
1
1
0
1
1
0
1
1
0
1
0
0
0
1
0
1
1
0
1
1
0
1
0
0
0
1
0
1
0.0625
0.0625
0.1250
0.0375
0.0750
0.0750
0.0375
0.0750
0.0750
0.0750
0.1500
0.1500
for all A π ⭋, and m1,2 (⭋) = 0, where
c=
Â
m1 (B) ◊ m2 (C ).
(5.58)
B «C =∆
Equation (5.57) is referred to as Dempster’s rule of combination. According to this rule, the degree of evidence m1(B) from the first source that focuses
on set B Œ P(X) and the degree of evidence m2(C) from the second source
that focuses on set C Œ P(X) are combined by taking the product m1(B) ·
m2(C), which focuses on the intersection B « C. This is exactly the same way
in which the joint probability distribution is calculated from two independent
marginal distributions; consequently, it is justified on the same grounds.
However, since some intersections of focal sets from the first and second
source may result in the same set A, we must add the corresponding products
172
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
to obtain m1,2(A). Moreover, some of the intersections may be empty. Since it
is required that m1,2(⭋) = 0, the value c expressed by Eq. (5.58) is not included
in the definition of the joint basic probability assignment m1,2. This means that
the sum of products m1(B) · m2(C) for all focal sets B of m1 and all focal sets
C of m2 such that B « C π ⭋ is equal to 1 - c. To obtain a normalized basic
probability assignment m1,2, as required, we must divide each of these products by this factor 1 - c, as indicated in Eq. (5.57).
EXAMPLE 5.8. Assume that an old painting was discovered that strongly
resembles paintings of Raphael. Such a discovery is likely to generate various
questions regarding the status of the painting. Assume the following three
questions:
1. Is the discovered painting a genuine painting by Raphael?
2. Is the discovered painting a product of one of Raphael’s many disciples?
3. Is the discovered painting a counterfeit?
Let R, D, and C denote subsets of our universal set X—the set of all paintings—which contain the set of all paintings by Raphael, the set of all paintings by disciples of Raphael, and the set of all counterfeits of Raphael’s
paintings, respectively.
Assume now that two experts performed careful examinations of the painting independently of each other and subsequently provided us with basic probability assignments m1 and m2, specified in Table 5.5. These are the degrees of
evidence that each expert obtained by the examination and that support the
various claims that the painting belongs to one of the sets of our concern. For
example, m1(R » D) = 0.15 is the degree of evidence obtained by the first
expert that the painting was done by Raphael himself or that the painting was
done by one of this disciples. Using Eq. (5.46), we can easily calculate the total
evidence, Bel1 and Bel2, in each set, as shown in Table 5.5.
Table 5.5. Illustration of the Dempster Rule of Combination (Example 5.8)
Focal Sets
R
D
C
R»D
R»C
D»C
R»D»C
Expert 1
Expert 2
Combined
evidence
m1
Bel1
m2
Bel2
m1,2
Bel1,2
0.05
0.00
0.05
0.15
0.10
0.05
0.60
0.05
0.00
0.05
0.20
0.20
0.10
1.00
0.15
0.00
0.05
0.05
0.20
0.05
0.50
0.15
0.00
0.05
0.20
0.40
0.10
1.00
0.21
0.01
0.09
0.12
0.20
0.06
0.31
0.21
0.01
0.09
0.34
0.50
0.16
1.00
173
5.4. BELIEF AND PLAUSIBILITY MEASURES
Applying Dempster’s rule of combination to m1 and m2, we obtain the joint
basic assignment m1,2, which is also shown in Table 5.5. To determine the values
of m1,2, we calculate the normalization factor 1 - c first. Applying Eq. (5.58),
we obtain
c = m1 (R) ◊ m2 (D) + m1 (R) ◊ m2 (C ) + m1 (R) ◊ m2 (D » C ) + m1 (D) ◊ m2 (R)
+ m1 (D) ◊ m2 (C ) + m1 (D) ◊ m2 (R » C ) + m1 (C ) ◊ m2 (R) + m1 (C ) ◊ m2 (D)
+ m1 (C ) ◊ m2 (R » D) + m1 (R » D) ◊ m2 (C ) + m1 (R » C ) ◊ m2 (D)
+ m1 (D » C ) ◊ m2 (R)
= 0.03.
The normalization factor is then 1 - c = 0.97. Values of m1,2 are calculated by
Eq. (5.57). For example,
m1,2 (R) = [ m1 (R) ◊ m2 (R) + m1 (R) ◊ m2 (R » D) + m1 (R) ◊ m2 (R » C )
+ m1 (R) ◊ m2 (R » D » C ) + m1 (R » D) ◊ m2 (R)
+ m1 (R » D) ◊ m2 (D » C ) + m1 (R » C ) ◊ m2 (R)
+ m1 (R » C ) ◊m
m2 (R » D) + m1 (R » D » C ) ◊ m2 (R)] / 0.97
= 0.21,
m1,2 (D) = [ m1 (D) ◊ m2 (D) + m1 (D) ◊ m2 (R » D) + m1 (D) ◊ m2 (D » C )
+ m1 (D) ◊ m2 (R » D » C ) + m1 (R » D) ◊ m2 (D)
+ m1 (R » D) ◊ m2 (D » C ) + m1 (D » C ) ◊ m2 (D)
+ m1 (D » C ) ◊ m2 (R » D) + m1 (R » D » C ) ◊ m2 (D)] / 0.97
= 0.01,
m1,2 (R » C ) = [ m1 (R » C ) ◊ m2 (R » C ) + m1 (R » C ) ◊ m2 (R » D » C )
+ m1 (R » D » C ) ◊ m2 (R » C ) / 0.97
= 0.2,
m1,2 (R » C » C ) = [ m1 (R » D » C ) ◊ m2 (R » D » C ) / 0.97
= 0.31,
and similarly for the remaining focal sets, C, R » D, and D » C. The joint basic
assignment can now be used to calculate the joint belief Bel1,2 (Table 5.5) and
Pl1,2.
The Dempster rule of combination is well justified when c = 0 in Eq. (5.57).
However, it is controversial when c π 0 (see Note 5.6). This happens when evidence obtained from distinct sources is conflicting. In fact, the value of c can
be viewed as the degree of this conflict. An alternative rule of combination,
which is epistemologically sound and, hence, eliminates the controversies of
the Dempster rule, works as follows:
when A π ∆ and
Ï Â m1 (B) ◊ m2 (C )
ÔÔ B «C = A
m1,2 ( A) = Ì m1 (X ) ◊ m2 (X ) + Â m1 (B) ◊ m2 (C ) when A = X
B «C =∆
Ô
ÔÓ 0
when A = ∆.
Aπ X
(5.59)
174
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
Table 5.6. Comparison of the Introduced Rules of Combination (Example 5.9)
Focal Sets
A
Ā
X
Source 1
m1
Source 2
m2
Eqs. (5.57) and (5.58)
m1,2
Eq. (5.59)
m1,2
0.01
0.80
0.19
0.90
0.00
0.10
0.646
0.286
0.068
0.181
0.080
0.739
According to this alternative rule of combination, m1,2 is normalized by moving
c to m1,2(X). This means that the conflict between the two sources of evidence
is not hidden, but it is explicitly recognized as a contributor to our ignorance.
EXAMPLE 5.9. Given some universal set X, assume that we are only interested in finding whether the true alternative is in set A or its complement Ā.
Assume further that we have evidence from two independent sources shown
in Table 5.6. Also shown in the table is the combined evidence obtained by the
Dempster rule of combination, expressed by Eqs. (5.57) and (5.58), and by the
alternative rule of combination, expressed by Eq. (5.59). In both cases, we first
calculate
c = m1 ( A) ◊ m2 ( A ) + m1 ( A ) ◊ m2 ( A) = 0.72.
Using the alternative rule, we then calculate
m1,2 ( A) = m1 ( A) ◊ m2 ( A) + m1 ( A) ◊ m2 (X ) + m1 (X ) ◊ m2 ( A) = 0.181,
m1,2 ( A ) = m1 ( A ) ◊ m2 ( A ) + m1 ( A ) ◊ m2 (X ) + m1 (X ) ◊ m2 ( A ) = 0.180.
Finally, m1,2(X) = 1 - m1,2(A) - m1,2(Ā) = 0.739. This value can be calculated
directly as
m1,2 ( X ) = m1 ( X ) ◊ m2 ( X ) + c = 0.739.
Using the Dempster rule, we normalize the previously obtained values of
m1,2(A) and m1,2(Ā) by dividing them by 1 - c = 0.28. This results in the values
given in Table 5.6.
5.4.3. Special Classes of Bodies of Evidence
Consider a body of evidence ·F, mÒ in the sense of DST. If the associated belief
measure is also additive, then it is a classical probability measure. The following theorem establishes necessary and sufficient conditions for probabilistic
bodies of evidence.
5.4. BELIEF AND PLAUSIBILITY MEASURES
175
Theorem 5.6. A belief measure Bel on a finite power set P(X) is a probability measure if and only if its basic assignment m is given by m({x}) = Bel({x})
and m(A) = 0 for all subsets A or X that are not singletons.
Proof. Assume that Bel is a probability measure. For the empty set ⭋, the
theorem trivially holds, since m(⭋) = 0 by the definition of m. Let A π ⭋ and
assume A = {x1, x2, . . . , xn}. Then by repeated application of additivity, we
obtain
Bel ( A) = Bel ({x1}) + Bel ({x 2 , x3 , . . . , x n })
= Bel ({x1}) + Bel ({x 2 }) + Bel ({x3 , x 4 , . . . , x n })
M
= Bel ({x1}) + Bel ({x 2 }) + . . . = Bel ({x n }).
Since Bel({x}) = m({x}) for any x ŒX, by Eq. (5.46), we have
n
Bel ( A) = Â m({xi }).
i =1
Hence, Bel is defined in terms of a basic probability assignment that focuses
only on singletons.
Assume now that a basic probability assignment m is given such that
 m({x}) = 1.
x ŒX
Then for any sets A, B ŒP(X) such that A « B = ⭋, we have
 m({x}) +  m({x})
= Â m({x}) = Bel ( A » B)
Bel ( A) + Bel (B) =
x ŒA
x ŒB
x ŒA » B
and, consequently, Bel is a probability measure.
䊏
Given a body of evidence ·F, mÒ, let F = {A1, A2, . . . , An}. When
Ap (1) Ã Ap ( 2 ) Ã . . . Ã Ap ( n )
for some permutation p of ⺞n (i.e., when F is a nested family of focal sets),
the body of evidence is called consonant. This name appropriately reflects the
fact that the degrees of evidence allocated to focal sets that are nested do conflict with one another in a minimal way. Belief and plausibility measures associated with consonant bodies of evidence have special properties that are
characterized by the following theorem.
176
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
Theorem 5.7. Given a consonant body of evidence ·F, mÒ, the associated consonant belief and plausibility measures possess the following properties:
(i) Bel(A « B) = min{Bel(A), Bel(B)} for all A, B Œ P(X);
(ii) Pl(A » B) = max{Pl(A), Pl(B)} for all A, B Œ P(X).
Proof. (i) Since the focal elements in F are nested, they may be linearly
ordered by the subset relation. Let F = {A1, A2, . . . , An} and assume that Ai Ã
Aj whenever i < j. Now consider arbitrary subsets A and B of X. Let i1 be the
largest integer i such that Ai à A, and let i2 be the largest integer i such that
Ai à B. Then Ai à A and Ai à B if and only if i £ i1 and i £ i2, respectively.
Moreover, Ai à A « B if and only if i £ min{i1, i2}. Hence,
Bel ( A « B) =
min { i 1, i2}
Â
m( Ai )
i=1
i
i
2
Ï 1
¸
= minÌ Â m( Ai ), Â m( Ai )˝
˛
Ó i=1
i=1
= min{Bel ( A), Bel (B)}.
(ii) Assume that (i) holds. Then by Eq. (5.44),
(
)
Pl ( A » B) = 1 - Bel A » B
= 1 - Bel ( A « B )
= 1 - min{Bel ( A ), Bel (B )}
= max{1 - Bel ( A ), 1 - Bel (B )}
= max{Pl ( A), Pl (B)}.
for all A, B ŒP(X).
䊏
It follows from this theorem that consonant belief and plausibility
measures are the same functions as necessity and possibility measures,
respectively.
Necessity and possibility measures are thus special belief and plausibility
measures, respectively, which are characterized by nested bodies of evidence.
Given any nested bodies of evidence, they can be manipulated either by
the calculus of possibility theory or by calculus of DST. In the former case, the
resulting bodies of evidence are again nested, and hence, we remain within the
domain of possibility theory. In the latter case, they may not be nested, which
means that we leave the domain of possibility theory. To operate within possibility theory thus requires the use of rules of possibilistic calculus, which in
general are different from the rules of the calculus of DST. Possibility theory
is thus a special branch of DST only at the level of representation, but not at
the calculus level. This is illustrated by the following example.
5.4. BELIEF AND PLAUSIBILITY MEASURES
177
Table 5.7. Noninteraction in Possibility Theory Versus Noninteraction in DST
(Example 5.10)
X¥Y
A:
Eq. (5.19)
Eq. (5.56)
x1
y1
x1
y2
x2
y1
x2
y2
m(A)
Bel(A)
Pl(A)
m(A)
Bel(A)
Pl(A)
0
1
0
0
0
1
1
1
0
0
0
1
1
1
0
1
0
0
1
0
0
1
0
0
1
1
0
1
1
0
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
1
1
1
0
0
0
0
1
0
0
1
0
1
1
0
1
1
1
1
0.0
0.2
0.0
0.0
0.0
0.0
0.2
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.6
0.0
0.2
0.0
0.0
0.0
0.2
0.4
0.2
0.0
0.0
0.0
0.4
0.2
0.4
0.0
1.0
0.0
1.0
0.6
0.8
0.6
1.0
1.0
1.0
0.8
0.6
0.8
1.0
1.0
1.0
0.8
1.0
0.00
0.08
0.00
0.00
0.00
0.12
0.32
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.48
0.00
0.08
0.00
0.00
0.00
0.20
0.40
0.08
0.00
0.00
0.00
0.52
0.20
0.40
0.00
1.00
0.00
1.00
0.60
0.80
0.48
1.00
1.00
1.00
0.92
0.60
0.80
1.00
1.00
1.00
0.92
1.00
EXAMPLE 5.10. Consider the following marginal basic probability assignments, mX and mY, on X = {x1, x2} and Y = {y1, y2}:
mX ({x1}) = 0.2,
mX (X ) = 0.8,
mY ({ y1}) = 0.4,
mY (Y ) = 0.6.
Clearly, both bodies of evidence induced by mX and mY are nested. Assuming
their noninteraction, we can calculate the associated joint body of evidence by
applying either the rule of possibility theory, expressed by Eq. (5.19), or the
rule of DST, expressed by Eq. (5.56). The two results, which are very different, are shown in Table 5.7. While using Eq. (5.19) results in the nested family
of focal sets,
{{ x1 , y1 }, { x1 , y1 , x2 , y1 }, X },
the family of focal sets obtained by Eq. (5.56) is not nested, as it contains two
focal sets,
{ x1 , y1 , x1 , y2 }
and
{ x1 , y1 , x 2 , y1 },
neither of which is a subset of the other one.
178
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
Another special class of bodies of evidence in DST consists of those that
are based on l-measures. Again, the rules of l-measures must be followed in
order to remain within this special class.
5.5. REACHABLE INTERVAL-VALUED
PROBABILITY DISTRIBUTIONS
In the theory of imprecise probabilities based on l-measures (Section 5.3), the
dual lower and upper probability measures are fully characterized by either
the lower probability function or the upper probability function on singletons.
This means that |X| values of one of the dual measures are sufficient to determine all other values of both measures. In the theory of imprecise probabilities discussed in this section, the lower and upper probability functions are
determined by both the lower and upper probability functions on singletons.
That is, 2 ¥ |X| values (all for singletons) are needed to determine all other
values of both measures. In other words, the lower and upper probability measures are determined in this case by probability distribution functions that are
interval-valued.
Given a finite set X = {x1, x2, . . . , xn} of considered alternatives, let
I = [l i , ui ] i Œ ⺞n
denote an n-tuple of probability intervals defined on individual alternatives
xi Œ X. This n-tuple may be viewed as an interval-valued probability distribution on X provided that the inequalities
0 £ l i £ ui £ 1
are satisfied for all i Œ ⺞n. Associated with I is a convex set
Ï
¸
D = Ì p l i £ p(x i ) £ ui , i Œ ⺞n ,  p(x i ) = 1˝
˛
Ó
x i ŒX
(5.60)
of probability distribution functions p on X whose values are bounded for each
xi Œ X by values li and ui. Clearly, D π ⭋ if and only if
n
 l £ 1,
i
(5.61)
i=1
and
n
 u ≥ 1.
i
i=1
(5.62)
5.5. REACHABLE INTERVAL-VALUED PROBABILITY DISTRIBUTIONS
179
Tuples of probability intervals that satisfy these inequalities are called proper
(or reasonable).These are obviously the only meaningful interval-valued probability distributions to represent imprecise probabilities.
Set D defined by Eq. (5.60) forms always an (n - 1)-dimensional polyhedron. Any point (a probability distribution) in this polyhedron can
be expressed as a linear combination of its extreme points (vertices, corners).
It is known that the number c of these extreme points is bounded by the
inequalities
n £ c £ n(n - 1).
(5.63)
From the probability distributions in set D, the lower and upper probability measures, l and u, are defined for all A ŒP(X) in the usual way (recall Eqs.
(4.11) and (4.12)):
 p( x ),
(5.64)
u( A) = sup  p( xi ).
(5.65)
l ( A) = inf
p ŒD
i
xi ŒA
p ŒD xi ŒA
Due to the definition of set D,
p({x i}) ≥ l i
and
p({x i}) £ ui
(5.66)
for all i Œ ⺞n. For consistency of the two representations of imprecise probabilities, one based on I and one based on D, it is desirable, if possible, to modify
the lower and upper bounds li and ui to obtain the equalities in Eq. (5.66)
without changing the set D. It is easy to see that the equalities in Eq. (5.66)
are obtained if and only if the n-tuple I satisfies the inequalities
Âl
j
+ ui £ 1,
(5.67)
+ l i ≥ 1,
(5.68)
jπ i
Âu
j
jπ i
for all i Œ ⺞n. These conditions guarantee for each i Œ ⺞n the existence of probability distribution functions ip and iq in D such that
i
p(x i ) = ui
and
l j £ i p(x i ) £ u j
for all j π i,
i
q(x i ) = l i
and
l j £ i q(x i ) ≥ u j
for all j π i.
These functions guarantee, in turn, that the equalities in Eq. (5.66) are reached.
Tuples I for which the inequalities (5.67) and (5.68) are satisfied are thus called
reachable (or feasible).
180
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
It is well established in the literature (see Note 5.8) that any given tuple of
probability intervals I can be converted to its reachable counterpart,
I ¢ = {[l i¢, ui¢] i Œ ⺞n },
via the formulas
Ï
¸
li¢ = maxÌli , 1 - Â u j ˝,
Ó
˛
j πi
(5.69)
Ï
¸
ui¢ = min Ìui , 1 - Â u j ˝.
Ó
˛
j πi
(5.70)
for all i Œ ⺞n. Moreover, the sets of probability distributions associated with I
and I¢ are the same.
Observe that l¢i defined by Eq. (5.69) is either equal to or greater than li.
Similarly, u¢i defined by Eq. (5.70) is either equal to or smaller than ui. This
means that the probability intervals in I¢ are equal to or narrower than the
corresponding intervals in I. The reachable tuples of probability intervals thus
provide us with a more accurate representation of imprecise probabilities than
those that are not reachable. It is thus reasonable to limit ourselves to only
reachable probability intervals. If the given probability intervals are not reachable, they always can be converted to the reachable ones by Eqs. (5.69) and
(5.70).
Assuming now that a given n-tuple of I of probability intervals is reachable,
the lower and upper probability measures, l and u, are calculated for all
A Œ P(X) by the formulas
Ï
l ( A) = max Ì Â l i , 1 Ó x i ŒA
¸
x i œA
Ï
u( A) = minÌ Â ui , 1 Ó x i ŒA
x i œA
 u ˝˛,
i
¸
 l ˝˛.
i
(5.71)
(5.72)
Values l(A) and u(A) for all A Œ P(X) are thus determined by 2n values li, ui,
i Œ ⺞n.
Imprecise probabilities based on reachable probability intervals are known
to belong to the class of imprecise probabilities that are based on Choquet
capacities of order 2. They are also known to be incomparable with imprecise
probabilities represented by Choquet capacities of order • (i.e., belief and
plausibility measures of DST).
5.5. REACHABLE INTERVAL-VALUED PROBABILITY DISTRIBUTIONS
181
EXAMPLE 5.11. Let X = {x1, x2, x3} and assume that the interval-valued probability distribution
I = [ 0.3, 0.6], [ 0.2, 0.5], [ 0.1, 0.4]
on X is given (elicited from an expert or obtained in some other way). This
distribution is proper, since it satisfies the inequalities (5.61) and (5.62). To
show that it is also reachable, the inequalities (5.67) and (5.68) must be
checked for i = 1, 2, 3:
l1 + l 2 + u3 = 0.3 + 0.2 + 0.4 = 0.9 £ 1,
l1 + l3 + u2 = 0.3 + 0.1 + 0.5 = 0.9 £ 1,
l 2 + l3 + u1 = 0.2 + 0.1 + 0.6 = 0.9 £ 1,
u1 + u2 + l3 = 0.6 + 0.5 + 0.1 = 1.2 ≥ 1,
u1 + u3 + l 2 = 0.6 + 0.4 + 0.2 = 1.2 ≥ 1,
u2 + u3 + l1 = 0.5 + 0.4 + 0.3 = 1.2 ≥ 1.
All the required inequalities are satisfied, and hence, the given I is reachable.
Now, we can calculate l(A) and u(A) for subsets of X that are not singletons
by Eqs. (5.71) and (5.72). The results are shown in Figure 5.7a. For
example:
l ({x 2 , x3 }) = max{l 2 + l3 , 1 - u1} = max{0.3, 0.4} = 0.4
u({x 2 , x3 }) = min{u2 + u3 , 1 - l1} = min{0.9, 0.7} = 0.7.
One of the two measures may, of course, be calculated from the other one by
the duality equation as well.
A geometrical interpretation of the three probability intervals in this
example and the associated set D of probability distributions is shown in
Figure 5.7c. We can see that this set is convex and any of its elements can be
expressed as a linear combination of the six extreme points shown in the figure.
According to Eq. (5.63), this is the maximum number of extreme points for
n = 3. Since l is a Choquet capacity of order 2, these extreme points are also
the probability distributions in the interaction representation of l (introduced
in Section 4.3.3). These latter probability distributions are readily obtained
from the Hasse diagram of the underlying Boolean lattice shown in Figure
5.7b. We can see that the probability distributions associated with the individual maximal chains in the lattice are exactly the same as the extreme points
in Figure 5.7c.
EXAMPLE 5.12. Let X = {x1, x2, x3} and consider the interval-valued probability distribution
I = [ 0.2, 0.5], [ 0.3, 0.4], [ 0.1, 0.2]
182
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
X : 1.0
0.4
A:
x1
x2
x3
l(A)
u(A)
0
1
0
0
1
1
0
1
0
0
1
0
1
0
1
1
0
0
0
1
0
1
1
1
0.0
0.3
0.2
0.1
0.6
0.5
0.4
1.0
0.0
0.6
0.5
0.4
0.9
0.8
0.7
1.0
0.5
{x 1,x 3 } : 0.5
{x2 ,x 3 } : 0.4
{x 1 ,x 2 } : 0.6
0.3
0.3
0.4
0.2
{x 1 } : 0.3
0.2
0.2
: 0.0
(b)
1
p(
.9
.8
.7
.6
.4
.2
.3
.1
p (x 3 )
0
)
x2
1
.2
.3
.4
.5
.6
.7
4
6
5
0 .
1
1
.9
.8
.7
.6
.5
.4
.3
.2
.1
0
3
D
2
.8
.9
1
0.4
{x 3} : 0.1
{x 2 } : 0.2
0.3
(a)
.5
0.6
p(
x1
)
(c)
Figure 5.7. Illustration to Example 5.11.
0.1
183
5.5. REACHABLE INTERVAL-VALUED PROBABILITY DISTRIBUTIONS
on X. This distribution is clearly proper, but it is not reachable. For example,
u2 + u3 + l1 = 0.8 < 1
violates Eq. (5.68). The distribution can be converted to its reachable counterpart, I¢, by Eqs. (5.69) and (5.70). For example,
l1¢ = max{l1 , 1 - u2 - u3 } = max{0.2, 0.4} = 0.4.
By calculating values l¢i and u¢i for all i = 1, 2, 3, we obtain
I ¢ = [ 0.4, 0.5], [ 0.3, 0.4], [ 0.1, 0.2] .
5.5.1. Joint and Marginal Interval-Valued Probability Distributions
Consider sets X = {x1, x2, . . . , xn} and Y = {y1, y2, . . . , ym} of states of two random
variables and let a joint interval-valued probability distribution
I = [l ij , uij ] i Œ ⺞n , j Œ ⺞m
on P(X ¥ Y) be given, which is assumed to be reachable. The joint lower and
upper probability measures l and u, are determined from I via Eqs. (5.71) and
(5.72). The associated marginal measures are defined in the usual way: for
every A Œ X and every B Œ Y,
l X ( A) = l ( A ¥ Y ),
u X ( A) = u( A ¥ Y ),
(5.73)
lY (B) = l (X ¥ B),
uY (B) = u(X ¥ B).
(5.74)
It is well known that the marginal measures obtained by these equations are
exactly the same as those obtained by marginalization of the convex set of
probability distributions associated with I. Moreover, the marginal tuples of
probability intervals,
IX =
IY =
[ X l i , X ui ]
[ X li ,X u j ]
i Œ ⺞n
j Œ ⺞n
are determined from I by the following equations (i Œ ⺞n, j Œ ⺞m):
Ïm
¸
l i = max Ì Â l ij ,1 - Â Â ukj ˝
˛
Ó j=1
k π i j=1
(5.75)
Ïm
¸
ui = minÌ Â uij ,1 - Â Â l kj ˝
˛
Ó j=1
k π i j=1
(5.76)
Ï n
¸
l j = max Ì Â l ij ,1 - Â Â uik ˝
˛
Ó i=1
k π j i=1
(5.77)
m
X
m
X
n
Y
184
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
Ï n
¸
u j = minÌ Â uij , 1 - Â Â uik ˝.
˛
Ó i=1
k π j i=1
n
Y
(5.78)
By comparing the forms of these equations with the forms of Eqs. (5.69) and
(5.70), we can conclude that IX and IY are reachable.
For calculating conditional lower and upper probabilities, the general
concept of conditional monotone measures introduced in Section 4.3.6 can be
applied. See also Note 5.8.
EXAMPLE 5.13. Joint imprecise probabilities on X ¥ Y = {x1, x2} ¥ {y1, y2} are
defined in this example by the interval-valued probability distribution shown
by the bold numbers in Table 5.8a. Since the given intervals are reachable, the
remaining values of lower and upper probabilities in the table, l(C) and u(C),
are calculated from the four intervals by Eqs. (5.71) and (5.72). The associated
marginal lower and upper probabilities are shown in Table 5.8b, 5.8c. They can
be calculated either by Eqs. (5.73) and (5.74) or by Eqs. (5.75)–(5.78). For
example,
l X ({x1}) = l ({x1} ¥ Y ) = l ({ x1 , y1 , x1 , y2 }) = 0.3,
uY ({ y2 }) = u(X ¥ { y2 }) = u({ x1 , y2 , x 2 , y2 }) = 0.8
Table 5.8. Illustration to Example 5.13
C:
x1
y1
x1
y2
x2
y1
x2
y2
l(C)
u(C)
0
1
0
0
0
1
1
1
0
0
0
1
1
1
0
1
0
0
1
0
0
1
0
0
1
1
0
1
1
0
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
1
1
1
0
0
0
0
1
0
0
1
0
1
1
0
1
1
1
1
0.0
0.1
0.2
0.1
0.3
0.3
0.2
0.4
0.3
0.5
0.4
0.5
0.8
0.6
0.7
1.0
0.0
0.3
0.4
0.2
0.5
0.6
0.5
0.7
0.6
0.8
0.7
0.7
0.9
0.8
0.9
1.0
(a)
A:
x1
x2
lX(A)
uX(A)
0
1
0
1
0
0
1
1
0.0
0.3
0.4
1.0
0.0
0.6
0.7
1.0
(b)
B:
y1
y2
lY(B)
uY(B)
0
1
0
1
0
0
1
1
0.0
0.2
0.5
1.0
0.0
0.5
0.8
1.0
(c)
5.6. OTHER TYPES OF MONOTONE MEASURES
185
by using Eqs. (5.73) and (5.74), respectively. Similarly,
X
l1 = l X ({x1}) = max{l11 + l12 , 1 - u21 - u22 } = 0.3
by Eq. (5.75), and
Y
u2 = uY ({ y2 }) = min{u12 + u22 , 1 - l11 - l 21} = 0.8
by Eq. (5.78).
5.6. OTHER TYPES OF MONOTONE MEASURES
The Choquet capacities of various orders introduced in Chapter 4 and the
special types of monotone measures introduced so far in this chapter are the
main types of set-valued functions that have been utilized for representing
imprecise probabilities. The purpose of this section is to survey some
additional types of monotone measures that have been discussed in the
literature and are potentially useful for representing imprecise probabilities
as well.
A large class of monotone measures, referred to as decomposable measures,
is based on the mathematical concepts of triangular norms (or t-norms) and
triangular conorms (or t-conorms). These are pairs of dual binary operations
on the unit interval that are commutative, associative, monotone, and subject
to appropriate boundary conditions. They play an important role in fuzzy set
theory as intersection and union operations on standard fuzzy sets. In this
book, they are introduced in Section 7.3.2.
A monotone measure, um, defined on subsets of a finite set X, is called
decomposable with respect to a given t-conorm u if and only if
u
m ( A » B) = u[ u m ( A), u m (B)]
(5.79)
for each pair of disjoint sets A, B Œ P(X). In decomposable measures, the
degree of uncertainty of the union of any pair of disjoint sets is thus dependent solely on the degrees of uncertainty of the individual sets.
An important consequence of Eq. (5.79) and associativity of t-conorms is
that any decomposable measure is uniquely determined by its values on all
singletons. That is,
u
m ( A) = u m Ê U {x}ˆ = u x ŒA [ u m (x)]
Ë x ŒA
¯
(5.80)
for all A Œ P(X). This is similar to possibility measures and l-measures, which
are in fact special decomposable measures. For possibility measures, the
186
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
t-conorm is the maximum operation; for l-measures, the t-conorm for each
particular value of l is the function
min{1, a + b + lab}.
There are, of course, many other classes of t-conorms or classes of t-norms,
some of which are introduced in Section 7.2.2. Thus, decomposable measures
form a very broad class of monotone measures. Their characteristic feature is
that each measure in this class is fully determined by its values on singletons.
Decomposable measures are thus attractive from the standpoint of computational complexity.
Another important class of monotone measures, which are called kadditive measures, has been discussed in the literature. A monotone measure,
m, is said to be k-additive (k ≥ 1) if its Möbius representation, m, satisfies the
following requirement: m(A) = 0 for all sets A Œ P(X) such that |A| > k and
there exists at least one set A with k elements for which m(A) π 0.
Similar to decomposable measures, k-additive measures were introduced in
an attempt to reduce the computational complexity in dealing with monotone
measures. In general, to define a monotone measure m on P(X) requires that
its values be specified for all subsets of X except ⭋ and X (since m(⭋) = 0 and
m(X) = 1 by axioms of monotone measures). Hence, 2|X| - 2 values must be
specified in general to characterize a monotone measure. For k-additive measures, the number of required values is reduced to Ski=1(|X|
i ), and for decomposable measures it is reduced to |X|. Decomposable measures and k-additive
measures are thus suitable for approximating monotone measures that are difficult to handle. General principles for dealing with these and other approximation problems are discussed in Chapter 9.
NOTES
5.1. The idea of graded possibilities was originally introduced in the context of fuzzy
sets by Zadeh [1978a]. However, the need for a theory of graded possibilities in
economics was perceived by the British economist George Shackle already in the
1940s [Shackle, 1949]. He formalized his ideas in terms of a monotone decreasing measure called a potential surprise [Shackle, 1955, 1961, 1979]. However, this
formalization can be reformulated in terms of standard possibility theory, surveyed in Section 5.2 [Klir, 2002a]. The notion of potential surprise is also examined in [Prade and Yager, 1994] and in some writings by Levi [1984, 1986, 1996].
5.2. The literature on possibility theory is now quite extensive. An early book by
Dubois and Prade [1988a] is a classic in this area. More recent developments in
possibility are covered in a text by Kruse et al. [1994] and in monographs by
Wolkenhauer [1998] and Borgelt and Kruse [2002]. Important sources are also
edited books by De Cooman et al. [1995] and Yager [1982a]. A sequence of three
papers by De Cooman [1997] is perhaps the most comprehensive and general
treatment of possibility theory. Thorough surveys of possibility theory with exten-
NOTES
187
sive bibliographies were written by Dubois et al. [1998, 2000]. A sequence of two
papers by De Campos and Huete [1999] and a paper by Vejnarová [2000] deal
in a comprehensive way with the various issues of independence in possibility
theory. Shafer [1985] discusses possibility measures in the broader context of
Dempster–Shafer theory. An interesting idea of using second-order possibility
distributions in statistical reasoning is explored by Walley [1997]. Constructing
possibility profiles from interval data is addressed by Joslyn [1994, 1997].
5.3. As is explained in Section 5.2, possibility theory is based on pairs of dual monotone measures, each consisting of a necessity measure, Nec, and a possibility
measure, Pos, that are connected via Eq. (5.1). These two measures, whose range
is [0, 1], can be converted by a one-to-one transformation to a single combined
function, C, whose range is [-1, 1]. For each A Œ P(X),
C( A) = Nec(a) + Pos( A) - 1,
(5.81)
and, conversely,
{C0 ( A)
C( A) + 1
Pos( A) = {
1
Nec( A) =
when C( A) £ 0
when C( A) > 0,
(5.82)
when C( A) £ 0
when C( A) > 0.
(5.83)
Positive values of C(A) indicate the degree of confirmation of A by the available
evidence, while its negative values express the degree of disconfirmation of A by
the evidence.
5.4. Sugeno l-measures were introduced by Sugeno [1974, 1977] and were further
investigated by Banon [1981], Wierzchoñ [1982, 1983], Kruse [1982a,b], and Wang
and Klir [1992]. Due to the small number of parameters that characterize any
l-measure (values of the measure on singletons), l-measures are relatively
easy (compared to more general measures) to construct from data.They are often
constructed with the help of neural networks [Wang and Wang, 1997] or genetic
algorithms [Wang et al., 1998].
5.5. Mathematical theory based on belief and plausibility measures, usually known as
evidence theory or Dempster–Shafer theory, was originated and developed by
Glenn Shafer [1976a]. It was motivated by previous work on lower and upper
probabilities by Dempster [1967a,b, 1968a,b], as well as by Shafer’s historical
reflection upon the concept of probability [1978] and his critical examination of
the Bayesian approach to evidence [Shafer, 1981]. Although the book by Shafer
[1976a] is still the best introduction to the theory, other books devoted to the
theory, which are more up-to-date, were written by Guan and Bell [1991–92],
Kohlas and Monney [1995], Kramosil [2001], and edited by Yager et al. [1994].
There are too many articles on the theory to be listed here, but some are included
in Bibliography. Most of them can be found in reference lists of the mentioned
books or in the Special Issue of the International Journal of Intelligent Systems on
the Dempster–Shafer Theory of Evidence (18(1), 2003, pp. 1–148). A broad discussion of the theory is in papers by Shafer [1981, 1982, 1990], Dubois and Prade
188
5.6.
5.7.
5.8.
5.9.
5.10.
5.11.
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
[1986a], Smets [1988, 1998], and Smets and Kennes [1994]. Papers by Delmotte
[2001] and Wilson [2000] deal with algorithmic aspects of the theory. Although
most literature on the Dempster–Shafer theory is restricted to finite sets, this
restriction is not necessary, as is shown by Shafer [1979] and Kramosil [2001].
The Dempster rule of combination is an important ingredient of DST. It is in
some sense a generalization of the Bayes rule in classical probability theory. Both
rules allow us, within their respective domains, to change any prior expression of
uncertainty in the light of new evidence. As the name suggests, the Dempster rule
was first proposed by Dempster [1967a], and it has played a prominent role in
the development of DST by Shafer [1976a]. The rule was critically examined by
Zadeh [1986], and formally investigated by Dubois and Prade [1986b], Harmanec
[1997], Norton [1988], Klawonn and Schwecke [1992], and Hájek [1993]. The
alternative rule of combination expressed by Eq. (5.59) was proposed by Yager
[1987b].
An interesting connection between modal logic and the various uncertainty theories is suggested and examined in papers by Resconi et al. [1992, 1993]. Modal
logic interpretations of belief and plausibility measures on finite sets is studied
in detail by Harmanec et al. [1994] and Tsiporkova et al. [1999], and on infinite
sets by Harmanec et al. [1996]. A modal logic interpretation of possibility theory
is established in a paper by Klir and Harmanec [1994].
Key references in the area of reachable interval-valued probability distributions
are papers by De Campos et al. [1994], Tanaka et al. [2004], and Weichselberger
[2000], and a book by Weichselberger and Pöhlman [1990]. These references
contain proofs of all propositions in Section 5.5. Conditional interval-valued
probability distributions are examined in detail by De Campos et al. [1994].
Bayesian inference based on interval-valued prior probability distributions and
likelihoods is developed in paper by Pan and Klir [1997]. Sgarro [1997] demonstrates that the theory based on reachable interval-valued probability distributions is not comparable with DST; neither of these two theories is more general
than the other one.
Decomposable measures were introduced and studied by Dubois and Prade
[1982b]. They have been further investigated by various authors, among them
Weber [1984] and Pap [1997]. The basis of these measures are triangular norms,
whose most comprehensive coverage is in the monograph by Klement et al.
[2000].
k-Additive measures were introduced by Grabisch [1997a]. In a survey
paper [Grabisch, 2000], which contains additional references to this subject, he
also discusses the applicability of k-additivity to possibility measures (called
k-possibility measures) and the issue of approximating monotone measures by
k-additive measures.
Pairs of nondecreasing functions, p and p̄, from ⺢ to [0, 1] that represent, respectively, lower and upper bounds on¯the unknown probability distribution functions
of random variables on ⺢ were introduced by Williamson and Downs [1990], and
further developed under the name “probability boxes” or “p-boxes” by Ferson
[2002]. Different methods for constructing p-boxes and their discrete approximations in terms of DST are discussed in [Ferson et al., 2003]. Among other references dealing with p-boxes, the most notable are [Ferson and Hajagos, 2004]
and [Regan et al., 2004]. An example of a probability box is shown in Figure 5.8.
NOTES
189
1
0.8
0.6
0.4
0.2
0.5
1
1.5
2
2.5
3
Figure 5.8. Example of a probability box.
5.12. A Dirac measure (see Figure 5.1) is a function m : P(X) Æ {0, 1} defined for each
A Œ P(X) by the formula
m ( A) =
{10
when x0 ŒA
otherwise,
where x0 is a particular (given) element of X. In any situation with no uncertainty,
possibilistic and probabilistic representations collapse to a Dirac measure:
r(x0) = p(x0) = m({x0}) = 1 and r(x) = p(x) = m({x}) = 0 for all x π x0.
5.13. Walley and De Cooman [1999] investigate conditional possibilities in terms of
the behavioral interpretation of possibility measures. They show that the largest
(or the least informative) conditional possibilities within this interpretation are
defined by the formulas
rX Y ( x y) =
r( x, y)
,
r( x, y) + 1 - max{r( x, y), a( y)}
rY X ( y x) =
r( x, y)
,
r( x, y) + 1 - max{r( x, y), b( x)}
where
a( y) = max{rY (z) z ŒY , z œ y},
b( x) = max{rX (z) z ŒX , z œ x},
This paper also reviews literature dealing with conditional possibility measures
and contains all relevant references.
190
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
5.14. A mathematical theory that is closely connected with imprecise probabilities, but
which is beyond the scope of this book, is the theory of random sets. For finite
sets, the theory is fairly easy to understand. Given a finite universal set X, a
random set on X is defined, in general, by a probability distribution function on
nonempty subsets of X. In DST (or in any uncertainty theory subsumed under
DST), this probability distribution function is clearly equivalent to the Möbius
representation (or the basic probability assignment function). When X is an
n-dimensional Euclidean space for some n ≥ 1, the definition of a random set is
considerably more complicated, but it is primarily this domain where the theory
of random sets has its greatest utility.
Random sets were originally conceived in connection with stochastic geometry. They were proposed in the 1970s, independently by two authors, Kendall
[1973, 1974] and Matheron [1975]. Their connection with belief functions of DST
is examined by Nguyen [1978b] and Smets [1992a], and it is also the subject of
several papers in a book edited by Goutsias et al. [1997]. The most comprehensive and up-to-date coverage of the theory of random sets is a recent book by
Molchanov [2005].
EXERCISES
5.1. Determine which of the Möbius representations in Table 5.9 characterize special monotone measures such as necessity measures, probability
measures, l-measures, or belief measures.
5.2. Assume that the Möbius representations in Table 5.9 characterize joint
bodies of evidence on X ¥ Y, where X = {x1, x2} and Y = {y1, y2}, and let
a = ·x1, y1Ò, b = ·x1, y2Ò, c = ·x2, y1Ò, and d = ·x2, y2Ò. Determine the associated marginal bodies of evidence.
5.3. Given noninteractive marginal possibility profiles rX = ·1, 0.8, 0.5Ò and
rY = ·1, 0.7Ò on sets X = {x1, x2, x3} and Y = {y1, y2}, determine the joint
body of evidence in two different ways:
(a) By the rules of possibility theory;
(b) By the rules of DST.
Show visually (as in Figure 5.2) the marginal and joint bodies of
evidence.
5.4. Repeat Exercise 5.3 for the following noninteractive marginal possibility profiles:
(a) rX = ·1, 0.7, 0.2Ò on X = {x1, x2, x3} and rY ·1, 1, 0, 0.4Ò on Y = {y1, y2,
y3, y4}
(b) rX = ·1, 0.7, 0.6, 0.2Ò on X = {x1, x2, x3, x4} and rY = ·1, 0.6Ò on Y = {y1,
y2}
Table 5.9. Möbius Representations Employed in Exercises
A:
a
b
c
d
m1
m2
m3
m4
m5
m6
m7
m8
m9
m10
0
1
0
0
0
1
1
1
0
0
0
1
1
1
0
1
0
0
1
0
0
1
0
0
1
1
0
1
1
0
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
1
1
1
0
0
0
0
1
0
0
1
0
1
1
0
1
1
1
1
0.0
0.0
0.4
0.1
0.5
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.2
0.0
0.0
0.3
0.0
0.0
0.0
0.0
0.0
0.0
0.4
0.0
0.0
0.1
0.00
0.00
0.10
0.20
0.48
0.08
0.00
0.00
0.08
0.02
0.00
0.02
0.02
0.02
0.02
-0.04
0.00
0.40
0.07
0.00
0.10
0.13
0.00
0.20
0.00
0.03
0.00
0.00
0.07
0.00
0.00
0.00
0.00
0.40
0.20
0.00
0.00
0.20
0.20
0.00
0.00
0.20
0.20
-0.20
-0.20
-0.20
-0.20
0.40
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.4
0.0
0.0
0.0
0.0
0.0
0.5
0.0
0.1
0.0
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.6
0.0
0.0
0.0
0.0
0.0
0.1
0.0
0.0
0.0
0.2
0.1
0.0
0.0
0.0
0.0
0.0
0.2
0.1
0.0
0.3
0.0
0.0
0.0
0.1
0.0
0.3
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.5
0.5
0.0
0.0
0.0
0.0
0.0
0.0
0.0
191
192
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
5.5. Determine the Möbius representation, possibility measure, and necessity measure for each of the following possibility profiles defined on
X = {xi | i Œ ⺞n} for appropriate values of n:
(a) r = ·1, 0.8, 0.5, 0.2Ò
(b) r = ·1, 1, 1, 0.7, 0.7, 0.7Ò
(c) r = ·1, 0.9, 0.8, 0.6, 0.3, 0.3Ò
(d) r = ·1, 0.5, 0.4, 0.3, 0.2, 0.1Ò
(e) r = ·1, 1, 0.8, 0.8, 0.5, 0.5, 0.2Ò
Assume in each case that ri = r(xi), i Œ ⺞n.
5.6. Determine the possibility profile, possibility measure, and necessity
measure for each of the following Möbius representations defined on
X = {xi | i Œ ⺞n} for appropriate values of n:
(a) m = ·0, 0, 03, 02, 0, 0.4, 0.1Ò
(b) m = ·0.1, 0.1, 0.1, 0, 0.1, 0.2, 0.2, 0.2Ò
(c) m = ·0, 0, 0, 0, 0.5, 0.5Ò
(d) m = ·0, 0.2, 0.2, 0, 0.3, 0, 0.3Ò
(e) m = ·0.1, 0.2, 0.3, 0.4Ò
(f) m = ·0.4, 0.3, 0.2, 0.1Ò
Assume in each case that mi = m({x1, x2, . . . , xi}), i Œ ⺞n.
5.7. Convert some of the possibility and necessity measures in Exercises 5.5
and 5.6 to the combined function C defined by Eq. (5.81) and convert
them back by Eqs. (5.82) and (5.83).
5.8. Let X = {a, b, c, d, e} and Y = ⺞8. Using a joint possibility profile on
X ¥ Y given in terms of the matrix
È1
Í0
Í0
Í
Í1
ÍÎ.8
0
.7
.5
1
0
0
0
0
1
.9
.3
.6
0
.5
0
.5
1
1
0
1
.2
0
0
0
.7
.4
.4
1
1
1
.1˘
.3 ˙
.5 ˙
˙
.4 ˙
.2 ˙˚
where rows are assigned to elements a, b, c, d, e and columns are assigned
to numbers 1, 2, . . . , 8, determine:
(a) Marginal possibility profiles;
(b) Joint and marginal Möbius representations;
(c) Both conditional possibilities given by Eqs. (5.24) and (5.25);
(d) Hypothetical joint possibly profile based on the assumption of
noninteraction.
EXERCISES
193
5.9. Determine the l-measure ml on X = {x1, x2} from each of the following
values of ml on singletons:
(a) ml ({x1}) = 0.6, ml({x2}) = 0.1
(b) ml ({x1}) = 0.4, ml({x2}) = 0.2
(c) ml ({x1}) = 0.6, ml({x2}) = 0.5
(d) ml ({x1}) = 0.7, ml({x2}) = 0.4
(e) ml ({x1}) = 0.5, ml({x2}) = 0.1
(f ) ml ({x1}) = 0.8, ml({x2}) = 0.7
5.10. Determine the joint l-measure ml on X ¥ Y = {x1, x2} ¥ {y1, y2} from each
of the following values of ml on singletons, which are given in the order
ml({·x1, y1Ò}), ml({·x1, y2Ò}), ml({·x2, y1Ò}), ml({·x2, y2Ò}):
(a) 0.4; 0.66̄; 0.0; 0.1
(b) 0.2; 0.2; 0.1; 0.0
(c) 0.1153; 0.2; 0.1; 0.0
(d) 0.4; 0.5696; 0.7; 0.0
(e) 0.5; 0.0; 0.0; 0.4
(f ) 0.5; 0.0343; 0.0; 0.3
5.11. For some of the l-measures in Exercises 5.9 and 5.10, calculate their dual
measures by:
(a) The duality equation, Eq. (5.35)
(b) Equation (5.38)
5.12. In Exercise 5.10 determine the associated marginal l-measures for each
of the joint l-measures.
5.13. In Exercise 5.12 determine for each pair of marginal l-measures the set
of joint probability measures that dominate (if the marginal l-measures
are lower probability measures) or are dominated (if they are upper
probability measures) by the marginal l-measures.
5.14. In Table 5.9 determine which of the Möbius representations characterize belief measures and convert each of these belief measures to the corresponding plausibility measures and commonality functions.
5.15. Let a, b, c, d in Table 5.9 denote, respectively, elements ·x1, y1Ò, ·x1, y2Ò,
·x2, y1Ò, ·x2, y2Ò of the Cartesian product {x1, x2} ¥ {y1, y2}. For each of the
Möbius representations in the Table 5.9, determine the associated lower
probability function and its marginal lower probability functions.
5.16. For each pair imX, imY of marginal bodies of evidence defined in Table
5.10, determine the joint body of evidence under the assumption that the
marginal bodies are noninteractive.
194
5. SPECIAL THEORIES OF IMPRECISE PROBABILITIES
Table 5.10. Bodies of Evidence in Exercises 5.16 and 5.17
A:
x1
x2
x3
1
mX
2
mX
3
0
1
0
0
1
1
0
1
0
0
1
0
1
0
1
1
0
0
0
1
0
1
1
1
0.0
0.0
0.0
0.0
0.4
0.1
0.5
0.0
0.0
0.3
0.0
0.1
0.2
0.2
0.2
0.0
0.0
0.1
0.2
0.3
0.0
0.2
0.0
0.2
mX
B:
y1
y2
y3
1
2
3
0
1
0
0
1
1
0
1
0
0
1
0
1
0
1
1
0
0
0
1
0
1
1
1
0.0
0.3
0.0
0.0
0.1
0.1
0.1
0.4
0.0
0.0
0.5
0.0
0.0
0.4
0.0
0.1
0.0
0.1
0.1
0.1
0.2
0.2
0.2
0.1
mY
mY
mY
5.17. Assume that the basic probability assignments 1mX and 2mY in Table 5.10
represent evidence obtained from two independent sources. Determine
the combined (aggregated) evidence by using:
(a) The Dempster rule of combination;
(b) The alternative rule of combination expressed by Eq. (5.59).
5.18. Repeat Exercise 5.17 for other pairs of basic probability assignments in
Table 5.10.
5.19. Assume that 1mX, 2mX, 3mX, and alternatively, 1mY, 2mY, 3mY in Table 5.10
represent evidence obtained from three independent sources. Determine
the combined evidence by using:
(a) The Dempster rule of combination;
(b) The alternative rule.
5.20. Show that the Dempster rule of combination is associative so that the
combined evidence does not depend on the order in which the sources
are used.
5.21. Show that the alternative rule of combination expressed by Eq. (5.59) is
not associative so that the combined result depends on the order in
which the sources are used. Can the alternative rule be generalized to
more than two sources to be independent of the order in which the
sources are used?
5.22. Consider the following triples of probability intervals on X = {x1, x2, x3}:
I1 = ·[0.3, 0.4], [0.3, 0.5], [0.3, 0.5]Ò
I2 = ·[0.2, 0.4], [0.4, 0.6], [0.1, 0.2]Ò
I3 = ·[0.5, 0.7], [0.2, 0.4], [0.1, 0.2]Ò
I4 = ·[0.4, 0.6], [0.5, 0.7], [0.2, 0.4]Ò
I5 = ·[0.1, 0.2], [0.3, 0.4], [0.0, 0.3]Ò
I6 = ·[0.2, 0.6], [0.2, 0.6], [0.2, 0.6]Ò
EXERCISES
195
I7 = ·[0.4, 0.5], [0.3, 0.4], [0.2, 0.5]Ò
I8 = ·[0.1, 0.4], [0.1, 0.6], [0.2, 0.5]Ò
For each of these triples, do the following:
(a) Check if the triple is proper to qualify as an interval-valued probability distribution;
(b) Check if the triple is reachable; if it is not reachable, convert it to its
reachable counterpart.
(c) Determine the lower and upper probabilities for each A Œ P(X).
(d) Determine the interaction representation of lower and upper
probabilities and the convex set of probability distributions that are
consistent with them.
5.23. Repeat Exercise 5.22 for the following 4-tuples of probability intervals
on X = {x1, x2, x3, x4}:
I1 = ·[0, 0.3], [0.1, 0.2], [0.3, 0.4], [0.1, 0.4]Ò
I2 = ·[0, 0.1], [0.1, 0.3], [0.2, 0.3], [0.48, 0.52]Ò
I3 = ·[0, 0.3], [0.1, 0.4], [0.2, 0.5], [0.3, 0.6]Ò
I4 = ·[0.2, 0.6], [0.2, 0.5], [0.2, 0.4], [0.2, 0.3]Ò
5.24. Repeat Exercise 5.8c for the conditional possibilities defined in Note
5.13.
6
MEASURES OF
UNCERTAINTY AND
INFORMATION
The mathematical theory of information had come into being when it was
realized that the flow of information can be expressed numerically in the same
way as distance, mass, temperature, etc.
—Alfréd Rényi
6.1. GENERAL DISCUSSION
Each particular formal theory of uncertainty is based on uncertainty functions,
u, that satisfy appropriate axiomatic requirements. A common property of all
uncertainty functions is that they are monotone measures. Their additional
properties are specific to each theory. Uncertainty functions of each theory
share the same special properties. It is useful to represent uncertainty functions pertaining to an uncertainty theory by several distinct forms. These representations are required to be equivalent in the sense that they be uniquely
convertible to one another. The most common representations are the various
lower and upper probability functions and their Möbius representations, which
are extensively discussed in Chapters 4 and 5.
A measure of uncertainty of some type in a particular uncertainty theory
is a functional, U, which for each function u in the theory measures the amount
of uncertainty of the considered type that is embedded in u. Two distinct types
of uncertainty, which emerge from the two classical uncertainty theories,
are nonspecificity and conflict. In the classical theories, they are measured,
Uncertainty and Information: Foundations of Generalized Information Theory, by George J. Klir
© 2006 by John Wiley & Sons, Inc.
196
6.1. GENERAL DISCUSSION
197
respectively, by the Hartley and Shannon functionals. In the various generalizations of the classical uncertainty theories, these two types of uncertainty
coexist and, therefore, we need to determine justifiable generalizations of both
the Hartley and Shannon functionals in each of these generalized uncertainty
theories.
Each functional for measuring uncertainty in a given uncertainty theory
must satisfy some essential requirements expressed in terms of the calculus of
the theory involved. However, due to conceptual similarities between these
various formal expressions, the requirements can be described informally in a
generic form, independent of the various uncertainty calculi.
The following requirements, each expressed here in a generic form, are
commonly those from which axioms for characterizing measures of uncertainty in the various uncertainty theories are drawn:
1. Subadditivity: The amount of uncertainty in a joint representation of
evidence (defined on a Cartesian product) cannot be greater than the
sum of the amounts of uncertainty in the associated marginal representations of evidence.
2. Additivity: The amount of uncertainty in a joint representation of
evidence is equal to the sum of the amounts of uncertainty in the
associated marginal representations of evidence if and only if the
marginal representations are noninteractive according to the rules of
the uncertainty calculus involved.
3. Monotonicity: When evidence can be ordered in the uncertainty theory
employed (as in possibility theory), the relevant uncertainty measure
must preserve this ordering.
4. Continuity: Any measure of uncertainty must be a continuous
functional.
5. Expansibility: Expanding the universal set by alternatives that are not
supported by evidence must not affect the amount of uncertainty.
6. Symmetry: The amount of uncertainty does not change when elements
of the universal set are rearranged.
7. Range: The range of uncertainty is [0, M], where 0 must be assigned to
the unique uncertainty function that describes full certainty and M
depends on the size of the universal set involved and on the chosen unit
of measurement (normalization).
8. Branching/Consistency: When uncertainty can be computed in multiple
ways, all acceptable within the calculus of the uncertainty theory
involved, the results must be the same (consistent).
9. Normalization: A measurement unit defined by specifying what the
amount of uncertainty should be for a particular (and usually very
simple) uncertainty function.
198
6. MEASURES OF UNCERTAINTY AND INFORMATION
10. Coordinate Invariance: When evidence is described within the
n-dimensional Euclidean space (n ≥ 1), the relevant uncertainty
measure must not change under isometric transformations of the
coordinate system.
These common names of the requirements are used consistently in this book
for specific forms of requirements in the various uncertainty theories. When
distinct types of uncertainty coexist in a given uncertainty theory, it is not necessary that these requirements be satisfied by each uncertainty type. However,
they must be satisfied by an overall uncertainty measure, which appropriately
aggregates measures of the individual uncertainty types.
The strongest justification of a functional as a meaningful measure of the
amount of uncertainty of a considered type in a given uncertainty theory is
obtained when we can prove that it is the only functional that satisfies the
relevant axiomatic requirements and measures the amount of uncertainty in
some specific measurement unit. Only some of the listed requirements are
usually needed to uniquely characterize the functional. For example, only
three of the listed requirements (branching, monotonicity, and normalization)
are needed to prove the uniqueness of the Hartley measure (recall
Theorem 2.1).
The purpose of this chapter is to examine generalizations of the Hartley
and Shannon measures of uncertainty in the various theories of imprecise
probabilities introduced in Chapters 4 and 5. The presentation follows, by and
large, chronologically the emergence of ideas and results pertaining to this
area since the early 1980s. Various unifying features of uncertainty measures
are then emphasized in Section 6.11. The presentation is not interrupted by
the many relevant historical and bibliographical remarks, which are covered
in the Notes Section to this chapter.
6.2. GENERALIZED HARTLEY MEASURE FOR
GRADED POSSIBILITIES
It is not surprising that the Hartley measure was first generalized from
classical (crisp) possibilities to graded possibilities. The generalized Hartley
measure for graded possibilities is usually denoted in the literature by U and
it is called U-uncertainty.
The U-uncertainty can be expressed in various forms. A simple form, easy
to understand, is based on the special notation introduced for graded possibilities in Section 5.2.1. In this notation, X = {x1, x2, . . . , xn} and ri denotes for
each i Œ ⺞n the possibility of xi. Elements of X are assumed to be appropriately rearranged so that the possibility profile
r = r1 , r2 , . . . , rn
6.2. GENERALIZED HARTLEY MEASURE FOR GRADED POSSIBILITIES
199
is ordered in such a way that
1 = r1 ≥ r2 ≥ . . . ≥ rn
and rn+1 = 0 by convention. Moreover, set Ai = {x1, x2, . . . , xi} is defined for each
i Œ ⺞n.
Using this simple notation, the U-uncertainty is expressed for each given
possibility profile r by the formula
n
U (r) = Â (ri - ri +1 )log 2 Ai .
(6.1)
i =1
Since, clearly,
n
 (r - r
i +1
i
) = 1,
i =1
the U-uncertainty is a weighted average of the Hartley measure for sets Ai,
i Œ ⺞n, where the weights are the associated differences ri - ri+1 in the given
possibility profile. These differences are, of course, values of the basic
probability assignment function for sets Ai.
Since |Ai| = i and log2|A1| = log2 1 = 0, Eq. (6.1) can be rewritten in the simpler
form
n
U (r) = Â (ri - ri +1 )i
(6.2)
i=2
or, alternatively, in the form
n
U (r) = Â ri log 2
i=2
Ê i ˆ
.
Ë i - 1¯
(6.3)
It follows directly from Eq. (6.3) that U(1r) £ U(2r) for any pair of possibility
profiles defined on the same set and such that 1r £ 2r. The U-uncertainty
thus preserves the ordering of possibility profiles defined on the same set.
Moreover,
0 £ U (r) £ log 2 X
(6.4)
for any possibility profile r on X. The lower and upper bounds are obtained,
respectively, for the smallest and the largest possibility profiles, ·1, 0, . . . , 0Ò
and ·1, 1, . . . , 1Ò.
Although Eqs. (6.1)–(6.3) are convenient for intuitive understanding of the
U-uncertainty, they are based on the assumption that the given possibility
profile is ordered. This assumption makes it difficult to describe the relation-
200
6. MEASURES OF UNCERTAINTY AND INFORMATION
ship between joint, marginal, and conditional U-uncertainties. Fortunately, the
U-uncertainty can also be defined in the following alternative way that does
not require that the given possibility profile be ordered.
Let r = ·r(x)| x Œ XÒ denote a possibility profile on X that is not required
to be ordered and let
a
r = {x Œ X r (x) ≥ a }
(6.5)
for each a Œ [0, 1]. Then,
1
U (r) = Ú log 2 a r da .
(6.6)
0
EXAMPLE 6.1. Consider the possibility profile on ⺞15 that is defined by the
dots in the diagram in Figure 6.1. Applying Eq. (6.6) to this possibility profile,
we obtain
1
Ú log
0
2
a
0.1
0.3
0.4
0
0.6
0.1
0.7
0.3
0.9
0.1
0.4
0.6
0.7
0.9
r da = Ú log 2 15 da + Ú log 2 12 da + Ú log 2 11 da
+ Ú log 2 9 da + Ú log 2 7 da + +Ú log 2 5 da + Ú log 2 3 da
= 0.1 log 2 15 + 0.2 log 2 12 + 0.1 log 2 11 + 0.2 log 2 9
+ 0.1 log 2 7 + 0.2 log 2 5 + 0.1 log 2 3
= 0.39 + 0.72 + 0.35 + 0.63 + 0.28 + 0.46 + 0.16 = 2.99.
1
0.9
0.8
0.7
r (x) 0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
6
7
8
9
10
11 12
x
Figure 6.1. Possibility profile in Example 6.1.
13 14
15
201
6.2. GENERALIZED HARTLEY MEASURE FOR GRADED POSSIBILITIES
The same result can be obtained by ordering the given possibility profile and
applying Eq. (6.2). The ordered possibility profile is
r = 1, 1, 1, 0.9, 0.9, 0.7, 0.7, 0.6, 0.6, 0.4, 0.4, 0.3, 0.1, 0.1, 0.1 .
When these values are applied to Eq. (6.2), we obtain the following nonzero
terms:
U (r) = 0.1 log 2 3 + 0.2 log 2 5 + 0.1 log 2 7 + 0.2 log 2 9
+ 0.1 log 2 11 + 0.2 log 2 12 + 0.1 log 2 15
= 2.99.
6.2.1. Joint and Marginal U-Uncertainties
Assume now that a possibility profile r is defined on a Cartesian product
X ¥ Y, and let
a
r = { x, y Œ X ¥ Y r (x, y) ≥ a }.
(6.7)
Then,
1
U (r) = Ú log 2 a r da ,
(6.8)
0
and the U-uncertainties of the marginal possibility profiles, rX and rY, are
1
U (rX ) = Ú log 2 a rX da ,
(6.9)
0
and
1
U (rY ) = Ú log 2 a rY da ,
(6.10)
rX = {x Œ X rX (x) ≥ a }
(6.11)
rY = { y ŒY rY ( y) ≥ a }
(6.12)
0
where
a
and
a
for all a Œ [0, 1]. It is now fairly easy to show that the U-uncertainty is both
subadditive and additive.
202
6. MEASURES OF UNCERTAINTY AND INFORMATION
Theorem 6.1. Let r be a possibility profile on X ¥ Y, and let rX and rY be the
associated marginal possibility profiles on X and Y, respectively. Then,
U (r) £ U (rX ) + U (rX ),
(6.13)
where the equality is obtained if and only if rX and rY are noninteractive.
Proof
1
U (r) = Ú log 2 a r da
0
1
£ Ú log 2 a rX ¥ a rY da
0
1
= Ú log 2 a rX ◊ a rY da
0
1
1
= Ú log 2 a rX da + Ú log 2 a rY da
0
= U ( rX ) + U ( rX ).
0
It is clear that the equality is obtained if and only if ar = arX ¥ arY for all
a Œ [0, 1], which is the case when marginal possibility profiles are
noninteractive.
䊏
EXAMPLE 6.2. In Figure 6.2a, a joint possibility profile r on X ¥ Y = {x1, x2,
x3} ¥ {y1, y2, y3} is given and the associated marginal possibility profiles rX and
rY are derived from it. In Figure 6.2b, a joint possibility profile, r¢, is derived
x2
x3
yj
r
y1
y2
x1
1.0
0.7
0.2
0.7
0.7
0.0
y1
y2
y3
1.0
0.8
0.9
y3
xi
x1
1.0
x2
x3
0.8
0.9
rX
(a)
rY
x1
1.0
0.7
r'
y1
y2
0.7
1.0
y3
1.0
1.0
xi
rX
x3
yj
rY
0.8
0.9
1.0
0.7
0.7
y1
y2
0.8
0.9
y3
1.0
x2
x1
1.0
x2
x3
0.8
0.9
(b)
Figure 6.2. Joint and marginal possibility profiles (Example 6.2).
0.7
203
6.2. GENERALIZED HARTLEY MEASURE FOR GRADED POSSIBILITIES
from rX and rY under the assumption that they are noninteractive. Ordering
the individual profiles and applying Eq. (6.2), we obtain
U (r) = 0.1 log 2 2 + 0.1 log 2 3 + 0.1 log 2 4 + 0.5 log 2 7 + 0.2 log 2 8 = 2.46,
U (r ¢) = 0.1 log 2 2 + 0.1 log 2 4 + 0.1 log 2 6 + 0.7 log 2 9 = 2.78,
U (rX ) = 0.1 log 2 2 + 0.8 log 2 3 = 1.37,
U (rY ) = 0.3 log 2 2 + 0.7 log 2 3 = 1.41.
We can see that U(r) < U(r¢) = U(rX) + U(rX), which verifies the subadditivity
and additivity of the U-uncertainty.
6.2.2. Conditional U-Uncertainty
Applying the definitions of conditional Hartley measures, expressed by Eqs.
(2.19) and (2.20), to sets ar, arX, and arY, we readily obtain their generalized
counterparts for the U-uncertainty:
U (rX
1
rY ) = Ú log 2
0
1
U (rY rX ) = Ú log 2
0
a
a
r
da ,
rY
(6.14)
a
a
r
da .
rX
(6.15)
It is obvious that these equations can also be rewritten as
U (rX rY ) = U (r) - U (rY ),
(6.16)
U (rY rX ) = U (r) - U (rX ).
(6.17)
Let the generic symbols U(X), U(Y), U(X ¥ Y), U(X | Y), and U(Y | X) now
be used instead of their specific counterparts U(rX), U(rY), U(rX), U(rX | rY), and
U(rY | rX), respectively. This is a common practice in the literature, and is used
in Chapter 2 for the marginal, joint, and conditional Hartley measures and
then again in Chapter 3 for the Shannon entropies. This notation makes it
easier to recognize that marginal, joint, and conditional uncertainties based on
distinct measures of uncertainty are related in analogous ways.
Using the generic symbols, Eqs. (6.16) and (6.17) are rewritten as
U (X Y ) = U (X ¥ Y ) - U (Y ),
(6.18)
U (Y
(6.19)
X ) = U (Y ¥ X ) - U (X ).
Observe that these equations are analogous to Eqs. (2.23) and (2.24) for the
Hartley measure, as well as to Eqs. (3.35) and (3.36) for the Shannon entropy.
204
6. MEASURES OF UNCERTAINTY AND INFORMATION
Observe also that the conditional U-uncertainties can be calculated without
using conditional possibilities.
U-uncertainty analogies of other equations that hold for the Hartley (and
for the Shannon entropy) can be easily derived as well. For example, when the
marginal possibility profiles are noninteractive, we have the equation
U ( X ¥ Y ) = U ( X ) + U (Y )
(6.20)
by Theorem 6.1, which is analogous to Eq. (2.27) for the Hartley measure.
When we combine this equation with Eqs. (6.18) and (6.19), we readily obtain
the equations
U (X Y ) = U (X ),
(6.21a)
U (Y
(6.21b)
X ) = U (Y ),
which are analogous to Eqs. (2.25) and (2.26) for the Hartley measure. Similarly, the inequality
U ( X ¥ Y ) £ U ( X ) + U (Y ),
(6.22)
which is a generic form of Eq. (6.13), is analogous to Eq. (2.30) for
Hartley measure and to Eq. (3.38) for the Shannon entropy. We can also
define information transmission, TU(X, Y), for the U-uncertainty by the
equation
TU ( X , Y ) = U ( X ) + U (Y ) - U ( X ¥ Y ),
(6.23)
in analogy with Eq. (2.31) for the Hartley measure and Eq. (3.34) for the
Shannon entropy.
EXAMPLE 6.3. Consider the joint and marginal possibility profiles in Figure
6.2a. Applying Eqs. (6.18) and (6.19) and utilizing the joint and marginal Uuncertainties calculated in Example 6.2, we have
U (X Y ) = 2.46 - 1.41 = 1.05,
U (Y
X ) = 2.46 - 1.37 = 1.09,
TU (X , Y ) = 1.37 + 1.41 - 2.46 = 0.32.
Similarly, for the possibility profiles in Figure 6.2b, we obtain
U (X Y ) = 2.78 - 1.41 = 1.37[ = U (X )],
U (Y
X ) = 2.78 - 1.38 = 1.41[ = U (Y )],
TU (X , Y ) = 2.78 - 1.37 - 1.41 = 0,
as is expected under the assumption of noninteraction.
6.2. GENERALIZED HARTLEY MEASURE FOR GRADED POSSIBILITIES
205
6.2.3. Axiomatic Requirements for the U-Uncertainty
As any measure of uncertainty, the U-uncertainty must satisfy the axiomatic
requirements listed in Section 6.1. In this case, the requirements must be
expressed in terms of the calculus of graded possibilities. If we consider only
finite sets, then the requirement of coordinate invariance is not relevant.
Hence, we have the following nine specific requirements:
Axiom (U1) Subadditivity. For any joint possibility profile r on X ¥ Y, U(r)
£ U(rX) + U(rY).
Axiom (U2) Additivity. When a joint possibility profile r on X ¥ Y is derived
from noninteractive marginal possibility profiles rX and rY, then U(r) = U(rX)
+ U(rY).
Axiom (U3) Monotonicity. For any pair 1r, 2r of possibility profiles of the
same length, if 1r £ 2r, then U(1r) £ U(2r).
Axiom (U4) Continuity. U is a continuous functional.
Axiom (U5) Expansibility. When components of zero are added to a given
possibility profile, the value of U does not change.
Axiom (U6) Symmetry. If r is a possibility profile and r¢ is a permutation of
r, then U(r) = U(r¢).
Axiom (U7) Range. U(r) Œ [0, M], where the minimum and maximum are
obtained for rmin = ·1, 0, . . . , 0Ò and rmax = ·1, 1, . . . , 1Ò, respectively, and the
value M depends on the cardinality of the universal set and the chosen unit
of measurement (normalization).
Axiom (U8) Branching/Consistency. For every possibility profile r = ·r1, r2,
. . . , rnÒ of any length n,
U (r1 , r2 , . . . , rn ) = U (r1 , r2 , . . . , rk-2 , rk , rk , rk+1 , . . . , rn )
rk-1 - rk
Ê
ˆ
+ (rk-2 - rk )U 1 k-2 ,
, 0 n-k+1
Ë
¯
rk-2 - rk
- (rk-2 - rk )U (1 k-2 , 0 n-k+ 2 )
(6.24)
for each k = 3, 4, . . . , n, where, for any given integer i, 1i denotes a sequence
of i ones and 0i denotes a sequence of i zeroes.
Axiom (U9) Normalization. To define bits as measurement units, it is
required that U(1, 1) = 1.
The branching requirement, which can also be formulated in other forms,
needs some explanation. This requirement basically states that the Uuncertainty must have the capability of measuring possibilistic nonspecificity
in two ways. It can be measured either directly for the given possibility profile
206
6. MEASURES OF UNCERTAINTY AND INFORMATION
or indirectly by adding U-uncertainties associated with a combination of possibility profiles that reflect a two-stage measuring process. In the first stage of
measurement, the distinction between two neighboring components, rk-1 and
rk, is ignored (rk-1 is replaced with rk) and the U-uncertainty of the resulting,
less refined possibility profile is calculated. In the second stage, the U-uncertainty is calculated in a local frame of reference, which is defined by a possibility profile that distinguishes only between the two neighboring possibility
values that are not distinguished in the first stage of measurement. The Uuncertainty calculated in the local frame must be scaled back to the original
frame by a suitable weighting factor. The sum of the two U-uncertainties
obtained by the two stages of measurement is equal to the total U-uncertainty
of the given possibility profile.
The first term on the right-hand side of Eq. (6.24) represents the Uuncertainty obtained in the first stage of measurement. The remaining two
terms represent the second stage, associated with the local U-uncertainty. The
first of these two terms expresses the loss of uncertainty caused by ignoring
the component rk-1 in the given possibility profile, but it introduces some additional U-uncertainty equivalent to the uncertainty of a crisp possibility profile
with k - 2 components. This additional U-uncertainty is excluded by the last
term in Eq. (6.24). That is, the local uncertainty is expressed by the last two
terms in Eq. (6.24).
The meaning of the branching property of the U-uncertainty is illustrated
by Figure 6.3. Four hypothetical possibility profiles involved in Eq. (6.24) are
shown in the figure, and in each of them the local frame involved in the branching is indicated.
It turns out that five of the nine axiomatic requirements are sufficient to
prove the uniqueness of the U-uncertainty: additivity, monotonicity, expansibility, branching, and normalization. The proof, which is quite intricate, is
covered in Appendix A.
6.2.4. U-Uncertainty for Infinite Sets
U-uncertainty for infinite sets is a generalization of the Hartley-like measure
HL (introduced in Section 2.3) for graded possibilities. It is thus reasonable
to denote it by UL. This generalization is obtained directly from Eq. (6.6) by
replacing log2 |ar| with HL(ar). Hence,
1
UL(r) = Ú HL( a r ) da
(6.25)
n
Ï
È n
¸
UL(r) = Ú min Ìlog 2 Í’ (1 + m ( a rit )) + m ( a r ) - ’ (m ( a rit )˝ da ,
t ŒT Ó
Î i =1
˛
i =1
0
(6.26)
0
or, more specifically,
1
where ar is defined in terms of r by Eq. (6.5).
Local
frame
1.0
0.5
rk – 1 – r k
0.0
1
2
3...
k–2
k–1
k
k+1
k+2 . . .
n
Local
frame
1.0
0.5
rk –2 –rk
0.0
1
2
3...
k–2
k–1
k
k+1
k+2 . . .
n
Local
frame
1.0
rk –1 –rk
rk – 2 –rk
0.0
1
2
3...
k–2
k–1
k
k+1
k+2 . . .
n
Local
frame
1.0
0.0
1
2
3...
k–2
k–1
k
k+1
k+2 . . .
n
Figure 6.3. Possibility profiles involved in the branching property.
208
6. MEASURES OF UNCERTAINTY AND INFORMATION
r (x)
1
0.8
0.6
0.4
0.2
0
x
0
2
4
6
8
10
Figure 6.4. Possibility profile in Example 6.4.
EXAMPLE 6.4. Let the range of a real-valued variable x be [0, 10]. Assume
that we are able to predict the value of x only approximately in terms of the
possibility profile
{
r ( x) = max 0, 2( x - 3) - ( x - 3)
2
}
for each x Œ [1, 10]. The graph of this possibility profile is shown in Figure 6.4.
For each a Œ [0, 1], ar is in this case a closed interval [x(a), x̄(a)] of real
numbers. The endpoints of these intervals for all a Œ [0, 1] are functions of a,
which can be obtained by solving the equation
2
2( x - 3) - ( x - 3) = a
for x. The solution is
x1,2 = 4 ± 1 - a .
Clearly, x(a ) = 4 - 1 - a and x (a ) = 4 + 1 - a . Since we deal with a 1dimensional variable, Eq. (6.26) assumes the form
1
UL(r) = Ú log 2 [1 + m ( a r )] da ,
0
where a r = x (a ) - x(a ) = 2 1 - a . Hence,
1
UL(r) = Ú log 2 [1 + 2 1 - a ] da = 1.1887.
0
6.3. GENERALIZED HARTLEY MEASURE IN DEMPSTER–SHAFER THEORY
209
6.3. GENERALIZED HARTLEY MEASURE IN
DEMPSTER–SHAFER THEORY
Once the U-uncertainty was well established as a generalized Hartley measure
for graded possibilities, its further generalization to Dempster–Shafer theory
(DST) became conceptually fairly straightforward. It emerged quite naturally
from two simple facts: (1) the U-uncertainty is a weighted average of the
Hartley measure for all focal subsets; and (2) the weights in this average, which
are expressed by the differences ri - ri+1 in ordered possibility profiles, are
values of the basic probability assignment function. Although the focal subsets
are always nested in the theory of graded possibilities, the concept of the
weighted average of the Hartley measure is applicable to any family of focal
subsets. The generalized Hartley measure, GH, in DST is thus defined by the
functional
GH (m) =
 m( A) log
2
A,
(6.27)
A ŒF
where ·F, mÒ is any arbitrary body of evidence in the sense of DST.
Observe that the functional GH is defined in terms of m while the functional U is defined in terms of r. However, the difference ri - ri+1, in Eq. (6.1)
is clearly equal to m(Ai), which means that the U-uncertainty can be defined
in terms of the basic probability assignment as well.
It is obvious that the GH measure defined by Eq. (6.27) is a continuous
functional, which satisfies the expansibility requirement and whose range is
0 £ GH (m) £ log 2 X .
(6.28)
The lower bound is obtained when all focal subsets are singletons, which
means that m is actually a probability distribution on X. This implies that
GH(m) = 0 for all probability measures. That is, probability measures are fully
specific. We can also see that GH(m) = 1 when m(A) = 1 and |A| = 2, which
means that the units in which the functional GH measures nonspecificity are
bits. For characterizing the functional, this property must be required as a
normalization.
The GH measure is also invariant with respect to permutations of values
of the basic probability assignment function within each group of subsets of
X that have equal cardinalities. This invariance is, in fact, the meaning of the
requirement of symmetry in DST.
The issue of the uniqueness of the GH measure in DST is addressed in
Appendix B.
6.3.1. Joint and Marginal Generalized Hartley Measures
The functional GH defined by Eq. (6.27) is also subadditive and additive, as
is established by the following two theorems.
210
6. MEASURES OF UNCERTAINTY AND INFORMATION
Theorem 6.2. For any joint basic probability assignment function m on X ¥ Y
and its associated marginal functions mX and mY,
GH (m) £ GH (mX ) + GH (mY ).
(6.29)
Proof. Recalling Eqs. (5.54) and (5.55), we have
GH (mX ) =
 m ( A) log A
  m(C ) log
  m(C ) log
 m(C ) log C
2
X
AÕ X
=
2
A
2
CX
A Õ X C A =C X
=
A Õ X C A =C X
=
2
X
.
C Õ X ¥Y
Similarly,
GH (mY ) =
Â
m(C ) log 2 CY .
Â
Â
Â
m(C )(log 2 C X + log 2 CY )
C Õ X ¥Y
Hence,
GH (mX ) + GH (mY ) =
C Õ X ¥Y
=
C Õ X ¥Y
≥
m(C ) log 2 ( C X ◊ CY )
m(C ) log 2 (C )
C Õ X ¥Y
= GH (m).
䊏
Theorem 6.3. Let mX and mY be noninteractive basic probability assignment
functions on subsets of X and Y, respectively, and let m be the associated joint
basic probability assignment function. Then,
GH (m) = GH (mX ) + GH (mY ).
Proof. Recalling Eq. (5.56), we have
GH (m) =
 m(C ) log C
  m ( A) ◊ m
  m ( A) ◊ m
2
C ÕX ¥Y
=
X
Y
(B) log 2 A ¥ B
X
Y
(B) log 2 ( A ◊ B )
A Õ X B ÕY
=
A Õ X B ÕY
(6.30)
211
6.3. GENERALIZED HARTLEY MEASURE IN DEMPSTER–SHAFER THEORY
=
  m ( A) ◊ m (B)(log A + log B )
 m (B)  m ( A) ◊ log A +  m ( A)  m
 m ( A) ◊ log A +  m (B) ◊ log B
X
2
Y
2
A Õ X B ÕY
=
Y
B ÕY
=
2
X
X
AÕX
X
AÕX
2
= GH (mX ) + GH (mY ).
(B) ◊ log 2 B
2
Y
AÕX
Y
B ÕY
B ÕY
䊏
According to the common notational convention in the literature, it is convenient to replace the specific symbols GH(m), GH(mX), and GH(mY) with
their generic counterparts GH(X ¥ Y), GH(X), and GH(Y). The conditional
GH measures, addressed in Section 6.3.3, are then denoted as GH(X | Y) and
GH(Y | X).
6.3.2. Monotonicity of the Generalized Hartley Measure
Bodies of evidence in DST can be partially ordered on the basis of set inclusion. This ordering is then used for defining the requirement of monotonicity
for the GH measure. For any pair ·F1, m1Ò and ·F2, m2Ò of bodies of evidence on
X, ·F1, m1Ò is said to be smaller than or equal to ·F2, m2Ò, which is written as
F1 , m1 £ F2 , m2 ,
if and only if the following requirements are satisfied:
(a) For each A Œ F1, there exists some B Œ F2 such that A 債 B;
(b) For each B Œ F2 there exists some A Œ F1 such that A 債 B;
(c) There exists a function f : P(X) ¥ P(X) Æ [0, 1] such that f(A, B) > 0
implies A 債 B and
m1 ( A) =
Â
f ( A, B)
for each A Õ X ,
(6.31)
for each B Õ X .
(6.32)
B AÕ B
m2 (B) =
Â
f ( A, B)
A AÕB
EXAMPLE 6.5. To illustrate the definition of ordering of bodies of evidence,
especially requirement (c) in the definition, consider the following two bodies
of evidence on X = {x1, x2, x3}:
F1 , m1 :F1 = {{x 2 }, {x1 , x 2 }, {x1 , x3 }}
m1 ({x 2 }) = 0.2, m1 ({x1 , x 2 }) = 0.2, m1 ({x1 , x3 }) = 0.6
F2 , m2 :F2 = {{x1 , x3 }, {x 2 , x3 }, X }
m2 ({x1 , x3 }) = 0.5, m2 ({x 2 , x3 }) = 0.1, m2 (X ) = 0.4.
212
6. MEASURES OF UNCERTAINTY AND INFORMATION
m1 ({x 2 }) = 0.2
m2 ({x 2 , x3 }) = 0.1
x2
x2
m1 ({x1 , x 2 }) = 0.2
m 2 ( X ) = 0 .4
x1
x3
x3
x1
m1 ({x1 , x3 }) = 0.6
m2 ({x1 , x3 }) = 0.5
F1 , m1
F2 , m 2
(a)
A
m1 ( A)
f(A,B)
∆
(b)
{x1}
{x2 }
0.2
{x3 }
{x1 , x2 }
0.2
{x1 , x3 } {x2 , x3 }
0.6
0.5
0.1
0.1
0.2
0.1
X
B
m2 (B)
∆
{x1}
{x 2 }
{x3 }
{x1 , x 2 }
{x1 , x3 }
{x2 , x3 }
X
0.5
0.1
0.4
(c)
Figure 6.5. Illustrations of ordered bodies of evidence in DST (Example 6.5).
These bodies of evidence are also shown graphically in Figure 6.5a and 6.5b,
respectively. To determine whether ·F1, m1Ò £ ·F2, m2Ò, the three requirements,
(a)–(c), for ordering bodies of evidence in DST must be checked. Requirement (a) is trivially satisfied, since X ŒF2. Requirement (b) also satisfied for
{x1, x3} ŒF2, {x1, x3} ŒF1; for {x2, x3} ŒF2, {x2} ŒF1; for X ŒF2, each set in F1 satisfies the condition. Requirement (c) is satisfied by constructing a function f
that satisfies Eqs. (6.31) and (6.32). For the given bodies of evidence, such a
function is shown in the table in Figure 6.5c. For clarity, all zero entries in the
table are left blank.
Using the introduced ordering of bodies of evidence in DST, the monotonicity requirement for the GH measure can be stated as follows:
6.3. GENERALIZED HARTLEY MEASURE IN DEMPSTER–SHAFER THEORY
213
if F1 , m1 £ F2 , m2 , then GH (m1 ) £ GH (m2 ).
As is stated by the following theorem, the functional GH defined by Eq. (6.27)
satisfies this monotonicity.
Theorem 6.4. For any pair of bodies of evidence in DST, ·F1, m1Ò and ·F2, m2Ò,
and for the functional GH defined by Eq. (6.27),
F1 , m1 £ F2 , m2 fi GH (m1 ) £ GH (m2 ).
(6.33)
Proof
 m (A) log
1
2
A =
AÕ X
  f ( A, B) log
  f ( A, B) log
  f (A, B) log
 m (B) log B .
2
A
2
B
2
B
AÕ X B AÕ B
£
AÕ X B AÕ B
=
BÕ X A AÕ B
=
2
䊏
2
BÕ X
6.3.3. Conditional Generalized Hartley Measures
The GH measure defined by Eq. (6.27) is basically a weighted average of the
Hartley measure. Its conditional counterparts should thus be weighted averages of the respective conditional Hartley measures. It is thus meaningful to
define GH(X | Y) and GH(Y | X) by the formulas
GH (X Y ) =
Â
m(C ) log 2
C
,
B
(6.34)
Â
m(C ) log 2
C
,
A
(6.35)
C ÕX ¥Y
GH (Y X ) =
C ÕX ¥Y
where B 債 Y and A 債 X.
Now, Eq. (6.34) can be rewritten as
GH ( X Y ) =
Â
m(C ) log 2 C -
C Õ X ¥Y
Â
m(C ) log 2 B
C Õ X ¥Y
 m(C ) log B
= GH ( X ¥ Y ) - Â Â m(C ) log
= GH ( X ¥ Y ) - Â m (B) log B .
= GH ( X ¥ Y ) -
2
C Õ X ¥Y
B ÕY C Õ X ¥Y
Y
B ÕY
2
2
B
214
6. MEASURES OF UNCERTAINTY AND INFORMATION
Hence,
GH ( X Y ) = GH ( X ¥ Y ) - GH (Y ),
(6.36)
and by a similar derivation,
GH (Y X ) = GH ( X ¥ Y ) - GH ( X ).
(6.37)
These equations verify once more that the relationship among joint, marginal,
and conditional uncertainties is an invariant property of the uncertainty
theories.
6.4. GENERALIZED HARTLEY MEASURE FOR CONVEX SETS OF
PROBABILITY DISTRIBUTIONS
It is desirable to try to generalize the Hartley functional in DST to arbitrary
convex sets of probability distributions (credal sets). This generalized version
of the functional would be defined by the formula
GH (D) =
Âm
D
( A) log 2 A ,
(6.38)
AÕ X
where D is a given set of probability distributions on X. Function mD in this
formula is obtained by the Möbius transformation applied to the lower probability function that is derived from D via Eq. (4.11).
It is easy to show that this functional GH is additive. That is,
GH (D) = GH X (DX ) + GHY (DY ),
(6.39)
when D is derived from DX and DY that are noninteractive, which means that
for all A Œ P(X) and all B Œ P(Y),
mD ( A ¥ B) = mDX ( A) ◊ mDY (B)
(6.40)
and mD(C) = 0 for all C π A ¥ B. The proof of additivity of functional GH in
this generalized case is virtually the same as the proof of additivity of GH in
DST (the proof of Theorem 6.3). Although values mD(A) in Eq. (6.38) may be
negative for some subsets of X, the equation
Âm
D
( A) = 1,
AÕ X
upon which the proof is based, still holds.
It is clear that the GH functional has the proper range, [0, log2|X|],
when the units of measurement are bits: 0 is obtained when D contains a single
6.4. GENERALIZED HARTLEY MEASURE FOR CONVEX SETS
215
probability; log2|X| is obtained when D contains all possible probability
distributions on X, and thus represents total ignorance. The functional GH is
also continuous, symmetric (invariant with respect to permutations of the
probability distributions in D), and expansible (it does not change when components with zero probabilities are added to the probability distributions
in D).
One additional property of the generalized Hartley functional, which is
significant when we deal with credal sets, is its monotonicity with respect to
subsethood relationship between credal sets. This means that for every pair
of credal sets on X, iD and jD, if iD 債 jD then GH(iD) £ GH(jD). This
property, whose proof is available in the literature (Note 6.6), is illustrated by
the following example.
EXAMPLE 6.6. Consider six convex sets of probability distributions on X =
{x1, x2, x3} , which are denoted by iD(i Œ ⺞6) and are defined geometrically
in Figure 6.6. Clearly, 1D 傶 2D 傶 3D 傶 4D and also 3D 傶 5D. However,
6
D is neither a subset nor a superset of any of the other sets. For each set
i
D(i Œ⺞6), the associated lower probability function, im,
– its Möbius representation, im, and the value GH(iD) of the GH measure are shown in Table 6.1.
In conformity with the monotonicity of GH,
GH ( 1D) ≥ GH ( 2 D) ≥ GH ( 3 D) ≥ GH ( 4 D)
and also GH(3D) ≥ GH(5D). Moreover, GH(6D) ≥ GH(iD) for all i Œ ⺞5,
which illustrates that nonspecificity GH(D) does not express the size of D.
While subadditivity of the generalized Hartley functional has been proven
for all uncertainty theories that are subsumed under DST (Theorem 6.2), the
following example demonstrates that the functional is not subadditive for arbitrary convex sets of probability distributions.
EXAMPLE 6.7. Let X = {x1, x2} and Y = {y1, y2} , and let zij = ·xi, yjÒ for all i, j
Œ {1, 2} . Furthermore, let p = ·p11, p12, p21, p22Ò denote joint probability distributions on X ¥ Y, where pij = p(zij). Given the set D of all convex combinations of probability distributions pA = ·0.4, 0.4, 0.2, 0Ò and pB = ·0.6, 0.2, 0, 0.2Ò,
we obtain the associated sets DX = {·0.8, 0.2Ò} and DY = {·0.6, 0.4Ò} of marginal
probability distributions. Clearly, GH(DX) + GH(DY) = 0. The lower probability function, m
– D associated with D and its Möbius representation, mD, are
shown in Table 6.2. Clearly, GH(D) = 0.332. Hence, GH is not subadditive in
this example.
It is easy to determine that the lower probability function m
– D in Example
6.7 is not 2-monotone. It contains eight violations of 2-monotonicity. One of
them is the violation of the inequality
m ({z11 , z12 , z21}) ≥ m ({z11 , z12 }) + m ({z11 , z21}) - m ({z11}).
D
D
D
D
216
6. MEASURES OF UNCERTAINTY AND INFORMATION
0, 0, 1
0, 1/3, 2/3
0, 2/3, 1/3
0, 1, 0
0, 0, 1
1/3, 0, 2/3
1
0, 1/3, 2/3
1/3, 0, 2/3
2/3, 0, 1/3
2
D
1/3, 2/3, 0
2/3, 1/3, 0
1, 0, 0
0, 1, 0
1/3, 2/3, 0
(a)
2/3, 1/3, 0
0, 1/3, 2/3
0, 0, 1
1/3, 0, 2/3
1/6, 1/6, 2/3
0, 1/3, 2/3
5
D
1/3, 0, 2/3
D
4
6
0, 1, 0
1, 0, 0
(b)
0, 0, 1
3
D
0.5, 0.5, 0
1, 0, 0 0, 1, 0
(c)
D
D
0.5, 0.5, 0
1, 0, 0
(d )
Figure 6.6. Closed convex sets of probability distributions on X = {x1, x2, x3} discussed in
Example 6.6.
Indeed, 0.8 ≥/ 0.8 + 0.6 - 0.4. Whether GH is subadditive or not for some less
general credal sets is an open question. However, this question looses its
significance within the context discussed in Section 6.8.
6.5. GENERALIZED SHANNON MEASURE IN
DEMPSTER–SHAFER THEORY
The issue of how to generalize the Shannon entropy from probability theory
to DST has been discussed in the literature since the early 1980s. By and large,
Table 6.1 Lower Probability Functions and Their Möbius Representations for the Convex Sets of Probability Distributions on X = {x1, x2, x3}
Defined in Figure 6.6 and the Associated Values of the Generalized Hartley Measure GH (Example 6.6)
x1
x2
1
x3
1
A:
0
1
0
0
1
1
0
1
0
0
1
0
1
0
1
1
GH(iD)
0
0
0
1
0
1
1
1
2
D
1
m
¯
0
0
0
0
1/3
1/3
1/3
1
m
0
0
0
0
1/3
1/3
1/3
0
1
3
D
2
2
m
¯
0
0
0
0
1/3
1/3
1/3
1
m
0
0
0
0
1/3
1/3
1/3
0
1
4
D
3
3
m
¯
0
0
0
0
1/3
0.5
0.5
1
m
0
0
0
0
1/3
0.5
0.5
-1/3
0.805
5
D
4
4
m
¯
0
1/6
1/6
0
1/3
0.5
0.5
1
m
0
1/6
1/6
0
0
1/3
1/3
0
2/3
6
D
5
D
5
m
¯
0
0
0
2/3
1/3
2/3
2/3
1
m
0
0
0
2/3
1/3
0
0
0
1/3
6
6
m
¯
0
0
0
0
1
0
0
1
m
0
0
0
0
1
0
0
0
1
217
218
6. MEASURES OF UNCERTAINTY AND INFORMATION
Table 6.2. Lower Probability Function mD in Examples 6.7 and 6.13
¯
A:
z11
z12
z21
z22
0
1
0
0
0
1
1
1
0
0
0
1
1
1
0
1
0
0
1
0
0
1
0
0
1
1
0
1
1
0
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
1
1
1
0
0
0
0
1
0
0
1
0
1
1
0
1
1
1
1
mD(A)
¯
0.0
0.4
0.2
0.0
0.0
0.8
0.6
0.4
0.2
0.4
0.2
0.8
0.8
0.6
0.4
1.0
mD(A)
0.0
0.4
0.2
0.0
0.0
0.2
0.2
0.0
0.0
0.2
0.2
-0.2
-0.2
-0.2
-0.2
0.4
the prospective generalized Shannon entropy was viewed in these discussions
as a measure of conflict among evidential claims in each given body of evidence in DST. This view was inspired by the recognition that the Shannon
entropy itself is a measure of conflict associated with each given probability
distribution (recall the discussion of Eq. (3.25) in Chapter 3).
Although several intuitively promising functionals have been proposed as
candidates for the generalized Shannon entropy in DST, each of them was
found upon closer scrutiny to violate some of the essential requirements for
uncertainty measures. In most cases, it was the requirement of subadditivity,
perhaps the most fundamental requirement, that was violated. The efforts to
determine a justifiable generalization of the Shannon entropy in DST have
thus been unsuccessful. The reasons for this failure, which are now understood,
are explained later in this section.
Although none of the functionals proposed as a prospective generalization
of the Shannon entropy in DST is acceptable on mathematical grounds, it
seems useful to present their overview for at least two reasons: (1) ideas are
always better understood when we are familiar with the history of their development; and (2) knowing which of the promising candidates have failed will
avoid their “reinventing.”
In the following overview of the unsuccessful attempts to generalize the
Shannon entropy to DST, no references are made. The relevant references are
all given in Note 6.7.
Two of the candidates for the generalized Shannon entropy in DST were
proposed in the early 1980s. One of them is functional E defined by the
formula
6.5. GENERALIZED SHANNON MEASURE IN DEMPSTER–SHAFER THEORY
E(m) = -
 m( A) log
2
Pl ( A),
219
(6.41)
A ŒF
which is usually called a measure of dissonance. The other one is functional C
defined by the formula
C (m) = -
 m( A) log
2
Bel ( A),
(6.42)
A ŒF
which is referred to as a measure of confusion. It is obvious that both of these
functionals collapse into the Shannon entropy when m defines a probability
measure.
To decide if either of the two functionals is an appropriate generalization
of the Shannon entropy in DST, we have to determine what these functionals
actually measure. From Eq. (5.47) and the general property of basic probability assignments (satisfied for every A Œ P(X)),
Â
m(B) +
A « B=∆
Â
m(B) = 1,
(6.43)
A « Bπ ∆
we obtain
E(m) = -
 m( A) log
2
A ŒF
Ê1 Ë
m(B)ˆ .
¯
A « B=∆
Â
(6.44)
The term
K ( A) =
Â
m(B)
A « B =∆
in Eq. (6.44) represents the total evidential claim pertaining to focal elements
that are disjoint with the set A. That is, K(A) expresses the sum of all evidential claims that fully conflict with the one focusing on the set A. Clearly,
K(A) Œ [0, 1]. The function
- log 2 [1 - K ( A)],
which is employed in Eq. (6.44), is monotonic increasing with K(A), and
extends its range from [0, 1] to [0, •). The choice of the logarithmic function
is motivated in the same way as in the classical case of the Shannon entropy.
It follows from these facts and the form of Eq. (6.44) that E(m) is the mean
(expected) value of the conflict among evidential claims within a given
body of evidence ·F, mÒ; it measures the conflict in bits and its range is
[0, log2|X|].
Functional E is not fully satisfactory since we feel intuitively that m(B) may
conflict with m(A) not only when B « A = ∆. This broader view of conflict is
220
6. MEASURES OF UNCERTAINTY AND INFORMATION
expressed by the measure of confusion C given by Eq. (6.42). Let us demonstrate this fact.
From Eq. (5.46) and the general property of basic assignments (satisfied for
every A ŒP(X)),
 m(B) +  m(B) = 1
(6.45)
Ê
ˆ
C (m) = - Â m( A) log 2 Á 1 - Â m(B)˜ .
Ë B債 A
¯
A ŒF
(6.46)
BÕ A
B債 A
we get
The term
L( A) =
 m(B)
B債 A
in Eq. (6.46) expresses the sum of all evidential claims that conflict with
the one focusing on the set A according to the following broader view of
conflict: m(B) conflicts with m(A) whenever B À A. The reason for using the
¯
function
- log 2 [1 - L( A)]
instead of L(A) in Eq. (6.46) is the same as already explained in the context
of functional E. The conclusion is that C(m) is the mean (expected) value of
the conflict, viewed in the broader sense, among evidential claims within a
given body of evidence ·F, mÒ.
Functional C is also not fully satisfactory as a measure of conflicting
evidential claims within a body of evidence, but for a different reason than
functional E. Although it employs the broader, and more satisfactory, view
of conflict, it does not properly scale each particular conflict of m(B) with
respect to m(A) according to the degree of violation of the subsethood
relation B 債 A. It is clear that the more this subsethood relation is violated,
the greater the conflict. In addition, neither E nor C satisfy the essential
axiomatic requirement of subadditivity.
To overcome the deficiencies of functionals E and C as adequate measures
of conflict in evidence theory, a new functional, D, was proposed in the early
1990s:
B-A ˆ
Ê
D(m) = - Â m( A) log 2 Á 1 - Â m(B)
˜.
Ë
B ¯
A ŒF
B ŒF
Observe that the term
(6.47)
6.5. GENERALIZED SHANNON MEASURE IN DEMPSTER–SHAFER THEORY
Con( A) =
 m(B)
B ŒF
B-A
B
221
(6.48)
in Eq. (6.47) expresses the sum of individual conflicts of evidential claims with
respect to a particular set A, each of which is properly scaled by the degree to
which the subsethood B 債 A is violated. This conforms to the intuitive idea of
conflict that emerged from the critical reexamination of functionals E and C.
Clearly, Con(A) Œ [0, 1] and, furthermore,
K ( A) £ Con( A) £ L( A).
(6.49)
The reason for using the function
- log 2 [1 - Con( A)]
instead of Con in Eq. (6.47) is exactly the same as previously explained in the
context of functional E. This monotonic transformation extends the range of
Con(A) from [0, 1] to [0, •).
Functional D, which is called a measure of discord, is clearly a measure of
the mean conflict (expressed by the logarithmic transformation of function
Con) among evidential claims within each given body of evidence. It follows
immediately from Eq. (6.49) that
E (m) £ D(m) £ C (m).
(6.50)
Observe that |B - A| = |B| - |A « B| and, consequently, Eq. (6.47) can be
rewritten as
D(m) = -
 m( A) log  m(B)
2
A ŒF
B ŒF
A« B
.
B
(6.51)
It is obvious that
Bel ( A) £
 m(B)
B ŒF
A« B
£ Pl ( A).
B
(6.52)
Although functional D is intuitively more appealing than functionals E and
C, further examination revealed that it has a conceptual defect. To explain the
defect, let sets A and B in Eq. (6.47) be such that A Ã B. Then, according to
function Con, the claim m(B) is taken to be in conflict with the claim m(A) to
the degree |B - A|/|B|. This, however, should not be the case: the claim focusing on B is implied by the claim focusing on A (since A Ã B) and, hence, m(B)
should not be viewed in this case as contributing to the conflict with m(A).
Consider, as an example, incomplete information regarding the age of a
person, say Joe. Assume that the information is expressed by two evidential
222
6. MEASURES OF UNCERTAINTY AND INFORMATION
claims pertaining to the age of Joe: “Joe is between 15 and 17 years old” with
degree m(A), where A = [15, 17], and “Joe is a teenager” with degree m(B),
where B = [13, 19]. Clearly, the weaker second claim does not conflict with the
stronger first claim.
Now assume that A … B. In this case, the situation is reversed: the claim
focusing on B is not implied by the claim focusing on A and, consequently,
m(B) does conflict with m(A) to a degree proportional to number of elements
in A that are not covered by B. This conflict is not captured by function Con,
since |B - A| = 0 in this case.
It follows from these observations that the total conflict of evidential claims
within a body of evidence ·F, mÒ with respect to a particular claim m(A) should
be expressed by functional
CON ( A) =
A-B
A
 m(B)
B ŒF
(6.53)
rather than functional Con given by Eq. (6.48). Replacing Con(A) in Eq. (6.47)
with CON(A), we obtain a new function, which is better justified as a measure
of conflict in DST than the functional D. This new functional, which is called
strife and is denoted by ST, is defined by the formula
A-B
Ê
ST (m) = - Â m( A) log 2 Á 1 - Â m(B)
Ë
A
A ŒF
B ŒF
ˆ
˜.
¯
(6.54)
It is trivial to convert this form into a simpler one,
ST (m) = - Â m( A) log 2
A ŒF
 m(B)
B ŒF
A «B
,
A
(6.55)
where the term |A « B|/|A| expresses the degree of subsethood of set A in set
B. Equation (6.55) can also be rewritten as
ST (m) = GH (m) -
 m( A) log  m(B) A « B ,
2
A ŒF
(6.56)
B ŒF
where GH is the generalized Hartley measure defined by Eq. (6.27). Furthermore, introducing
Z (m) =
 m( A) log  m(B) A « B ,
2
A ŒF
(6.57)
B ŒF
we have
ST (m) = GH (m) - Z (m).
(6.58)
6.5. GENERALIZED SHANNON MEASURE IN DEMPSTER–SHAFER THEORY
223
It was later shown that the distinction between strife and discord reflects
the distinction between disjunctive and conjunctive set–valued statements,
respectively.
To describe this distinction, consider statements of the form “x is A,” where
A is a subset of a given universal set X and x Œ X. We assume the framework
of DST, in which the evidence supporting this proposition is expressed by the
value m(A) of the basic probability assignment. The statement may be interpreted either as a disjunctive set–valued statement or a conjunctive set–valued
statement.
A statement “x is A” is disjunctive if it means that x conforms to one of the
elements in A. For example “Mary is a teenager” is disjunctive because it
means that the real age of Mary conforms to one of the values in the set
{13, 14, 15, 16, 17, 18, 19} . Similarly “John arrived between 10:00 and 11:00
a.m.” is disjunctive because it means that John’s real arrival time was one value
in the time interval between 10:00 and 11:00 a.m.
A statement “x is A” is conjunctive if it means that x conforms to all of the
elements in A. For example, the statement “The compound consists of iron,
copper, and aluminium” is conjunctive because it means that the compound
in question conforms to all the elements in the set {iron, copper, aluminium}.
Similarly, “John was in the doctor’s office from 10:00 a.m. to 11:00 a.m.” is
conjunctive because it means that John was in the doctor’s office not only at
one time during the time interval, but all the time instances during the time
interval between 10:00 and 11:00 a.m.
Let SA and SB denote, respectively, the statements “x is A” and “x is B.”
Assume that A Ã B and the statements are disjunctive. Then, clearly, SA implies
SB and, consequently, SB does not conflict with SA while SA does conflict
with SB. For example, the statement SB: “Mary is a teenager” does not
conflict with the statement SA: “Mary is fifteen or sixteen,” while SA conflicts
with SB.
Let SA and SB be conjunctive and assume again that A Ã B. Then, clearly,
SB implies SA and, consequently, SA does not conflict with SB, while SB does
conflict with SA. For example, the statement SB: “Steel is a compound of iron,
carbon, and nickel” does conflict with the statement SA: “Steel is a compound
of iron and carbon,” while SA does not conflict with SB in this case.
This examination clearly shows that the measure of strife ST (Eq. (6.55))
expresses the conflict among disjunctive statements, while the measure of
discord D (Eq. (6.51)) expresses the conflict among conjunctive statements.
Since DST and possibility theory usually deal with disjunctive statements, the
measure of strife is a better justified measure of conflict (or entropy-like
measure) in DST and possibility theory.
It is reasonable to conclude that functional ST is well justified on intuitive
grounds as a measure of conflict among evidential claims in DST when disjunctive statements are employed. Similarly, functional D is a well justified
measure of conflict in evidence theory when conjunctive statements are
employed. Unfortunately, neither of these functionals is subadditive.
224
6. MEASURES OF UNCERTAINTY AND INFORMATION
After the functionals ST and D were rejected as generalizations of the
Shannon entropy due to their violation of subadditivity, the next idea was to
explore the sums of ST + GH and D + GH. Unfortunately, these sums were
found to violate the requirement of subadditivity as well. Specific counterexamples demonstrating these violations are examined in the following example.
EXAMPLE 6.8. Let Z = X ¥ Y, where X = {x1, x2} and Y = {y1, y2} , and let
zij = ·xi, yjÒ (i, j = 1, 2). The joint body of evidence specified visually in Figure
6.7a is an example for which the functional ST + GH is not subadditive. All
numbers in the figures are values of the basic probability assignment functions
for the indicated focal elements. For this joint body of evidence and its associated marginal bodies of evidence (also shown in Figure 6.7a), we can immediately see that
GH (mX ) + GH (mY ) - GH (m) = 0.
For ST, we obtain:
ST (mX ) = ST (mY ) = =-
2
5
log 2 ,
3
6
1
Ê 1 2ˆ 2
Ê 1 1 2ˆ
log 2
+
- log 2
◊ +
Ë
¯
Ë 3 2 3¯
3
3 3
3
1
Ê1 1 1
ST (m) = -2 ◊ log 2
+ ◊ +
Ë3 3 2
3
2
5 1
2
= - log 2 - log 2 .
3
6 3
3
1ˆ 1
Ê 1 1 1ˆ
- log 2 2 ◊ ◊ +
¯
Ë 3 2 3¯
3
3
That is,
ST (mX ) + ST (mY ) - ST (m) =
1
Ê 24 ˆ
log 2
.
Ë 25 ¯
3
which is a negative number. This means that the functional ST + GH violates
the requirement of subadditivity. Similarly, the joint body of evidence in Figure
6.7b is an example of where the functional D + GH violates subadditivity. We
can easily see that
GH (mX ) + GH (mY ) - GH (m) = 0
and
D(mX ) + D(mY ) - D(m) =
which again is a negative number.
1
log 2 0.96,
3
6.5. GENERALIZED SHANNON MEASURE IN DEMPSTER–SHAFER THEORY
1/3
1/3
z11
z 21
1/3
1/3
1/3
z12
x1
z 22
x2
1/3
z 21
y2
y1
2/3
2/3
x1
z 22
x2
1/3
y2
1/3
(b)
1–a
1
z12
x1
z 21
z 22
x2
y1
y2
z11
z12
1/3
(a)
a
2/3
z11
2/3
y1
1/3
225
1
(c)
Figure 6.7. Counterexamples demonstrating the violation of subadditivity: (a) for ST + GH;
(b) for D + GH; (c) for AS + GH (Examples 6.8 and 6.9).
One additional prospective functional for generalized Shannon entropy in
DST, quite different from the other ones, was also suggested in the literature.
This functional, denoted here by AS, is defined as the average Shannon
entropy for all probability distributions in the interaction representation (discussed in Section 4.3.3). That is,
226
6. MEASURES OF UNCERTAINTY AND INFORMATION
AS (m) =
1
 S(p),
X ! p ŒP
(6.59)
where P denotes the set of all probability distributions in the interaction representation of m, and S denotes the Shannon entropy.
Although initially quite promising, functional AS was found deficient in the
same way as the other candidates: neither AS nor AS + GH is subadditive, as
demonstrated by the following example.
EXAMPLE 6.9. Considering the family of bodies of evidence specified in
Figure 6.7c, where a Œ [0, 1], we have:
AS (mX ) = AS (mY ) = 0,
GH (mX ) = GH (mY ) = 1,
AS (m) = [ - a log 2 a - (1 - a) log 2 (1 - a)] 4 ,
GH (m) = a log 2 3 + 2 - 2 a.
For the subadditivity of GH + AS, the difference
D = (GH X + GHY + AS X + ASY ) - (GH + AS )
= [ a log 2 a + (1 - a) log 2 (1 - a)] 4 + 2 a - a log 2 3
is required to be nonnegative for values a Œ [0, 1]. However, D is negative
in this case for any value a Œ [0, 0.58] and it reaches its minimum, D = -0.1, at
a = 0.225.
The long, unsuccessful, and often frustrating search for the generalized
Shannon measure in DST was replaced in the early 1990s with the search for
a justifiable aggregate measure of uncertainty, capturing both nonspecificity
and conflict. An aggregated measure that possesses all the required mathematical properties was eventually found, but not as a composite of measures
of uncertainty of the two types. This measure is examined in the next section.
6.6. AGGREGATE UNCERTAINTY IN DEMPSTER–SHAFER THEORY
Let AU denote a functional by which the aggregate uncertainty ingrained in
any given body of evidence (expressed in terms of DST) can be measured.
This functional is supposed to capture, in an aggregated fashion, both nonspecificity and conflict—the two types of uncertainty that coexist in DST. The
functional AU may be expressed in terms of belief measures, plausibility measures, or basic probability assignments. Choosing, for example, belief measures,
the functional has the following form
AU :B Æ [ 0, •),
6.6. AGGREGATE UNCERTAINTY IN DEMPSTER–SHAFER THEORY
227
where B is the set of all belief measures. Since there are one-to-one mappings
between corresponding belief measures, plausibility measures, and basic probability assignments, the domain of this functional AU expressed in this form
may be reinterpreted in terms of the corresponding plausibility measures or
basic probability assignments.
To qualify as a meaningful measure of aggregate uncertainty in DST, the
functional AU must satisfy the following requirements:
(AU1) Probability Consistency. Whenever Bel defines a probability
measure (i.e., all focal subsets are singletons), AU assumes the form
of the Shannon entropy
AU (Bel ) = - Â Bel ({x}) log 2 Bel ({x}).
x ŒX
(AU2) Set Consistency. Whenever Bel focuses on a single set (i.e., m(A) =
1 for some A 債 X), AU assumes the form of the Hartley measure
AU (Bel ) = log 2 A .
(AU3) Range. The range of AU is [0, log2 |X|] when Bel is defined on P(X)
and AU is measured in bits.
(AU4) Subadditivity. If Bel is an arbitrary joint belief function on X ¥ Y
and the associated marginal belief functions are BelX and BelY, then
AU (Bel ) £ AU (Bel X ) + AU ( BelY ).
(AU5) Additivity. If Bel is a joint belief function on X ¥ Y, and the marginal belief functions BelX and BelY are noninteractive, then
AU (Bel ) = AU (Bel X ) + AU (BelY ).
A measure of aggregate uncertainty in DST that satisfies all these requirements was conceived by several authors in the early 1990s (see Note 6.8). The
measure is defined as follows.
Given a belief measure Bel on the power set of a finite set X, the aggregate
uncertainty associated with Bel is measured by the functional
Ï
¸
AU (Bel ) = maxÌ- Â px log 2 px ˝,
PBel Ó
˛
x ŒX
(6.60)
where the maximum is taken over all probability distributions that dominate
the given belief measure. Thus, PBel in Eq. (6.60) consists of all probability distributions ·px | x Œ XÒ that satisfy the constraints
228
6. MEASURES OF UNCERTAINTY AND INFORMATION
(a) px Œ [0, 1] for all x Œ X and
(b) Bel ( A) £
Âp
Âp
x
= 1.
x ŒX
x
for all A 債 X.
x ŒA
The following five theorems establish that the functional AU defined by Eq.
(6.60) satisfies the requirements (AU1)–(AU5). It is thus a well-justified
measure of aggregate uncertainty in DST. Although its uniqueness is still an
open problem, it is known that the function AU is the smallest measure of
aggregate uncertainty in DST among all other measures (if they exist).
Theorem 6.5. The measure AU is probability consistent.
Proof. When Bel is a probability measure, all focal elements are singletons
and this implies that Bel({x}) = px = Pl({x}) for all x Œ X. Hence, the maximum
is taken over a single probability distribution ·px | x Œ XÒ and AU(Bel) is equal
to the Shannon entropy of this distribution.
䊏
Theorem 6.6. The measure AU is set consistent.
Proof. Let m denote the basic probability assignment corresponding to
function Bel in Eq. (6.60). By assumption of set consistency, m(A) = 1 for some
A 債 X, and this implies that m(B) = 0 for all B π A (including B Ã A). This
means that every probability distribution that sums to one for elements x in
A and is zero for all x not in A is consistent with Bel. It is well known that the
uniform probability distribution maximizes the entropy function and, hence,
the uniform probability distribution on A will maximize AU. That is,
AU (Bel ) = - Â
x ŒX
1
1
log 2
= log 2 A .
A
A
䊏
Theorem 6.7. The measure AU has a range [0, log2|X|].
Proof. Since [0, log2|X|] is the range of the Shannon entropy for any
probability distribution on X the measure AU cannot be outside these
bounds.
䊏
Theorem 6.8. The measure AU is subadditive.
Proof. Let Bel be a function on X ¥ Y and let ·p̂xy | ·x, yÒ Œ X ¥ YÒ denote a
probability distribution for which
AU (Bel ) = - Â
 pˆ
x ŒX y ŒY
and
xy
log 2 pˆ xy
6.6. AGGREGATE UNCERTAINTY IN DEMPSTER–SHAFER THEORY
Bel ( A) £
Â
229
pˆ xy
( x ,y ) ŒA
for all A 債 X ¥ Y (this must be true for p̂xy to dominate Bel). Furthermore, let
pˆ x =
 pˆ
 pˆ
pˆ y =
and
xy
y ŒY
xy
.
x ŒX
Using Gibbs’ inequality in Eq. (3.28), we have
-Â
 pˆ
xy
  pˆ log (pˆ ◊ pˆ )
= - Â pˆ log pˆ - Â pˆ log
log 2 pˆ xy £
x ŒX y ŒY
2
xy
x
y
x ŒX y ŒY
2
x
x
y
x ŒX
2
pˆ y .
y ŒY
Observe that, for all A 債 X
Bel X ( A) = Bel ( A ¥ Y ) £
  pˆ
=
xy
x ŒX y ŒY
 pˆ
x
x ŒA
and, analogously, for all B 債 Y,
BelY (B) £
 pˆ .
y
y ŒB
Considering all these facts, we conclude
AU (Bel ) = - Â
 pˆ
xy
log 2 pˆ xy
x ŒX y ŒY
£-
 pˆ
x
log 2 pˆ x -
x ŒX
 pˆ
y
log 2 pˆ y
y ŒY
£ AU (Bel X ) + AU (BelY ).
䊏
Theorem 6.9. The measure AU is additive
Proof. By subadditivity we know that AU(Bel) £ AU(BelX) + AU(BelY). When
Bel is noninteractive we need to prove the reverse, AU(Bel) ≥ AU(BelX) +
AU(BelY), to conclude that the two quantities must be equal and that AU is
additive.
Let Bel be a noninteractive belief function. Let ·p̂x | x Œ XÒ be the probability distribution for which
AU (Bel X ) = - Â pˆ x log 2 pˆ x
x ŒX
and
230
6. MEASURES OF UNCERTAINTY AND INFORMATION
Bel ( A) £ Â pˆ x
x ŒA
for all A 債 X; similarly, let ·p̂y | y Œ YÒ be the probability distribution for
which
AU (BelY ) = - Â pˆ y log 2 pˆ y
y ŒY
and
BelY (B) £ Â pˆ y ,
y ŒB
for all B 債 Y. Define p̂xy = p̂x · p̂y for all ·x, yÒ Œ X ¥ Y. Clearly, p̂xy is a probability distribution on X ¥ Y. Moreover, for all C 債 X ¥ Y,
Â
pˆ xy =
x ,y ŒC
 pˆ ◊ pˆ
 pˆ  pˆ
 pˆ  m (B)
{ }
  pˆ ◊ m (B)
{ }
  m (B) ◊ pˆ
{ }
 m (B)  m ( A)
 m ( A) ◊ m (B).
x
y
x ,y ŒC
=
x
x ŒC X
≥
x
Y
B x ¥ B ÕC
x ŒC X
=
y
x ,y ŒC
x
Y
x ŒC X B x ¥ B Õ C
=
Y
x
B Õ AY x x ¥ B Õ C
≥
Y
X
B Õ AY
=
A ¥ B ÕC
X
Y
A ¥ B ÕC
This implies that
  pˆ
= - Â pˆ log
AU (Bel ) ≥ -
xy
x ŒX y ŒY
x
x ŒX
2
log 2 pˆ xy
pˆ x - Â pˆ y log 2 pˆ y
y ŒY
= AU (Bel X ) + AU (BelY ).
䊏
6.6.1. General Algorithm for Computing the Aggregated Uncertainty
Since functional AU is defined in terms of the solution to a nonlinear optimization problem, its practical utility was initially questioned. Fortunately, a
relatively simple and fully general algorithm for computing AU was developed. The algorithm is formulated as follows.
Algorithm 6.1. Calculating AU from a belief function.
Input. A frame of discernment X and a belief measure Bel on subsets of X.
6.6. AGGREGATE UNCERTAINTY IN DEMPSTER–SHAFER THEORY
231
Output. AU(Bel), ·px | x Œ XÒ such that AU (Bel ) = -  p x log 2 p x, pi ≥ 0,
Â
x ŒX
p x = 1, and Bel ( A) £
Â
x ŒX
p x for all ⭋ π A 債 X.
x ŒA
Step 1. Find a nonempty set A 債 X, such that Bel(A)/|A| is maximal. If
there are more such sets A than one, take the one with maximal
cardinality.
Step 2. For x Œ A, put px = Bel(A)/|A|.
Step 3. For each B 債 X - A, put Bel(B) = Bel(B » A) - Bel(A).
Step 4. Put X = X - A.
Step 5. If X π ⭋ and Bel(X) > 0, then go to Step 1.
Step 6. If Bel(X) = 0 and X π ⭋, then put px = 0 for all x Œ X.
Step 7. Calculate AU (Bel ) = - Â px log 2 px .
x ŒX
A proof that Algorithm 6.1 terminates after a finite number of steps, and
that it produces the correct result (the maximum of the Shannon entropy
within the set of probability distributions that dominate a given belief function) is covered in Appendix C.
EXAMPLE 6.10. Given the frame of discernment X = {a, b, c, d}, let a belief
function Bel be defined by the associated basic probability assignment m:
m({a}) =
m({b}) =
m({c}) =
m({a, b}) =
m({a, c}) =
m({a, d}) =
m({b, c}) =
m({b, d}) =
m({c , d}) =
m({a, b, c , d}) =
0.26,
0.26,
0.26,
0.07,
0.01
0.01
0.01,
0.01,
0.01,
0.1
(only values for focal elements are listed). By Remark (c) in Appendix C, we
do not need to consider the value of Bel for {d} and ⭋. The values of Bel(A)
and also the values of Bel(A)/|A| for all other subsets A of X are listed in Table
6.3a. We can see from this table that the highest value of Bel(A)/|A| is obtained
for A = {a, b}. Therefore, we set pa = pb = 0.295. Now, we have to update our
(now generalized) belief function Bel. For example,
Bel ({c}) = Bel ({a, b, c}) - Bel ({a, b})
= 0.87 - 0.59 = 0.28.
232
6. MEASURES OF UNCERTAINTY AND INFORMATION
Table 6.3. Calculation of AU by Algorithm 6.1 in Example 6.10: (a) First Iteration;
(b) Second Iteration
A
{a}
{b}
{c}
{a, b}
{a, c}
{a, d}
{b, c}
{b, d}
{c, d}
{a, b, c}
{a, b, d}
{a, c, d}
{b, c, d}
{a, b, c, d}
Bel(A)
Bel( A)
A
0.26
0.26
0.26
0.59
0.53
0.27
0.53
0.27
0.27
0.87
0.61
0.55
0.55
1
0.26
0.26
0.26
0.295
0.65
0.135
0.265
0.135
0.135
0.29
0.203̄
0.183̄
0.183̄
0.25
A
{c}
{d}
{c, d}
Bel(A)
Bel( A)
A
0.28
0.02
0.41
0.28
0.02
0.205
(b)
(a)
All the values are listed in Table 6.3b. Our new frame of discernment X is now
{c, d}. Since X π ⭋ and Bel(X) > 0, we repeat the process. The maximum of
Bel(A)/|A| is now reached at {c}. We put pc = 0.28, change Bel({d}) to 0.13, and
X to {d}. In the last pass through the loop we get pd = 0.13.We can conclude that
AU (Bel ) = -
Â
i Œ{a ,b ,c ,d }
pi log 2 pi
= - 2 ¥ 0.295 log 2 0.295 - 0.28 log 2 0.28 - 0.13 log 2 0.13
⬟ 1.9.
6.6.2. Computing the Aggregated Uncertainty in Possibility Theory
Due to the nested structure of possibilistic bodies of evidence, the computation of functional AU can be substantially simplified. It is thus useful to reformulate Algorithm 6.1 in terms of possibility profiles for applications in
possibility theory. The following is the reformulated algorithm.
Algorithm 6.2. Calculating AU from a given possibility profile.
Input. n Œ ⺞, r = ·r1, r2, . . . , rnÒ.
n
Output. AU(Pos), ·pi | i Œ ⺞nÒ such that AU(Pos) = -Si=1
pi log2pi, with pi ≥ 0 for
n
i Œ ⺞n, S i=1 pi = 1, and SxŒApx £ Pos(A) for all ⭋ π A à X.
Step 1. Let j = 1 and rn+1 = 0.
Step 2. Find maximal i Œ {j, j + 1, . . . , n} such that (rj - ri+1)/(i + 1 - j) is
maximal.
Step 3. For k Œ {j, j + 1, . . . , i}, put pk = (rj - ri+1)/(i + 1 - j).
6.6. AGGREGATE UNCERTAINTY IN DEMPSTER–SHAFER THEORY
233
Step 4. Put j = i + 1.
Step 5. If i < n, then go to Step 2.
Step 6. Calculate AU(Pos) = -Sni=1 pi log2pi.
As already mentioned, it is sufficient to consider only all unions of focal
elements of a given belief function, in this case a necessity measure. Since all
focal elements of a necessity measure are nested, the union of a set of focal
elements is the largest focal element (in the sense of inclusion) in the set.
Therefore, we have to examine the values of Nec(A)/|A| only for A being a
focal element.
We show by induction on the number of passes through the loop of Steps
1–5 of Algorithm 6.1 that the following properties hold in a given pass:
(a) The “current” frame of discernment X is the set {xj, xj+1, . . . , xn}, where
the value of j is taken in the corresponding pass through the loop of
Steps 2–5 of Algorithm 6.2.
(b) All focal elements are of the form Ai = {xj, xj+1, . . . , xi} for some i Œ {j,
j + 1, . . . , n}, where j has the same meaning as in (a).
(c) Nec(Ai) = rj - ri+1, where j is again as described in (a).
This implies that Algorithm 6.2 is a correct modification of Algorithm 6.1 for
the case of possibility theory.
In the first pass, j = 1, X = {x1, x2, . . . , xn}; (b) holds due to our ordering convention regarding possibility profiles. Since r1 = 1, we have
Nec ( Ai ) = 1 - Pos( Ai )
n
= 1 - max{r j}
k= i+1
= 1 - ri+1 .
So (c) is true.
Let us now assume that (a)–(c) were true in some fixed pass. We want to
show that (a)–(c) hold in the next pass. Let l denote the value of i maximizing Nec(Ai)/|Ai| = Nec(Aj)/(i + 1 - j). Now X becomes X - Al = {xl+1, xl+2, . . . ,
xn}. Therefore (a) holds, since j = l + 1 and Nec(Ai) becomes
Nec ( Ai ) = Nec ( Al » Ai ) - Nec ( Al )
= 1 - Pos( Al - Ai ) - 1 - Pos( Al )
n
n
È
˘ È
˘
= Í1 - max{r j} ˙ - Í1 - max{r j} ˙
k= l+1
k= i+1
Î
˚ Î
˚
= rl+1 - ri+1
= r j - ri+1
[
] [
for i Œ {j, j + 1, . . . , n}. This implies that (b) and (c) hold.
]
234
6. MEASURES OF UNCERTAINTY AND INFORMATION
Table 6.4. Illustration of the Use of Algorithm 6.2 in
Example 6.11
rj - ri + 1
i+1- j
Pass
1
2
i/j
1
4
1
2
3
4
5
6
7
8
0.1
0.075
0.16̄
0.125
0.13
0.116̄
0.1286
0.125
0
0.075
0.06̄
0.1
0.1
EXAMPLE 6.11. Consider X = {1, 2, . . . , 8} and the possibility profile
p = 1, 0.9, 0.85, 0.5, 0.5, 0.35, 0.3, 0.1 .
The relevant values of (rj - ri+1)/(i + 1 - j) are listed in Table 6.4.
We can see there that in the first pass the maximum is reached at i = 3, and
p1 = p2 = p3 = –16 . In the second pass, the maximum is reached at both i = 7 and
i = 8. We take the bigger one and put p4 = p5 = p6 = p7 = p8 = 0.1. We finish by
computing
8
AU (Pos) = - Â pi log 2 pi =
i=1
3
5
log 2 6 +
log 2 10
6
10
= 2.95.
6.7. AGGREGATE UNCERTAINTY FOR CONVEX SETS OF
PROBABILITY DISTRIBUTIONS
Once the aggregated measure of uncertainty AU has been established in DST,
it is fairly easy to generalize it for convex sets of probability distributions. Let
this generalized version of AU be denoted by S̄ to emphasize its definition in
terms of the maximum Shannon entropy. Clearly, S̄ is a functional defined on
the family of convex sets of probability distribution functions. Given a convex
set D of probability distribution functions p, S̄ is defined by the formula
Ï
¸
S (D) = maxÌ- Â p( x) log 2 p( x)˝.
p ŒD Ó
˛
x ŒX
(6.61)
6.7. AGGREGATE UNCERTAINTY FOR CONVEX SETS OF PROBABILITY DISTRIBUTIONS
235
It is essential to show that S̄ satisfies the following requirements, which are
appropriate generalizations of their counterparts for AU:
(S̄1) Probability Consistency. When D contains only one probability distribution, S̄ assumes the form of the Shannon entropy.
(S̄2) Set Consistency. When D consists of the set of all possible probability distributions on A 債 X, then
S (D) = log 2 A .
(S̄3) Range. The range of S̄ is [0, log2|X|] provided that uncertainty is measured in bits.
(S̄4) Subadditivity. If D is an arbitrary convex set of probability distributions on X ¥ Y and DX and DY are the associated sets of marginal
probability distributions on X and Y, respectively, then
S (D) £ S (DX ) + S (DY ).
(S̄5) Additivity. If D is the set of joint probability distributions on X ¥ Y
that is associated with independent marginal sets of probability distributions, DX and DY, which means that D is the convex hull of the set
DX ƒDY = { p(x, y) = pX (x) ◊ pY ( y) x Œ X , y ŒY , pX ŒDX , PY ŒDY },
then S (D) = S (DX ) + S (DY ).
It is obvious that the functional S̄ defined by Eq. (6.61) satisfies requirements (S̄1)–(S̄3). The remaining two properties, subadditivity and additivity,
are addressed by the following two theorems.
Theorem 6.10. Functional S̄ defined by Eq. (6.61) satisfies the requirement
(S̄4), that is, it is subadditive.
.
Proof. Given a convex set of joint probability distributions on X ¥ Y, let p
denote the joint probability distribution in D for which the maximum in Eq.
.
.
(6.61) is obtained and let pX and pY denote the marginal probability distribu.
tions of p. Then, by Gibbs’ inequality (Eq. 3.28),
S (p˙ ) = - Â
 p˙ (x, y) log p˙ (x, y)
£ - Â Â p˙ (x, y) log [ p˙ (x) ◊ p˙ ( y)]
= - Â Â p˙ (x, y) log p˙ (x) - Â Â p˙ (x, y) log
= - Â p˙ (x) log p˙ (x) - Â p˙ ( y) log p˙ ( y)
2
y ŒY x ŒX
2
X
Y
y ŒY x ŒX
2
X
y ŒY x ŒX
X
x ŒX
£ S (p˙ X ) + S (p˙ Y ).
2
p˙ Y ( y)
y ŒY x ŒX
2
X
Y
2
Y
y ŒY
䊏
236
6. MEASURES OF UNCERTAINTY AND INFORMATION
Theorem 6.11. Functional S̄ defined by Eq. (6.61) satisfies the requirement
(S̄5), that is, it is additive.
Proof. From Theorem 6.10
S (D) £ S (DX ) + S (DY ).
It is thus sufficient to show that
S (D) ≥ S (DX ) + S (DY )
.
.
under the assumption of independence. Let pX and pY denote the marginal
probability distributions on X and Y for which the maxima in Eq. (6.61) are
obtained. Then
S (DX ) + S (DY ) = - Â p˙ X (x) log 2 p˙ X (x) x ŒX
=-
  p˙
X
 p˙
Y
( y) log 2 p˙ Y ( y)
y ŒY
(x) ◊ p˙ Y ( y) log 2 [ p˙ X (x) ◊ p˙ y ( y)]
y ŒY x ŒX
£ S (D).
䊏
EXAMPLE 6.12. Consider the following convex sets of marginal probability
distributions on X = {x1, x2} and Y = {y1, y2}:
DX = { pX (x1 ) = 0.2 + 0.2 l X , pX (x 2 ) = 0.8 - 0.2 l X l X Œ[ 0, 1]},
DY = { pY ( y1 ) = 0.1 + 0.1l X , pY ( y2 ) = 0.9 - 0.1lY lY Œ[ 0, 1]}.
The maximum entropy within DX is obtained for lX = 1 (i.e., for pX (x1) = 0.4
and pX(x2) = 0.6). The maximum entropy within DY is obtained for lY = 1 (i.e.,
for pY(y1) = 0.2 and pY(y2) = 0.8). Hence,
S (DX ) = S (0.4, 0.6) = 0.971,
S (DY ) = S (0.2, 0.8) = 0.722,
S (DX ) + S (DY ) = 1.693.
Assuming the independence of the marginal probabilities, the set D of the
associated joint probabilities on Z = X ¥ Y is the convex hull of the set
DX ƒ DY = { p(z11 ) = (0.2 + 0.2l X )(0.1 + 0.1lY ),
p(z12 ) = (0.2 + 0.2l X )(0.9 - 0.1lY ),
p(z21 ) = (0.8 - 0.2l X )(0.1 + 0.1lY ),
p(z22 ) = (0.8 - 0.2l X )(0.9 - 0.1lY ) l X , lY Œ[0, 1]},
where zi,j = ·xi, yjÒ for all i, j = 1, 2. The maximum entropy within DX ƒ DY is
obtained for lX = lY = 1 and, hence,
6.7. AGGREGATE UNCERTAINTY FOR CONVEX SETS OF PROBABILITY DISTRIBUTIONS
237
S (D) = S (DX ) + S (DY ) = 1.693.
Functional S̄ is defined in this section for convex sets of probability distributions. However, it can be as well defined for the lower and upper probability functions or the Möbius representations obtained uniquely from convex
sets of probability distributions. We can thus write not only S̄(D), but also
S̄( m ), S̄(m̄), or S̄(m).
Algorithm 6.1 (introduced in Section 6.6.1) is formulated and proved
correct for belief functions. It is not necessarily applicable to lower probabilities that are more general or incomparable with belief functions. A lower
probability for which the algorithm does not work is examined in the following example.
EXAMPLE 6.13. Consider the following convex set D of probability distributions on Z = {z11, z12, z21, z22} defined in Example 6.7. The lower probability
function, m D, associated with this set, which is derived by Eq. (4.11), is shown
in Table 6.2. Recall that m D is not 2-monotone in this example.
Applying Algorithm 6.1 results in probability distribution
p(z11 ) = p(z12 ) = 0.4,
p(z21 ) = p(z22 ) = 0.1,
for which S(0.4, 0.4, 0.1, 0.1) = 1.722. However, this probability distribution is
not a member of the given set of probability distributions. The plot of the
Shannon entropy for l Œ [0, 1] is shown in Figure 6.8. Its maximum is = 1.69288.
It is obtained for p = ·0.48, 0.32, 0.12, 0.08Ò. Algorithm 6.1 thus does not
produce the correct value of S̄(m ) in this example.
S
1.7
1.6
1.5
1.4
l
0.2
0.4
0.6
0.8
1
Figure 6.8. Values of the Shannon entropy for l Œ [0, 1] in Example 6.13.
238
6. MEASURES OF UNCERTAINTY AND INFORMATION
m1
m2
x1
x2
1 - m1 - m2
1
S ( m) = S ( m 2 , 1 - m 2 )
0.5
m2
S (m) = S (m1, 1 - m1)
S ( m) = 1
(a)
0
m1
0.5
1
(b)
Figure 6.9. Illustration of the severe insensitivity of the aggregate uncertainty measure
(Example 6.14).
6.8. DISAGGREGATED TOTAL UNCERTAINTY
Functional S̄ defined by Eq. (6.60) is certainly well justified on mathematical
grounds as an aggregate measure of uncertainty in the theory based on arbitrary convex sets of probability distributions and, consequently, to the various
special theories of uncertainty as well. However, it has a severe practical shortcoming: it is highly insensitive to changes in evidence. To illustrate this undesirable feature of S̄, let us examine the following example.
EXAMPLE 6.14. Consider the following class of simple bodies of evidence
in DST: X = {x1, x2}, m({x1}) = m1, m({x2}) = m1, m(X) = 1 - m1 - m2, where m1,
m2 Œ [0, 1] and m1 + m2 £ 1 (Figure 6.9a). Clearly, Bel({x1}) = m1, Bel({x2}) =
m2, and Bel(X) = 1. Then, S̄(Bel) = 1 (or AU(Bel) = 1 when using the special
notation in DST) for all m1 Œ [0, 0.5] and m2 Œ [0, 0.5]. Moreover, when m1 >
0.5, S̄(Bel) = S(m1, 1 - m1), where S denotes the Shannon entropy. Hence,
S̄(Bel) is independent of m2. Similarly, when m2 > 0.5, S̄(Bel) = S(m2, 1 - m2)
and, hence, S̄(Bel) is independent of m1. These values of S̄(Bel) are shown in
Figure 6.9b. The functional S̄ in this example can also be expressed by the
formula
S (Bel ) = S (max{m1 , m2 , 0.5}, 1 - max{m1 , m2 , 0.5})
(6.62)
for all m1, m2 Œ [0, 1] such that m1 + m2 £ 1, where S denotes the Shannon
entropy.
This example surely illustrates that the insensitivity of the functional S̄ to
changes in evidence is very severe. This feature makes the functional ill-suited
for measuring uncertainty on conceptual and pragmatic grounds. However,
6.8. DISAGGREGATED TOTAL UNCERTAINTY
239
recognizing that S̄ is an aggregate of the two types of uncertainty—nonspecificity and conflict—it must be that
S (m ) = GH (m ) + GS (m ),
(6.63)
where m is a lower probability function, which can be derived from any of the
other representations of the same uncertainty (the associated convex set of
probability distributions, Möbius representation, or upper probability function) or may be converted to any of these representations as needed. In Eq.
(6.63) GH is the well-justified functional for measuring nonspecificity (defined
at the most general level by Eq. (6.38) and referred to as the generalized
Hartley measure) and GS is an unknown functional for measuring conflict
(referred to as the generalized Shannon entropy).
Since functionals S̄ and GH in Eq. (6.63) are well justified and GS is an
unknown functional, it is suggestive to define GS from the equation as
GS (m ) = S (m ) - GH (m ).
(6.64)
Here, GS is defined indirectly, in terms of two well-justified functionals, thus
overcoming the unsuccessful attempts to define it directly (as discussed in
Section 6.5).
Functional S̄, which is well justified but practically useless due to its insensitivity, can now be disaggregated into two components, GH and GS, which
measure two types of uncertainty that coexist in all uncertainty theories except
the classical ones. A disaggregated total uncertainty, TU, is thus defined as the
pair
TU = GH , GS ,
(6.65)
where GH and GS are defined by Eqs. (6.38) and (6.64), respectively. Since
the sum of the two components of TU is S̄, TU is as well justified as S̄. One
advantage of the disaggregated total uncertainty, TU, in comparison with its
aggregated counterpart S̄, is that it expresses amounts of both types of uncertainty (nonspecificity and conflict) explicitly, and consequently, it is highly sensitive to changes in evidence.
Another advantage of TU is that its components, GH and GS, need not
satisfy all the mathematical requirements for measures of uncertainty. It
only matters that their aggregate measure, S̄, satisfies them. The lack of
subadditivity of GH for arbitrary convex sets of probability distributions,
established in Example 6.7, is thus of no consequence when GH is employed
as one component in TU.
EXAMPLE 6.15. To appreciate the difference between S̄ and TU, let us
consider the following three bodies of evidence on X and let |X| = n for
convenience:
240
6. MEASURES OF UNCERTAINTY AND INFORMATION
(a) In the case of total ignorance (when m(X) = 1), S̄(m) = log2n and TU(m)
= ·log2n, 0Ò.
(b) When evidence is expressed by the uniform probability distribution
on X (i.e., m({x}) = 1/n for all x Œ X), then again S̄(m) = log2n, but
TU(m) = ·0, log2nÒ.
(c) When evidence is expressed by m({x}) = a for all x Œ X, where a < 1/n,
and m(X) = 1 - na, then again S̄(m) = log2n, while TU(m) = ·(1 na)log2n, nalog2nÒ.
EXAMPLE 6.16. To illustrate the disaggregated total uncertainty TU = ·GH,
GSÒ, let us consider again the class of simple bodies of evidence introduced in
Example 6.14. Plots of the dependences of S̄, GH, and GS on m1 and m2 are
shown in the first column of Figure 6.10. We can see that both components of
TU (GH and GS) are sensitive to changes of evidence. Plots of S̄, GH, and GS
for the extreme cases when either m2 = 0 or m1 = 0 (nested or possibilistic
bodies of evidence) are shown in the second column of Figure 6.10. It is easy
to determine that the maximum of GS is 0.585, and it is attained in these cases
for m1 = 2/3 and m2 = 0 (or m1 = 0 and m2 = 2/3).
The plots in Figure 6.10 are based on the lower probability function, when
m1 + m2 £ 1. Similar plots can be made for the upper probability function, when
m1 + m2 ≥ 1. In Figure 6.11, values of S̄(m1, m2), GH(m1, m2), GS(m1, m2) are
shown for all values of m1, m2 Œ [0, 1]. Clearly,
Ï S (max{m1 , m2 , 0.5},
Ô 1 - max{m1 , m2 , 0.5})
S (m1 , m2 ) = Ì
Ô S (max{1 - m1 , 1 - m2 , 0.5},
Ó 1 - max{1 - m1 , 1 - m2 , 0.5})
when m1 + m2 £ 1
when m1 + m2 ≥ 1,
where S denotes the Shannon entropy, and
{1m- +mm- m- 1
when m1 + m2 £ 1
when m1 + m2 ≥ 1
GS(m1 , m2 ) = S (m1 , m2 ) - GH (m1 , m2 ),
GH (m1 , m2 ) =
1
1
2
2
To fully justify the disaggregated total uncertainty TU defined by Eq. (6.65),
it remains to prove that its second component, the generalized Shannon
entropy GS defined by Eq. (6.64) is always nonnegative. That is, we need to
prove that the inequality
S (m ) - GH (m ) ≥ 0
holds for any lower probability function m (or any of its equivalent representations). This proof is presented for belief functions in Appendix D. A proof
for lower probability functions outside DST is still needed.
6.9. GENERALIZED SHANNON ENTROPY
241
1
0.8
1
0.8
0.6
0.4
0.2
0
0
1
0.8
0.6
0.2
0.4
0.4
0.2
0.6
0.8
0.6
0.4
0.2
0
10
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
1
0.8
1
0.75
0.5
0.25
0
0
0.2
1
0.8
0.6
0.4
0.4
0.6
0.2
0.8
0.6
0.4
0.2
0
10
0.5
1
0.75
0.5
0.25
0
0
0.4
0.2
1
0.8
0.6
0.4
0.4
0.2
0.6
0.8
10
0.3
0.2
0.1
0
Figure 6.10. Uncertainties for the two-element bodies of evidence defined in Example 6.14.
First column: S̄, GH, and GS for all m1 and m2 such that m1 + m2 £ 1; second column: S̄, GH,
and GS for either m1 = 0 or m2 = 0.
6.9. GENERALIZED SHANNON ENTROPY
The generalized Shannon entropy defined by Eq. (6.64) emerged fairly
recently and has not been sufficiently investigated as yet. Nevertheless, some
of its properties in Dempster–Shafer theory are derived in this section on the
basis of Algorithm 6.1.
1
0.8
0.6
0.4
0.2
0
0
1
0.8
0.6
0.4
0.2
0.4
0.2
0.6
0.8
1 0
(a)
1
0.75
0.5
0.25
1
0.8
0
0
0.6
0.4
0.2
0.4
0.2
0.6
0.8
1 0
(b)
1
0.75
0.5
0.25
1
0.8
0
0
0.6
0.4
0.2
0.4
0.2
0.6
0.8
1 0
(c)
Figure 6.11. For all m1, m2 Œ [0, 1]: (a) S̄ (m1, m2); (b) GH(m1, m2); (c) GS(m1, m2).
6.9. GENERALIZED SHANNON ENTROPY
243
To facilitate the following derivations, let
F = {Ai Ai ŒP ( X ), i Œ⺞q }
denote the family of all focal sets of a given body of evidence in
Dempster–Shafer theory. Furthermore, let
E = {Bk Bk ŒP ( X ), k Œ⺞r }
denote the partition of kŒ⺞q Ai that is produced by Algorithm 6.1. Clearly,
r £ q and
U
k Œ⺞q
Ai =
Â
Bk .
k Œ⺞r
For convenience, assume that block Bk is produced by the kth iteration of the
algorithm, and let Pro(Bk) denote the probability of Bk. Clearly,
 Pro(B ) = 1.
(6.66)
k
k Œ⺞r
According to Algorithm 6.1,
S (Bel ) = - Â Pro(Bk ) log 2 [Pro(Bk ) Bk ],
k Œ⺞r
where Bel denote the belief measure associated with the given body of evidence. This equation can be rewritten as
S (Bel ) = - Â Pro(Bk ) log 2 Pro(Bk ) +
k Œ⺞r
 Pro(B ) log
k
2
Bk .
k Œ⺞r
Due to Eq. (6.66), the first term on the right-hand side of this equation is the
Shannon entropy of E and the second term is the generalized Hartley measure
of E. We can thus rewrite this equation as
S (Bel ) = S(Pro(Bk ) k Œ⺞r ) + GH (Pro(Bk ) k Œ⺞r ).
Using now Eq. (6.64), we obtain
GS(Bel ) = S(Pro(Bk ) k Œ⺞r ) + GH (Pro(Bk ) k Œ⺞r ) - GH (m( Ai ) i Œ⺞q ).
(6.67)
This equation indicates that the generalized Shannon entropy is expressed by
the Shannon entropy of E and the difference between the nonspecificity of E
and the nonspecificity of the given body of evidence ·F, mÒ.
244
6. MEASURES OF UNCERTAINTY AND INFORMATION
In the rest of this section, some properties of the generalized Shannon
entropy are derived for two special types of bodies of evidence, those in which
focal elements are either disjoint or nested.
Theorem 6.12. Consider a given body of evidence ·F, mÒ in which F is a family
of pairwise sets. Then,
q
GS (Bel ) = - Â m( Ai ) log 2 m( Ai ),
i=1
where Bel denotes the belief measure based on m.
Proof. For convenience, let mi = m(Ai) and ai = |Ai|. Assume (without any loss
of generality) that
m1 m2 . . . mq
≥
≥ ≥
.
a1
a2
aq
Then, m1a2 ≥ m2a1. Now comparing m1/a1 with (m1 + m2)/(a1 + a2), we
obtain
m1 m1 + m2 m1a2 - m2 a1
≥ 0.
=
a1
a1 + a2
a1 (a1 + a2 )
It is obvious that the same result is obtained when m1/a1 is compared with
Êm +
mi ˆ Ê a1 + Â a i ˆ
Ë 1 Â
¯ Ë
¯
iŒI
iŒI
for any I 債 {2, 3, . . . , q}. Hence, Algorithm 6.1 assigns probabilities m1/a1 to
all elements in Ai in the first iteration (observe that Bel(Ai) = m(Ai) for all
i Œ⺞q when focal elements are disjoint). By repeating the same argument for
m2/a2, we can readily show that Algorithm 6.1 assigns probabilities m2/a2 to all
elements of A2 in the second iteration, and so forth. Hence,
q
S (Bel ) = - Â a i
i=1
q
Ê mi ˆ
Ê mi ˆ
log 2
Ë ai ¯
Ë ai ¯
q
= - Â mi log 2 mi + Â mi log 2 a i
i=1
q
i=1
= - Â mi log 2 mi + GH (Bel ).
i=1
Now,
6.9. GENERALIZED SHANNON ENTROPY
245
GS (Bel ) = S (Bel ) - GH (Bel )
q
= - Â mi log 2 mi .
䊏
i=1
Now let us consider some properties of the generalized Shannon entropy
for nested bodies of evidence. Let X = {xi | i Œ⺞n} and assume that the elements
of X are ordered in such a way that the family
A = { Ai = {x1 , x 2 , . . . , x i} i Œ ⺞n }
contains all focal sets. According to this special notation (introduced in Section
5.2.1), F 債 A. For convenience, let mi = m(Ai) for all i Œ ⺞n.
Given a particular nested body of evidence ·F, mÒ,
GS(m) = S (m) - GH (m).
While GH(m) in this equation is expressed by the simple formula
GH (m) =
n
 m log
i
2
i,
(6.68)
i= 2
the expression of S̄(m) is not obvious at all. To investigate some properties of
GS for nested bodies of evidence, let the following three cases be distinguished:
(a) mi ≥ mi+1 for all i Œ ⺞n-1;
(b) mi £ mi+1 for all i Œ ⺞n-1;
(c) Neither (a) nor (b).
Following the algorithm for computing S̄, we obtain the formula
n
GS a (m) = - Â mi log 2 (mii)
(6.69)
i=1
for any function m that conforms to Case (a). By applying the method of
Lagrange multipliers (Appendix E), we can readily find out that the maximum,
GS*a (n), of this functional for each n Œ ⺞ is obtained for
mi = (1 i)2 - (a +1 ln 2) (i Œ⺞n ),
where the value of a is determined by solving the equation
n
2 - (a +1 ln 2) Â (1 i) = 1.
i =1
246
6. MEASURES OF UNCERTAINTY AND INFORMATION
n
Let sn = Â (1 i) . Then,
i =1
a = - log 2 (1 sn ) - (1 ln 2)
and, hence,
mi = (1 i)2 log2 (1 sn )
1
=
.
sn i
Substituting this expression for mi in Eq. (6.69), we obtain
GSa* (n) = log 2 sn .
(6.70)
The maximum, GS*b(n), of this functional for some n Œ ⺞, subject to the
inequalities that are assumed in Case (b), is obtained for mi = 1/n for all
i Œ ⺞n. Hence,
GSb* (n) = log 2
n
.
n!1 n
(6.71)
For large n, n! in this equation can be approximated by the Stirling formula
1 2
n! ª (2p ) nn +1 2 e - n .
Then,
1 n
(n!)
1 2n
ª (2p )
n1+(1 2 n ) e -1
and
lim
nƕ
n
ne
= lim
= e.
n!1 n nƕ (2p )1 2n n1+(1 2n )
Hence, GS*b(n) = log2e ª 1.442695. That is, GS*b is bounded, contrary to GS*a (n).
Moreover GS*b(n) < GS*a (n) for all n Œ ⺞. Plots of GS*a (n) and GS*b(n) for
n Œ ⺞1000 are shown in Figure 6.12.
Case (c) is more complicated for a general analytic treatment since it covers
a greater variety of bodies of evidence with respect to the computation of GS.
This follows from the algorithm for computing S̄. For each given body of evidence, the algorithm partitions the universal set in some way, and distributes
the value of the lower probability in each block of the partition uniformly. For
6.9. GENERALIZED SHANNON ENTROPY
247
3
2.75
2.5
2.25
2
1.75
1.5
1.25
200
400
600
800
1000
Figure 6.12. Plots of GSa*(n) and GSb*(n).
10
8
6
4
2
200
400
600
800
1000
Figure 6.13. Illustration of the growing difference [GH*(n) - GS*(n)] with increasing n.
nested bodies of evidence, the partitions preserve the induced order of elements of X. There are 2n-1-order preserving partitions. The most refined partition and the least refined one are represented by Cases (a) and (b),
respectively. All the remaining 2n-1 - 2 partitions are represented by Case (c).
A conjecture, based on a complete analysis for n = 3 and extensive simulation
experiments for n > 3, is that the maxima of GS for all these partitions are for
all n Œ ⺞ smaller than the maximum GS*a (n) for Case (a). According to this
plausible conjecture, whose proof is an open problem, the difference between
the maximum nonspecificity, GH*(n), and the maximum conflict, GS*a (n),
grows rapidly with n. This is illustrated by the plots of GH*(n) and GS*(n) for
n Œ ⺞1000 in Figure 6.13.
248
6. MEASURES OF UNCERTAINTY AND INFORMATION
In addition to defining the general Shannon entropy by Eq. (6.64), it seems
also reasonable to express it by the interval [S(D), S̄(D)], where S(D) and
S̄(D) are, respectively, the minimum and maximum values of the Shannon
entropy within a given convex set D of probability distributions. Then, an
alternative total uncertainty, TU¢, is defined as the pair
TU ¢ = GH , [ S , S ] .
(6.72)
While this measure is quite expressive, as it captures all possible values of the
Shannon entropy within each given set D, its properties and utility have yet
to be investigated (see Note 6.10).
6.10. ALTERNATIVE VIEW OF DISAGGREGATED
TOTAL UNCERTAINTY
The disaggregated measure of total uncertainty introduced in Section 6.8 is
based on accepting the generalized Hartley functional as a measure of nonspecificity and using its numerical complement with respect to the aggregated
measure S̄ as a generalization of the Shannon entropy. This approach to disaggregating S̄ is reasonable since the generalized Hartley functional is well
justified (on both intuitive and mathematical grounds) as a measure of
nonspecificity in at least all the uncertainty theories that are subsumed under
DST. No functional with a similar justification has been found to generalize
the Shannon entropy, as is discussed in Section 6.5.
Although the full justification of the generalized Hartley functional does
not extend to all theories of uncertainty, due to the lack of subadditivity (as
shown in Example 6.7), this does not hinder its role in the disaggregated
measure. The two components in the disaggregated measure are defined in
such a way that whenever one of them violates any of the required properties,
the other one compensates for these violations.
Recall that measure, S̄, which is well justified in all uncertainty theories,
aggregates two types of uncertainty: nonspecificity and conflict. In classical
uncertainty theories, these types of uncertainty are measured by the Hartley
functional and the Shannon functional, respectively. In the various generalizations of the classical theories, appropriate counterparts of these classical
measures of nonspecificity and conflict are needed. This suggests looking for
justifiable generalizations of the Hartley and Shannon functionals in the
various nonclassical uncertainty theories. However, all attempts to generalize
these functionals independently of each other have failed. This eventually led
to the idea of disaggregating the aggregated measure S̄. According to this idea,
the generalization of the classical functionals should be constrained by the
requirement that their sum always be equal to S̄. One way of satisfying this
requirement is to choose one of the components of the disaggregated measure
6.10. ALTERNATIVE VIEW OF DISAGGREGATED TOTAL UNCERTAINTY
249
and defined the other one as its numerical complement with respect to S̄.
Clearly, there are two ways to pursue this approach to the disaggregation of
S̄: the chosen component is either a measure of nonspecificity or a measure of
conflict.
The disaggregated total uncertainty introduced in Section 6.8 is based on
choosing the generalized Hartley functional as a measure of nonspecificity.The
aim of this section is to explore the other possibility: to choose a particular
measure of conflict, and to define the measure of nonspecificity as its numerical complement with respect to S̄. Since, in this context, the chosen measure
of conflict does not have to satisfy all the required properties, all the proposed
generalizations of the Shannon entropy, which are discussed in Section 6.5, are
in principle applicable. However, a measure that seems to capture best the
generalized Shannon entropy is the functional S defined for any arbitrary
closed and convex set D of probability distributions by the formula
S(D) = min{-Â p( x) log 2 p( x)}.
p ŒD
(6.73)
This measure has been neglected due to its massive violations of the
subadditivity requirement. However, this deficiency does not matter when S
is used as one component of the disaggregated measure of total uncertainty.
Functional S is intuitively a good choice for a generalized Shannon entropy,
since it measures (by the Shannon entropy itself) the essential (irreducible)
amount of conflict embedded in any given credal set D. When it is accepted
as a measure of conflict, then the measure of nonspecificity, N, is defined for
any given credal set D by the formula
N (D) = S (D) - S(D).
(6.74)
This means that, according to this view of disaggregating S̄, the measure of
nonspecificity is not a generalized Hartley measure anymore.
Functionals S and N have been proved to possess the following properties
(Note 6.10):
1. The range of both S and N is [0, log2|X|] provided that the units of measurement are bits. The lower and upper bounds are reached under the
following conditions:
• S(D) = 0 iff sup{ p(x )} = 1 for some x Œ X.
p ŒD
•
•
•
S(D) = log2|X| iff D contains only the uniform probability distribution
on X.
N(D) = 0 iff |D| = 1.
N(D) = log2|X| iff sup{ p(x)} = 1 for some x Œ X and D contains the
p ŒD
uniform probability distribution on X.
250
6. MEASURES OF UNCERTAINTY AND INFORMATION
2. S is monotone decreasing and N is monotone increasing with respect to
subsethood relation between credal sets if D 債 D¢, then S(D) ≥ S(D¢)
and N(D) £ N(D¢).
3. Both S and N are additive.
4. Both S and N are continuous.
Taking into account all these properties, it makes sense to define for any
given credal set D an alternative measure of disaggregated total uncertainty,
a
TU, by the formula
a
TU (D) = S (D) - S (D), S (D) .
(6.75)
In Figure 6.14, plots of the functionals involved in aTU (i.e., S̄, N = S̄ - S, and
S) are shown for the class of simple bodies of evidence introduced in Example
6.14.These are counterparts of the plots shown in Figure 6.11 of the functionals
involved in TU. In this case,
S (m) =
{SS((ab,, 11 -- ab))
when m1 + m2 £ 1
when m1 + m2 ≥ 1,
where a = max(m1, m2, 0.5) and b = max(1 - m1, 1 - m2, 0.5),
S (m) =
{SS ((cd,, 11-- cd))
when m1 + m2 £ 1
when m1 + m2 ≥ 1,
where c = min(m1, m2) and d = min(1 - m1, 1 - m2), and N(m) = S̄(m) - S(m);
S denotes the Shannon entropy.
The practical utility of the alternative measure aTU of disaggregated total
uncertainty is contingent on an efficient algorithm for computing S. While it
has been established that, due to the concavity of the Shannon entropy, S(D)
is always obtained at some extreme point of D, the computational complexity of searching through the extreme points is still very high. Since the upper
count of the number of extreme point is |X|!, some ways of avoiding an exhaustive search is essential. Some results in this direction have been obtained
already (Note 6.10).
EXAMPLE 6.17. Consider the credal set D defined in Example 6.7. Clearly,
S(D) is obtained for the extreme point pB, as can be seen from the plot of
S(lpA + (1 - l)pB) in Figure 6.8, where pA = ·0.4, 0.4, 0.2, 0Ò and pB = ·0.6, 0.2,
0, 0.2Ò. We have S(D) = S(pB) = 1.371. Since S̄(D) = 1.693 in this case (see
Example 6.13), we have,
TU = 0.21, 1.371 .
a
6.10. ALTERNATIVE VIEW OF DISAGGREGATED TOTAL UNCERTAINTY
1
0.8
0.6
0.4
0.2
0
0
1
0.8
0.6
0.4
0.2
0.4
0.2
0.6
0.8
10
1
0.8
0.6
0.4
0.2
0
0
1
0.8
0.6
0.4
0.2
0.4
0.2
0.6
0.8
10
1
0.8
0.6
0.4
0.2
0
0
1
0.8
0.6
0.4
0.2
0.4
0.2
0.6
0.8
10
Figure 6.14. Counterparts of plots in Figure 6.11 for aTU defined by Eq. (6.75).
251
252
6. MEASURES OF UNCERTAINTY AND INFORMATION
EXAMPLE 6.18. Consider the credal set m2D in Example 4.5, which is defined
by convex combinations of four extreme points:
p1 = 0.2, 0.3, 0.5 ,
p 2 = 0.4, 0.1, 0.5 ,
p3 = 0.3, 0.3, 0.4 ,
p 4 = 0.4, 0.2, 0.4 .
These points are obtained via the interaction representation of the given lower
probability function m2, as is shown in Figure 4.5. Clearly, S(p1) = 1.485, S(p2)
= 1.361, S(p3) = 1.571, and S(p4) = 1.522. Hence, S(m2D) = S(p2) = 1.361. Using
Algorithm 6.1, it is easy to compute S̄(m2D) = S(0.3, 0.3, 0.4) = S(p3) = 1.571.
Then,
TU ( m2 D) = 0.21, 1.361 .
a
A pictorial overview of justified measures of uncertainty in GIT is given in
Figure 6.15. Although other disaggregations of S̄ are possible, at least in principle, the two shown in the figure, TU and aTU, seem to be the best choices on
both intuitive and mathematical grounds. Needless to say that the issue of how
to disaggregate S̄ has not been sufficiently investigated as yet.
Classical
possibility
theory: H
GH
Classical
probability
theory: S
Classical
uncertainty
theories
Aggregation
S
Generalized
uncertainty
theories
Disaggregations
TU = ·GH, S – GHÒ
GS
a
TU = · S – S, SÒ
N
a
GS
Figure 6.15. An overview of justified measures of uncertainty in GIT.
NOTES
253
6.11. UNIFYING FEATURES OF UNCERTAINTY MEASURES
The formalization of uncertainty functions involves a considerable diversity,
as is demonstrated in Chapter 5. However, it also involves some unifying features, which are examined in Chapter 4. Functionals that for each type of
uncertainty function measure the amount of uncertainty are equally diverse
while, at the same time, they also exhibit some unifying features. The following are some of these unifying features:
1. The aggregate uncertainty S̄ is defined by one formula, expressed by Eq.
(6.61), regardless of the uncertainty theory involved.
2. The definition of the generalized Hartley functional GH, when expressed
by Eq. (6.38), does not change when we move from one uncertainty to
another.
3. The measure of conflict, S, is defined in the same way, expressed by Eq.
(6.73), regardless of the uncertainty theory involved.
4. It follows from features 1 to 3 that the two versions of the disaggregated
total uncertainty, TU, and aTU, are defined in the same way in all theories of uncertainty.
5. In all the generalized theories of uncertainty that have been adequately
developed, we observe that the various equations established in classical uncertainty theories, which express how joint, marginal, and conditional uncertainty measures (and also the various information
transmissions) are interrelated, hold as well.
These unifying features of uncertainty measures allow us to work with any set
of recognized and well-developed theories of uncertainty as a whole. Some
aspects of this issue are discussed further in Chapter 9.
NOTES
6.1. The U-uncertainty was proposed by Higashi and Klir [1983a]. They also formulated the axiomatic requirements for U-uncertainty given in Section 6.2.3 (except
the branching axiom) and proved that the functional defined by Eq. (6.2) or, alternatively, Eq. (6.6), satisfies them. The possibilistic branching requirement was formulated by Klir and Mariano [1987], and they used it for proving the uniqueness
of the U-uncertainty. It was shown by Ramer and Lander [1987] that the branching axiom is essential for obtaining a unique generalization of the Hartley
measure for the theory of graded possibilities.
6.2. The generalization of the U-uncertainty to Dempster–Shafer theory was suggested by Dubois and Prade [1985c]. The uniqueness of this generalized Hartley
measure, defined by Eq. (6.27), was proved by Ramer [1986, 1987]. The measure
was also investigated and compared with other measures by Dubois and Prade
[1987d].
254
6. MEASURES OF UNCERTAINTY AND INFORMATION
6.3. An alternative measure of nonspecificity for graded possibilities, which is not a
generalization of the Hartley measure, was proposed by Yager [1982c, 1983]. It is
a functional a defined by the formula
a(m) = 1 -
Â
A ŒF
6.4.
6.5.
6.6.
6.7.
m( A)
,
A
where ·F, mÒ is a given body of evidence in Dempster–Shafer theory. This functional does not satisfy some of the axiomatic requirements for a measure of
uncertainty. In particular, it does not satisfy the requirements of subadditivity and
additivity.
The ordering of bodies of evidence in DST was introduced by Dubois and Prade
[1986a, 1987d]. A broader discussion of the concept of ordering bodies of evidence in DST is in [Dubois and Prade, 1986a].
The formulation of the symmetry and branching axioms and the proof of uniqueness of the generalized Hartley measure in DST, which is presented in Section
6.3 and in Appendix B, are due to Ramer [1986, 1987]. Ramer [1990a] also studied
the relationship among the various axiomatic requirements for the generalized
Hartley measure in possibility theory and DST.
The generalization of the Hartley measure for convex sets of probability
distributions was proposed by Abellán and Moral [2000]. Relevant proofs
regarding properties of this generalized measure are covered in the paper.
Among them, the proof of monotonicity of the measure is the most complex one.
Contrary to the claims in the paper, the generalized Hartley functional is not subadditive for arbitrary credal sets, as is demonstrated in Example 6.7. However,
this is no deficiency in the context of the disaggregated measure of total uncertainty TU.
The following is a chronological list of main publications regarding the various
unsuccessful attempts to generalize the Shannon entropy to DST that are surveyed in Section 6.5:
• Höhle [1982] proposed the measure of confusion C;
• Yager [1983] proposed the measure of dissonance E;
• Lamata and Moral [1988] suggested the sum E + GH;
• Klir and Ramer [1990] proposed the measure of discord D and further investigated its properties as well as the properties of D + GH in Ramer and Klir
[1993];
• Properties of discord in possibility theory were investigated by Geer and Klir
[1991];
• Further examination of the measure of discord by Klir and Parviz [1992]
revealed that it had a conceptual defect, and to correct it they suggested the
measure of strife ST;
• Klir and Yuan [1993] showed that the distinction between strife and discord
reflects the distinction between disjunctive and conjuctive set–valued statements. The latter distinction is discussed, for example, in [Yager, 1987a];
• The violation of subadditivity was demonstrated for D + GH by Vejnarová
[1991], and for ST + GH by Vejnarová and Klir [1993];
EXERCISES
255
Functional AS based on the interaction representation was suggested by Yager
[2000] and Marichal and Roubens [2000] and its violation of subadditivity is
shown in [Klir, 2003].
6.8. The idea of the aggregated measure of uncertainty in DST, which is discussed in
Section 6.6, emerged in the early 1990s. It seems that it was conceived independently and almost simultaneously by several authors. The following are relevant
publications: [Chokr and Kreinovich, 1991, 1994], [Maeda and Ichihashi, 1993],
[Chau, et al., 1993], [Harmanec and Klir, 1994], and [Abellán and Moral, 1999].
Algorithms for computing the functional AU were investigated by Maeda et al.
[1993], Meyerowitz et al. [1994], and Harmanec et al. [1996]; the algorithm and
its proof of correctness that are presented in Sections 6.6.1 and 6.6.2 and Appendix C are adopted from the last reference. Harmanec [1995, 1996] made progress
toward the proof of uniqueness of AU by proving that AU is the smallest functional among those that satisfy the requirements (AU1)–(AU5), if any of them
exist except AU. Abellán and Moral [2003a] generalized the aggregate measure
of uncertainty to arbitrary credal sets and developed a more efficient algorithm
for computing it within the theory based on reachable interval-valued probability distributions. They also illustrated a practical utility of the aggregated measure
[Abellán and Moral, 2003b].
6.9. The disaggregated total uncertainty introduced in Section 6.8 was proposed and
investigated by Smith [2000].A part of his investigations is the proof of the proper
range of the generalized Shannon entropy that is presented in Appendix D.
6.10. The importance of both S and S̄ is discussed in Kapur et al. [1995] and in Kapur
[1994, Chapter 23]. Also discussed in these references is the role of the difference
S̄ - S. More recently, Abellán and Moral [2004, 2005] investigated further properties of this difference and described an algorithm for calculating the value of
S for any lower probability that is 2-monotone. They also suggested using the difference as a measure of nonspecificity. This suggestion led to the alternative view
of disaggregated total uncertainty, which is discussed in Section 6.10.
•
EXERCISES
6.1. Derive Eq. (6.3) from Eq. (6.2).
6.2. For each of the possibility profiles on X in Table 6.5a, calculate:
(a) The U-uncertainty by Eq. (6.2)
(b) The U-uncertainty by Eq. (6.6)
(c) The aggregate measure of uncertainty AU
6.3. For each pair of marginal possibility profiles on X and Y in Table 6.5b,
calculate
(a) The marginal U-uncertainties;
(b) The joint U-uncertainty under the assumption of noninteraction of
the marginal possibility profiles.
6.4. Express each of the marginal possibility profiles in Table 6.5b as a body
of evidence in DST and calculate its nonspecificity. Then, determine by
256
6. MEASURES OF UNCERTAINTY AND INFORMATION
Table 6.5. Supporting Information for Various Exercises
X
1
r
r
3
r
4
r
2
x1
x2
x3
x4
x5
x6
x7
x8
1.0
0.4
0.6
0.7
0.0
0.5
0.6
0.9
0.3
0.5
0.5
1.0
0.7
1.0
0.4
1.0
0.9
0.8
0.3
1.0
1.0
0.8
1.0
0.9
0.5
0.6
1.0
0.7
0.5
0.5
1.0
0.5
(a)
X
1
rX
rX
3
rX
2
x1
x2
x3
0.6
1.0
1.0
1.0
0.5
0.3
0.8
0.0
0.2
1
2
rY
1.0
1.0
0.5
3
rY
0.4
0.7
1.0
rY
0.9
1.0
0.8
y1
y2
y3
Y
(b)
X
Y
X
1
r
x1
x2
x3
y1
y2
y3
y4
0.0
0.8
0.6
0.4
0.5
1.0
0.7
0.6
0.7
0.9
0.8
0.8
x4
2
r
x1
x2
x3
x4
0.0
0.7
1.0
1.0
y1
y2
y3
y4
1.0
0.7
0.7
1.0
0.5
0.0
0.4
0.8
0.6
0.0
0.5
0.9
1.0
0.8
0.7
1.0
Y
X
Y
3
r
x1
x2
x3
x4
y1
y2
y3
y4
0.3
0.5
0.5
0.3
0.6
0.7
0.8
0.9
0.9
0.8
0.7
0.6
1.0
0.9
0.7
0.4
(c)
the calculus of DST for each pair of these marginal bodies of evidence
on X and Y the corresponding joint body of evidence under the assumption of their noninteraction and calculate:
(a) Its nonspecificity (compare these values with their counterparts in
Exercise 6.3b);
(b) Its aggregate measure of uncertainty AU.
EXERCISES
257
6.5. For each of the joint possibility profiles in Table 6.5c, calculate
(a) The joint U-uncertainty and its associated marginal U-uncertainties;
(b) Both conditional U-uncertainties;
(c) The information transmission.
6.6. For each of the joint possibility profiles in Table 6.5c, calculate:
(a) The aggregate measure of uncertainty AU and its marginal
counterparts;
(b) The total disaggregated uncertainty TU and its marginal
counterparts.
6.7. Repeat Exercise 6.5 for the joint possibility profile derived from
the respective marginal possibility profiles under the assumption of
noninteraction.
6.8. For each of the possibility profiles in Table 6.5a and some k ≥ 3, apply
the branching axiom in Eq. (6.24) for calculating the U-uncertainty in
two stages.
6.9. Let the range of real-valued variable x be [0, 100]. Assume that we are
able to assess the value of the variable only approximately in terms of
the possibility profile
x-5
ÏÔ
r ( x) = Ì1 - 3
ÔÓ0
when x Œ[2, 8]
otherwise.
Plot the possibility profile and calculate:
(a) The nonspecificity of this assessment;
(b) The amount of information obtained by this assessment.
6.10. Repeat Exercise 6.9 for the following possibility profiles and ranges of
the variable
1 - x2
when x Œ[-1, 1]
(a) x Œ [-10, 10] and r ( x) =
0
otherwise;
{
Ïx 2
Ô
2
(b) x Œ [0, 20] and r ( x) = Ì(2 - x)
Ô0
Ó
(c) x Œ [0, 10] and r(x) = 1/(1 + 10(x
when x Œ[0, 1]
when x Œ[2, 2]
otherwise;
- 2)2).
6.11. For each of the Möbius representations of the joint lower probability
functions on subsets X ¥ Y = {x1, x2} ¥ {y1, y2} that are defined in Table
6.6, where zij = ·xi, yjÒ for all i, j = 1, 2, calculate:
(a) The generalized Hartley measure GH(X ¥ Y) defined by Eq. (6.38);
(b) The generalized Shannon measure GS(X ¥ Y) defined by Eq. (6.64).
258
6. MEASURES OF UNCERTAINTY AND INFORMATION
Table 6.6. Möbius Representations Employed in Exercises 6.11–6.13
C:
z11
z12
z21
z22
m1(C)
m2(C)
m3(C)
m4(C)
m5(C)
m6(C)
0
1
0
0
0
1
1
1
0
0
0
1
1
1
0
1
0
0
1
0
0
1
0
0
1
1
0
1
1
0
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
1
1
1
0
0
0
0
1
0
0
1
0
1
1
0
1
1
1
1
0.0
0.1
0.1
0.2
0.4
0.0
0.0
0.0
0.0
0.0
0.0
0.1
0.1
0.0
0.1
-0.1
0.0
0.0
0.1
0.1
0.2
0.0
0.2
0.2
0.1
0.1
0.3
0.0
0.0
-0.2
-0.1
0.0
0.0
0.0
0.1
0.0
0.2
0.3
0.0
-0.2
-0.1
0.0
0.3
0.0
0.2
0.1
0.1
0.0
0.0
0.1
0.1
0.0
0.3
0.0
0.0
0.0
0.2
0.0
0.1
0.1
0.1
0.1
0.1
-0.2
0.0
0.0
0.0
0.0
0.0
0.3
0.0
0.0
0.0
0.0
0.5
0.2
0.2
0.1
0.1
-0.4
0.00
0.15
0.03
0.00
0.10
0.11
0.00
0.30
0.00
0.07
0.00
0.00
0.22
0.00
0.00
0.02
6.12. Repeat Exercise 6.11 for:
(a) The generalized marginal Hartley and Shannon measures;
(b) The generalized conditional Hartley and Shannon measures;
(c) The information transmissions based on the generalized Hartley and
Shannon measures.
6.13. Repeat Exercise 6.11 by calculating the following:
(a) The measure of dissonance defined by Eq. (6.41);
(b) The measure of confusion defined by Eq. (6.42);
(c) The measure of discord defined by Eq. (6.51);
(d) The measure of strife defined by Eq. (6.55);
(e) The measure of average Shannon entropy defined by Eq. (6.59).
6.14. Compare S̄(m), TU(m), and aTU(m) for the simple body of evidence
illustrated in Figure 6.9.
6.15. For each of the convex sets of probability distributions discussed in
Example 6.6 (see also Table 6.1), compare S̄(iD), TU(iD), and aTU(iD).
6.16. Assume that a given body evidence ·F, mÒ consists of pairwise disjoint
focal sets. Show that under this assumption
GS(m) =
 m( A) log m( A),
A ŒF
where GS is the generalized Shannon entropy defined by S.
EXERCISES
259
6.17. Consider the convex sets of probability distributions on X = {x1, x2, x3}
defined by convex hulls of the following extreme points:
(a) 1p = ·0.4, 0.5, 0.1Ò; 2p = ·0.6, 0.3, 0.1Ò;
3
p = ·0.6, 0.2, 0.2Ò; 4p = ·0.4, 0.2, 0.4Ò;
5
p = ·0.1, 0.5, 0.4Ò
1
(b) p = ·0.4, 0.5, 0.1Ò; 2p = ·0.4, 0.2, 0.4Ò;
3
p = ·0.1, 0.5, 0.4Ò
1
(c) p = ·0.4, 0.5, 0.1Ò; 2p = ·0.6, 0.2, 0.2Ò;
3
p = ·0.4, 0.2, 0.4Ò
1
(d) p = ·0.4, 0.5, 0.1Ò; 2p = ·0.4, 0.2, 0.4Ò
(e) 1p = ·0, 0.2, 0.8Ò; 2p = ·0.2, 0.2, 0.8Ò
3
p = ·0.5, 0.1, 0.5Ò; 4p = · 0, 0.5, 0.5Ò
1
(f) p = ·0, 0.5, 0.5Ò; 2p = ·0.5, 0, 0.5Ò;
3
p = ·0.5, 0.5, 0.5Ò
(g) 1p = ·0, 1, 0Ò; 2p = ·0, 0.5, 0.5Ò;
3
p = ·0.5, 0.5, 0Ò
(h) 1p = ·0, 1, 0Ò; 2p = ·1, 0, 0Ò; 3p = ·0.4, 0.4, 0.4Ò
For each of the convex sets of probability distribution functions, calculate TU and aTU.
6.18. Calculate TU and aTU for the interval-valued probability distribution
I = ·[0, 0.2], [0, 0.6], [0.3, 0.5], [0.2, 0.4]Ò on X = {x1, x2, x3, x4}.
7
FUZZY SET THEORY
Vagueness is no more to be done away with in the world of logic than friction in
mechanics.
—Charles Sanders Peirce
7.1. AN OVERVIEW
The notion of the most common type of fuzzy sets, referred to as standard
fuzzy sets, is introduced in Section 1.4. Recall that each standard fuzzy set is
uniquely defined by a membership function of the form
A : XÆ [ 0, 1],
where X is the universal set of concern. For each x Œ X, the value A(x)
expresses the degree (or grade) of membership of the element x of X in standard fuzzy set A. For the sake of notational simplicity, the symbol of a given
membership function, A, is also employed as a label of the standard fuzzy set
defined by this function. It is obvious that no ambiguity is introduced by this
double use of the same symbol.
Recall also from Chapter 1 that classical sets, when viewed as special fuzzy
sets, are usually referred to as crisp sets. It is common in the literature to
describe a standard fuzzy set A on a finite universal set X by the special form
A = a1 x1 + a 2 x 2 + . . . + a n x n ,
Uncertainty and Information: Foundations of Generalized Information Theory, by George J. Klir
© 2006 by John Wiley & Sons, Inc.
260
7.1. AN OVERVIEW
261
where xi Œ X for all i Œ ⺞n and each ai denotes the degree of membership of
element xi in A. Each slash is employed in this form to link an element of X
with its degree of membership in A, and the plus signs indicate that the listed
pairs collectively form the definition of the set A. The pairs with zero membership degrees are usually not listed.
The purpose of this chapter is to introduce the fundamentals of fuzzy set
theory. This background is needed for understanding how the two classical
uncertainty theories (surveyed in Chapters 2 and 3) and their generalizations
based on the various types of monotone measures (discussed in Chapters 4–6)
can be fuzzified. In other words, this background is needed to understand how
these various uncertainty theories, all described in terms of the formalized
language of classical set theory, can be further generalized via the more expressive formalized languages of fuzzy set theory. This generalization is manifested in the 2-dimensional array in Figure 1.3 by the horizontal expansion of
column 1.
The common feature of all fuzzy sets is that the membership of any relevant object in any fuzzy set is a matter of degree. However, there are distinct
ways of expressing membership degrees, which result in distinct categories of
fuzzy sets. In standard fuzzy sets, membership degrees are expressed by real
numbers in the unit interval. In other, nonstandard fuzzy sets, they may
be expressed by intervals of real numbers, partially ordered qualitative
descriptors of membership degrees, and in numerous other ways. Given any
particular category of fuzzy sets, operations of set intersection, union, and complementation are not unique. Further distinctions within the category can thus
be made by choosing various specific operations from the class of possible
operations. Each choice induces an algebraic structure of some type on the
given category of fuzzy sets. These algebraic structures are always weaker than
Boolean algebra, that is, they are non-Boolean. The term “fuzzy set theory”
thus stands for a collection of theories, each dealing with fuzzy sets in a particular category by specific operations and, consequently, based on a nonBoolean algebraic structure of some type. A fuzzified uncertainty theory is
then obtained by formalizing a monotone measure of some type in terms of
this algebraic structure, as illustrated in Figure 7.1.
Standard fuzzy sets have been predominant in the literature and, moreover,
virtually all fuzzifications of the various uncertainty theories that are currently
described in the literature are based on standard fuzzy sets. In this chapter it
is thus natural to examine standard fuzzy sets in more detail than other types
of fuzzy sets. However, the amount of research work on the various nonstandard categories of fuzzy sets has visibly increased during the last few years,
primarily in response to emerging applications needs. Since the aim of generalized information theory (GIT) is to expand the development of uncertainty
theories in both the dimensions depicted in Figure 1.3, it is essential to survey
in this chapter those nonstandard types of fuzzy sets that have been proposed
in the literature. However, this survey, which is the subject of Section 7.8, is
only relevant to future research in GIT. It is not needed for understanding the
262
7. FUZZY SET THEORY
Non-boolean
algebra of
some type
Monotone
measure of
some type
Fuzzified
uncertainty
theory
Figure 7.1. Components of a fuzzified uncertainty theory.
fuzzified theories of uncertainty examined in Chapter 8, which are, by and
large, based on standard fuzzy sets.
7.2. BASIC CONCEPTS OF STANDARD FUZZY SETS
Since this section (and the subsequent Sections 7.3–7.7) deal only with standard fuzzy sets, the adjective “standard” is omitted for the sake of linguistic
simplicity.
As in classical set theory, the concept of subsethood is one of the most fundamental concepts in fuzzy set theory. Contrary to classical set theory,
however, this concept has two distinct meanings in fuzzy set theory. One of
them is a crisp subsethood, the other one is a fuzzy subsethood. These two concepts of subsethood are introduced as follows.
Given two fuzzy sets A, B defined on the same universal set X, A is said to
be a subset of B if and only if
A(x) £ B(x)
for all x Œ X. The usual notation, A 債 B, is used to signify the subsethood relation. According to this definition, clearly, set A is either a subset of set B or it
is not a subset of B (and, similarly, the other way around). Therefore, this definition of subsethood is called a crisp subsethood. The set of all fuzzy subsets
of X (defined in this way) is called the fuzzy power set of X and is denoted by
F(X). Observe that this set is crisp, even though its members are fuzzy sets.
Moreover, this set is always infinite, even if X is finite.
When X is a finite universal set, a fuzzy subsethood relation, Sub, is defined
by specifying for each pair of fuzzy sets on X, A and B, a degree of subsethood, Sub(A, B), by the formula
7.2. BASIC CONCEPTS OF STANDARD FUZZY SETS
Sub( A, B) =
 A(x) -  max{0, A(x) - B(x)}
 A(x)
x ŒX
x ŒX
263
(7.1)
x ŒX
The negative term in the numerator describes the sum of the degrees to which
the subset inequality A(x) £ B(x) is violated, the positive term describes the
largest possible violation of the inequality, the difference in the numerator
describes the sum of the degrees to which the inequality is not violated, and
the term in the denominator is a normalizing factor to obtain the range
0 £ Sub( A, B) £ 1.
When sets A and B are defined on a bounded subset of real numbers (i.e., X
is a closed interval of real numbers), the three S terms in Eq. (7.1) are replaced
with integrals over X.
For any fuzzy set A defined on a finite universal set X, its scalar cardinality, |A|, is defined by the formula
A=
 A(x).
(7.2)
x ŒX
Scalar cardinality is sometimes referred to in the literature as a sigma count.
Among the most important concepts of standard fuzzy sets are the concepts
of an a-cut and a strong a-cut. Given a fuzzy set A defined on X and a particular number a in the unit interval [0, 1], the a-cut of A, denoted by aA, is a
crisp set that consists of all elements of X whose membership degrees in A
are greater than or equal to a. This can formally be written as
a
A = {x A(x) ≥ a }.
The strong a-cut, a+A, has a similar meaning, but the condition “greater than
or equal to” is replaced with the stronger condition “greater than.” Formally,
A = {x A(x) > a }.
a+
The set 0+A is called the support of A and the set 1A is called the core of A.
When the core A is not empty, A is called normal; otherwise, it is called subnormal. The largest value of A is called the height of A and it is denoted by
hA. The set of distinct values A(x) for all x ŒX is called the level set of A and
it is denoted by LA.
All the introduced concepts are illustrated in Figure 7.2. We can see that
A Õ a 2A
a1
and
a1 +
A Õ a 2+ A
when a1 ≥ a2. This implies that the set of all distinct a-cuts (as well as strong
a-cuts) is always a nested family of crisp sets. When a is increased, the new a-
264
7. FUZZY SET THEORY
A(x)
1
a1
hA = 1
(A is normal)
a2
0
Core
a1
a2
x
A
A
Support
Figure 7.2. Illustration of some basic characteristics of fuzzy sets.
cut (strong a-cut) is always a subset of the previous one. Clearly, 0A = X and
1+
A = ⭋.
It is well established that each fuzzy set is uniquely represented by the associated family of its a-cuts via the formula
A(x) = sup{a ◊ aA(x) a Œ[ 0, 1]},
(7.3)
or by the associated family of its strong a-cuts via the formula
A(x) = sup{a ◊ a +A(x) a Œ[ 0, 1]},
(7.4)
where sup denotes the supremum of the respective set and aA (or a +A) denotes
for each a Œ [0, 1] the special membership function (characteristic function)
of the a-cut (or strong a-cut, respectively).
EXAMPLE 7.1. To illustrate the meaning of Eq. (7.3) consider a fuzzy set A
defined on ⺢ by the simple stepwise membership function shown in Figure
7.3a. Since L(A) = {0, 0.3, 0, 0.6, 1} and 0 · 0A(x) = 0 for all x Œ ⺢, A is fully
represented by the three special fuzzy sets a · aA(a = 0.3, 0.6, 1), whose membership functions are shown in Figure 7.3b. The membership function A is
uniquely reconstructed from these three membership functions by taking for
each x Œ ⺢ their supremum. The same can clearly be accomplished by the
strong a-cuts and Eq. (7.4).
265
7.2. BASIC CONCEPTS OF STANDARD FUZZY SETS
1
0.3 ◊ 0.3A(x)
0.3
1
0
x
1
0.6
0.6 ◊0.6A(x)
A(x) 0.3
0.6
0
1A
0.6A
x
0
x
0.3A
1
1 ◊ A(x)
1
(a)
0
x
(b)
Figure 7.3. Illustration of the a-cut representation of Eq. (7.3): (a) given fuzzy set A; (b) decomposition of A into special fuzzy sets.
EXAMPLE 7.2. An alternative way to illustrate the meaning of Eq. (7.3) is
to use a fuzzy set defined on a finite universal set. Let
A = 0.3 x1 + 0.5 x 2 + 1.0 x3 + 1.0 x 4 + 0.8 x5 + 0.5 x 6 .
This fuzzy set is fully represented by the following four special fuzzy sets:
266
7. FUZZY SET THEORY
0.3 ◊ 0.3 A = 0.3 x1 + 0.3 x 2 + 0.3 x3 + 0.3 x 4 + 0.3 x5 + 0.3 x 6
0.5 ◊ 0.5 A = 0.0 x1 + 0.5 x 2 + 0.5 x3 + 0.5 x 4 + 0.5 x5 + 0.5 x 6
0.8 ◊ 0.8 A = 0.0 x1 + 0.0 x 2 + 0.8 x3 + 0.8 x 4 + 0.8 x5 + 0.0 x 6
1 ◊ 1 A = 0.0 x1 + 0.0 x 2 + 1.0 x3 + 1.0 x 4 + 0.0 x5 + 0.0 x 6 .
Now taking for each xi the maximum value in these sets, we readily obtain the
original fuzzy set A.
The significance of the a-cut (or strong a-cut) representation of fuzzy sets
is that it connects fuzzy sets with crisp sets. While each crisp set is a collection
of objects that are conceived as a whole, each fuzzy set is a collection of nested
crisp sets that are also conceived as a whole. Fuzzy sets are thus wholes of a
higher category.
The a-cut representation of fuzzy sets allows us to extend the various properties of crisp sets, established in classical set theory, into their fuzzy counterparts. This is accomplished by requiring that the classical property be satisfied
by all a-cuts of the fuzzy set concerned. Any property that is extended in this
way from classical set theory into the domain of fuzzy set theory is called a
cutworthy property. For example, when convexity of fuzzy sets is defined by
the requirement that all a-cuts of a fuzzy convex set be convex sets in the classical sense, this conception of fuzzy convexity is cutworthy. Other important
examples are the concepts of a fuzzy partition, fuzzy equivalence, fuzzy compatibility, and various kinds of fuzzy orderings that are cutworthy.
It is important to realize that many (perhaps most) properties of fuzzy sets,
perfectly meaningful and useful, are not cutworthy. These properties cannot
be derived from classical set theory via the a-cut representation.
7.3. OPERATIONS ON STANDARD FUZZY SETS
As is well known, each of the three basic operations on sets—complementation, intersection, and union—is unique in classical set theory. However, their
counterparts in fuzzy set theory are not unique. Each of them consists of a
class of functions that satisfy certain requirements. In this section, only some
main features of these classes of functions are described.
7.3.1. Complementation Operations
Complementation operations on fuzzy sets are functions of the form
c : [ 0, 1] Æ [ 0, 1]
that are order reversing and such that c(0) = 1 and c(1) = 0. Moreover, they
are usually required to possess the property
7.3. OPERATIONS ON STANDARD FUZZY SETS
c (c (a)) = a
267
(7.5)
for all a Œ [0, 1]; this property is called an involution. Given a fuzzy set A and
a particular complementation function c, the complement of A with respect to
c, cA, is defined for all x Œ X by the formula
A(x) = c ( A(x)).
(7.6)
c
An example of a practical class of involutive complementation functions,
cl, is defined for each a Œ [0, 1] by the formula
1 l
c l (a) = (1 - a l ) ,
l > 0,
(7.7)
where l is a parameter whose values specify individual complements in
this class. When l = 1, the resulting complement is usually called a standard
complement.
Which of the possible complements to choose is basically an experimental
question. The choice is determined in the context of each particular application by eliciting the meaning of negating a given concept. This can be done
by employing a suitable parametrized class of complementation functions,
such as the one defined by Eq. (7.7), and determining a proper value of the
parameter.
7.3.2. Intersection and Union Operations
Intersection and union operations on fuzzy sets are defined in terms of functions i and u, respectively, which assign to each pair of numbers in the unit
interval [0, 1] a single number in [0, 1]. These functions, i and u, are commutative, associative, monotone nondecreasing, and consistent with the characteristic functions of classical intersections and unions, respectively. They are
often referred to in the literature as triangular norms (or t-norms) and triangular conorms (or t-conorms). It is well know that the inequalities
i min (a, b) £ i(a, b) £ min(a, b),
(7.8)
max(a, b) £ u(a, b) £ u max (a, b)
(7.9)
are satisfied for all a, b Œ [0, 1], where
Ïmin(a, b)
i min (a, b) = Ì
Ó0
when max(a, b) = 1
otherwise,
(7.10)
Ïmax(a, b)
u max (a, b) = Ì
Ó1
when min(a, b) = 0
otherwise.
(7.11)
268
7. FUZZY SET THEORY
Operations expressed by min and max are usually called standard operations; those expressed by imin and umax are called drastic operations.
Given fuzzy sets A and B, defined on the same universal set X, their intersections and unions with respect to i and u, A «i B and A »u B, are defined
for each x Œ X by the formulas
( A « i B)(x) = i[ A(x), B(x)],
(7.12)
( A » u B)(x) = u[ A(x), B(x)].
(7.13)
It is useful to capture the full range of intersections and unions, as expressed
by the inequalities (7.8) and (7.9), by suitable classes of functions. For example,
all functions in the classes
{[
l
il (a, b) = 1 - min 1, (1 - a) + (1 - b)
{
1 l
ul (a, b) = min 1, (a l + b l )
]
l 1 l
},
},
(7.14)
(7.15)
where l > 0, qualify as intersection and unions, respectively, and cover the
whole range by varying the parameter l. The standard operations are obtained
in the limit for l Æ •, while the drastic operations are obtained in the limit
for l Æ 0. In each particular application, a fitting operation is determined by
selecting appropriate values of the parameter l by knowledge acquisition
techniques.
7.3.3. Combinations of Basic Operations
In classical set theory, the operations of complementation, intersection, and
union possess the properties listed in Table 1.1. For each universal set X, these
are properties of the Boolean algebra defined on the power set of X. The operations of intersection and union are dual with respect to the operation of complementation in the sense that they satisfy for any pair of subsets of X, A, and
B, the De Morgan laws
A« B = A» B
and
A » B = A « B.
It is desirable that this duality be satisfied for fuzzy sets as well. It is obvious
that only some combinations of t-norms, t-conorms, and complementation
operations on fuzzy sets can satisfy this duality. We say that a t-norm i and a
t-conorm u are dual with respect to a complementation operation c if and only
if for all a, b Œ [0, 1]
c[i(a, b)] = u[ c (a), c (b)]
(7.16)
7.3. OPERATIONS ON STANDARD FUZZY SETS
269
and
c[u(a, b)] = i[ c (a), c (b)].
(7.17)
These equations describe the De Morgan laws for fuzzy sets. Triples ·i, u, cÒ
that satisfy these equations are called De Morgan triples.
None of the De Morgan triples satisfies all properties of the Boolean
algebra on P(X). Depending on the operations used, various weaker algebraic
structures are obtained. In some of them, for example, the laws of contradiction and excluded middle are violated. In others, these laws are preserved, but
the laws of distributivity are violated.
7.3.4. Other Operations
Two additional types of operations are applicable to fuzzy sets, but they have
no counterparts in classical set theory. They are called modifiers and averaging operations.
Modifiers are unary operations that are order preserving. Their purpose is
to modify fuzzy sets that represent linguistic terms to account for linguistic
hedges such as very, fairly, extremely, more or less, and the like. The most
common modifiers either increase or decrease all values of a given membership function. A convenient class of functions, ml, that qualify as increasing or
decreasing modifiers is defined for each a Œ [0, 1] by the simple formula
ml (a) = a l ,
(7.18)
where l > 0 is a parameter whose value determines which way and how
strongly ml modifies a given membership function. Clearly, ml(a) > a when
l Œ (0, 1), ml(a) < a when l Œ (1, •), and ml(a) = a when l = 1. The farther
the value of l from 1, the stronger the modifier ml.
Averaging operations are monotone nondecreasing and idempotent, but are
not associative. Due to the lack of associativity, they must be defined as functions of n arguments for any n ≥ 2. It is well known that any averaging operation, h, satisfies the inequalities
min(a1 , a 2 , . . . , a n ) £ h(a1 , a 2 , . . . , a n ) £ max(a1 , a 2 , . . . , a n )
(7.19)
for any n-tuple (a1, a2, . . . , an) Œ [0, 1]n. This means that the averaging operations fill the gap between intersections (t-norms) and unions (t-conorms).
One class of averaging operations, hl, which covers the entire interval
between min and max operations, is defined for each n-tuple (a1, a2, . . . , an) in
[0, 1]n by the formula
l
l
l
Ê a1 + a 2 + . . . + a n ˆ
hl (a1 , a 2 , . . . , a n ) =
Ë
¯
n
1 l
,
(7.20)
270
7. FUZZY SET THEORY
where l is a parameter whose range is the set of all real numbers except 0.
For l = 0, function hl is defined by the limit
1 n
lim hl (a1 , a 2 , . . . , a n ) = (a1 , a 2 , . . . , a n ) ,
lÆ 0
(7.21)
which is the well-known geometric mean. Moreover
lim hl (a1 , a 2 , . . . , a n ) = min(a1 , a 2 , . . . , a n ),
(7.22)
lim hl (a1 , a 2 , . . . , a n ) = max(a1 , a 2 , . . . , a n ).
(7.23)
lÆ-•
lÆ-•
This indicates that the standard operations of intersection and union also may
be viewed as extreme opposites in the range of averaging operations.
Other classes of averaging operations are now available, some of which use
weighting factors to express the relative importance of the individual fuzzy
sets involved. For example, the function
h(a i , w i i = 1, 2, . . . , n) =
n
Âw a ,
i i
i =1
where the weighting factors wi usually take values in the unit interval [0, 1]
and
n
Âw
i
= 1,
i =1
expresses for each choice of values wi the corresponding weighted average of
values ai(i = 1, 2, . . . , n). Again, the choice is an experimental issue.
7.4. FUZZY NUMBERS AND INTERVALS
Fuzzy sets that are defined on the set of real numbers, ⺢, have a special significance in fuzzy set theory. Among these sets, the most important are cutworthy fuzzy intervals. They are defined by requiring that each a-cut be a
single closed and bounded interval of real numbers for all a Œ (0, 1]. A fuzzy
interval, A, may conveniently be represented for each x Œ ⺢ by the canonical
form
Ï f A (x)
ÔÔ 1
A(x) = Ì
Ô g A (x)
ÔÓ 0
when x Œ[ a, b)
when x Œ[ b, c )
when x Œ (c , d]
otherwise,
(7.24)
7.4. FUZZY NUMBERS AND INTERVALS
271
where a, b, c, d are specific real numbers such that a £ b £ c £ d, fA is a realvalued function that is increasing, and gA is a real-valued function that is
decreasing. In most applications, functions fA and gA are continuous, but, in
general, they may be only semicontinuous from the right and left, respectively.
When A(x) = 1 for exactly one x Œ ⺢ (i.e., b = c in the canonical representation), A is called a fuzzy number.
Some common shapes of membership functions of fuzzy numbers or intervals are shown in Figure 7.4. Each of them represents, in a particular way, a
fuzzy set of numbers described in natural language as “close to 1” (or “around
1”). Whether a particular membership function is appropriate for representing this linguistic description can be determined only in the context of each
given application of the linguistic expression. Usually, however, the membership function that is supposed to represent a given linguistic expression in the
context of a given application is constructed from the way in which the linguistic expression is interpreted in this application. The issue of constructing
membership functions is addressed in Section 7.9.
In practical applications, the most common shapes of membership functions
of fuzzy intervals are the trapezoidal ones, illustrated by the membership function in Figure 7.2 and also function B in Figure 7.4. They are easy to represent
and manipulate. Each trapezoidal-shaped membership function is uniquely
defined by the four real numbers a, b, c, d, in Eq. (7.24). Defining a trapezoidal
fuzzy interval A by the quadruple
A = a, b, c , d
means that
f A (x) =
x-a
b-a
and
g A (x) =
d-x
d-c
in Eq. (7.24). Clearly, triangular-shaped fuzzy numbers are special cases of the
trapezoidal-shaped fuzzy intervals in which b = c.
For any fuzzy interval A expressed in the canonical form, the a-cuts of A
are expressed for all a Œ (0, 1] by the formula
a
-1
-1
Ï[ f A (a ), g A (a )]
A= Ì
Ó [ b, c ]
when a Œ (0, 1),
when a = 1,
(7.25)
where f A-1 and gA-1 are the inverse functions of fA and gA, respectively. When a
membership function of fuzzy intervals has a trapezoidal shape, such as the
function A in Figure 7.2, the a-cuts can readily be expressed in terms of the
four real numbers a, b, c, d by the formula
A = [ a + (b - a)a , d - (d - c )a ].
a
(7.26)
272
7. FUZZY SET THEORY
1
1
A
B
0
1
2
1
0
1
2
0
1
2
0
1
2
1
C
D
0
1
2
1
1
F
E
0
1
2
Figure 7.4. Examples of fuzzy sets of numbers that are “close to 1” (or “around 1”).
7.4. FUZZY NUMBERS AND INTERVALS
273
EXAMPLE 7.3. Determine the a-cut representation of a fuzzy number F
shown in Figure 7.4, whose membership function is defined for each x Œ ⺢ by
the formula
2
Ïx
Ô
2
F (x) = Ì(2 - x)
Ô0
Ó
when x Œ[ 0, 1)
when x Œ[1, 2]
otherwise.
In this case, fF(x) = x2 and gF(x) = (2 - x)2. Moreover, fF (1) = gF (1) = 1. For each
a Œ [0, 1], the left-end point, x, in aF is a function of a that is determined
by the positive root of the equation a = x2, and the right-end point, x–, in
a
F is a function of a that is determined by the positive root of the equation
a = (2 - x–)2. Hence,
a
F = [ a ,2 - a ]
for all a Œ (0, 1]
7.4.1. Standard Fuzzy Arithmetic
Consider fuzzy intervals A and B whose a-cut representations are
a
A = a[ a, a ],
a
B = a[ b, b ],
where a, ā, b, b̄ are functions of a. Then, the individual arithmetic operations
on these fuzzy intervals are defined for all a Œ (0, 1] in terms of the wellestablished arithmetic operations on the closed intervals of real numbers
by the formula
a
( A * B) = aA * aB,
where * denotes any of the four basic arithmetic operations (addition, subtraction, multiplication, and division). That is, arithmetic operations on fuzzy
intervals are cutworthy. According to interval arithmetic,
a
( A * B) = {a * b a Œ aA, b Œ aB}
(7.27)
with the requirement that 0 Œ
/ aB for all a Œ (0, 1] when * stands for division.
a
The sets (A * B) (closed intervals of real numbers) defined by Eq. (7.27) can
be obtained for the individual operations from the end points of aA and aB in
the following ways:
274
7. FUZZY SET THEORY
a
a
( A + B) = a [ a + b, a + b],
(7.28)
a
( A - B) = a [ a - b, a - b];
(7.29)
( A ◊ B) = a [min(ab, ab , a b, ab), max(ab, ab , a b, ab)];
a
(7.30)
( A B) = a[min(a b, a b , a b , a b ), max(a b, a b , a b , a b )]. (7.31)
The operation of division, a(A/B), requires that 0 Œ
/ a[b, b̄] for all a Œ (0, 1].
Fuzzy intervals together with these operations are usually referred to as
standard fuzzy arithmetic. Algebraic expressions in which values of variables
are fuzzy intervals are evaluated by standard fuzzy arithmetic in the same
order as classical algebraic expressions are evaluated by classical arithmetic
on real numbers.
EXAMPLE 7.4. Consider the following two triangular-shape fuzzy numbers,
A and B, defined by the quadruples ·a, b, c, dÒ:
A = 1, 2, 3, 4
and
B = 2, 4, 4, 5 .
To perform arithmetic operations with these fuzzy numbers, we need to determine their a-cuts. For triangular fuzzy numbers, this can be done via Eq. (7.26):
A = [1 + a , 3 - a ]
B = [2 + 2a , 5 - a ]
a
a
Then, by using Eqs. (7.28)–(7.31), we obtain, for example:
( A + B) = [ 3 + 3a , 8 - 2a ],
( A - B) = [2a - 4, 1 - 3a ],
a
(B - A) = [ 3a - 1, 4 - 2a ],
a
( A ◊ B) = [(1 + a )(2 + 2a ), (3 - a )(5 - a )],
a
( A B) = [(1 + a ) (5 - a ), (3 - a ) (2 + 2a )],
a
(B A) = [(2 + 2a ) (3 - a ), (5 - a ) 1 + a ].
a
a
7.4.2. Constrained Fuzzy Arithmetic
It turns out that standard fuzzy arithmetic does not take into account constraints among fuzzy intervals employed in algebraic expressions. For example,
it does not distinguish between a(A * B), where aB happens to be equal to aA,
and a(A * A). In the latter case, both fuzzy intervals represent a value of the
same variable, and, consequently, they are strongly constrained: whatever
7.4. FUZZY NUMBERS AND INTERVALS
275
value we select in the first interval, the value in the second interval must be
the same. Taking this equality constraint into account, Eq. (7.26) must be
replaced with
a
( A * A) = {a * a a Œ aA},
Thus, for example, aA - aA = [0, 0] = 0 under the equality constraint, while
a
( A - A) = a [ a - a , a - a]
under standard fuzzy arithmetic.
By ignoring constraints among fuzzy intervals involved, standard fuzzy
arithmetic produces results that are, in general, deficient of information, even
though they are principally correct. To avoid this information deficiency, all
known constraints among fuzzy intervals in each application must be taken
into account. This is a subject of constrained fuzzy arithmetic.
EXAMPLE 7.5. To compare standard fuzzy arithmetic with constrained fuzzy
arithmetic, consider the simple equation a = b/(b + 1), where a and b are real
variables and b ≥ 0. Assume that, due to information deficiency, the actual
value of variable b in a given situation is known only imprecisely, by the triangular fuzzy number B in Example 7.4. Given this information, we want to
determine as precisely as possible the value of variable a. Clearly, this value is
expressed by another fuzzy number, A, such that A = B/(B + 1). To determine
A, we simply need to evaluate the expression on the right-hand side of this
equation. Consider first the standard fuzzy arithmetic. Then,
a
[ B + 1] = [2 + 2a , 5 - a ] + [1, 1] = [ 3 + 2a , 6 - a ].
Now, we take this result and use it to calculate the whole expression:
a
[ B (B + 1)] = [(2 + 2a ) (6 - a ), (5 - a ) (3 + 2a )] = aA.
As follows from Eq. (7.25), inverses of the left-end and right-end functions of
a
A are functions fA and gA of the canonical form Eq. (7.24) of A. The inverses
are obtained by solving the equations
2 + 2 f A (x)
=x
6 - f A (x)
and
5 - g A (x)
=x
3 + 2 g A (x)
for fA(x) and gA(x), respectively. The solutions are:
f A (x) =
6x - 2
2+x
and
g A (x) =
5 - 3x
.
2x + 1
276
7. FUZZY SET THEORY
1
0.8
0.6
0.4
0.2
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Figure 7.5. Illustration to Example 7.5.
We can easily determine from aA that the support of A is the open interval
(1/3, 5/3) and its core [0.8, 0.8]. The wide graph in Figure 7.5 shows the whole
membership function A obtained by standard fuzzy arithmetic. Its analytic
form is
Ï 6x - 2
Ô 2+x
ÔÔ
A(x) = Ì 5 - 3x
Ô 2x + 1
Ô
ÔÓ 0
when x Œ[1 3, 0.8]
when x Œ[ 0.8, 5 3]
otherwise.
Now let us repeat the procedure by using constrained fuzzy arithmetic. This
means to evaluate the expression B/(B + 1) globally and to recognize that the
two appearances of fuzzy number B in the expression represent the same variable b. This means, in turn, that whatever choice we make from any a-cut of
B, we must make the same choice for both the B in the numerator and the
one in the denominator. Since the function b/(b + 1) is clearly increasing for
all values of b ≥ 0, the minimum and maximum values for each a-cut of the
expression are obtained, respectively, for the left-end point and the right-end
point of aB. Hence,
a
[ B (B + 1)] = [(2 + 2a ) (3 + 2a ), (5 - a ) (6 - a )] = aA ,
and the inverses of the left-hand and right-hand functions of the a-cut representation of A are
7.4. FUZZY NUMBERS AND INTERVALS
f A (x) =
2 - 3x
2x - 2
and
g A (x) =
277
6x - 5
.
x-1
In this case, the support of A is the open interval (2/3, 5/6) and, again, the core
is [0.8, 0.8]. The whole function A is depicted in Figure 7.5 by the narrow graph.
Its analytic form is
Ï 2 - 3x
Ô2x - 2
ÔÔ
A(x) = Ì 6x - 5
Ô x-1
Ô
ÔÓ 0
when x Œ[2 3, 0.8]
when x Œ[ 0.8, 5 6]
otherwise.
We can clearly see that the result obtained by standard fuzzy arithmetic is considerably less precise than the one obtained by constrained fuzzy arithmetic.
This is caused by the fact that standard fuzzy arithmetic ignores the equality
constraint, which, in this case, contains a lot of information.
Observe that the given equation can be written as
a = 1-
1
.
b+1
In this form, the equality constraint is not applicable and, as can be easily verified, standard fuzzy arithmetic produces the same result we obtained by constrained fuzzy arithmetic. This demonstrates that results obtained by standard
fuzzy arithmetic are dependent on the formula by which a given function is
expressed.
For evaluating any arithmetic expression, Exp(A1, A2, . . . , An), where
Ai(i Œ ⺞) are symbols of fuzzy numbers or, more generally, fuzzy intervals
(approximating the associated real numbers ai), the equality constraint is utilized in the following way. First, all symbols that appear in the expression are
grouped by the variables they represent. That is, all symbols that represent the
same variable are placed in the same group. Each group thus consists of equal
fuzzy numbers or intervals that represent the same variable. They are distinguished only by their distinct roles in the expression. Assume that the expression represents a function of m variables vk whose values range over sets Vk(k
Œ ⺞m, m £ n). Then, there are m groups of symbols appearing in the expression, each of which is associated with a particular variable vk and the associated set of values Vk. The groups can be formally defined by a function
g : ⺞n Æ ⺞m
278
7. FUZZY SET THEORY
such that
a g ( i ) = vk
where symbol ai represents variable vk. Using this function, let Bk be a common
symbol for all the equal fuzzy numbers or intervals in group k(k Œ ⺞m). Then,
a
Exp( A1 , A2 , . . . , An ) = {Exp(a g(1) , a g( 2 ) , . . . , a g( n ) )
v1 , v2 , . . . , vm Œ { a B1 ¥ a B2 ¥ . . . ¥ a Bm}.
(7.32)
This equation is a formal description of the following equality rule: for
each group of fuzzy intervals denoted in an arithmetic expression by the
same symbol and each a Œ (0, 1], selections from these intervals are equal.
If the function represented by the expression is monotone increasing or
decreasing with respect to each variable vk(k Œ⺞m), then the left-end and rightend points of aExp can be easily expressed in terms of the left-end and rightend points of the intervals aB1, aB2, . . . , aBm. Otherwise, the minimum and
maximum in the set defined by Eq. (7.32) must be determined by some other
means.
Although the equality constraints are perhaps the most common, any other
constraints regarding fuzzy numbers or intervals in arithmetic expressions
must be taken into account. Each constraint represents some information.
Ignoring it inevitably results in the inflated imprecision of the resulting fuzzy
number or interval.
A particularly important constraint, bearing upon the fuzzifications of
uncertainty theories, is a probabilistic constraint. To illustrate this type of constraint, consider a sample space with two elementary events, A and B, whose
probabilities are pA and pB. Assume that we can only estimate these probabilities in terms of fuzzy numbers PA and PB defined on [0, 1]. Since it is
required that pA + pB = 1, arithmetic operations on PA and PB must take this
constraint into account. That is, we must require that PA + PB = 1.
EXAMPLE 7.6. As a specific example, let PA = ·0.2, 0.3, 0.3, 0.4Ò. By Eq. (7.26),
PA = [ 0.2 + 0.1a , 0.4 - 0.1a ]
a
for all a Œ (0, 1]. Then,
PB = 1 - aPA =[ 0. 6+ 0.1a , 0. 8 - 0.1a ] .
a
Graphs of PA and PB are shown in Figure 7.6a. Now, using standard fuzzy arithmetic, we obtain
a
(PA + aPB ) = [ 0.8 + 0.2a , 1.2 - 0.2a ]
7.4. FUZZY NUMBERS AND INTERVALS
1
PA + PB
PA
0
279
0.1
0.2
0.3
0.4
PB
0.5
0.6
0.7
0.8
(PA + PB )C
0.9
1
10.1 10.2
x
(a)
0.9
PB
0.8
0.7
C : pA + pB =1
0.6
0.5
0.4
0.3
pB
0.2
0.1
0.3 0.4
0.4 0.5 0.6
0.1 0.2
0.2 0.3
1
0.7
0.8
0.9
0
p
A
PA
1
(b)
Figure 7.6. Fuzzy arithmetic with the probabilistic constraint.
by Eq. (7.28). The graph of PA + PB is also shown in Figure 7.6a. This result is
obtained by ignoring the probabilistic constraint,
C : p A + pB = 1,
which is illustrated in Figure 7.6b. Under this constraint,
(PA + PB )C = 1,
280
7. FUZZY SET THEORY
as illustrated for the Cartesian product of the supports of PA and PB in Figure
7.6b, and this equality obviously holds for the Cartesian product aPA ¥ aPB for
any a Œ (0, 1] as well.
The probabilistic constraint not only affects the operation of addition, but
other arithmetic operations as well. Let
a
a
PA = [ a(a ), a (a )]
a
PB = [ b(a ), b (a )]
(PA * PB )C = [ s(a ), s (a )],
for all a Œ (0, 1], where * denotes any of the four arithmetic operations and
the subscript C indicates that the operation is performed under the probability constraint. Then
•
•
s(a) = min{a * b} such that a Œ [a(a), ā(a)], b Œ [b(a), b̄(a)], and a + b =
1;
s̄(a) = max{a * b} such that a Œ [a(a), ā(a)], b Œ [b(a), b̄(a)], and a + b =
1.
These optimization problems can be generalized in a fairly straightforward
way to more then two elementary events and to arbitrary expressions. In these
more general cases, however, both probabilistic and equality constraints are
often involved.
When dealing with lower and upper probabilities, the probabilistic constraints are expressed by appropriate inequalities, such as the inequalities in
Eqs. (5.61) and (5.62) when dealing with reachable interval-valued probability distributions. In the situation depicted in Figure 7.6b, for example, lower
probabilities are constrained to points that are under the diagonal C, while
upper probabilities are constrained to points that are over the diagonal.
7.5. FUZZY RELATIONS
When fuzzy sets are defined on universal sets that are Cartesian products of
two or more sets, we refer to them as fuzzy relations. Individual sets in the
Cartesian product of a fuzzy relation are called dimensions of the relation.
When n-sets are involved in the Cartesian product, we call the relation ndimensional (n ≥ 2). Fuzzy sets may be viewed as degenerate, one-dimensional
relations.
All concepts and operations applicable to fuzzy sets are applicable to fuzzy
relations as well. However, fuzzy relations involve additional concepts and
operations due to their multidimensionality. These additional operations
include projections, cylindric extensions, compositions, joins, and inverses of
7.5. FUZZY RELATIONS
281
fuzzy relations. Projections and cylindric extensions are applicable to any fuzzy
relations, whereas compositions, joins, and inverses are applicable only to
binary relations.
7.5.1. Projections and Cylindric Extensions
The operations of projection and cylindric extension are applicable to any ndimensional fuzzy relation (n ≥ 2). However, for the sake of simplicity, they
are discussed here in terms of 3-dimensional relations. A generalization to
higher dimensions is quite obvious.
Let R denote a 3-dimensional (ternary) fuzzy relation on X ¥ Y ¥ Z. A projection of R is an operation that converts R into a lower-dimensional fuzzy
relation, which in this case is either a 2-dimensional or 1-dimensional (degenerate) relation. In each projection, some dimensions are suppressed (not recognized) and the remaining dimensions are consistent with R in the sense that
each a-cut of the projection is a projection of the a-cut of R in the sense
of classical set theory. Formally, the three 2-dimensional projections of R on
X ¥ Y, X ¥ Z, and Y ¥ Z, RXY, RXZ, and RYZ, are defined for all x Œ X, y Œ Y,
z Œ Z by the following formulas:
RXY (x, y) = max R(x, y, z),
z ŒZ
RXZ (x, z) = max R(x, y, z),
y ŒY
RYZ ( y, z) = max R(x, y, z).
x ŒX
Moreover, the three 1-dimensional projections of R on X, Y, and Z, RX, RY,
and RZ, can then be obtained by similar formulas from the 2-dimensiional
projections:
RX (x) = max RXY (x, y)
y ŒY
= max RXZ (x, z)
z ŒZ
RY ( y) = max RXY (x, y)
x ŒX
= max RYZ ( y, z)
z ŒZ
RZ (z) = max RXZ (x, z)
x ŒX
= max RYZ ( y, z).
y ŒY
Any relation on X ¥ Y ¥ Z that is consistent with a given projection is called
an extension of the projection. The largest among the extensions is called a
cylindric extension. Let REXY and REX denote the cylindric extensions of projections RXY and RX, respectively. Then, REXY and REX are defined for all triples
·x, y, zÒ Œ X ¥ Y ¥ Z by the formula
282
7. FUZZY SET THEORY
REXY (x, y, z) = RXY (x, y),
REX (x, y, z) = RX (x).
Cylindric extensions of the other 2-dimensional and 1-dimensional projections
are defined in a similar way. This definition of cylindric extension for fuzzy
relations is a cutworthy generalization of the classical concept of cylindric
extension.
Given any set of projections of a given relation R, the standard fuzzy intersection of their cylindric extensions (expressed by the minimum operator) is
called a cylindric closure of the projections. This is again a cutworthy concept.
Regardless of the given projections, it is guaranteed that their cylindric closure
contains the fuzzy relation R.
EXAMPLE 7.7. To illustrate the concepts introduced in this section, let us
consider a binary relation R on X ¥ Y, where X = {x1, x2, . . . , x6} and Y = {y1,
y2, . . . , y10}, which is defined by the matrix
x1
x2
x3
R=
x4
x5
x6
y1
È 0.2
Í1.0
Í
Í 0.0
Í
Í 0.0
Í 0.1
Í
Î 0.0
y2
0.0
0.0
0.0
0.1
0.0
1.0
y3
1.0
0.0
0.8
0.0
0.5
0.0
y4
0.0
0.3
0.0
0.0
0.0
0.0
y5
0.0
0.0
0.4
0.0
0.0
0.2
y6
0.0
0.4
0.0
0.0
0.6
0.0
y7
1.0
0.0
0.1
0.0
0.0
1.0
y8
0.0
0.0
0.00
0.9
0.0
0.0
y9
0.0
1.0
0.0
0.7
0.0
0.0
y10
0.0 ˘
0.0 ˙˙
0.0 ˙
˙
0.5 ˙
0.0 ˙
˙
0.5 ˚
The projections of R into X and Y are obtained for all xi Œ X and for all yi Œ
Y by the equations
RX (x i ) = max R(x i , y j ),
y j ŒY
RY ( y j ) = max R(x i , y j ),
xi Œ X
respectively. It is convenient to express them as vectors
x1
R X = [1.0
x2
1.0
x3
0.8
x4
0.9
x5
0.6
x6
1.0],
y1
RY = [1.0
y2
1.0
y3
1.0
y4
0.3
y5
0.4
y6
0.6
y7
1.0
y8
0.9
y9
1.0
y10
0.5].
7.5. FUZZY RELATIONS
283
Now, the cylindric extensions REX and REY are obtained for all pairs ·xi, yjÒ Œ
·X ¥ YÒ by the equations
REX (x i , y j ) = RX (x i ),
REY (x i , y j ) = RY ( y j ),
and their matrix representations are
R EX
R EY
y1
x1 È1.0
x 2 Í1.0
Í
x3 Í 0.8
= Í
x 4 Í 0.9
x5 Í 0.6
Í
x 6 Î1.0
y2
1.0
1.0
0.8
0.9
0.6
1.0
y3
1.0
1.0
0.8
0.9
0.6
1.0
y4
1.0
1.0
0.8
0.9
0.6
1.0
y5
1.0
1.0
0.8
0.9
0.6
1.0
y6
1.0
1.0
0.8
0.9
0.6
1.0
y7
1.0
1.0
0.8
0.9
0.6
1.0
y8
1.0
1.0
0.8
0.9
0.6
1.0
y9
1.0
1.0
0.8
0.9
0.6
1.0
y10
1.0 ˘
1.0 ˙˙
0.8 ˙
˙
0.9 ˙
0.6 ˙
˙
1.0 ˚
y1
x1 È1.0
x 2 Í1.0
Í
x3 Í1.0
= Í
x 4 Í1.0
x5 Í1.0
Í
x 6 Î1.0
y2
1.0
1.0
1.0
1.0
1.0
1.0
y3
1.0
1.0
1.0
1.0
1.0
1.0
y4
0.3
0.3
0.3
0.3
0.3
0.3
y5
0.4
0.4
0.4
0.4
0.4
0.4
y6
0.6
0.6
0.6
0.6
0.6
0.6
y7
1.0
1.0
1.0
1.0
1.0
1.0
y8
0.9
0.9
0.9
0.9
0.9
0.9
y9
1.0
1.0
1.0
1.0
1.0
1.0
y10
0.5˘
0.5˙˙
0.5˙
˙
0.5˙
0.5˙
˙
0.5˚
The cylindric closure of projections RX and RY, Cyl{RX, RY}, is then obtained
by taking the standard intersection of these cylindric extensions. That is,
Cyl {R x , RY }(x i , yi ) = min{REX (x i , y j ), REY (x i , y j )}
for all pairs ·xi, yjÒ Œ ·X ¥ YÒ. We obtain
y1
x1 È1.0
x 2 Í1.0
Í
x3 Í 0.8
Cyl {RX , RY } = Í
x 4 Í 0.9
x5 Í 0.6
Í
x 6 Î1.0
y2
1.0
1.0
0.8
0.9
0.6
1.0
y3
1.0
1.0
0.8
0.9
0.6
1.0
y4
0.3
0.3
0.3
0.3
0.3
0.3
y5
0.4
0.4
0.4
0.4
0.4
0.4
y6
0.6
0.6
0.6
0.6
0.6
0.6
y7
1.0
1.0
0.8
0.9
0.6
1.0
y8
0.9
0.9
0.8
0.9
0.6
0.9
y9
1.0
1.0
0.8
0.9
0.6
1.0
y10
0.5˘
0.5˙˙
0.5˙
˙
0.5˙
0.5˙
˙
0.5˚
284
7. FUZZY SET THEORY
We can see that R 債 Cyl{RX, RY}, which is always the case, but this reconstructed relation from the projections RX and RY is in this case much larger
than the actual relation R. This means that information about R that is contained in the projection about R is in this case highly deficient.
7.5.2. Compositions, Joins, and Inverses
Consider two binary fuzzy relations P and Q that are defined on set X ¥ Y
and Y ¥ Z, respectively. Any such relations, which are connected via the
common set Y, can be composed to yield a relation on Y ¥ Z. The standard
composition of these relations, which is denoted by P ° Q, produces a relation
R on X ¥ Z defined by the formula
R(x, z) = (P o Q)(x, z) = max min{P(x, y), Q( y, z)}
y ŒY
(7.33)
for all pairs ·x, zÒ Œ X ¥ Z.
Other definitions of a composition of fuzzy relations, in which the min and
max operations are replaced with other t-norms and t-conorms, respectively,
are possible and useful in some applications. All compositions are associative:
(P o Q) o Q = P o (Q o Q).
However, the standard fuzzy composition is the only one that is cutworthy.
A similar operation on two connected binary relations, which differs from
the composition in that it yields a 3-dimensional relation instead of a binary
one, is known as the relational join. For the same fuzzy relations P and Q, the
standard relational join, P * Q, is a 3-dimensional relation on X ¥ Y ¥ Z defined
by the formula
R(x, y, z) = (P * Q)(x, y, z) = min{P(x, y), Q( y, z)}
(7.34)
for all triples ·x, y, zÒ Œ X ¥ Y ¥ Z. Again, the min operation in this definition
may be replaced with another t-norm. However, the relational join defined by
Eq. (7.34) is the only one that is cutworthy.
The inverse of a binary fuzzy relation R on X ¥ Y, denoted by R-1, is a relation on Y ¥ X such that
R -1 ( y, x) = R(x, y)
for all pairs ·y, xÒ Œ Y ¥ X. When R is represented by a matrix, R-1 is represented by the transpose of this matrix. This means that rows are replaced with
columns and vice versa. Clearly,
(R -1 )-1 = R
holds for any binary relation.
7.5. FUZZY RELATIONS
285
It is convenient to perform compositions of binary fuzzy relations in terms
of their matrix representations. Let
P = [ pij ],
Q = [ q jk ],
and
R = [rik ]
be matrix representations of fuzzy relations P, Q, R in Eq. (7.33). Then, using
this matrix notation, we can write
[ rik ] = [ pij ] o [ q jk ],
where
rik = max min{ pij , q jk }.
j
(7.35)
Observe that the same elements of matrices P and Q are used in the calculation of matrix R as would be used in the regular matrix multiplication, but the
product and sum operations are replaced, respectively, with the min and max
operations.
EXAMPLE 7.8. To illustrate the standard composition of fuzzy relations by
their matrix representations, let symbols P, Q, R have the same meaning as in
Eq. (7.33), where X = {x1, x2, x3, x4}, Y = {y1, y2, y3}, and Z = {z1, z2, z3, z4, z5}, and
consider the following example of the matrix representation of equation
P ° Q = R:
y1
y2
y3
z1
z2
z3
z4
z5
x1 È 0.0
0.3
0.4 ˘
y1 È 0.7
0.0
0.0
0.3
0.6 ˘
Í
x 2 Í 0.2
0.5
0.3 ˙˙
Í
o y2 0.5
0.5
1.0
0.4
0.0 ˙˙
Í
Í
˙
x3 0.8
0.0
0.0
y3 ÍÎ 0.0
0.7
0.2
0.9
0.0 ˙˚
Í
˙
x 4 Î 0.7
0.7
1.0 ˚
z1
x1 È 0.3
x 2 ÍÍ 0.5
=
x3 Í 0.7
Í
x 4 Î 0.7
z2
0.4
0.5
0.0
0.7
z3
0.3
0.5
0.0
0.7
z4
0.4
0.4
0.3
0.9
z5
0.0 ˘
0.2 ˙˙
.
0.6 ˙
˙
0.6 ˚
Following Eq. (7.35), we have, for example,
0.3(= r11 ) = max{min{ p11 , q11},min{ p12 , q 21},min{ p13 , q31}}
= min{0, 0.7},min{0.3, 0.5},min{0.4, 0}}
0.4(= r24 ) = max{min{ p21 , q14 },min{ p22 , q 24 },min{ p23 , q34 }}
= min{0.2, 0.3},min{0.5, 0.4},min{0.3, 0.9}.
286
7. FUZZY SET THEORY
When the join operation defined by Eq. (7.34) is applied to the same fuzzy
relations, P and Q, we obtain a 3-dimensional relation. Equation (7.34) can be
expressed in terms of the associated arrays as
[rijk ] = [ pij ] * [ q jk ],
where
rijk = min{ pij , q jk }.
The resulting 3-dimensional array R may conveniently be represented by the
following four matrices, each associated with one element of set X:
z1
z2
z3
z4
z5
y1
È 0.0
Í 0.0
Í
Í 0.0
Í
Í 0.0
ÎÍ 0.0
y2
0.3
0.3
0.3
0.3
0.0
x1
y3 y1
0.0 ˘ È 0.2
0.4 ˙˙ ÍÍ 0.0
0.2 ˙ Í 0.0
˙Í
0.4 ˙ Í 0.2
0.0 ˚˙ ÎÍ 0.2
y2
0.5
0.5
0 .5
0.4
0.0
x2
y3 y1
0.0 ˘ È 0.7
0.3 ˙˙ ÍÍ 0.0
0.2 ˙ Í 0.0
˙Í
0.3 ˙ Í 0.3
0.0 ˚˙ ÎÍ 0.6
y2
0.0
0.0
0.0
0.0
0.0
x3
y3 y1
0.0 ˘ È 0.7
0.0 ˙˙ ÍÍ 0.0
0.0 ˙ Í 0.0
˙Í
0.0 ˙ Í 0.3
0.0 ˚˙ ÎÍ 0.6
y2
0.5
0.5
0.7
0.4
0.0
x4
y3
0.0 ˘
0.7 ˙˙
0.2 ˙
˙
0.9 ˙
0.0 ˙˚
Equations (7.33), which describe R = P ° Q are called fuzzy relation equations.
Normally, it is assumed that P and Q are given and R is determined by Eq.
(7.33). However, two inverse problems play important roles in many applications. In one of them, R and P are given and Q is to be determined; in the
other one, R and Q are given and P is to be determined. Various methods for
solving these problems exactly as well as approximately have been developed.
It should also be mentioned that cutworthy fuzzy counterparts of the
various classical binary relations on X ¥ X, such as equivalence relations, and
the various ordering relations, have been extensively investigated. However,
many types of fuzzy relations on X ¥ X that are not cutworthy have been investigated as well and found useful in many applications.
7.6. FUZZY LOGIC
The term “fuzzy logic,” as currently used in the literature, has two distinct
meanings. It is viewed either in a narrow sense or in a broad sense. Fuzzy logic
in the narrow sense is a generalization of the various many-valued logics, which
have been investigated in the area of symbolic logic since the beginning of the
20th century. It is concerned with development of syntactic aspects (based on
the notion of proof ) and semantic aspects (based on the notion of truth) of a
relevant logic calculus. In order to be acceptable, the calculus must be sound
(provability implies truth) and complete (truth implies provability).
7.6. FUZZY LOGIC
287
Fuzzy logic in the narrow sense is important since it provides theoretical
foundations for fuzzy logic in the broad sense. The latter is viewed as a system
of concepts, principles, and methods for reasoning that is approximate rather
than exact. It utilizes the apparatus of fuzzy set theory for formulating various
forms of sound approximate reasoning with fuzzy propositions. It is fuzzy logic
in this broad sense that is primarily involved in dealing with fuzzified uncertainty theories.
To establish a connection between fuzzy set theory and fuzzy logic, it is
essential to connect degrees of membership in fuzzy sets with degrees of truth
of fuzzy propositions. This can only be done when the degrees of membership
and the degrees of truth refer to the same objects. Let us consider first the simplest connection, in which only one fuzzy set is involved.
Given a fuzzy set A, its membership degree A(x) for any x in the underlying universal set X can be interpreted as the degree of truth of the associated
fuzzy proposition “x is a member of A.” Conversely, given an arbitrary proposition of the simple form “x is A,” where x is from X and A is a fuzzy set that
represents an inherently vague linguistic term (such as low, high, near, fast),
its degree of truth may be interpreted as the membership degree of x in A.
That is, the degree of truth of the proposition is equal to the degree with which
x belongs to A.
This simple correspondence between membership degrees and degrees of
truth, which conforms well to our intuition, forms a basis for determining
degrees of truth of more complex propositions. Moreover, negations, conjunctions, and disjunctions of fuzzy propositions are defined under this correspondence in exactly the same way as complement, intersections, and unions
of fuzzy sets, respectively.
7.6.1. Fuzzy Propositions
Now let us examine basic propositional forms of fuzzy propositions. To do that,
we need a convenient notation. Let X denote a variable whose states (values)
are in set X, and let A denote a fuzzy set defined on X that represents an
approximate description of the state of variable X by a linguistic term such as
low, medium, high, slow, fast.
Using this notation, the simplest fuzzy propositions (unconditional and
unqualified) are expressed in the canonical propositional form,
f A (x) : X is A,
in which the fuzzy set A is called a fuzzy predicate. Given this propositional
form, a fuzzy proposition, fA(x), is obtained when a particular state (value) x
from X is substituted for variable X in the propositional form. That is,
f A (x) : X = x is A,
288
7. FUZZY SET THEORY
where x Œ X, is a particular fuzzy proposition of propositional form fA. For
simplicity, let fA(x) also denote the degree of truth of the proposition “x is A.”
This means that the symbol fA denotes a propositional form as well as a function by which degrees of truth are assigned to fuzzy propositions based on the
form. This double use of the symbol fA does not create any ambiguity, since
there is only one function for each propositional form that assigns degrees of
truth to individual propositions subsumed under the form. In this case, the
function is defined for all x Œ X by the simple equation
f A (x) = A(x).
That is, fA(x) is true to the same degree to which x in a member of fuzzy
set A.
The propositional form fA can be modified by qualifying the claims for the
degree of truth of the associated fuzzy propositions. Two types of qualified
propositional forms are recognized: a truth-qualified propositional form
and a probability-qualified propositional form. These two forms also can be
combined.
The truth-qualified propositional form, fT(A), is obtained by adding a truth
qualifier, T, to the basic form of fA. That is,
fT( A ) : X is A is T ,
where T is a fuzzy number or interval on [0, 1] that represents a linguistic term
by which the meaning of truth in fuzzy propositions associated with this propositional form is qualified. Linguistic terms such as very true, fairly true, barely
true, false, fairly false, or virtually false are typical examples of linguistic truth
qualifiers.
To obtain the degree of truth of a truth-qualified proposition fT(A), we need
to compose the membership function A with the membership function of the
truth qualifier T. That is,
fT( A ) (x) : T ( A(x))
(7.36)
for all x Œ X.
EXAMPLE 7.9. Consider truth-qualified fuzzy propositions of the form
fT( A ) :The rate of inflation is low is fairly true.
In this example, X is the variable “the rate of inflation” (in %), A is a fuzzy
set representing the predicate “low” (L), T is a fuzzy set representing the truth
qualifier “fairly true” (FT). The relevant membership functions of L and FT
are shown in Figure 7.7. Now assume that the actual rate of inflation is 2.5%.
Then, L(2.5) = 0.75 and FT(0.75) = 0.86. Hence
7.6. FUZZY LOGIC
L: low
M: medium
1
L(2.5) = .75
M(2.5) = .25
0
1
2
3
4
5
6
7
8
r = 2.5
r : Inflation Rate in %
(a)
T(m)
1
Fa
0.86
irly
fa
lse
Fa
ry
Ve
0.56
0.25
ue
Tr
irly
tru
e
0.51
0.07
Fa
e
y
er
Absolutely true
lse
fa
Absolutely false
0.75
lse
tru
V
0
0
0.2
0.4
0.6
m = 0.25
0.8
1
m = 0.75
m: Membership Degree
(b)
Figure 7.7. Illustration of truth-qualified unconditional fuzzy propositions.
289
290
7. FUZZY SET THEORY
fT( L ) (2.5) = FT (L(2.5)) = FT (0.75) = 0.86.
When we change the predicate or the truth qualifier or both, the degree of
truth of the proposition is affected. For example, when only the truth qualifier
is changed to “false” (F) the degree of truth becomes F(0.75) = 0.25.
The probability-qualified propositional form, fP(A), is obtained by adding
a probability qualifier, P, to the basic form fA and replacing “X is A” with
“Probability of X is A.” That is,
fP( A ) : Probability of X is A is P,
where P is a fuzzy number or interval defined on [0, 1], called a probability
qualifier, which represents an approximate description of probability by a linguistic term, such as likely, very likely, extremely likely, around 0.5. Given this
propositional form, a proposition is obtained for each particular probability
measure, Pro, defined on subsets of X. That is,
fP( A ) : (Pro( A)): Pro( A) is P.
When X is finite,
Pro( A) =
Â
A(x) p(x),
(7.37)
x ŒX
where p is a given (known) probability distribution function on X. When
X = [x, x̄] 債 ⺢,
Pro( A) = Ú A(x)q(x) dx,
X
(7.38)
where q is a given probability density function on X. Equations (7.37) and
(7.38) are standard formulas for computing probabilities of fuzzy events
(Section 8.4).
EXAMPLE 7.10. Consider probability-qualified fuzzy propositions of the
form
fP( F ) : Probability of temperature (at a given place and time) is A is P,
where A is a fuzzy number shown in Figure 7.8a that represents the fuzzy predicate “around 80°F,” as understood in a given context, and P is a fuzzy number
shown in Figure 7.8b that represents the linguistic term “unlikely.” Assume
now that the probabilities p(t) for the individual temperatures, obtained
from relevant statistical data over many years, are given in Table 7.1a. To
determine the degree of truth of the proposition, we calculate first Pro(A) by
Eq. (7.37):
291
7.6. FUZZY LOGIC
Very unlikely
1
1
0.7
Unlikely
Likely
A(t)
Very
likely
0.29
0
0
74
76
78
80
o
82
t [ F]
84
86
0
0.25
0.5
0.75
1
0.15
(a)
(b)
Figure 7.8. Illustration of probability-qualified unconditional fuzzy proposition.
Pro( A) = 0.25 ¥ 0.14 + 0.5 ¥ 0.11 + 0.75 ¥ 0.04 + 1 ¥ 0.02 + 0.75 ¥ 0.001
+ 0.5 ¥ 0.006 + 0.25 ¥ 0.004 = 0.15.
Now, applying this probability value to the probability qualifier “unlikely” in
Figure 7.8b, we obtain 0.7. Hence,
fP ( A )( p) = P(Pro( A)) = P(0.15) = 0.7.
Fuzzy propositions that are probability-qualified as well as truth-qualified
have the form
fT [ P( A )] : Probability of X is A is P is T .
To obtain the degree of truth of any proposition of this form, we need to
perform two compositions of functions:
fT[ P( A )] ( p) = T {P[ Pro( A)]},
where Pro(A) is calculated either by Eq. (7.37) or by Eq. (7.38).
An important type of fuzzy proposition, which is essential for knowledgebased fuzzy systems, is the conditional fuzzy proposition. This type of proposition is based on the propositional form
fB A : If X is A, then Y is B,
where X and Y are variables whose states are in sets X and Y, respectively.
These propositions may also be expressed in an alternative, but equivalent
form
292
Table 7.1. Probabilities in (a) Example 7.10; (b) Exercise 7.29
t
p(t)
71
72
73
74
75
76
77
78
79
80
81
82
83
0.01
0.03
0.11
0.15
0.21
0.16
0.14
0.11
0.04
0.02
0.01
0.006
0.004
(a)
r
p(r)
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
0.005
0.005
0.01
0.02
0.05
0.05
0.04
0.05
0.05
0.05
0.1
0.3
0.2
0.04
0.02
(b)
7.6. FUZZY LOGIC
293
fB A : (X ,Y ) is R,
where R is a fuzzy relation on X ¥ Y. It is assumed here that R is determined
for each x Œ X and each y Œ Y by the formula
R(x, y) = J ( A(x), B(x)),
where the symbol J stands for a binary operation on [0, 1] that represents an
appropriate fuzzy implication in the given application context. Clearly,
fB A (x, y) = R(x, y)
for all ·x, yÒ Œ X ¥ Y. Moreover, if a truth qualification or a probability qualification is employed, R must be composed with the respective qualifiers to
obtain for each ·x, yÒ Œ X ¥ Y the degree of truth of the conditional and qualified proposition.
As is well known, operations that qualify as fuzzy implications form a class
of binary operations on [0, 1], similarly as fuzzy intersections and fuzzy unions.
An important class of fuzzy implications, referred to as Lukasiewicz implications, is defined for each a Œ [0, 1] and each b Œ [0, 1] by the formula
J (a, b) = min[1, 1 - a l + b l ]1 l ,
(7.39)
where l > 0 is a parameter by which individual implications are distinguished
from one another.
Fuzzy propositions of any of the introduced types may also be quantified.
In general, fuzzy quantifiers are fuzzy intervals. This subject is beyond the
scope of this overview.
7.6.2. Approximate Reasoning
Reasoning based on fuzzy propositions of the various types is usually referred
to as approximate reasoning. The most fundamental components of approximate reasoning are conditional fuzzy propositions, which may also be truth
qualified, probability qualified, quantified, or any combination of these. Special
procedures are needed for each of these types of fuzzy propositions. This large
variety of fuzzy propositions makes approximate reasoning methodologically
rather intricate. This reflects the richness of natural language and the many
intricacies of common-sense reasoning, which approximate reasoning based
upon fuzzy set theory attempts to emulate.
To illustrate the essence of approximate reasoning, let us characterize the
fuzzy-logic generalization of one of the most common inference rules of classical logic: modus ponens. The generalized modus ponens is expressed by the
following schema:
294
7. FUZZY SET THEORY
Fuzzy rule:
Fuzzy fact :
Fuzzy conclusion:
If X is A, thenY is B
X is F
Y is C .
Clearly, in this schema A and F are fuzzy sets defined on X, while B and C are
fuzzy sets defined on Y. Assuming that the fuzzy rule is already converted to
the alternative form
(X ,Y ) is R,
where R represents the fuzzy implication employed, the fuzzy conclusion C is
obtained by composing F with R. That is
B = F oR
or, more specifically,
B( y) = max{min[ F (x), R(x, y)]}
x ŒX
(7.40)
for all y Œ Y. This way of obtaining the conclusion according to the generalized modus ponens schema is called a compositional rule of inference.
To use the compositional rule of inference, we need to choose a fitting fuzzy
implication in each application context and express it in terms of a fuzzy relation R. There are several ways in which this can be done. One way is to derive
from the application context (by observing or by expert’s judgements) pairs
F, C of fuzzy sets that are supposed to be inferentially connected (facts and
conclusions). Relation R, which represents a fuzzy implication, is then determined by solving the inverse problem of fuzzy relation equations. This and
other issues regarding fuzzy implications in approximate reasoning are discussed fairly thoroughly in the literature (see Note 7.9).
7.7. FUZZY SYSTEMS
In general, each classical system is ultimately a set of variables together with
a relation among states (or values) of the variables. When the states of variables are fuzzy sets, the system is called a fuzzy system. In most typical fuzzy
systems, the states are fuzzy intervals that represent linguistic terms such as
very small, small, medium, large, very large, as interpreted in the context
of each particular application. If they do, the variables are called linguistic
variables.
Each linguistic variable is defined in terms of a base variable, whose values
are usually real numbers within a specific range. A base variable is a variable
in the classical sense, as exemplified by any physical variable (temperature,
pressure, tidal range, grain size, etc.). Linguistic terms involved in a linguistic
7.7. FUZZY SYSTEMS
295
variable are used for approximating the actual values of the associated base
variable. Their meanings are captured, in the context of each particular application, by appropriate fuzzy intervals. That is, each linguistic variable consists
of:
•
•
•
•
A name, which should reflect the meaning of the base variable involved.
A base variable with its range of values (usually a closed interval of real
numbers).
A set of linguistic terms that refers to values of the base variable.
A set of semantic rules, which assign to each linguistic term its meaning
in terms of an appropriate fuzzy interval (or some other fuzzy set) defined
on the range of the base variable.
An example of a linguistic variable is shown in Figure 7.9. Its name “interest
rate” captures the meaning of the base variable—a real variable whose range
is [0, 20]. Five linguistic states are distinguished by the linguistic terms very
small, small, medium large, large, and very large. Each of these terms is represented by a trapezoidal-shaped fuzzy interval, as shown in the figure.
Consider a linguistic variable whose base variable has states (values) in set
X. Fuzzy sets representing a finite set of linguistic states of the linguistic variable are often defined in such a way that the sum of membership degrees in
these fuzzy sets is equal to 1 for each x Œ X. Such a family of fuzzy subsets
of X is called a fuzzy partition of X. Formally a finite family {Ai | Ai Œ F(X),
i Œ ⺞n, n ≥ 1} of n fuzzy subsets of X is a fuzzy partition if and only if Ai π ⭋
for all i Œ ⺞n and
Â
Ai (x) = 1 for each x Œ X .
i Œ⺞ n
An example of a fuzzy partition of X = [0, 20] is the family of fuzzy intervals
in Figure 7.9.
7.7.1. Granulation
Representing states of variables by fuzzy sets (usually fuzzy numbers or intervals) is called a granulation. It is a fuzzy counterpart of classical quantization,
which is any meaningful grouping of states of variables into quanta (or aggregates). An example of quantization of a real variable whose range is [0, 1] into
eleven quanta (semiopen or closed intervals of real numbers) is shown in
Figure 7.10a. One of many possible fuzzy counterparts of this quantization, a
particular granulation of the variable by triangular fuzzy numbers, is shown in
Figure 7.10b.
While viewing physical variables, such as temperature, pressure, and
electric current, as real variables is mathematically convenient, this view
introduces a fundamental inconsistency between the infinite precision
296
7. FUZZY SET THEORY
Linguistic
variable
Interest rate
Linguistic
values (states)
Very small
Medium
Small
Large
Very large
Semantic
rule
Fuzzy
intervals
0
2.5
5
7.5
10
12.5
15
17.5
20
v (interest rate in %)
Base variable
Figure 7.9. An example of a linguistic variable.
0
0.1
0.2
0.1
0 0.05
0.2
0.15
0.3
0.4
0.5
0.6
0.7
0.8
0.3
0.4
0.5
0.6
0.7
0.8
0.25
0.35
0.45
0.55
Around
0
Around
0.1
Around
0.2
Around
0.3
Around
0.4
(a)
Around
0.5
0
0.1
0.2
0.3
0.4
0.5
0.65
0.75
0.9
1
0.9
0.95 1
0.85
Around
0.6
Around
0.7
Around
0.8
Around
0.9
Around
1
0.6
0.7
0.8
0.9
1
(b)
Figure 7.10. Quantization versus granulation.
7.7. FUZZY SYSTEMS
297
required to distinguish real numbers and the finite precision of any measuring instrument. Appropriate quantization, whose coarseness reflects the precision of a given measuring instrument, is thus inevitable to resolve this
inconsistency. Consider, for example, a real variable that represents electric
current whose values range from 0 to 1 ampere. Assume, for the sake of simplicity, that measurements of the variable can be made to an accuracy of 0.1
ampere. Then, according to the usual quantization, the interval [0, 1] is partitioned into 10 semiopen intervals and one closed interval (quanta, aggregates),
[0, 0.5), [0.5, 1.5), [1.5, 2.5), . . . , [0.85, 0.95), [0.95, 1], which are labeled by their
ideal representatives 0, 0.1, 0.2, . . . , 0.9, 1, respectively.This is exactly the quantization shown in Figure 7.10a.
Although the usual quantization of real variables is capable of capturing
the limited resolutions of measuring instruments employed, it completely
ignores the issue of measurement errors. While representing the infinite
number of values by a finite number of quanta (disjoint intervals of real
numbers and their ideal representatives) the unavoidable measurement errors
make the sharp boundaries between the quanta highly unrealistic. The representation can be made more realistic by a finite number of granules (fuzzy
numbers or intervals), as illustrated for our example in Figure 7.10b. The fundamental difference is that transitions from each granule to its adjacent granules are smooth rather than abrupt. Moreover, any available knowledge
regarding measurement errors in each particular application context can be
utilized in molding the granules.
7.7.2. Types of Fuzzy Systems
In principle, fuzzy systems can be knowledge-based, model-based, or hybrid.
In knowledge-based fuzzy systems, relationships between variables are
described by collections of fuzzy inference rules (conditional fuzzy propositional forms). These rules attempt to capture the knowledge of a human
expert, expressed often in natural language. Model-based fuzzy systems are
based on traditional systems modeling, but they employ appropriate areas of
fuzzy mathematics (fuzzy analysis, fuzzy differential equations, etc.). These
mathematical areas, based on the notion of fuzzy numbers or intervals, allow
us to approximate classical mathematical systems of various types via appropriate granulation to achieve tractability, robustness, and low computational
cost. Hybrid fuzzy systems are combinations of knowledge-based and modelbased fuzzy systems. At this time, knowledge-based fuzzy systems are more
developed than model-based or hybrid fuzzy systems.
In knowledge-based fuzzy systems, the relation between input and output
linguistic variables is expressed in terms of a set of fuzzy inference rules (conditional propositional forms). From these rules and any information describing actual states of input variables, the actual states of output variables are
derived by an appropriate compositional rule of inference. Assuming that the
input variables are X1, X2, . . . , and the output variables are Y1, Y2, . . . , we
298
7. FUZZY SET THEORY
have the following general scheme of inference to represent the input–output
relation of the system:
Rule 1:
Rule 2:
If X1 is A11 and X 2 is A21 and . . . , then Y 1 is B11 and Y 2 is
B21 and . . .
If X1 is A12 and X 2 is A22 and . . . , then Y 1 is B12 and Y 2 is
B22 and . . .
. . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Rule n:
Fact :
Conclusion:
If X1 is A1n and X 2 is A2 n and . . . then Y 1 is B1n and Y 2 is
B2 n and . . .
X1 is C1 and X 2 is C 2 and . . .
Y 1 is D1 and Y 2 is D2 and . . .
This general formulation of knowledge-based fuzzy systems involves many
complex issues regarding the construction of the inference rules as well as procedures for using them to make inferences. These issues have been extensively
addressed in the literature, but they are beyond the scope of this overview.
Although fuzzy systems that approximate relationships among numerical
base variables have special significance, due to their extensive applicability,
they are not the only fuzzy sets. Any classical (crisp) system whose variables
are not numerical (e.g., ordinal-scale or nominal-scale variables) can be fuzzified as well.
7.7.3. Defuzzification
The result of each fuzzy inference based on a system of input and output linguistic variables that approximate numerical base variables is, in general, a set
of fuzzy intervals, one for each output variable. Some applications (such as
control or decision making) require that each of these fuzzy intervals be converted to a single real number that, in the context of a given application, best
represent the fuzzy interval. This conversion of a given fuzzy interval A, to its
real-number representation d(A), is called a defuzzification of A.
The most common defuzzification method, which is called a centroid
method, is defined by the formula
d( A) =
Ú
⺢
x ◊ A(x) dx
Ú
⺢
A(x) dx
(7.41)
or, when A is defined on a finite universal set of numbers X, by the formula
d( A) =
 x ◊ A(x)
x ŒX
 A(x)
x ŒX
.
(7.42)
7.8. NONSTANDARD FUZZY SETS
299
This method may be viewed as unbiased since it treats all values A(x) equally.
However, it has been observed that other methods of defuzzification are
preferable in some applications. In these methods, differences in values A(x)
are either upgraded or downgraded to various degrees. Such methods are
members of the general class of defuzzification methods defined either by the
formula
dl ( A) =
Ú
⺢
x ◊ Al (x) dx
Ú
⺢
Al (x)
,
(7.43)
,
(7.44)
when A is defined on ⺢, or by the formula
dl ( A) =
 x◊A
l
(x)
x ŒX
Â
Al (x)
x ŒX
when A is defined on a finite universal set X. Individual methods in this class
are distinguished by the value of parameter l Œ (0, •). When l = 1, we obtain
the centroid method. When l > 1, differences in values A(x) are upgraded;
when l < 1, they are downgraded.
An alternative kind of defuzzification is to replace fuzzy intervals with representative crisp intervals rather than with individual real numbers. This kind
of defuzzification can be formalized rationally in information–theoretic terms,
as is shown in Chapter 9. Another problem, somewhat connected with defuzzification, is the problem of converting a given fuzzy set to a linguistic expression. This problem, which is usually referred to as linguistic approximation,
can also be dealt with in information-theoretic terms.
7.8. NONSTANDARD FUZZY SETS
In addition to standard fuzzy sets, several other types of fuzzy sets have been
introduced in the literature. Each of them is a nucleus of a particular formalized language, which may be viewed as a branch of the overall fuzzy set theory.
Distinct types of fuzzy sets are distinguished from one another by the domain
and range of their membership functions.
The following are definitions of the most visible of nonstandard fuzzy sets.
In each of them, symbols X and A denote, respectively, the universal set of
concern and the fuzzy set defined.
Interval-Valued Fuzzy Sets. Membership functions have the form
A : X Æ CI ([ 0, 1] ),
300
7. FUZZY SET THEORY
where CI([0, 1]) denotes the set of all closed intervals contained in [0, 1]. For
each x Œ X, A(x) is a closed interval of real numbers in [0, 1]. An alternative
formulation:
A = A, A ,
where A and Ā are standard fuzzy sets such that A(x) £ Ā(x) for all x Œ X.
Fuzzy sets defined in this way are sometimes called gray fuzzy sets. Clearly,
for each x ŒX
[
]
A(x) = A(x), A(x) ŒCI ([ 0, 1] ).
An example of an interval-valued fuzzy set is shown in Figure 7.11.
Fuzzy Sets of Type 2. Membership functions have the form
A : X Æ FI ([ 0, 1] ),
where FI([0, 1]) denotes the set of all fuzzy intervals defined on [0, 1]. For each
x Œ X, A(x) = Ix, where Ix is a fuzzy interval defined on [0, 1]. This fuzzy interval defines (imprecisely) the membership degree of x in A. An example is
shown in Figure 7.12, where the fuzzy intervals are assumed to be of a trapezoidal shape. For each x Œ X, the closed interval defined by the shaded area
is the core of Ix, and the closed interval defined by the uppermost and lowermost curves is the support of Ix.
1
A(x)
A
0.5
A
0
a
x
Figure 7.11. Example of an interval-valued fuzzy set (or a gray set).
301
7.8. NONSTANDARD FUZZY SETS
b4
1
A(x)
y
b3
a4
b2
a3
b1
a2
y
a1
Ia( y)
0
a
x
b
Ib( y)
Figure 7.12. Example of a fuzzy set of type 2.
Fuzzy Sets of Type t (t > 2). For each t > 2, membership functions are defined
recursively by the form
A : X Æ FI t -1 ([ 0, 1] ),
where FIt-1([0, 1]) denote the set of all fuzzy intervals of type t - 1. These types
of fuzzy sets were introduced in the literature as theoretically possible generalizations of fuzzy sets of type 2. However, they have not been seriously investigated so far, and their practical utility remains to be seen.
Fuzzy Sets of Level 2. Membership functions have the form
A : F (X ) Æ [ 0, 1],
where F(X) denotes a family of fuzzy sets defined on X. That is, a fuzzy set of
level 2 is defined on a family of fuzzy sets, each of which is defined, in turn, on
a given universal set X. This mathematical structure allows us to represent
a higher-level concept by lower-level concepts, all expressed in imprecise
linguistic terms of natural language.
Fuzzy Sets of Level l (l > 2). For each l > 2, membership functions are defined
recursively by the form
A : F ( l-1) (X ) Æ [ 0, 1],
where F(l-1)(X) denotes a family of fuzzy sets of level l - 1. Sets of these types
are natural generalizations of fuzzy sets of level 2. They are sufficiently expressive to facilitate representation of high-level concepts embedded in natural
302
7. FUZZY SET THEORY
language. Notwithstanding their importance for linguistics, cognitive science,
and knowledge representation, their theory has not been adequately developed as yet.
L-Fuzzy Sets. Membership functions have the form
A : X Æ L,
where L denotes a recognized set of membership grades, which are not
required to be numerical. However, the membership grades recognized in L
are required to be at least partially ordered. Usually, L is assumed to be a complete lattice. These fuzzy sets are very general, and some of the other types of
fuzzy sets may be viewed as special L-fuzzy sets.
Intuitionistic Fuzzy Sets. Fuzzy sets of this type are defined by pairs of standard fuzzy sets,
A = AM , AN ,
where AM and AN denote standard fuzzy sets on X such that
0 £ AM (x) + AN (x) £ 1
for each x Œ X. The values AM(x) and AN(x) are interpreted for each x Œ X
as, respectively, the degree of membership and the degree of nonmembership
of x in A.
Rough Fuzzy Sets. These are fuzzy sets whose a-cuts are approximated by
rough sets. That is, AR is a rough approximation of a fuzzy set A based on an
equivalence relation R on X. Symbols AR and ĀR denote, respectively, the lower
and upper approximations of A in which the set of equivalence classes X/R is
employed instead of the universal set X; for each a Œ [0, 1], the a-cuts of AR
and ĀR are defined by the formulas
a
AR = » {[ x]R [ x]R Õ a A, x Œ X },
a
AR = » {[ x]R [ x]R « a A π ∆, x Œ X },
where [x]R denotes the equivalence class in X/R that contains x. This combination of fuzzy sets with rough sets must be distinguished from another combination, in which a fuzzy equivalence relation is employed in the definition
of a rough set. It is appropriate to refer to the sets that are based on the latter
combination as fuzzy rough sets. These combinations, which have been discussed in the literature since the early 1990s, seem to have great utility in some
application areas.
7.9. CONSTRUCTING FUZZY SETS AND OPERATIONS
303
Observe that the introduced types of fuzzy sets are interrelated in numerous ways. For example, a fuzzy set of any type that employs the unit interval
[0, 1] can be generalized by replacing [0, 1] with a complete lattice L; some of
the types (e.g., standard, interval-valued, or type 2 fuzzy sets) can be viewed
as special cases of L-fuzzy sets; or rough fuzzy sets can be viewed as special
interval-valued sets. The overall fuzzy set theory is thus a broad formalized
language based on an appreciable inventory of interrelated types of fuzzy sets,
each associated with its own variety of concepts, operations, methods of computation, interpretations, and applications.
7.9. CONSTRUCTING FUZZY SETS AND OPERATIONS
Fuzzy set theory provides us with a broad spectrum of tools for representing
and manipulating linguistic concepts, most of which are intrinsically vague and
strongly dependent on the context in which they are used. Some linguistic concepts are represented by fuzzy sets, others are represented by operations on
fuzzy sets. A prerequisite to each application of fuzzy set theory is to construct
appropriate fuzzy sets and operations on fuzzy sets by which the intended
meanings of relevant linguistic terms are adequately captured.
The problem of constructing fuzzy sets (i.e., their membership functions)
or operations on fuzzy sets in the contexts of various applications is not a
problem of fuzzy set theory per se. It is a problem of knowledge acquisition,
which is a subject of a relatively new field referred to as knowledge engineering. The process of knowledge acquisition involves one or more experts
in a specific domain of interest and a knowledge engineer. The role of the
knowledge engineer is to elicit the knowledge of interest from experts, and
express it in some operational form of a required type.
In applications of fuzzy set theory, knowledge acquisition involves basically
two stages. In the first stage, the knowledge engineer attempts to elicit relevant knowledge in terms of propositions expressed in natural language. In the
second stage, the knowledge engineer attempts to determine the meaning of
each linguistic term employed in these propositions. It is during this second
stage of knowledge acquisition that membership functions of fuzzy sets as well
as appropriate operations on these fuzzy sets are constructed.
Many methods for constructing membership functions are described in the
literature. It is useful to classify them into direct methods and indirect methods.
In direct methods, the expert is expected to define a membership function
either completely or exemplify it for some selected individuals in the universal set. To request a complete definition from the expert, usually in terms of a
justifiable mathematical formula, is feasible only for a concept that is perfectly
represented by some objects of the universal set, called ideal prototypes of the
concept, and the compatibility of other objects in the universal set with these
ideal prototypes can be expressed mathematically by a meaningful similarity
function. For example, in pattern recognition of handwritten characters, any
304
7. FUZZY SET THEORY
given straight line s can be defined as a member of a fuzzy set H of horizontal straight lines with the membership degree
when q Œ[ 0, 45]
otherwise,
Ï1 - q 45
H (s) = Ì
Ó0
where q is the angle (measured in degrees) between s and an ideal horizontal
straight line (an ideal prototype) that crosses it; it is assumed that q Œ [-90,
90], which means that the angle is required to be measured either in the first
guadrant or the fourth guadrant, as relevant.
If it is not feasible to define the membership function in question completely, the expert should at least be able to exemplify it for some representative objects of the universal set. The exemplification can be facilitated by
asking the expert questions regarding the compatibility of individual objects
x with the linguistic term that is to be represented by fuzzy set A. These questions, regardless of their form, result in a set of pairs ·x, A(x)Ò that exemplify
the membership function under construction. This set is then used for constructing the full membership function. One way to do that is to select an
appropriate class of functions (triangular, trapezoidal, S-shaped, bell-shaped,
etc.) and employ some relevant curve-fitting method to determine the function that best fits the given samples. Another way is to use an appropriate
neural network to construct the membership function by learning from the
given samples. This approach has been so successful that neural networks are
now viewed as a standard tool for constructing membership functions.
When a direct method is extended from one expert to multiple experts, the
opinions of individual experts must be properly combined. Any averaging
operation, including those introduced in Section 7.3.4, can be used for this
purpose. The most common operation is the simple weighted average
A(x) =
n
 c A (x),
i
i
i=1
where Ai(x) denotes the valuation of the proposition “x belongs to A” by
expert i, n denotes the number of experts involved, and ci denote weights by
which the relative significance of the individual experts can be expressed; it is
assumed that
n
Âc
i
= 1.
i=1
Experts are instructed either to value each proposition by a number in [0, 1]
or to value it as true or false.
Direct methods based on exemplification have one fundamental disadvantage. They require the expert (or experts) to give answers that are overly
NOTES
305
precise and, hence, unrealistic as expressions of their qualitative subjective
judgments. As a consequence, the answers are always somewhat arbitrary.
Indirect methods attempt to reduce this arbitrariness by replacing the
requested direct estimates of degrees of membership with simpler tasks.
In indirect methods, experts are usually asked to compare elements of the
universal set in pairs according to their relative standing with respect to their
membership in the fuzzy set to be constructed. The pairwise comparisons are
often easier to estimate than the direct values, but they have to be somehow
connected to the direct values. Numerous methods have been developed for
dealing with this problem. They have to take into account possible inconsistencies in the pairwise estimates. Most of these methods deal with pairwise
comparisons obtained from one expert, but a few methods are described in
the literature that aggregate pairwise estimates from multiple experts. The
latter methods are particularly powerful since they allow the knowledge engineer to determine the degrees of competence of the participating experts,
which are then utilized, together with the expert’s judgments for calculating
the degrees of membership in question.
Methods for constructing membership functions of fuzzy sets and relevant
operations on these fuzzy sets in the context of individual applications are
essential for utilizing the enormous expressive power of fuzzy set theory for
representing knowledge. Although many powerful methods are now available
for this purpose, research in this area is still very active. To cover the various
construction methods in detail and the ongoing research in this rather large
problem area is far beyond the scope of this overview.
NOTES
7.1. Standard fuzzy sets and their basic properties were introduced in a seminal paper
by Lotfi A. Zadeh [1965], even though the ideas of fuzzy sets and fuzzy logic were
envisioned some 30 years earlier by the American philosopher Max Black in his
penetrating discussion of the concept of vagueness [Black, 1937]. Zadeh is not
only the founder of fuzzy set theory, but he has also been a key contributor to
its development in various directions. His principal writings on fuzzy set theory,
fuzzy logic, fuzzy systems, and other related areas during the period 1965–1995
are available in two books, edited by Yager et al. [1987] and Klir and Yuan [1996].
These books are perhaps the most important sources of information about the
development of ideas in these areas. Since 1995, Zadeh has published several
significant papers in which he examines the important role of fuzzy logic in
computing with perceptions [Zadeh, 1996, 1997, 1999, 2002, 2005]. The idea of
computing with perceptions introduces many new challenges, one of which is the
translation from statements in natural language that approximate perceptions to
their counterparts in the formalized language of fuzzy logic and the reverse translation (or retranslation) from statements in fuzzy logic (obtained by approximate
reasoning) to the their counterparts in natural language. Some initial work in this
area has been done by Dvořák [1999], Klir and Sentz [2005], and Yager [2004].
306
7.2.
7.3.
7.4.
7.5.
7.6.
7. FUZZY SET THEORY
A historical overview of fuzzy set theory and fuzzy logic is presented in [Klir,
2001].
The literature on fuzzy set theory and related areas is abundant and rapidly
growing. Two important handbooks, edited by Ruspini et al. [1998] and Dubois
and Prade [2000], are recommended as convenient sources of information on virtually any aspect of fuzzy set theory and related areas; the latter one is the first
volume in a multivolume series of handbooks on fuzzy set theory. From among
the growing number of textbooks on fuzzy set theory, any of the following general
textbooks is recommended for further study: Klir and Yuan [1995a], Lin and Lee
[1996], Nguyen and Walker [1997], Pedrycz and Gomide [1998], Zimmermann
[1996].
Research and education in fuzzy logic is now supported by numerous professional organizations. Many of them cooperate in a federation-like manner via the
International Fuzzy Systems Association (IFSA), which publishes the prime
journal in the field—Fuzzy Sets and Systems, and has organized the biennial
World IFSA Congress since 1985. The oldest professional organization supporting fuzzy logic is the North American Fuzzy Information Processing Society
(NAFIPS). Founded in 1981, NAFIPS publishes the Journal of Approximate
Reasoning and organizes annual meetings.
Two valuable books were written in a popular genre by McNeil and Freiberger
[1993] and Kosko [1993a]. Although both books characterize the relatively
short but dramatic history of fuzzy set theory and discuss the significance of
the theory, they have different foci. While the former book focuses on the
impact of fuzzy set theory on high technology, the latter is concerned more with
philosophical and cultural aspects; these issues are further explored in a
more recent book by Kosko [1999]. Another book for the popular andience,
which is worth reading, was written by De Bono [1991]. He argues that fuzzy logic
(called in the book “water logic”) is important in virtually all aspects of human
affairs.
Operations called t-norms and t–cornoms were originally introduced by Menger
[1942] for the study of statistical metric spaces [Schweizer and Sklar, 1983]. The
most comprehensive treatment of these important operations for fuzzy set theory
are two publications by Klement et al. [2000, 2004]. Procedures are now available
by which various parameterized classes of operations of fuzzy intersections,
unions, and complementations can be generated [Klir and Yuan, 1995a].An excellent overview of the whole spectrum of aggregation operations on fuzzy sets was
prepared by Dubois and Prade [1985b]. Linguistic modifiers are surveyed in
[Kerre and DeCock, 1999].
The concept of a fuzzy number and the associated fuzzy arithmetic were introduced by Dubois and Prade [1978, 1979, 1987a]. They also developed basic ideas
of fuzzy differential calculus [Dubois and Prade, 1982a]. Fuzzy differential equations were studied by Kaleva [1987]. Interval arithmetic, which is a basis for fuzzy
arithmetic, is thoroughly covered in books by Alefeld and Herzeberger [1983],
Hansen [1992], Moore [1966, 1979], and Neumaier [1990]. A specialized introductory book on fuzzy arithmetic was written by Kaufmann and Gupta [1985],
and a more advanced book on using fuzzy arithmetic was written by Mareš
[1994]. Basic ideas of constrained fuzzy arithmetic are developed in [Klir, 1997a,
b] and [Klir and Pan, 1998]. It is worth of mentioning that the concept of a
NOTES
7.7.
7.8.
7.9.
7.10.
7.11.
307
complex fuzzy numbers was also introduced and its utility explored in [Nguyen
et al., 1998].
Basic ideas of fuzzy relations and the concepts of cutworthy fuzzy equivalence,
compatibility, and partial ordering were introduced by Zadeh [1971] and were
further investigated by many researchers. This subject is thoroughly covered in
the book by Bělohlávek [2002], which also contains a large bibliography and
extensive bibliographical comments. Another excellent book on fuzzy relations,
focusing more on software tools and applications, was written by Peeva and
Kyosev [2004]. The notion of fuzzy relation equations was first proposed by
Sanchez [1976]. This important area of fuzzy set theory has been studied extensively, and many results emerging from these studies are covered in a dedicated
monograph by Di Nola et al. [1989]. More recent results can be found in Chapter
6 (written by De Baets) in Dubois and Prade [2000], as well as in Gottwald [1993],
De Baets and Kerre [1994], and Klir and Yuan [1995a].
An excellent and comprehensive survey of multivalued logics, which are the basis
for fuzzy logic in the narrow sense, was prepared by Rescher [1969]. A survey of
more recent developments was prepared by Wolf [1977], Bolc and Borowic
[1992], and Malinowski [1993]. Fuzzy logic in the narrow sense is most comprehensively covered in [Hájek, 1998]. Other important publications in this area
include a series of papers by Pavelka [1979], and books by Novák et al. [1999],
Gottwald [2001], and Gerla [2001].
Literature dealing with approximate reasoning (or fuzzy logic in the broad sense)
is very extensive. Two major references seem to capture the literature quite well.
One of them is a pair of overview papers published together [Dubois and Prade,
1991], and the other one is a book edited by Bezdek et al. [1999], which is one of
the handbooks in [Dubois and Prade, 1998– ]. Two classical papers by Bandler
and Kohout [1980a, b] on fuzzy implication operators are useful to read.
The concept of a linguistic variable was introduced and thoroughly investigated
by Zadeh [1975–76]. It was also Zadeh who introduced the initial ideas of fuzzy
systems in several articles that are included in Yager et al. [1987] and Klir and
Yuan [1996]. The book by Negoita and Ralescu [1975] is an excellent early book
on fuzzy systems. There are many books dealing with fuzzy modeling, often in
the context of control, which is the most visible application of fuzzy systems. A
very small sample consists of the excellent books by Babuška [1998], Piegat
[2001], and Yager and Filev [1994]. Fuzzy systems are also established as universal approximators of a broad class of continuous functions, as is well discussed
by Kreinovich et al. [2000]. Special fuzzy systems—fuzzy automata and languages—are thoroughly covered in a large book by Monderson and Malik [2002].
A good overview of defuzzification methods was prepared by Van Leekwijck and
Kerre [1999].
Interval-valued fuzzy sets and type 2 fuzzy sets have been investigated since the
1970s. The book by Mendel [2001] is by far the most comprehensive coverage of
these types of fuzzy sets, but a paper by John [1998] is a useful overview. Fuzzy
sets of type k were introduced by Zadeh [1975–76] as a natural generalization of
fuzzy sets of type 2, but their theory has not been developed as yet. Fuzzy sets
of level 2 and higher levels were recognized in the late 1970s [Gottwald, 1979],
but they have been rather neglected in the literature, in spite of their potential
utility for representing complex concepts expressed in natural language. L-fuzzy
308
7.12.
7.13.
7.14.
7.15.
7. FUZZY SET THEORY
sets were introduced by Goguen [1967] and they have been the source of various
generalizations in fuzzy set theory. Intuitionistic fuzzy sets were introduced by
Atanassov [1986] and are well developed in his more recent book [Atanassov,
2000]. Combinations of fuzzy sets with rough sets, which have already been
proved useful in some applications, were first examined by Dubois and Prade
[1987c, 1990b, 1992b]; rough sets were introduced by Pawlak [1982] and are
further developed in his book [Pawlak, 1991].
The problem of constructing membership functions of fuzzy sets and operations
on fuzzy sets has been addressed by many authors. Some representative publications dealing with this problem are [Bharathi-Devi and Sarma, 1985],
[Chameau and Santamarina, 1987], and [Sancho-Royo and Verdegay, 1999].
Increasingly, the use of neural networks or genetic algorithms has begun to dominate this area, as is exemplified by the following references: [Lin and Lee, 1996],
[Nauck et al., 1997] [Cordón et al. [2001], [Rutkowska, 2002], and [Rutkowski,
2004]
Since the late 1980s, many surprisingly successful applications of fuzzy set theory
have been developed in virtually all areas of engineering as well as some other
professions. An overview of these applications with relevant references can be
found, for example, in Klir [2000] or Klir and Yuan [1995a]. A significant, more
recent development is the use of fuzzy set theory and fuzzy logic in some areas
of science, such as economics [Billot, 1992], chemistry [Rouvray, 1997], geology
[Demicco and Klir, 2003; Bárdossy and Fodor, 2004], and social sciences [Ragin,
2000].
It has been argued in the literature (see, for example, [Klir, 2000]) that the emergence of fuzzy set theory initiated a scientific revolution (or a paradigm shift)
according to the criteria introduced in the highly influential book by Thomas
Kuhn [1962].
Fuzzy sets may be viewed as wholes, each capturing as a collection (potentially
infinite) of classical sets—its a-cuts. The need for such wholes in mathematics is
expressed very clearly in the classic book Holism and Reductionism by Smuts
[1926].
EXERCISES
7.1. Show that the fuzzy subsethood Sub can be expressed in terms of the
sigma count and standard operation of intersection as
Sub( A, B) =
A« B
.
A
(7.45)
7.2. Determine the strong a-cut representation of fuzzy set A in Figure 7.3a.
7.3. Derive Eq. (7.26) from Eq. (7.24) under the assumption that fA and gA
are linear functions.
EXERCISES
309
7.4. Membership function C in Figure 7.4 is defined for each x Œ ⺢ by the
formula
C (x) = max{0, 2 x - x 2 }.
Determine the a-cut representation of C.
7.5. Derive general formulas for membership functions of fuzzy numbers
whose shapes are exemplified in Figure 7.4 by functions D and E, and
convert them to their respective a-cut representations.
7.6. Determine the membership functions for the a-cut representations in
Example 7.4.
7.7. Show that for each l > 0 the complementation function cl defined by
Eq. (7.7) is involutive.
7.8. Show that for each l > -1 the function
cl =
1- a
1 + la
is an involutive complementation function of fuzzy sets.
7.9. Show that the standard operations of intersection, union, and complementation of fuzzy sets possess the following properties:
(a) They satisfy the De Morgan laws expressed by Eqs. (7.16) and (7.17);
(b) They violate the law of excluded middle and the law of contradiction.
7.10. For the classes of intersection and union operations of fuzzy sets defined
by Eqs. (7.14) and (7.15), respectively, show the following:
(a) The standard operations are obtained in the limit for l Æ •;
(b) The drastic operations are obtained in the limit for l Æ 0.
7.11. Show that the following combinations of fuzzy operations on fuzzy sets
satisfy the law of excluded middle and the law of contradiction:
(a) Drastic intersection, drastic union, and standard complementation;
(b) i(a, b) = max{0, a + b - 1}, u(a, b) = min{1, a + b}, c(a) = 1 - a.
7.12. Show that for crisp sets the class of operations defined by Eq. (7.14)
(or, alternatively, by Eq. (7.15)) collapse to a single operation that
conforms to the classical set intersection (or, alternatively, the classical
set union).
7.13. Show that averaging operations of any kind are not applicable within
the domain of classical set theory.
310
7. FUZZY SET THEORY
7.14. Repeat Example 7.5 for the alternative form of the equation: a = 1 - 1/
(b + 1).
7.15. Consider a fuzzy number A defined by the formula
Ï1 - (1.25 - x)
A(x) = Ì
Ó0
2
when x Œ[ 0.25, 2.25]
otherwise.
and the fuzzy number C in Exercise 7.4. Determine each of the
following fuzzy numbers by both standard and constrained fuzzy
arithmetic:
(a) A + C
(b) A - C; C - A; and C - C
(c) A · C
(d) C/A and A/A
(e) A/C - C/A
(f) A · C/(A + C)
7.16. Repeat Exercise 7.15 for fuzzy number
Ï1 - x - 5 3
A(x) = Ì
Ó0
when x Œ[2, 8]
otherwise,
and a triangular fuzzy number C = ·-1, 0, 0, 1Ò.
7.17. For PA and PB = 1 - PA in Example 7.6, determine the following operations with and without the probabilistic constraint:
(a) PA · PB
(b) PA/PB and PB/PA
(c) PA - PB and PB - PA
(d) PA + PB/PA and PA + PB/PB
7.18. Using the standard and constrained fuzzy arithmetic, determine A · A for
the following trapezoidal-shape fuzzy intervals:
(a) A = ·-1, 0, 1, 2Ò
(b) A = ·1, 2, 2, 4Ò
(c) A = ·-5, -3, -2, -1Ò
(d) A = ·-2, 0, 0, 2Ò
7.19. Given a fuzzy interval whose a-cut representation is aA = [a(a), ā(a)],
show that under the equality constraint
EXERCISES
[
[
[
2
]
]
Ï a 2 (a ), a (a )
ÔÔ 2
a
( A ◊ A) = Ì a (a ), a 2 (a )
Ô
2
ÔÓ 0, max a 2 (a ), a (a )
{
}]
311
when a (a ) < 0
when a(a ) > 0
when 0 Œ[ a(a ), a (a )]
for all a Œ (0, 1].
7.20. Given arbitrary intervals A, B, C show that:
(a) A(B + C) 債 AB + AC when standard fuzzy arithmetic is used;
(b) A(B + C) = AB + AC when constrained fuzzy arithmetic is used.
7.21. Given the equation A + X = B, where A and B are given fuzzy intervals,
show that X = B - A under constrained fuzzy arithmetic, but not under
standard fuzzy arithmetic
7.22. Given the equation A · X = B, where A and B are given fuzzy intervals,
show that X = B/A (assuming that 0 Œ
/ A) under the constrained fuzzy
arithmetic, but not under the standard fuzzy arithmetic.
7.23. Show that the standard operations of intersection and union on fuzzy
sets are cutworthy.
7.24. Show that (P ° Q)-1 = Q-1 ° P-1 for the standard composition and the
inverse of any connected fuzzy binary relations P and Q.
7.25. For each of the three pairs of connected binary fuzzy relations defined
in Table 7.2 by their matrix representations, determine:
(a) The standard composition;
(b) The standard join.
7.26. For some of the compositions performed in Exercise 7.25, verify the
equation (P ° Q)-1 = Q-1 ° P-1.
7.27. For some of the fuzzy relations defined in Table 7.2, determine the
following:
(a) Projections to each dimension;
(b) Cylindric extensions from the projections;
(c) Cylindric closure based on the projections.
7.28. Let X be the relative humidity (measured in %) at some particular place
on the Earth, and let the property of high humidity be expressed by
the trapezoidal-shape fuzzy interval H = ·60, 80, 100, 100Ò defined on the
interval [0, 100]. Using Figure 7.7b, determine the degree of truth of the
truth-qualified fuzzy proposition
fT( H ) (x) : X = x is H is T
312
7. FUZZY SET THEORY
Table 7.2. Matrix Representations of Fuzzy Relations Employed in Exercises 7.25–7.27
x1
P1 = x2
x3
y1
1.0
È
Í 0.3
ÍÎ 0.0
y2
0.0
0.2
0.5
y3
0.7 ˘
0.0 ˙
1.0 ˙˚
x1
x
P2 = 2
x3
x4
y1
1.0
È
Í 0.6
Í 0.3
ÍÎ 0.2
y2
0.9
1.0
0.2
0.3
y3
0.8
0.5
1.0
0.4
y4
0.7
0.4
0.1
1.0
x1
x2
x3
P3 =
x4
x5
x6
y1
0.9
È
Í 0.7
Í 0.6
Í 0.3
Í
Í 0.2
Î 0.1
y2
1.0
0.8
0.5
0.4
0.5
0.3
y3
1.0
0.9
0.8
0.7
0.6
0.5
y4
1.0
0.7
0.6
0.3
0.5
0.3
y1
Q1 = y2
y3
z1
1.0
È
Í 0.0
ÍÎ 0.7
z2
0.0
1.0
0.0
z3
0.7
0.0
1.0
z4
0.5 ˘
1.0 ˙
0.8 ˙˚
y1
y2
Q2 =
y3
y4
z1
0.0
È
Í 0.0
Í 1.0
ÍÎ 1.0
z2
0.8
0.0
1.0
1.0
z3
0.6
0.9
0.0
0.5
z4
0.0
0.9
0.5
0.0
z5
0.4
0.7
0.0
1.0
z6
0.2
0.7
0.0
1.0
z1
y1
y2
y3
Q3 =
y4
y5
y6
È 1.0
Í 1.0
Í 1.0
Í 0.0
Í
Í 0.0
Î 0.0
z2
0.9
0.9
1.0
0.0
0.0
0.0
z3
0.9
1.0
0.8
0.0
0.0
0.0
z4
0.8
0.8
0.9
0.7
0.0
0.5
z5
0.6
0.7
0.0
0.8
0.6
0.5
z6
0.4
0.7
0.0
0.0
0.7
0.5
under the following specifications:
(a) x = 65% and T = True;
(b) x = 65% and T = False;
(c) x = 50% and T = True;
(d) x = 50% and T = False;
(e) x = 76% and T = Fairly true;
(f) x = 76% and T = Very false.
˘
˙
˙
˙˚
y5
0.9
˘
0.6 ˙
0.5 ˙
0.4 ˙
˙
0.2 ˙
0.1 ˚
˘
˙
˙
˙˚
z7
0.2 ˘
0.5 ˙
0.0 ˙
0.0 ˙˙
0.0 ˙
0.6 ˚
EXERCISES
313
7.29. Repeat Exercise 7.28 for the property of medium humidity, which
is expressed by the trapezoidal-shaped fuzzy interval M = ·20, 40, 60,
80Ò.
7.30. The probabilities, p(r), of daily receipts, r, of a shop (rounded to
the nearest hundred dollars) that have been obtained from statistical
data collected over many years are shown in Table 7.1b. Consider three
fuzzy events: low, medium, and high receipts and assume that they are
represented by trapezoidal-shaped fuzzy intervals L = ·18, 18, 21, 23Ò,
M = ·21, 23, 27, 29Ò, and H = ·27, 29, 32, 32Ò, respectively. Determine the
degrees of truth of the following propositions:
(a) The daily receipts are low;
(b) The daily receipts are medium;
(c) The daily receipts are high.
7.31. Modify the granulation defined in Figure 7.10b by replacing the triangular-shaped granules with:
(a) Trapezoidal-shaped granules, with small plateaus around the ideal
values;
(b) Granules of the shape illustrated by membership function C in
Figure 7.4 and defined in Exercise 7.4;
(c) Granules of the shape illustrated by membership function F in
Figure 7.4 and defined in Example 7.3.
7.32. Show that the defuzzified value d(A) obtained by the centroid defuzzification method may be interpreted as the expected value of x based on
A.
7.33. Show that the defuzzified value d(A) defined by Eq. (7.41) is the value
of x for which the area under the graph of membership function A is
divided into two equal areas.
7.34. Using the centroid defuzzification method defuzzify the following fuzzy
sets:
(a) Fuzzy number A(x) = max{0, 4x - 4 - (x - 2)2};
(b) Fuzzy interval L in Figure 7.7a;
(c) The two fuzzy numbers obtained in Example 7.5 and shown in
Figure 7.5;
(d) Fuzzy set A that is equal to the possibility profile r in Figure 6.1 (i.e.,
A(x) = r(x) for all x Œ ⺞15);
(e) Fuzzy interval A defined by the following a-cut representation:
a
Ï[1.5(a ) + 1.75, 4 - 0.5a ]
A= Ì
Ó[a + 2.5, 4 - 0.5a]
when a Œ (0, 0.5]
when a Œ (0.5, 1] .
314
7. FUZZY SET THEORY
7.35. Repeat Exercise 7.34 for some other defuzzification methods defined by
Eqs. (7.43) and (7.44). Choose at least one value of l < 1 and one value
of l > 1.
7.36. Every rectangle may be considered to some degree a member of a fuzzy
set of squares. Using common sense, define a possible membership function of such a fuzzy set.
8
FUZZIFICATION OF
UNCERTAINTY THEORIES
The limit of language is the limit of the world.
—Stephen A. Tyler
8.1. ASPECTS OF FUZZIFICATION
Perhaps the best description of the nature of fuzzification is contained in the
well-known classical paper by Goguen [1967]: “Fuzzification is a process of
imparting a fuzzy structure to a definition (concept), a theorem, or even a
whole theory.” The strength of this deceivingly simple description is its sweeping generality. The single sentence captures the essence of a wide variety of
issues that must be addressed when mathematical concepts, properties, or theories based on classical sets are generalized to their counterparts based on
fuzzy sets of some type.
One method of fuzzifying various properties (concepts, operations,
theorems) of classical set theory is to use the a-cut (or strong a-cut) representations of fuzzy sets. Since a-cuts (as well as strong a-cuts) are classical sets,
every property of classical set theory applies to them. A property of classical
sets is fuzzified (that is, it becomes a property of fuzzy sets) via the a-cut representation by requiring that it holds (in the classical sense) in all a-cuts of
the fuzzy sets involved. Any property of fuzzy sets of some specific type that
is derived from a property of classical sets in this way is called a cutworthy
property.
For standard fuzzy sets, there are many properties that are cutworthy. One
such property is convexity of fuzzy sets. A fuzzy set is said to be convex if and
Uncertainty and Information: Foundations of Generalized Information Theory, by George J. Klir
© 2006 by John Wiley & Sons, Inc.
315
316
8. FUZZIFICATION OF UNCERTAINTY THEORIES
only if all its a-cuts are convex sets in the classical sense. An important class
of special convex fuzzy sets consists of fuzzy intervals. A fuzzy set on ⺢ is said
to be a fuzzy interval if and only if all its a-cuts are closed intervals of real
numbers (i.e., classical convex subsets of ⺢). Arithmetic operations on fuzzy
intervals are also cutworthy. At each a-cut, they follow the rules of either standard or constrained arithmetic on closed intervals of real numbers.
It is significant that the standard operations of intersection and union of
fuzzy sets (min and max operations) are cutworthy. As a consequence, other
operations that are based solely on them are cutworthy as well. They include,
for example, the max–min composition of binary fuzzy relations as well as the
relational join defined by the min operation. The concept of cylindric closure
of fuzzy relations is also cutworthy when the intersection of the cylindric
extensions of the given fuzzy relations is defined by the min operation.
Important examples of cutworthy properties are some properties of binary
fuzzy relations on X 2, such as equivalence, compatibility, or partial ordering.
Classical equivalence relations, for example, are defined by three properties:
(i) reflexivity; (ii) symmetry; and (iii) transitivity. Let these properties be
defined for fuzzy relations, R, as follows:
(i) R is reflexive iff R(x, x) = 1 for all x ŒX.
(ii) R is symmetric iff R(x, y) = R(y, x) for all x,y ŒX.
(iii) R is transitive (or, more specifically, max–min transitive) iff
R(x, z) ≥ max min{R(x, y), R( y, z)}
(8.1)
y ŒY
for all pairs ·x, zÒ ŒX 2.
Then, any binary fuzzy relation on X 2 that possesses these properties is a fuzzy
equivalence relation in the cutworthy sense. That is, all a-cuts of any binary
fuzzy relation that possesses these properties are classical equivalence relations. This follows from the fact that each of the three properties is cutworthy.
In the case of reflexivity and symmetry, it is obvious. In the case of transitivity, it is cutworthy since it is defined in terms of the standard operations of
intersection and union, which are cutworthy.
EXAMPLE 8.1. Consider the binary fuzzy relation R on X 2, where X =
{xi | i Œ ⺞7}, which is defined by the matrix
x1
x1 È1.0
x 2 ÍÍ 0.8
x3 Í 0.0
Í
R = x 4 Í 0.4
x5 Í 0.0
Í
x 6 Í 0.0
x 7 ÍÎ 0.0
x2
0.8
1.0
0.0
0.4
0.0
0.0
0.0
x3
0.0
0.0
1.0
0.0
1.0
0.9
0.5
x4
0.4
0.4
0.0
1.0
0.0
0.0
0.0
x5
0.0
0.0
1.0
0.0
1.0
0.9
0.5
x6
0.0
0.0
0.9
0.0
0.9
1.0
0.5
x7
0.0 ˘
0.0 ˙˙
0.5 ˙
˙
0.0 ˙.
0.5 ˙
˙
0.5 ˙
1.0 ˙˚
8.1. ASPECTS OF FUZZIFICATION
317
It is easy to see that this relation is reflexive and symmetric. It is also max–min
transitive, but that is more difficult to verify. One convenient way to verify it
is to calculate (R ° R) » R where ° denotes the max–min composition and »
denotes the standard union operation. Then, R is transitive if and only if
R = (R o R) » R.
The given relation satisfies this equation. Hence, it is max–min transitive and,
due to its reflexivity and symmetry, it is a fuzzy equivalence relation. This can
be verified by examining all its a-cuts.
The level set of the given relation R is LR = {0, 0.4, 0.5, 0.8, 0.9, 1}. Therefore, R represents six classical equivalence relations, one for each a ŒLR. Each
of these equivalence relations, aR, partitions the set X in some particular way,
a
p(X). Since a ¢R 債 aR when a¢ ≥ a, clearly
p (X ) £ a p (X )
a¢
when a ¢ ≥ a .
The six partitions of the given relation are shown in the form of a partition
tree in Figure 8.1. The partitions become increasingly more refined when
values of a in LR increase.
Other cutworthy types of binary fuzzy relations can be defined in a similar
way. Examples are fuzzy compatibility relations (reflexive and symmetric) and
fuzzy partial orderings (reflexive, antisymmetric, and transitive). For fuzzy
partial orderings, the property of fuzzy antisymmetry is defined as follows: for
all x, y Œ X, if R(x, y) > 0 and R(y, x) > 0, then x = y. This, clearly, is a cutworthy property.
It is important to realize that there are many fuzzy-set generalizations of
properties of classical sets that are not cutworthy. A fuzzy-set generalization
of some classical property is required to reduce to its classical counterpart
when membership grades are restricted to 0 and 1, but it is not required to be
cutworthy. There often are multiple generalizations of a classical property, but
only one or, in some cases, none of them is cutworthy. Examples of fuzzy-set
generalizations that are not cutworthy are all operations of intersection and
union of fuzzy sets (t-norms and t-conorms) except the standard ones (min
and max). Even more interesting examples are operations of complementation of fuzzy sets, none of which is cutworthy, even though all of them are, by
definition, generalizations of the classical complementation.
Another way of connecting classical set theory and fuzzy set theory is to
fuzzify functions. Given a function
f : X Æ Y,
where X and Y are crisp sets, we say that the function is fuzzified when it is
extended to act on fuzzy sets defined on X and Y. That is, the fuzzified function maps, in general, fuzzy sets defined on X to fuzzy sets defined on Y.
Formally, the fuzzified function, F, has a form
318
8. FUZZIFICATION OF UNCERTAINTY THEORIES
a = 0.0
a = 0.4
x1
x1
x2
x3
x4
x5
x6
x7
x2
x4
x3
x5
x6
x7
x7
a = 0.5
x1
x2
x4
x3
x5
x6
a = 0.8
x1
x2
x4
x3
x5
x6
x7
x6
x7
a = 0.9
x1
x2
x4
x3
x5
a = 1.0
x1
x2
x4
x3
x5
x6
x7
Figure 8.1. Partition tree of the fuzzy equivalence relation in Example 8.1.
F : F (X ) Æ F (Y ),
where F(X) and F(Y) denote the fuzzy power sets (sets of all fuzzy subsets)
of X and Y, respectively. To qualify as a fuzzified version of f, function F must
conform to f within the extended domain F(X) and F(Y). This is guaranteed
when a principle is employed that is called an extension principle. According
to this principle
B = F ( A)
is determined for any given fuzzy set A ŒF(X) and all y ŒY via the formula
Ï sup{ A(x) x Œ X , f (x) = y}
B( y) = Ì
Ó0
when f -1 ( y) π ∆
otherwise.
(8.2)
8.1. ASPECTS OF FUZZIFICATION
319
The inverse function
F -1 : F (Y ) Æ F (X )
of F is defined, according to the extension principle, for any given B ŒF(Y)
and all x ŒX, by the formula
[ F -1 (B)](x) = B( y),
(8.3)
where y = f(x). Clearly,
F -1[ F ( A)]
A
for all A ŒF(X), where the equality is obtained when f is a one-to-one
function.
The use of the extension principle is illustrated in Figure 8.2, which shows
how fuzzy set A is mapped to fuzzy set B via function F that is consistent with
the given function f. That is, B = F(A). For example, since
b = f (a1 ) = f (a 2 ) = f (a3 ),
we have
B(b) = max{ A(a1 ), A(a 2 ), A(a3 )}
B(b)
B
y
A(x)
1
f
b
a1
0
B(y)
A(x)
1
A(a 1 )
A(a 2 )
a2
a3
A(a3)
x
A
Figure 8.2. Illustration of the extension principle for fuzzy sets.
320
8. FUZZIFICATION OF UNCERTAINTY THEORIES
by Eq. (8.2). Conversely,
[ F -1 (B)](a1 ) = [ F -1 (B)](a 2 ) = [ F -1 (B)](a3 ) = B(b)
by Eq. (8.3).
The introduced extension principle, by which functions are fuzzified, is basically described by Eqs. (8.2) and (8.3). These equations are direct generalizations of similar equations describing the extension principle of classical set
theory. In the latter, symbols A and B denote characteristic functions of crisp
sets.
To fuzzify a function of n variables of the form
f : X1 ¥ X 2 ¥ . . . ¥ X n Æ Y ,
the formula in Eq. (8.2) has to be replaced with the more general formula
{ Ai (x i ) x i Œ X i , i Œ ⺞n , f (x1 , x 2 , ◊ ◊ ◊ , x n ) = y} when f -1 ( y) π ∆
Ï sup min
i Œ⺞ n
B( y) = Ì
otherwise.
Ó0
(8.4)
Similarly, Eq. (8.3) has to be replaced with
[ F -1 (B)](x1 , x 2 , ◊ ◊ ◊ , x n ) = B( y).
(8.5)
Equation (8.4) can be further generalized by replacing the min operator with
a t-norm.
EXAMPLE 8.2. Consider the function y =
angular fuzzy number
Ï2 x - 1
Ô
A(x) = Ì3 - 2 x
Ô0
Ó
x , where x Œ [0, 5], and the tri-
when x Œ[ 0.5, 1)
when x Œ[1, 1.5]
otherwise,
which represents an approximate assessment of value x. Using the extension
principle, the value of y is approximately assessed by the fuzzy number
Ï2 x - 1
Ô
B( y = x ) = Ì3 - 2 x
Ô0
Ó
when x Œ[ 0.5 , 1)
when x Œ[1, 1.5 ]
otherwise.
Fuzzification also can be attained via the mathematical theory of categories.
In this way, a fuzzy structure is imparted into various categories of mathematical objects by fuzzifying morphisms through which the categories are
8.2. MEASURES OF FUZZINESS
321
defined. This is a powerful approach to fuzzification, which has played an
important role in fuzzifying many areas of mathematics, such as topology,
analysis, various algebraic theories, graphs, hypergraphs, geometry, and finitestate automata. Because the reader of this book is not expected to have sufficient background in category theory, this approach to fuzzification is not
employed in this chapter.
There are usually multiple ways in which a given classical mathematical
structure can be fuzzified. These ways of fuzzification are distinguished from
one another by choosing which components of the structure are going to be
fuzzified and how. These choices have to be made in each given application
context on pragmatic grounds.
8.2. MEASURES OF FUZZINESS
The pragmatic value of fuzzy logic in the broad sense consists of its capability to represent and deal with vague linguistic expressions. Such expressions
are typical of natural language. A linguistic expression is vague when its
meaning is not fixed by sharp boundaries. Such a linguistic expression is always
associated with uncertainty regarding its applicability. However, this type of
uncertainty does not result from any information deficiency, but from the lack
of linguistic precision. It is a linguistic uncertainty rather than informationbased uncertainty. Consider, for example, a fuzzy set that was constructed in a
given application context to represent the linguistic term “high temperature.”
Assume now that a particular measurement of the temperature was taken (say
92°F). This measurement belongs to the fuzzy set with a particular membership degree (say, 0.7). Clearly, this degree does not express any lack of information (the actual value of the temperature is known—it is 92°F), but rather
the degree of compatibility of the known value with the imprecise (vague)
linguistic term.
Since this linguistic uncertainty is represented by fuzzy sets, it is usually
called fuzziness. Each fuzzy set is clearly associated with some amount of fuzziness, and it is desirable to be able to measure this amount in some meaningful way. Although the amount of fuzziness is not connected in any way to the
quantification of information, it is an important trait of information representation. Clearly, our ability to measure it allows us to characterize information
representation more completely, and thus enriches our methodological capabilities for dealing with information.
In general, a measure of fuzziness for some type of fuzzy sets is a functional
f : F (X ) Æ ⺢ + ,
where F(X ) denotes the set of all fuzzy subsets of X of a given type. Thus far,
however, the issue of measuring fuzziness has been addressed only for standard fuzzy sets. This restriction is thus followed in this section as well.
322
8. FUZZIFICATION OF UNCERTAINTY THEORIES
Several ideas of how to measure the fuzziness of fuzzy sets (i.e., standard
fuzzy sets) have been pursued in the literature. One of them, which is currently
predominant in the literature and is followed in this section, is to measure the
fuzziness of a fuzzy set by the lack of distinction between the fuzzy set and its
complement. This is a sound idea. It is precisely the lack of distinction between
sets and their complements that distinguishes fuzzy sets from crisp sets. The
less a set differs from its complement, the fuzzier it is.
Measuring fuzziness in terms of distinctions between sets and their complements is dependent on the definition of the complementation operation.
The simplest way of expressing the local distinctions (one for each x ŒX) of
a given set A and its complement, c(A), is to calculate the difference
A(x) - c[ A(x)] .
The maximum of this difference, obtained for crisp sets, is 1. The lack of distinction between A and its complement is thus expressed for each x Œ X by
the value
1 - A(x) - c[ A(x)] .
When X is finite, a measure of fuzziness, f(A), of the set A is then obtained by
adding these values for all x in the support of A, S(A). That is,
f ( A) =
Â
x Œ S( A )
(1 - A(x) - c[ A(x)] ),
(8.6)
or, alternatively,
f ( A) = S ( A) -
Â
x Œ S( A )
A(x) - c[ A(x)] .
(8.7)
Equations (8.6) and (8.7) are justified by the fact that 1 - |A(x) - c[A(x)]| = 0
for all x œ S(A). Clearly,
0 £ f ( A) £ X ,
where f(A) = 0 if and only if A is a crisp set, and f(A) = |X| if and only if
A(x) = c[ A(x)]
for all x Œ X, which means that the set is equal to its complement. When f(A)
is divided by |X|, we obtain a normalized value, fˆ (A), of fuzziness. Clearly,
0 £ fˆ ( A) £ 1.
When X = [x, x̄] and S(A) = [a, b] 債 X, the summation in Eq. (8.6) must be
replaced with integration. That is,
8.2. MEASURES OF FUZZINESS
323
b
f ( A) = Ú (1 - A(x) - c[ A(x)] ) dx,
(8.8)
a
or alternatively,
b
f ( A) = b - a - Ú A(x) - c[ A(x)] dx .
(8.9)
a
Clearly,
0 £ f ( A) £ x - x
and for the associated normalized version, fˆ (A) = f(A)/(x̄ - –x),
0 £ fˆ ( A) £ 1.
When c is the standard fuzzy complement, Eq. (8.7) becomes
f ( A) = S ( A) -
Â
2 A(x) - 1,
(8.10)
f ( A) = b - a - Ú 2 A(x) - 1 dx.
(8.11)
x ŒS ( A )
and Eq. (8.9) becomes
b
a
EXAMPLE 8.3. To calculate the amount of fuzziness of the fuzzy relation R
in Example 8.1 by Eq. (8.10) (i.e., assuming the use of standard operation of
complementation), it is convenient to first calculate the following matrix of
the differences |2R(x) - 1| for all pairs x = ·xi, xjÒ Œ S(R):
x1
x1 È1.0
x 2 ÍÍ 0.6
x3 Í —
Í
x 4 Í 0.2
x5 Í —
Í
x6 Í —
x 7 ÍÎ —
x2
0.6
1.0
—
0.2
—
—
—
x3
—
—
1.0
—
1.0
0.8
0.0
x4
0.2
0.2
—
1.0
—
—
—
x5
—
—
1.0
—
1.0
0.8
0.0
x6
—
—
0.8
—
0.8
1.0
0.0
x7
—˘
— ˙˙
0.0 ˙
˙
— ˙.
0.0 ˙
˙
0.0 ˙
1.0 ˙˚
The sum of all these differences is 14.2, |X| = 49, and |S(R)| = 25. Hence, f(R)
= 25 - 14.2 = 10.8 and fˆ (R) = 10.8/49 = 0.22.
324
8. FUZZIFICATION OF UNCERTAINTY THEORIES
EXAMPLE 8.4. To illustrate the use of Eq. (8.11), let
A(x) = max{0, 2 x - x 2 }
for all x Œ X = [0, 5]. It is obvious that the support of A is the interval [0, 2].
Hence,
2
f ( A) = 2 - Ú 4 x - 2 x 2 - 1 dx.
0
The expression 4x - 2x2 - 1 is positive within the interval [x1, x2] and negative
within the intervals [0, x1] and [x2, 2], where x1 = (2 - 2 ) 2 and x2 = (2
+ 2 ) 2 . Hence,
x1
x2
0
x1
f ( A) = 2 - Ú (1 + 2 x 2 - 4 x) dx - Ú (4 x - 2 x 2 - 1) dx
2
- Ú (1 + 2 x 2 - 4 x) dx = 0.78,
x2
and fˆ (A) = 0.78/5 = 0.156.
Two alternative functionals for measuring the fuzziness of fuzzy sets are
well known in the literature. Both of them are based on the idea that fuzziness is manifested by the lack of distinction between a fuzzy set and its complement. One of them, say functional f ¢, expresses this lack of distinction by
the standard intersection of the set and its complement. That is,
Â
min{ A(x), c[ A(x)]}
(8.12)
f ¢( A) = Ú min{ A(x), c[ A(x)]} dx
(8.13)
f ¢( A) =
x ŒS ( A )
when X is finite, or
b
a
when S(A) = [a, b]. The minimum, f ¢(A) = 0, is clearly obtained only if A is a
crisp set. The maximum is obtained when
A(x) = c[ A(x)]
for all x ŒX. The value for which this equation is satisfied is called an equilibrium of complement c. For example, the equilibrium of the standard complement is 0.5. Denoting the equilibrium of complement c by ec, it is obvious
that
8.2. MEASURES OF FUZZINESS
325
0 £ f ¢( A) £ X ◊ e c
when X is finite, or
0 £ f ¢( A) £ (x - x) ◊ e c
when X = (x, x̄). For the standard fuzzy complement, it is easy to show that
f(A) = 2f ¢(A) for any fuzzy set A, regardless whether it is defined on a finite
set X or on ⺢. In this case, f and f ¢ differ only in the measurement unit, which
does not affect their normalized versions. Hence, fˆ (A) = fˆ ¢(A) for all fuzzy
sets.
Another functional, f ≤, extensively covered in the literature, is applicable
only to the standard fuzzy complement. It is defined by the formula
 [ - A(x) log
A(x) - (1 - A(x)) log 2 (1 - A(x))]
(8.14)
f ¢¢( A) = Ú [ - A(x) log 2 A(x) - (1 - A(x)) log 2 (1 - A(x))] dx
(8.15)
f ¢¢( A) =
2
x ŒX
when X is finite, or by the formula
⺢
when X = [x, x̄]. This functional is based on the recognition that for each x Œ
X, the values of A(x) and the standard fuzzy complement, 1 - A(x), add to 1.
Hence, for each x Œ X, the pair ·A(x), 1 - A(X)Ò may be viewed as a probability distribution on two elements. The functional utilizes, for each x Œ X, the
Shannon entropy of this elementary distribution, S(A(x), 1 - A(x)), as a convenient measure of the lack of distinction between A(x) and 1 - A(x).
EXAMPLE 8.5. To compare functionals f and f ≤, let us repeat Example 8.3
for f ≤. First, for each entry of the matrix we calculate the Shannon entropy
S(R(x), 1 - R(x)):
x1
x2
x3
x4
x5
x6
x7
x1
0
.
È 00
Í 0.72
Í
Í 0.00
Í
Í 0.97
Í 0.00
Í
Í 0.00
ÍÎ 0.00
x2
0.72
0.00
0.00
0.97
0.00
0.00
0.00
x3
0.00
0.00
0.00
0.00
0.00
0.47
1.00
x4
0.97
0.97
0.00
0.00
0.00
0.00
0.00
x5
0.00
0.00
0.00
0.00
0.00
0.47
1.00
x6
0.00
0.00
0.47
0.00
0.47
0.00
1.00
x7
0.00 ˘
0.00 ˙˙
1.00 ˙
˙
0.00 ˙.
1.00 ˙
˙
1.00 ˙
0.00 ˙˚
After adding all entries in this matrix, we obtain f ≤(R) = 13.2 and fˆ ≤(R) =
13.2/49 = 0.27.
326
8. FUZZIFICATION OF UNCERTAINTY THEORIES
EXAMPLE 8.6. To illustrate the use of Eq. (8.15), let us repeat Example 8.4
for f ≤. We have,
2
f ¢¢( A) = -Ú [(2 x - x 2 ) log 2 (2 x - x 2 ) + (1 - 2 x + x 2 ) log 2 (1 - 2 x + x 2 )] dx
0
2
1
È 4x 4
=Í
+ log 2 (x - 2) + log 2 (x - 1) + (x - 3)x 2 log 2 (2 x - x 2 )
3
3
3
Î 3
2
1
˘
- x(3 - 3x - x 2 ) log 2 (1 - 2 x + x 2 )˙
3
˚0
= 1.18.
Moreover, fˆ ≤(A) = 1.18/5 = 0.236.
8.3. FUZZY-SET INTERPRETATION OF POSSIBILITY THEORY
The term “possibility theory” is used in this section for the theory of graded
possibilities introduced in Section 5.2. As a formal mathematical system, possibility theory has various interpretations, some of which are mentioned in
Section 5.2.6. Perhaps the most visible and useful interpretation of possibility
theory, which is the subject of this section, is its fuzzy-set interpretation. In this
interpretation, possibility profiles are derived from information expressed in
terms of fuzzy sets.
In order to explain this interpretation of possibility theory, let X denote a
variable that takes values on a universal set X, and assume that information
about the actual value of the variable is expressed by a proposition “X is F,”
where F is, in general, a fuzzy set on X. This clearly also covers the special case
when F is a crisp set or even a singleton. To express information captured
by this fuzzy proposition in measure–theoretic terms, it is natural, to interpret
the membership degree F(x) for each x ŒX as the degree of possibility that X
= x. This interpretation induces a unique possibility profile rF on X that is
defined for all x ŒX by the equation
rF (x) = F (x).
(8.16)
Given this possibility profile, the corresponding possibility measure, PosF, is
then determined for all A ŒP(X) by the formula
PosF ( A) = sup{min{ c A (x), rF (x)}},
x ŒX
(8.17)
where c A denotes the characteristic function of A. When A is a fuzzy set, the
characteristic function in Eq. (8.17) is replaced with the membership function
of A, which results in the more general formula
8.3. FUZZY-SET INTERPRETATION OF POSSIBILITY THEORY
PosF ( A) = sup{min{ A(x), rF (x)}}.
327
(8.18)
x ŒX
The use of Eqs. (8.17) and (8.18) is illustrated in Figure 8.3a and 8.3b,
respectively.
Equation (8.18) is also applicable when X is a multidimensional variable
and F is then a fuzzy relation, generally n-dimensional (n ≥ 2); the induced
possibility distribution function rF is also n-dimensional in this case. When the
given fuzzy proposition is truth qualified, probability qualified, or modified in
some other way, Eq. (8.18) is still applicable, provided that the fuzzy set F in
Eq. (8.16) represents a relevant composition of functions involved in the modification, as is explained in Section 7.6.1.
The fuzzy-set interpretation of possibility theory emerges quite naturally
from the similarity between the mathematical structures of possibility measures (or, alternatively, necessity measures) and fuzzy sets. In both cases, the
cA(x)
1
PosF (A)
F(x)=r F (x)
rF (a)=PosF ({a})
0
a
x
(a)
1
F(x)=rF (x)
PosF (A)
A (x)
0
x
(b)
Figure 8.3. Illustration of (a) Eq. (8.17); (b) Eq. (8.18).
328
8. FUZZIFICATION OF UNCERTAINTY THEORIES
underlying mathematical structures are families of nested sets. In possibility
theory, these families consist of focal elements; in fuzzy sets, they consist of
a-cuts.
Equation (8.16) is usually referred to as the standard fuzzy-set interpretation of possibility theory. This interpretation is simple and natural, but it is
applicable only to fuzzy sets that are normal. Recall that a fuzzy set F is normal
when its height, hF, is equal to 1. When a subnormal fuzzy set F is involved in
Eq. (8.16), for which hF < 1 by definition, then
sup{rF (x)} = hF < 1
x ŒX
and, by Eq. (8.18),
Pos (X ) = hF < 1.
This violates one of the axioms of possibility theory, and hence, the theory
looses its coherence. To illustrate this problem, let SF and S̄F denote the support
of F and its complement, respectively. When hF < 1 for the fuzzy set F in Eq.
(8.16), we obtain
PosF (S F ) = hF
and
PosF (S F ) = 0
by Eq. (8.18). Then, by the duality between Pos and Nec, expressed by Eq.
(5.1), we have
Nec F (S F ) = 1.
Hence,
Nec F (S F ) > PosF (S F ),
which violates one of the key properties of possibility theory: whatever is
necessary must be possible at least to the same degree, as expressed by the
inequality in Eq. (5.12).
To assume that F in Eq. (8.16) is always normal is overly restrictive. For
example, if the given fuzzy proposition “X is F” represents the conjunction of
several fuzzy propositions that express information about X obtained from
disparate sources, it is quite likely that F is subnormal. In this case, the value
of 1 - h(F ) indicates the degree of inconsistency among the individual information sources. It is thus important that a sound fuzzy-set interpretation of
possibility theory be coherent for all fuzzy sets, regardless of whether they are
normal or not.
In order to formulate a genuine fuzzy-set interpretation of possibility
theory, we need to address the following question: How should we interpret
8.3. FUZZY-SET INTERPRETATION OF POSSIBILITY THEORY
329
information given in the form “X is F,” where F is an arbitrary fuzzy subset of
X, in terms of a possibility profile? An associated, but more specific, question
is: How should we assign values of rF to the values of F(x) for all x ŒX?
The position taken here is that the definition of the possibility profile rF in
terms of a given fuzzy set F should be such that it does not change the evidence conveyed by F and, at the same time, preserves the required possibilistic normalization, which is expressed in general by the equation supxŒX{rF(x)}
= 1. Since the standard definition of rF given by Eq. (8.16) does not satisfy the
normalization for subnormal fuzzy sets, it must be appropriately modified.
To satisfy the possibilistic normalization, we must define rF(x) = 1 for at least
one x ŒX. Since there is no reason to treat distinct elements of X differently,
the only sensible way to achieve the required normalization is to increase the
values of rF equally for all x ŒX by the amount of 1 - hF. This means that the
revised fuzzy-set interpretation of possibility theory is expressed for all x ŒX
by the equation
rF (x) = F (x) + 1 - hF .
(8.19)
This is a generalized counterpart of the standard interpretation, Eq. (8.16); it
is applicable to all fuzzy sets, regardless whether they are normal or not. For
normal fuzzy sets, clearly, Eq. (8.19) collapses to Eq. (8.16).
The significance of the possibility profile rF defined by Eq. (8.19) is that it
is the only one that does not change the evidence conveyed by F. An easy way
to show this uniqueness is to use the Möbius representation of this possibility
profile (see Section 5.2.1). As in Section 5.2.1, assume that elements of the universal set X = {x1, x2, . . . , xn} are ordered in such a way that
rF (x i ) ≥ rF (x i+1 )
for all i Œ⺞n-1, and let rF(xi+1) = 0 by convention. Moreover, let Ai = {x1, x2,
. . . , xi}. Then, the Möbius representation, mF, of the possibility profile is given
by the formula
ÏrF (x i ) - rF (x i+1 )
mF ( A) = Ì
Ó 0
when A = Ai for some i Œ ⺞n
when A π Ai for all i Œ ⺞n .
(8.20)
Now substituting for rF from Eq. (8.19), we obtain
Ï F (x i ) - F (x i+1 )
Ô
mF ( A) = Ì inf F (x i ) + 1 - hF
x ŒX
Ô i
Ó 0
when A = Ai for some i Œ ⺞n-1
when A = X (= An )
when A π Ai for all i Œ ⺞n .
We can see that the evidential support, mF(A), for the various sets A Ã X is
based on the original evidence expressed by the values F(xi) for all i Œ⺞i. The
330
8. FUZZIFICATION OF UNCERTAINTY THEORIES
only change is in the value of mF(X), which expresses the degree of our
ignorance. Since inconsistency in evidence, which is expressed by the value
1 - hF, is a form of ignorance, it is natural to place it in mF(X).
The necessity function, NecF, which is based on the evidence expressed by
F, is uniquely determined by mF via the usual formula
i
Ï
Ô Â m( Ak ) = 1 - rF (x i+1 )
Nec F ( A) = Ì k=1
Ô 1 - max rF (x i )
Ó
x i ŒA
when A = Ai for some i Œ ⺞n .
otherwise.
(8.21)
This function, again, does not modify in any way the evidence expressed by F.
Moreover,
Â
PosF ( Ai ) =
m( Ak ) = 1 for all i Œ ⺞n ,
(8.22)
Ak « Ai π∆
which is always the case for all possibilistic bodies of evidence.
It is easy to see that defining rF in any way different from Eq. (8.19) would
either violate the possibilistic normalization or the evidence conveyed by F.
For example, it has often been suggested in the literature to obtain a normalized possibility profile, r¢F, for a subnormal fuzzy set by the formula
rF¢ (x) =
F (x)
hF
(8.23)
for all x ŒX. However, this possibility profile does not preserve the evidence
conveyed by F. Clearly,
mF¢ ( Ai ) = rF¢ (ri ) - r ¢(ri+1 ) =
1
[ F (x i ) - F (x i+1 )],
hF
for all i Œ⺞n, where the symbols xi, xi+1, and Ai have the meaning introduced
earlier in this section. We can see that the evidential support for sets Ai (focal
elements), expressed by the values of m¢F (Ai), is inflated by the factor of 1/hF.
Under the generalized fuzzy-set interpretation of possibility theory, Eq.
(8.18) can be written more explicitly as
PosF ( A) = sup{min{ A(x), F (x) + 1 - hF }},
x ŒX
where A and F are arbitrary fuzzy sets. Thus, for example,
PosF (S F ) = sup{min{S F (x), F (x) + 1 - hF }}
x ŒX
= sup{F (x) + 1 - hF }
x ŒX
=1
(8.24)
8.3. FUZZY-SET INTERPRETATION OF POSSIBILITY THEORY
331
PosF (S F ) = sup{min{S F (x), F (x) + 1 - hF }}
x ŒX
Ï1 - hF
=Ì
Ó0
when S F π ∆
when S F = ∆.
Nec F (S F ) = 1 - PosF (S F )
when S F π X
Ï hF
=Ì
when S F = X
Ó0
Nec F (S F ) = 1 - PosF (S F )
= 0.
Now consider the two extreme cases when F = X and F = ⭋. In the former
case, the proposition “X is X” does not carry any information; in the latter
case, the proposition “X is ⭋” represents evidence that is totally conflicting,
and hence, it does not carry any information either. Applying Eq. (8.19), we
obtain
rX (x) = r∆ (x) = 1
for all x ŒX. This means that both propositions are represented by the same
possibility and necessity measures:
Ï1
PosX ( A) = Pos∆ ( A) = Ì
Ó0
when A π ∆,
when A = ∆,
Ï0
Nec X ( A) = Nec ∆ ( A) = Ì
Ó1
when A π X ,
when A = X ,
These results are exactly the same as we would expect on intuitive grounds.
EXAMPLE 8.7. To illustrate the generalized fuzzy-set interpretation of possibility theory, let evidence regarding the relation between two discrete
variables X and Y with states in sets X = {xa|a Œ⺞4} and Y = {yb|b Œ⺞5}, respectively, be expressed in terms of a fuzzy relation R defined by the following
matrix:
y1
x1 È 0.0
x 2 Í 0.3
R= Í
x3 Í 0.0
Í
x 4 Î 0.2
y2
0.2
0.0
0.7
0.4
y3
0.0
0.0
0.6
0.3
y4
0.4
0.6
0.5
0.0
y5
0.5 ˘
0.5 ˙˙
.
0.4 ˙
˙
0.0 ˚
The possibility profile, rR, based on this evidence is determined by Eq. (8.19):
332
8. FUZZIFICATION OF UNCERTAINTY THEORIES
y1
x1 È 0.3
x Í 0.6
[rR (x i , y j )] = 2 Í
x3 Í 0.3
Í
x 4 Î 0.5
y2
0.5
0.3
1.0
0.7
y3
0.3
0.3
0.9
0.6
y4
0.7
0.9
0.8
0.3
y5
0.8 ˘
0.8 ˙˙
.
0.7 ˙
˙
0.3 ˚
The nested body of evidence associated with this 2-dimensional possibility
profile is shown in Figure 8.4, where the pairs of integers denote the subscripts
of pairs ·xa, ybÒ. Calculating the various components of uncertainty associated
with this possibilistic body of evidence, we obtain: GH(mR) = 3.12, S̄(NecR) =
4.29, and S(NecR) = 0. Viewing rR as the normalized version, R̂, of the given
relation R, we readily obtain f(R̂) = 11.2 and fˆ (R̂) = 0.56, while for the given
fuzzy relation R, we get f(R) = 9.6 and fˆ (R) = 0.48.
Consider now a real-valued variable X that takes values in X = [x, x̄]. Information about the actual value of the variable is expressed again by the proposition “X is F,” where F is in this case a fuzzy interval defined on X. Differences
in Eq. (8.20) are now differentials. Denoting the focal elements corresponding to the a-cuts aF by arF, the necessity function for all focal elements is
expressed by the simple formula
a
Nec F ( rF ) = Ú - da
a
1
(8.25)
= 1 - a.
For other crisp sets
rR(xa, yb): 1
0.9
0.8
0.7
14
35
42
0.6
0.5
0.3 ¨ Eq. (8.19)
11
13
22
23
31
44
45
32
24
33
15
25
34
Ai:
A1
A3
A6
A9
A11
A13
A20
mR (Ai):
0.1
0.1
0.1
0.1
0.1
0.2
0.3 ¨ Eq. (8.20)
21
43
12
41
NecR (Ai): 0.1
0.2
0.3 0.4 0.5 0.7
1.0 ¨ Eq. (8.21)
PosR (Ai): 1.0
1.0
1.0
1.0 ¨ Eq. (8.22)
1.0
1.0
1.0
Figure 8.4. Possibilistic body of evidence in Example 8.7.
8.3. FUZZY-SET INTERPRETATION OF POSSIBILITY THEORY
Nec F ( A) = 1 - sup rF (x)
333
(8.26)
x ŒA
which can be easily generalized to fuzzy set A via Eq. (8.18).
Equations (8.25) and (8.26) are counterparts of Eq. (8.21) for any convex
and bounded universal set X Ã ⺢. A generalization to ⺢n(n ≥ 2) is straightforward, but it is not pursued here.
EXAMPLE 8.8. Let evidence regarding the value of a real-valued variable
X, whose values are in the interval X = [0, 4], be expressed by the proposition
“X is F,” where F is a triangular fuzzy number shown in Figure 8.5 and defined
for each x ŒX by the formula
Ï (x - 1) / 2
Ô
F (x) = Ì(3 - x) / 2
Ô0
Ó
when x Œ[1, 2)
when x Œ[2, 3]
otherwise.
F is a subnormal fuzzy set; hF = 0.5. According to Eq. (8.19), the associated
possibility profile rF is defined for each x ŒX by
rF (x) = F (x) + 0.5,
and its graph is shown in Figure 8.5. For each a Œ(0, 1], the focal elements of
rF are
(4–x)/2
x/2
1
rF
a
.5
F
0
0
3
2
1
2a
4-2a
Figure 8.5. Illustration to Example 8.8.
x
4
334
8. FUZZIFICATION OF UNCERTAINTY THEORIES
a
Ï[ 0, 4]
rF = Ì
Ó[2a , 4 - 2a ]
when (0, 0.5]
when (0.5, 1],
and their Lebesgue measures are
when a Œ[ 0, 0.5]
when a Œ (0.5, 1].
Ï4
m ( a rF ) = Ì
Ó 4 - 4a
The generalized Hartley-like measure of nonspecificity, GHL, is then calculated by the following integral:
1
GHL(rF ) = Ú log 2 [1 + m ( a rF )] da
0
0 ,5
=
Ú
1
log 2 5 da +
0
Ú log
2
(5 - 4a ) da
0.5
0.5
0
= [a log 2 5]
= 2.077.
1
a ˘
È1
- Í (5 - 4a ) log 2 (5 - 4a ) +
ln 2 ˙˚0.5
Î4
Using Eq. (8.11), the degree of fuzziness of F and its normalized counterpart,
F̂ = rF, are f(F) = 1 and f(F̂) = 3, and their normalized versions are fˆ (F) = 0.25
and fˆ (F̂) = 0.75. The numbers can be obtained in this case by simple geometrical considerations in Fig. 8.5.
8.4. PROBABILITIES OF FUZZY EVENTS
A simple fuzzification of classical probability theory is obtained by extending
the classical concept of an event from classical sets to fuzzy sets. Notwithstanding its simplicity, this fuzzification makes probability theory more expressive and, consequently, enlarges the domain of its applicability.The use of fuzzy
events adds new capabilities to probability theory. Among them is the capability to capture the meanings of linguistic expressions of natural language,
and the capability to express borderline uncertainties in measurement.
Given a probability distribution function p on X (when X is finite) or a
probability density function q on X (when X = ⺢n for some n ≥ 1), the classical probability measure, Pro(A), can be expressed for each set A in a given
s-algebra by the formulas
Pro( A) =
Âc
A
(x) p(x),
x ŒX
Pro( A) = Ú n c A (x)q(x) dx,
⺢
8.4. PROBABILITIES OF FUZZY EVENTS
335
respectively, where XA denotes the characteristic function of set A. These
formulas are readily generalized to fuzzy sets (events) A by replacing the
characteristic function XA with the membership function A:
Pro( A) =
 A(x) p(x),
(8.27)
x ŒX
Pro( A) = Ú n A(x)q(x) dx.
⺢
(8.28)
Observe that the classical case of crisp events is captured by Eqs. (8.27) and
(8.28) as well. Observe also that probabilities of fuzzy events preserve some
properties of classical (additive) probabilities. For example, using the standard
operations on fuzzy sets and assuming that X is finite, we obtain for any A, B
ŒF(X)
 max{ A(x), B(x)}
= Â A(x) + Â B(x) - Â min{ A(x), B(x)}
Pro( A » B) =
x ŒX
x ŒX
x ŒX
x ŒX
= Pro( A) + Pro(B) - Pro( A « B).
This equation conforms to the calculus of classical probability theory, even
though A and B are fuzzy events. However, some other properties of the calculus are not preserved for fuzzy events. For example, Pro(A » Ā) £ 1 and
Pro(A « Ā) ≥ 0 when A is a fuzzy event; the equalities are obtained only in
the special case when A becomes a crisp event.
EXAMPLE 8.9. Consider a random variable whose values are positive real
numbers and whose probability distribution function p (shown in Figure 8.6)
is characterized by the probability density function
Ï(x - 76) 40
Ô
q(x) = Ì(86 - x) 10
Ô0
Ó
when x Œ[ 76, 84)
when x Œ[ 84, 86]
otherwise.
This function is also shown in Figure 8.6, together with the membership
function
Ï(x - 76) 4
Ô
A(x) = Ì(86 - x) 4
Ô0
Ó
when x Œ[ 76, 80)
when x Œ[ 80, 84]
otherwise .
which represents (in a given application context) a fuzzy event “around 80.”
Using Eq. (8.28), the probability that x is around 80, Pro(A), is calculated as
follows:
336
8. FUZZIFICATION OF UNCERTAINTY THEORIES
1
p
A
0.8
0.6
0.4
0.2
q
78
80
82
84
86
Figure 8.6. Probability distribution p, probability density function q, and fuzzy event A in
Example 8.9.
80
Pro( A) =
84
x - 76 x - 76
84 - x x - 76
◊
dx + Ú
◊
dx = 0.4.
4
40
4
40
76
80
Ú
A random variable whose values are real numbers in an interval X = [x, x̄]
is often approximated by partitioning X into a finite set of disjoint intervals.
This approximation is usually referred to as quantization. Assuming that the
partition consists of n intervals (quanta) of equal length, Dn = (x - x̄)/n, the
partition, pn(X), is defined in general terms as
p n (X ) = {{[ x + (i - 1)D n , x + iD n ] i Œ ⺞n-1}, [ x + (n - 1)D n , x + nD n ,]}.
Observe that all intervals except the last one in this set are semiclosed intervals; the last interval is closed to ensure that the union of all the intervals is
equal to X (as required for a partition). For convenience, let
[ x + (i - 1)D n , x + iD n ]
I i = ÏÌ
Ó[ x + (i - 1)D n , x + iD n ]
when i Œ ⺞n-1
when i Œ n.
Recall that these intervals may be represented by their characteristic functions
cIi .
An example of partition pn(X), where X = [0, 100] and n = 5, is shown in
Figure 8.7a. The five intervals in this partition may be characterized as classes
of values of the variable that are small, medium, large, and so on, as is indicated in the figure. A more expressive representation of these classes is
obtained when the intervals Ii Œ pn(X) are replaced with appropriate fuzzy
337
8.4. PROBABILITIES OF FUZZY EVENTS
I1
Very small
0
I2
I3
I4
I5
Small
Medium
Large
Very large
40
20
60
80
100
(a)
A 1 : Very
small
0
A 2 : Small
20
A 3: Medium
40
A5 : Very
large
A 4 : Large
60
80
100
(b)
Figure 8.7. Examples of partitions of the interval [0, 100]: (a) crisp partition; (b) fuzzy partition.
intervals Ai, as shown in Figure 8.7b. The main advantage of fuzzy intervals is
their capability to make transitions between adjacent classes gradual rather
than abrupt.
A nonempty family of fuzzy intervals Ai, i Œ⺞n, for some n ≥ 1 (or, more
generally, fuzzy sets) defined on X is called a fuzzy partition of X if and only
if the family does not contain the empty set and
Â
Ai (x) = 1 for each x Œ X .
i Œ⺞ n
It is left to the reader to show that fuzzy partitions are not cutworthy. The five
trapezoidal-shaped fuzzy intervals in Figure 8.7b, whose definitions are
A1 = 0, 0, 15, 25
A2 = 15, 25, 35, 45
A3 = 35, 45, 55, 65
A4 = 55, 65, 75, 85
A5 = 75, 85, 100, 100 ,
form a fuzzy partition on [0, 100].
338
8. FUZZIFICATION OF UNCERTAINTY THEORIES
Now assume that we have a set of m observations of the variable
{x k Œ X k Œ ⺞m}.
Using these data, probabilities of the individual intervals, Pro(Ii), are calculated in the usual way. For each i Œ⺞n,
fi
,
m
(8.29)
(x k ).
(8.30)
Pro(I i ) =
where
m
fi =
Âc
Ii
k=1
Counterparts of Eqs. (8.29) and (8.30) for fuzzy intervals Ai are, respectively
Pro( Ai ) =
ai
,
m
(8.31)
where
m
ai =
 A (x
i
k
).
k=1
If the family {Ai | i Œ⺞n} forms a fuzzy partition of X, then
Â
i Œ⺞ n
ai = m
and
Â
Pro( Ai ) = 1.
i Œ⺞ n
8.5. FUZZIFICATION OF REACHABLE INTERVAL-VALUED
PROBABILITY DISTRIBUTIONS
A natural fuzzification of the theory of reachable interval-valued probability
distributions (introduced in Section 5.5) is obtained by extending relevant
probability intervals to fuzzy intervals via the a-cut representation. To discuss
basic issues of this fuzzified uncertainty theory, let the following notation be
used.
Assume, as in Section 5.5, that we deal with a finite set X = {xi | i Œ⺞n} of
considered alternatives. These alternatives may, in general, be viewed as states
of variable X. Assume further that each alternative xi Œ X is associated with
an imprecise probability expressed by a fuzzy interval, Fi, defined on [0, 1]. It
is assumed that Fi(p) is defined for all p Œ [0, 1] in terms of the canonical form
8.5. FUZZIFICATION OF REACHABLE INTERVAL-VALUED PROBABILITY DISTRIBUTIONS
339
expressed by Eq. (7.25). As always, the fuzzy interval is uniquely represented
by the family of its a-cuts,
a
Fi = [l i (a ), ui (a )]
for all a Œ [0, 1]. In each particular application, evidence in this fuzzified uncertainty theory is expressed by an n-tuple, F, of fuzzy intervals Fi. That is,
F = Fi i Œ ⺞n .
(8.32)
Again, each F is uniquely represented by the family of its a-cuts,
a
F=
a
Fi i Œ ⺞n
for all a Œ [0, 1]. For convenience, let fuzzy intervals Fi in Eq. (8.32) be called
probability granules, and let each tuple F defined by Eq. (8.32) be called a tuple
of probability granules.
The various properties of tuples of probability intervals, which are introduced in Section 5.5, are extended to tuples of probability granules via the acut representations of the latter. Thus, for example, a tuple of probability
granules is called proper iff all its a-cuts are proper in the classical sense; it is
called reachable iff all its a-cuts are reachable in the classical sense; and so
forth.
To make computation with probability granules as efficient as possible, it is
desirable to represent them by trapezoidal membership functions. That is, it is
desirable to deal only with special tuples of probability granules,
T = Ti = a i , bi , c i , di
i Œ ⺞n ,
(8.33)
where each probability granule Ti is a trapezoidal-shaped fuzzy interval with
support [ai, di] and core [bi, ci,]. However, when the operations of multiplication or division of fuzzy arithmetic are applied to trapezoidal granules, the
resulting granules are not trapezoidal. This is unfortunate since we often need
to use the resulting granules as inputs for further processing. For efficient computation, it is thus desirable to approximate the membership functions of
resulting probability granules at each stage of computation by appropriate
trapezoidal granules.
A simple way of approximating an arbitrary granule, F, by a trapezoidal
one, T, is to keep the values a, b, c, d of the canonical form of F (see Eq. (7.25))
unchanged and replace its nonlinear functions in the intervals [a, b] and [c, d]
with their linear counterparts
x-a
b-a
and
d-x
,
d-c
340
8. FUZZIFICATION OF UNCERTAINTY THEORIES
respectively. The advantage of this approximation method is its simplicity.
However, when multiplication or division operations are used repeatedly, the
accumulated error can become sufficiently large to produce misleading results.
Other approximation methods have been proposed to reduce the accumulated
error. Although some recent results seem to indicate that this approximation
problem can be adequately solved, further research is needed to obtain more
conclusive results (see Note 8.11).
Assume, for the sake of simplicity, that we deal in this section only with
tuples of trapezoidal probability granules and that we specify them either in
the form of Eq. (8.33) or via the associated a-cut representation
a
T=
Ti = [ a i + (bi - a i )a , di - (di - c i )a ] i Œ ⺞n .
a
(8.34)
To be proper, a tuple of trapezoidal probability granules must satisfy the
inequalities
 [a
i
+ (bi - a i )a ] £ 1,
 [d
- (di - c i )a ] ≥ 1
iŒ⺞ n
i
iŒ⺞ n
for all a Œ [0, 1]. Clearly, if the inequalities are satisfied for a = 1, then they
are satisfied for all a Œ [0, 1]. It is thus sufficient to check the inequalities
 b £1
and
i
iŒ⺞ n
Âc
i
≥ 1.
iŒ⺞ n
Moreover, this convenient feature holds for all tuples of probability granules,
not only for the trapezoidal ones.
To be reachable, a tuple of trapezoidal probability granules must satisfy the
inequalities
 [a
j
+ (b j - a j )a ] + di - (di - c i )a £ 1,
(8.35)
 [d
j
- (d j - c j )a ] + a i + (bi - a i )a ≥ 1
(8.36)
jπ i
jπ i
for all i Œ⺞n and all a Œ [0, 1]. Since the left-hand sides of these inequalities
are linear functions of a, it is sufficient to check the inequalities only for a =
0 and a = 1. That is, it is sufficient to check the inequalities
Âa
j
+ di £ 1,
(8.37)
Âd
j
+ a i ≥ 1,
(8.38)
jπ i
jπ i
8.5. FUZZIFICATION OF REACHABLE INTERVAL-VALUED PROBABILITY DISTRIBUTIONS
341
Âb
j
+ c i £ 1,
(8.39)
Âc
j
+ bi ≥ 1,
(8.40)
jπ i
jπ i
for all i Œ ⺞n. Clearly, this convenient feature does not hold for tuples of probability granules that are not trapezoidal.
EXAMPLE 8.10. Consider a tuple with four trapezoidal probability granules,
T = ·Ti | i Œ ⺞4Ò, where
T1 = 0, 0.1, 0.1, 0.2 ,
T2 = 0.1, 0.2, 0.2, 0.25 ,
T3 = 0.2, 0.3, 0.3, 0.4 ,
T4 = 0.3, 0.4, 0.4, 0.6 .
This tuple, which is shown in Figure 8.8, is proper since
Âb = Âc
i
iŒ⺞ 4
i
= 1.
iŒ⺞ 4
It is also reachable since the inequalities (8.37)–(8.40) are satisfied for all
i Œ⺞4, as can be easily verified.
When a given tuple of trapezoidal fuzzy granules,
T = Ti i Œ ⺞n ,
is proper but not reachable, it can be converted to its reachable counterpart,
F = Fi i Œ ⺞n ,
for all a Œ [0, 1] and all i Œ⺞n via the formulas
Ï
¸
l i (a ) = max Ì a i + (bi - a i )a , 1 - Â [ d j - (d j - c j )a )˝
˛
Ó
jπ i
(8.41)
Ï
¸
ui (a ) = minÌ di - (di - c i )a , 1 - Â [ a j + (b j - a j )a )˝
˛
Ó
jπ i
(8.42)
where [li(a), ui(a)] = aFi. Observe that, due to the max and min operations in
these formulas, granules Fi of the resulting tuple F may not have trapezoidal
shapes.
342
8. FUZZIFICATION OF UNCERTAINTY THEORIES
1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1
Figure 8.8. A reachable tuple of probability granules (Example 8.10).
EXAMPLE 8.11. Consider a tuple T = ·T1, T2, T3Ò of trapezoidal probability
granules
T1 = 0.0, 0.0, 0.2, 0.4
T2 = 0.1, 0.3, 0.4, 0.5
T3 = 0.4, 0.6, 0.8, 0.9
whose a-cuts are
a
T1 = [ 0, 0.4 - 0.2a ],
a
T2 = [ 0.1 + 0.2a , 0.5 - 0.1a ],
a
T3 = [ 0.4 + 0.2a , 0.9 - 0.1a ].
This tuple, which is shown in Figure 8.9a, is proper since b1 + b2 + b3 = 0.9 < 1
and c1 + c2 + c3 = 1.4 > 1. However, it is not reachable since, for example, the
inequality (8.39) is violated for i = 1 (b2 + b3 + c1 = 1.1 ⱕ 1) and i = 3 (b1 + b2
+ c3 = 1.1 ⱕ 1). To use the probability granules for further processing, for
example, as prior probabilities in Bayesian inference, we need to convert T to
its reachable counterpart F. By using Eqs. (8.41) and (8.42) for each i = 1, 2, 3
and a Œ [0, 1], we obtain the following a-cuts of the probability granules (not
necessarily trapezoidal) in F:
8.5. FUZZIFICATION OF REACHABLE INTERVAL-VALUED PROBABILITY DISTRIBUTIONS
343
i = 1: l1 (a ) = max{0, 0.2a - 0.4} = 0,
when a Œ[ 0, 0.5)
when a Œ[ 0.5, 1],
Ï 0.4 - 0.2a
u1 (a ) = min{0.4 - 0.2a , 0.5 - 0.4a } = Ì
Ó0.5 - 0.4a
i = 2 : l 2 (a ) = max{0.1 + 0.2a , 0.3a - 0.3} = 0.1 + 0.2a ,
u2 (a ) = min{0.5 - 0.1a , 0.6 - 0.2a } = 0.5 - 0.1a ,
i = 3 : l3 (a ) = max{0.4 + 0.2a , 0.1 + 0.3a } = 0.4 + 0.2a ,
u3 (a ) = min{0.9 - 0.1a , 0.9 - 0.2a } = 0.9 - 0.2a .
Graphs of F1, F2, F3 are shown in Figure 8.9b. We can see that F1 π T1 and
F3 π T3. Moreover, F1 is not of the trapezoidal shape.
Once it is verified that a given tuple F of probability granules on set X is
reachable, it can be used for calculating probability granules for any subset of
X. These calculations are facilitated by applying Eqs. (5.71) and (5.72) to the
a-cuts of F.
For each A ŒP(X) and each a Œ [0, 1], let
a
FA = [l A (a ), u A (a )]
denote the a-cut of the probability granule, FA, of set A. Then, using Eqs. (5.71)
and (5.72), we have
Ï
l A (a ) = max Ì Â l i (a ), 1 Ó x i ŒA
¸
 u (a )˝˛,
i
(8.43)
x i œA
Ï
u A (a ) = minÌ Â ui (a ), 1 Ó x i ŒA
¸
 l (a )˝˛.
i
(8.44)
x i œA
EXAMPLE 8.12. Consider the reachable tuple of four probability granules
on X = {x1, x2, x3, x4} that is discussed in Example 8.10. Graphs of the granules
are shown in Figure 8.8, and their a-cut representations are:
a
T1 = [ 0.1a , 0.2 - 0.1a ],
a
T2 = [ 0.1 + 0.1a , 0.25 - 0.05a ],
a
T3 = [ 0.2 + 0.1a , 0.4 - 0.1a ],
a
T4 = [ 0.3 + 0.1a , 0.6 - 0.2a ].
The a-cuts of probability granules aFA for all sets A ŒP(X), calculated by Eqs.
(8.43) and (8.44), are given in Table 8.1. For convenience, sets A are identified
344
8. FUZZIFICATION OF UNCERTAINTY THEORIES
1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1
0.6
0.8
1
(a)
1
0.8
0.6
0.4
0.2
0.2
0.4
(b)
Figure 8.9. Illustration to Example 8.11: (a) A proper tuple of probability granules that is not
reachable; (b) the reachable tuple of probability granules derived from the one in part (a).
by the index k defined in the table. For the singletons, clearly, k = i. For
example, for set A = {x1, x2}, identified by k = 5, we have
l5 (a ) = max{l1 (a ) + l 2 (a ), 1 - u3 (a ) - u4 (a )}
= max{0.1 + 0.2a , 0.3a }
= 0.1 + 0.2a ,
u5 (a ) = min{u1 (a ) + u2 (a ), 1 - l3 (a ) - l 4 (a )}
= min{0.45 - 0.15a , 0.5 - 0.3a }
= 0.45 - 0.15a ;
8.5. FUZZIFICATION OF REACHABLE INTERVAL-VALUED PROBABILITY DISTRIBUTIONS
345
for the set A = {x1, x3, x4}, identified by k = 13, we have
l13 (a ) = max{l1 (a ) + l3 (a ) + l 4 (a ), 1 - u2 (a )}
= max{0.5 + 0.3a , 0.75 + 0.05a }
= 0.75 + 0.05a ,
u13 (a ) = min{u1 (a ) + u3 (a ) + u4 (a ), 1 - l 2 (a )}
= min{1.2 - 0.4a , 0.9 - 0.11a }
= 0.9 - 0.1a ;
and, similarly, all the remaining values of lA(a) and uA(a) in Table 8.1 are calculated. Observe the following:
(a) For all A ŒP(X) and all a Œ [0, 1],
u A (a ) = 1 - l A (a );
(b) For all A ŒP(X), the probability granules FA have triangular shapes;
(c) Due to the triangular shapes, the a-cuts aFA are additive for a = 1 (lA(1)
= uA(1) for all A ŒP(X)).
Table 8.1. Lower and Upper Probability Granulas for the Reachable Tuple of
Probability Granules Discussed in Example 8.10 and Shown in Figure 8.8 (illustration
of Example 8.12)
X
A:
a
k
x1
x2
x3
x4
0
1
0
0
0
1
1
1
0
0
0
1
1
1
0
1
0
0
1
0
0
1
0
0
1
1
0
1
1
0
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
1
1
1
0
0
0
0
1
0
0
1
0
1
1
0
1
1
1
1
lA(a)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
a
FA
0
0.1a
0.1 + 0.1a
0.2 + 0.1a
0.3 + 0.1a
0.1 + 0.2a
0.2 + 0.2a
0.35 + 0.15a
0.3 + 0.2a
0.4 + 0.2a
0.55 + 0.15a
0.4 + 0.2a
0.6 + 0.1a
0.75 + 0.05a
0.8 + 0.1a
1
m(a)
uA(a)
0
0.2 - 0.1a
0.25 - 0.05a
0.4 - 0.1a
0.6 - 0.2a
0.45 - 0.15a
0.6 - 0.2a
0.7 - 0.2a
0.65 - 1.5a
0.8 - 0.2a
0.9 - 0.2a
0.7 - 0.1a
0.8 - 0.1a
0.9 - 0.1a
1.0 - 0.1a
1
0
0.1a
0.1 + 0.1a
0.2 + 0.1a
0.3 + 0.1a
0
0
0.05 - 0.05a
0
0
0.05 - 0.05a
0.1 - 0.1a
0.15 - 0.15a
0.15 - 0.15a
0.15 - 0.15a
-0.25 + 0.25a
346
8. FUZZIFICATION OF UNCERTAINTY THEORIES
Once the a-cuts of probability granules FA are determined for all A ŒP(X),
the Möbius representation of these a-cuts, am, can be calculated by the usual
formula
a
m( A) =
Â
(-1)
A -B
l B (a )
(8.45)
B BÕ A
for all A ŒP(X). For the lower probability in Example 8.12, the Möbius representation of is shown in Table 8.1. For example,
a
m({x1 , x 2 , x3 }) = 0.4 + 0.2a - (0.1 + 0.2a ) - (0.2 + 0.2a ) - (0.3 + 0.2a )
+ 0.1a + 0.1 + 0.1a + 0.2 + 0.1a = 0.1 - 0.1a .
Observe that
Â
a
m( A) = 1
A ŒP ( X )
for each a Œ [0, 1].
Using the Möbius representation, we can compute the average nonspecificity embedded in the associated tuple of probability granules. First, we calculate the generalized Hartley measure as a function of a,
Â
GH ( a m) =
a
m( A) log 2 A ,
(8.46)
A ŒP ( X )
which is followed by computing the average value, GHa, via the integral
1
GH a = Ú GH ( a m) da.
(8.47)
0
Applying Eq. (8.46) to am in Example 8.12 (given in Table 8.1), we obtain
GH ( a m) = 0.47 - 0.47a .
Clearly, GH(0m) = 0.47 and GH(1m) = 0. Applying now Eq. (8.47), we obtain
GHa = 0.235.
Computing S̄ is more difficult. Algorithm 6.1 can be employed, but it is now
applicable to the ratios lA(a)/|A|, which are functions of a. First, we need to
express the dependence of S̄ on a, S̄(lA(a)). Then, the average value, S̄a, is computed by the integral
1
S a = Ú S (l A (a )) da.
0
(8.48)
8.5. FUZZIFICATION OF REACHABLE INTERVAL-VALUED PROBABILITY DISTRIBUTIONS
347
Applying Algorithm 6.1 to the a-cuts lA(a) in Table 8.1, we obtain the following probabilities of S̄(lA(a)) in the given order, one at each iteration of the
algorithm:
a
p(x 4 ) = 0.3 + 0.1a ,
a
p(x3 ) = 0.25 + 0.05a ,
a
p(x 2 ) = 0.25 - 0.05a ,
a
p(x1 ) = 0.2 - 0.1a .
Now, we have
S (l A (a )) = -(0.3 + 0.1a ) log 2 (0.3 + 0.1a )
-(0.25 + 0.05a ) log 2 (0.25 + 0.5a )
-(0.25 - 0.05a ) log 2 (0.25 - 0.05a )
-(0.2 - 0.1a ) log 2 (0.2 - 0.1a ).
The plot of S̄(lA(a)) for a Œ [0, 1] is shown in Figure 8.10. The extreme values
are S̄(lA(0)) = 1.985 and S̄(lA(1)) = 1.846. The average value, obtained by Eq.
(8.48), is S̄a = 1.93.
When tuples of probability granules are employed in any computation
involving arithmetic operations (such as computing expected values or posterior probabilities in Bayesian inference), it is essential that constrained fuzzy
arithmetic be used. Computing with fuzzy probabilities requires that not only
the requisite equality constraints be observed, but also the probabilistic constraints, as is explained in Section 7.4.2.
1.98
1.96
1.94
1.92
1.9
1.88
1.86
0
0.2
0.4
0.6
0.8
Figure 8.10. Dependence of S̄ on a in Example 8.12.
1
348
8. FUZZIFICATION OF UNCERTAINTY THEORIES
8.6. OTHER FUZZIFICATION EFFORTS
Efforts to fuzzify uncertainty theories have not been restricted to the three
fuzzifications described in Section 8.3–8.5. However, among the various efforts
to fuzzify uncertainty, which have been discussed in the literature (see Note
8.12), the three fuzzifications described in this chapter have some visible
advantages.They are well developed, conceptually and computationally sound,
and they have features that are suitable for applications.
To illustrate the point that some uncertainty theories are less practical
to fuzzify than others, let us examine the following very simple example
of fuzzification within the theory of uncertainty based on the Sugeno
l-measures.
EXAMPLE 8.13. Consider the set X = {x1, x2, x3} of alternatives and assume
that we want to deal with uncertainty regarding these alternatives in terms of
l-measures (Section 5.3). This requires that we are able to assess either the
lower probabilities or the upper probabilities on singletons. From these assessments, we compute the value of parameter l by Eq. (5.30), which makes it possible, in turn, to compute lower and upper probabilities for all subsets of X.
Once uncertainty regarding the alternatives is formalized in this way, we can
apply the rules of the calculus of l-measures to manipulate the uncertainty in
any desirable way within this theory.
Now assume that we are able to make only fuzzy assessments of lower (or
upper) probabilities on singletons, and that these assessments are represented
by appropriate fuzzy intervals or numbers. As a specific example, let the
triangular-shape fuzzy numbers
m ({x1}) = 0.1, 0.2, 0.2, 0.2 ,
m ({x 2 }) = 0.2, 0.3, 0.3, 0.3 ,
m ({x3 }) = 0.3, 0.4, 0.4, 0.4 ,
represent a given assessment of lower probabilities. These fuzzy numbers are
reasonable representations of linguistic assessments such as “the lower probability of xi is pi or a little less,” where p1 = 0.2, p2 = 0.3, and p2 = 0.4. The acuts of the numbers are
a
m ({x1}) = [ 0.1 + 0.1a , 0.2],
a
m ({x 2 }) = [ 0.2 + 0.1a , 0.3],
a
m ({x3 }) = [ 0.3 + 0.1a , 0.4].
Applying Eq. (5.30) to these a-cuts results in the equation
8.6. OTHER FUZZIFICATION EFFORTS
349
(0.001a 3 + 0.006a 2 + 0.11a + 0.006)l2 + (0.12a + 0.11)l + 0.3a - 0.4 = 0,
whose positive root defines the value l(a), for each a Œ [0, 1]. The plot of l(a)
is shown in Figure 8.11a. The analytic expression for l(a) is too complex to
be practical, even in this very simple example. It is more practical to calculate
l(a) for selected values of a, such as those shown in Figure 8.11b.
Once values of l(a) are determined, the a-cuts of lower probabilities,
a
m(A), can be determined via Eq. (5.28) for all subsets A of X, as is shown in
Table 8.2. Considering the fact that l(a) is determined by a rather complex
expression, we can see that some expressions in this table would be quite formidable when completed. Clearly, expressions for the Möbius representation
will be even more complex.
l(a)
a
0.0
3.1091
0.1
2.62262
0.2
2.20969
2.5
0.3
1.85613
2
0.4
1.55108
0.5
1.28614
0.6
1.05468
1
0.7
0.851409
0.5
0.8
0.672067
0.9
0.51317
1.0
0.371852
l (a)
3.5
3
1.5
0.2
0.4
0.6
0.8
1
a
(a)
(b)
Figure 8.11. Plot of l(a) and some numerical values of l(a) in Example 8.13.
Table 8.2. a-Cuts of Lower Probabilities, am (A), in Example 8.13
¯
A:
x1
x2
x3
0
1
0
0
1
1
0
1
0
0
1
0
1
0
1
1
0
0
0
1
0
1
1
1
m(A)
¯
a
0
[0.1 + 0.1a, 0.2]
[0.2 + 0.1a, 0.3]
[0.3 + 0.1a, 0.4]
[0.3 + 0.2a + l(a) (0.1 + 0.1a) (0.2 + 0.1a), 0.5 + l(a) · 0.06]
[0.4 + 0.2a + l(a) (0.1 + 0.1a) (0.3 + 0.1a), 0.6 + l(a) · 0.08]
[0.5 + 0.2a + l(a) (0.2 + 0.1a) (0.3 + 0.1a), 0.7 + l(a) · 1.2]
1
350
8. FUZZIFICATION OF UNCERTAINTY THEORIES
It is sensible to conclude from this strikingly simple example that the uncertainty theory based on l-measures is conceptually and computationally difficult to fuzzify. This contrasts with the theory based on reachable
interval-valued probability distributions, whose fuzzification (discussed in
Section 8.5) is much more transparent and computationally tractable. By comparing the fuzzifications of these two theories, it is reasonable to expect that
only the latter will survive as useful for various applications. It is likely that
among the many possible fuzzifications of uncertainty theories some will be
found useful and some will be discarded.
NOTES
8.1. The concept of a-cuts of fuzzy sets was already introduced in the seminal paper
by Zadeh [1965]. In another paper [Zadeh, 1971], he introduced the a-cut representation of fuzzy sets and showed how equivalence, compatibility, and ordering relations can be fuzzified via this representation. However, the term
“cutworthy” was coined much later by Bandler and Kohout [1993]. A highly
general and comprehensive investigation of cutworthy properties was made in
terms fuzzy predicate logic by Bĕlohlávek [2003].
8.2. The first formulation of the extension principle was introduced by Zadeh [1965],
even though it was described under the heading “fuzzy sets induced by mappings.” The term “extension principle” was introduced in [Zadeh, 1975–76], where
the principle and its utility are thoroughly examined.
8.3. Fuzzification via fuzzy morphisms and category theory was pioneered by Goguen
[1967, 1968–69, 1974]. Among other notable references are [Arbib and Manes,
1975], [Rodabaugh et al., 1992], [Rodabaugh and Klement, 2003], [Höhle and
Klement, 1995], [Höhle and Rodabaugh, 1999], and Walker [2003].
8.4. The idea that the degree of fuzziness of a fuzzy set can be most naturally
expressed in terms of the lack of distinction between the set and its complement
was proposed by Yager [1979, 1980b]. A general formulation based on this idea,
which is applicable to all possible fuzzy complements, was developed by Higashi
and Klir [1982]. They proved that every measure of fuzziness of this type can be
expressed in terms of a metric distance that is based on an appropriate aggregate
of the absolute values of the individual differences between membership grades
of the given fuzzy set and its complement (of a type chosen in a given application) for all elements of the universal set.
8.5. Observe that the summation term in Eq. (8.7) and the integral in Eq. (8.9) express
the Hamming distance between A and c(A). It is of course possible to use other
distance functions for this purpose. The whole range of measures of fuzziness
based on the Minkowski class of distance functions is examined by Higashi and
Klir [1982].
8.6. The measure of fuzziness based on local Shannon entropies (defined by Eq. (8.14)
or Eq. (8.15)) was introduced by De Luca and Termini [1972, 1974]. This measure,
which is usually called an entropy of fuzzy sets, has been investigated in the literature fairly extensively by numerous authors.
EXERCISES
351
8.7. Broader classes of measures of fuzziness were characterized on axiomatic
grounds by Knopfmacher [1975] and Loo [1977]. A good overview of the literature dealing with measures of fuzziness was prepared by Pal and Bezdek [1994].
8.8. The standard fuzzy set interpretation of possibility theory was introduced by
Zadeh [1978a] as a natural tool for dealing with uncertainty associated with information expressed in natural language and represented by fuzzy propositions
[Zadeh, 1978b, 1981]. The generalized fuzzy-set interpretation of possibility
theory defined by Eq. (8.19) was introduced in a paper by Klir [1999]. The
paper also surveys other, unsuccessful attempts to revise the standard fuzzy-set
interpretation.
8.9. Probabilities of fuzzy events were introduced by Zadeh [1968], just three years
after he introduced the concept of a fuzzy set [Zadeh, 1965].
8.10. Fuzzification of the uncertainty theory based on reachable interval-valued probability distributions was investigated by Pan [1997a] and Pan and Yuan [1997].
They also developed a fuzzified Bayesian inference within this uncertainty
theory, which is based on methods of linear programming.
8.11. Some methods for approximating arbitrary probability granules by trapezoidal
ones were examined by Giachetti and Young [1997] from the standpoint of the
accumulated error for repeated use of multiplication or division. A method proposed in the paper is shown to lead to acceptable results in most practical cases.
An interesting approximation method was proposed by Pan [1997a,b] in the
context of Bayesian inference with probability granules. It is based on keeping
the core unchanged and calculating new support in terms of the weighted leastsquare-error method in which the weights are monotone increasing with values
of a according to some rule. This method seems promising, but no error analysis
has been made for it as yet.
8.12. Perhaps the most fundamental approach to fuzzifying uncertainty theories, at
least from the theoretical point of view, is to fuzzify monotone measures. This is
explored in a paper by Qiao [1990], which is also reprinted in [Wang and Klir,
1992]. Interesting approaches to fuzzifying the Dempster–Shafer theory were
proposed by Yen [1990] and Yang et al. [2003]. The concept of a fuzzy random
variable is covered in the literature quite extensively. A broad introduction is in
[Zwick and Wallsten, 1989] and [Negoita and Ralescu, 1987]; a Special Issue
edited by Gil [2001] covers more recent developments, as well as valuable references to previous work. The following are some useful references that deal
with fuzzy probabilities, even though they do not specifically address the issue of
fuzzifying uncertainty theories: [Buckley, 2003; Cai, 1996; Fellin et al., 2005;
Manton et al., 1994; Möller and Beer, 2004; Ross et al., 2002; Viertl, 1996]. The
book by Mordeson and Nair [1998] is a good overview of other fuzzifications in
mathematics.
EXERCISES
8.1. Show that the standard complement of a fuzzy set is not a cutworthy
operation.
352
8. FUZZIFICATION OF UNCERTAINTY THEORIES
8.2. Show that the standard operations of intersection and union of fuzzy sets
are cutworthy.
8.3. Verify that the binary fuzzy relation R defined by the matrix
x1
x1 È1.0
x 2 ÍÍ 0.2
x3 Í1.0
R= Í
x 4 Í 0.6
x5 Í 0.2
Í
x 6 Î 0.6
x2
0.2
1.0
0.2
0.2
0.8
0.2
x3
1.0
0.2
1.0
0.6
0.2
0.6
x4
0.6
0.2
0.6
1.0
0.2
0.8
x5
0.2
0.8
0.2
0.2
1.0
0.2
x6
0.6 ˘
0.2 ˙˙
0.6 ˙
˙
0.8 ˙
0.2 ˙
˙
1.0 ˚
is a fuzzy equivalence relation that is cutworthy and determine its partition tree.
8.4. Repeat Example 8.2 for function y = 9 + 1/x, where x Œ [1, 10], and two
triangular fuzzy members by which the value x is approximately
assessed: A1 = ·1.5, 2, 2, 2.5Ò and A2 = ·2.5, 3, 3, 3.5Ò.
8.5. Assuming the standard complement and using the functional f defined
by Eq. (8.10), calculate the amount of fuzziness and its normalized value
for the following fuzzy sets:
(a) A = 0.5/x1 + 0.7/x2 + 1/x3 + 0.9/x4 + 0.6/x5 + 0.4/x6;
(b) Fuzzy relation R in Exercise 8.3.
8.6. Repeat Example 8.4 for the following fuzzy numbers:
(a) Triangular fuzzy number A = ·0, 1, 1, 2Ò;
(b) A fuzzy number B whose a-cut representation is a B = [ x , 2 - x ] ;
(c) C(x) = e-|5x-10| for all x Œ [1, 3] and C(x) = 0 for x œ [1, 3].
8.7. Repeat the calculations in Examples 8.3 and 8.4 and Exercises 8.5 and
8.6 for some nonstandard complements introduced in Section 7.3.1.
8.8. Let f and f ¢ be defined by Eqs. (8.7) and (8.12), respectively, and let c be
the standard complement of fuzzy sets. Show that for any fuzzy set A
defined on a finite set X:
(a) f(A) = 2f ¢(A)
(b) fˆ (A) = fˆ ¢(A)
8.9. Repeat Exercise 8.8 for f and f ¢ defined by Eqs. (8.9) and (8.13), respectively, and X = ⺢.
8.10. Investigate the relationship between the functional f defined by Eq.
(8.10) and the functional f ≤ defined by Eq. (8.14).
EXERCISES
353
8.11. Using the functional f≤ defined by Eq. (8.15), calculate the amount of
fuzziness and its normalized value for the following fuzzy intervals:
(a) A trapezoidal fuzzy interval A = ·1, 3, 4, 7Ò;
(b) A trapezoidal fuzzy interval B = ·0, 0, 1, 5Ò;
(c) C(x) = e-|5x-10| for all x Œ [1, 3] and C(x) = 0 for x œ [1, 3].
8.12. Verify the values of GH(mR), S̄(NecR), S(NecR), f(R), and f(R̂) in
Example 8.7.
8.13. Verify the values of f(R), and f(R̂) in Example 8.8 by using Eq. (8.11),
as well as by geometrical reasoning in Figure 8.5.
8.14. Consider a discrete variable X whose values are in the set X = ⺞200. Given
information that “X is F,” where F is a fuzzy set defined by
F = 0.1 52 + 0.2 53 + 0.4 54 + 0.6 55 + 0.8 56 + 0.5 57 + 0.3 58 + 0.1 50,
calculate the following:
(a) The nonspecificity of the possibilistic representation of the given
information;
(b) The degrees of possibility and necessity in the sets A = {55, 56, . . . ,
62}, B = {53, 54, 55}, and C = {54, 55, 56}.
8.15. Consider a variable X whose values are in the interval X = [0, 100].
Assume that the value of the variable is assessed in a given context in
terms of a fuzzy propositions “X is F,” in which F is a fuzzy interval
defined for all x Œ X as follows:
Ï1.4( x - 6.5)
ÔÔ0.7
F ( x) = Ì
Ô 3.5(10 - x)
ÓÔ 0
when x Œ[6.5, 7],
when x Œ(7, 8),
when x Œ[8, 10],
otherwise.
Calculate:
(a) The amount of nonspecificity and its normalized version for the possibilistic representation of the given information;
(b) The degrees of possibility and necessity in the fuzzy sets whose acut representations are
a
A = [6 + a , 9 - a ],
a
B = [7 + 2a , 10 - a ],
a
C = [6.5 + a , 8.5 - a ].
354
8. FUZZIFICATION OF UNCERTAINTY THEORIES
8.16. Show that fuzzy partitions are not cutworthy regardless of which operations are used for intersections and unions of fuzzy sets.
8.17. Calculate probabilities of crisp and fuzzy events defined in Figure 8.7a
and 8.7b, respectively, for the following probability density functions q
on ⺢:
Ï 0.0016x - 0.032
Ô
(a) q(x) = Ì 0.112 - 0.0016x
Ô0
Ó
Ï 4.6875 ◊ 10
(b) q(x) = Ì
Ó0
-5
◊ x2
when x Œ[20, 45)
when x Œ[ 45, 70]
otherwise
when x Œ[ 0, 40]
otherwise
8.18. Suggest some reasonable ways of approximating the probability granule
F1 in Figure 8.9b by a trapezoidal granule (see also Note 8.11).
8.19. Determine for each of the following tuples T = ·T1, T2, T3Ò of trapezoidal
probability granules on X = {x1, x2, x3} whether it is proper and
reachable:
(a) T1 = ·0, 0, 0, 0.1Ò, T2 = ·0.2, 0.3, 0.4, 0.5Ò, T3 = ·0.4, 0.6, 0.8, 0.9Ò
(b) T1 = ·0, 0, 0.3, 0.5Ò, T2 = ·0, 0.2, 0.5, 0.6Ò, T3 = ·0.2, 0.4, 0.4, 0.7Ò
(c) T1 = ·0.2, 0.3, 0.3, 0.4Ò, T2 = ·0, 0.2, 0.2, 0.5Ò, T3 = ·0.3, 0.5, 0.5, 0.8Ò
Convert each of the tuples that is proper but not reachable to its reachable counterpart.
8.20. In analogy with Example 8.12, determine for each of the reachable tuples
(or their reachable counterparts) in Exercise 8.19 the following functions
of a for all A ŒP(X):
(a) lA(a)
(b) uA(a)
(c) amA(A)
8.21. Using the functions determined in Exercise 8.20, calculate:
(a) GH(am) and GHa
(b) S̄(lA(a)) and S̄a
8.22. For Example 8.13, determine the following:
(a) l(a)
(b) The a-cuts of the lower probabilities;
(c) The Möbius representation of the lower probabilities in Table 8.2.
9
METHODOLOGICAL ISSUES
There is nothing better than to know that you don’t know.
Not knowing, yet thinking your know—
This is sickness.
Only when you are sick of being sick
Can you be cured.
—Lao Tsu
9.1. AN OVERVIEW
From the methodological point of view, two complementary features of generalized information theory (GIT) are significant. One of them is the great diversity of prospective uncertainty theories that are subsumed under GIT. This
diversity increases whenever new types of formalized languages or monotone
measures are recognized. At any time, of course, only some of the prospective
uncertainty theories are properly developed.
Complementary to the diversity of uncertainty theories subsumed under
GIT is their unity, which is manifested by common properties that the theories share. From the methodological point of view, particularly notable are the
common forms of functionals that generalize the Hartley and Shannon measures of uncertainty in the various uncertainty theories, and the invariance of
equations and inequalities that express the relationship among joint, marginal,
and conditional measures of uncertainty.
The diversity of GIT offers an extensive inventory of distinct uncertainty
theories, each characterized by specific assumptions embedded in its axioms.
Uncertainty and Information: Foundations of Generalized Information Theory, by George J. Klir
© 2006 by John Wiley & Sons, Inc.
355
356
9. METHODOLOGICAL ISSUES
This allows us to choose, in any given application context, a theory that is compatible with the application of concern. On the other hand, the unity of GIT
allows us to work within GIT as a whole. That is, it allows us to move from
one theory to another, as needed.
The primary aim of this chapter is to examine the following four methodological principles of uncertainty:
1.
2.
3.
4.
Principle of minimum uncertainty
Principle of maximum uncertainty
Principle of requisite generalization
Principle of uncertainty invariance
In general, these principles are epistemologically based prescriptive procedures that address methodological issues involving uncertainty that cannot be
resolved solely by using calculi of the individual uncertainty theories. Due to
the connection between uncertainty and uncertainty-based information, these
principles also can be interpreted as principles of information.
The principle of minimum uncertainty is basically an arbitration principle.
It facilitates the selection of meaningful alternatives from solution sets
obtained by solving problems in which some amount of the initial information
is inevitably lost. According to this principle, we should accept only those solutions for which the loss of the information is as small as possible. This means,
in turn, that we should accept only solutions with minimum uncertainty.
The second principle, the principle of maximum uncertainty, is essential for
any problem that involves ampliative reasoning. This is reasoning in which
conclusions are not entailed in the given premises. Using common sense, the
principle may be expressed as follows: in any ampliative inference, use all
information supported by available evidence, but make sure that no additional
information (unsupported by the given evidence) is unwittingly added.
Employing the connection between information and uncertainty, this definition can be reformulated in terms of uncertainty: any conclusion resulting from
ampliative inference should maximize the relevant uncertainty within constraints representing given premises. This principle guarantees that we fully
recognize our ignorance when we attempt to make inferences that are beyond
the information domain defined by the given premises and, at the same time,
that we utilize all information contained in the premises. In other words, the
principle guarantees that our inferences are maximally noncommittal with
respect to information that is not contained in the premises.
The principles of minimum and maximum uncertainty are well developed
within classical, probability-based information theory. They are referred to
as principles of minimum and maximum entropy. These classical principles of
uncertainty are extensively covered in the literature. A survey of this literature is presented in Notes 9.1–9.3. A few examples in Sections 9.2 and 9.3 illustrate the important role of these principles in classical information theory.
9.2. PRINCIPLE OF MINIMUM UNCERTAINTY
357
Optimization problems that emerge from the minimum and maximum
uncertainty principles outside classical information theory have yet to be properly investigated and tested in praxis. One complication is that two types of
uncertainty coexist in all the nonclassical uncertainty theories. Material presented in this chapter is thus by and large exploratory, oriented primarily to
examining the issues involved and stimulating further research.
Principles of minimum and maximum uncertainty are fundamentally different from the remaining principles: the principle of requisite generalization
and the principle of uncertainty invariance. While the former are applicable
within each particular uncertainty theory, the latter facilitate transitions from
one theory to another. Clearly, the latter principles have no counterparts in
classical information theory.
The principle of requisite generalization is based on the assumption that we
work within GIT as a whole. According to this principle, we should not a priori
commit to any particular uncertainty theory. Our choice should be determined
by the nature of the problem we deal with. The chosen theory should be
sufficiently general to allow us to capture fully our ignorance. Moreover, when
the chosen theory becomes incapable of expressing uncertainty resulting from
deficient information at some problem-solving stage, we should move to a more
general theory that has the capability of expressing the given uncertainty.
The last principle, the principle of uncertainty invariance (also called the
principle of information preservation), was introduced in GIT to facilitate
meaningful transformations between various uncertainty theories. According
to this principle, the amount of uncertainty (and the associated uncertaintybased information) should be preserved in each transformation from one
uncertainty theory to another. The primary use of this principle is to approximate in a meaningful way uncertainty formalized in a given theory by a formalization in a theory that is less general.
The roles of each of the four principles of uncertainty are illustrated in
Figure 9.1, where uncertainty theory T2 is assumed to be more general than
uncertainty theory T1. While the principles of minimum and maximum uncertainty (1 and 2) are applicable within a single theory (theory T1 in the figure),
the principles of requisite generalization (3) and uncertainty invariance (4)
deal with transitions from one theory to another.
9.2. PRINCIPLE OF MINIMUM UNCERTAINTY
The principle of minimum uncertainty is applicable to all problems that are
prone to lose information. The principle may be viewed as a safeguard against
losing more information than necessary to solve a given problem. It is thus
reasonable to also refer to it as the principle of minimum information loss. In
this section, the principle is illustrated by two classes of problems: simplification problems and conflict-resolution problems.
358
9. METHODOLOGICAL ISSUES
4
1
Deficient
information
Uncertainty
theory
T1
3
Uncertainty
theory
T2
2
Figure 9.1. Principles of uncertainty: an overview. Assumption: Theory T2 is more general than
theory T1. ➀ Principle of minimum uncertainty; ➁ principle of maximum uncertainty; ➂ principle
of requisite generalization; ➃ principle of uncertainty invariance.
9.2.1. Simplification Problems
A major class of problems for which the principle of minimum uncertainty is
applicable consists of simplification problems. When a system is simplified, it
is usually unavoidable to lose some information contained in the system. The
amount of information that is lost in this process results in the increase of an
equal amount of relevant uncertainty. Examples of relevant uncertainties are
predictive, retrodictive, or prescriptive uncertainty. A sound simplification of
a given system should minimize the loss of relevant information (or the
increase in relevant uncertainty) while achieving the required reduction of
complexity. That is, we should accept only such simplifications of a given
system at any desirable level of complexity for which the loss of relevant information (or the increase in relevant uncertainty) is minimal. When properly
applied, the principle of minimum uncertainty guarantees that no information
is wasted in the process of simplification.
There are many simplification strategies, all of which can perhaps be classified into three main classes:
•
•
•
Simplifications made by eliminating some entities from the system (variables, subsystems, etc.).
Simplifications made by aggregating some entities of the system (variables, states, etc.).
Simplifications made by breaking overall systems into appropriate
subsystems.
Regardless of the strategy employed, the principle of minimum uncertainty is
utilized in the same way. It is an arbiter that decides which simplifications to
9.2. PRINCIPLE OF MINIMUM UNCERTAINTY
359
choose at any given level of complexity or, alternatively, which simplifications
to choose at any given level of acceptable uncertainty. Let us describe this
important role of the principle of minimum uncertainty in simplification problems more formally.
Let Z denote a system of some type and let QZ denote the set of all simplifications of Z that are considered admissible in a given context. For example,
QZ may be the set of simplifications of Z that are obtained by a particular simplification method. Let £c and £u denote preference orderings on QZ that are
based upon complexity and uncertainty, respectively. In general, systems with
smaller complexity and smaller uncertainty are preferred. The two preference
orderings can be combined in a joint preference ordering, £, defined as follows:
for any pair of systems Zi, Zj Œ QZ, Zi £ Zj if and only if Zi £c Zj and Zi £u Zj.
The joint preference ordering is usually a partial ordering, even though it may
be only a weak ordering in some simplification problems (reflexive and transitive relation on QZ). The use of the uncertainty preference ordering, which,
in this case, exemplifies the principle of minimum uncertainty, enables us to
reduce all admissible simplifications to a small set of preferred simplifications.
The latter forms a solution set, SOLZ, of the simplification problem, which consists of those admissible simplifications in QZ that are either equivalent or
incomparable in terms of the joint preference ordering. Formally,
SOLZ = {Zi ŒQZ for all Z j ŒQZ , Z j £ Zi implies Zi £ Z j }.
Observe that the solution set SOLZ in this formulation, which may be called
an unconstrained simplification problem, contains simplifications at various
levels of complexity and with various degrees of uncertainty. The problem can
be constrained, for example, by considering simplifications admissible only
when their complexities are at some designated level or when they are below
a specified level, when their uncertainties do not exceed a certain maximum
acceptable level, and the like. The formulation of these various constrained
simplification problems differs from the preceding formulation only in the definition of the set of admissible simplifications QZ.
EXAMPLE 9.1. The aim of this example is to illustrate the role of the principle of minimum uncertainty in determining the solution set of preferable
simplifications of a given system. The example deals with a simple diagnostic
system, Z, with one input variable, x, and one output variable, y, whose states
are in sets X = {0, 1, 2, 3} and Y = {0, 1}, respectively. It is assumed that the
states are ordered in the natural way: 0 £ 1 £ 2 £ 3. The relationship between
the variables is expressed by the joint probability distribution function on X
¥ Y that is given in Table 9.1a. The diagnostic uncertainty of the system is
S(Y X ) = S( X , Y ) - S( X )
= 2.571 - 1.904
= 0.667.
360
9. METHODOLOGICAL ISSUES
Table 9.1. Preferred Simplifications Determined by the Principle of Minimum
Uncertainty (Example 9.1)
Z:
p(x, y)
Y
0
1
2
3
X
0
1
0.10
0.00
0.15
0.05
0.30
0.15
0.05
0.20
X
pX(x)
0
1
2
3
0.40
0.15
0.20
0.25
X1
pX1(x)
{0,1}
{2}
{3}
0.55
0.20
0.25
X2
pX2(x)
{0}
{1,2}
{3}
0.40
0.35
0.25
S(X, Y) = 2.571
S(X) = 1.904
S(Y | X) = 0.667
I(Y | X) = 0.333
(a)
Z1:
p1(x, y)
X1
Z2:
{0,1}
{2}
{3}
Y
0
1
0.10
0.15
0.05
0.45
0.05
0.20
p2(x, y)
X2
Z3:
{0}
{1,2}
{3}
Y
0
1
0.10
0.15
0.05
0.30
0.20
0.20
p3(x, y)
X3
{0}
{1}
{2,3}
Y
X3
0
1
0.10
0.00
0.20
0.30
0.15
0.25
{0}
{1}
{2,3}
S(X1,Y) = 2.158
S(X1) = 1.439
S(Y | X1) = 0.719
I(Y | X1) = 0.281
S(X2,Y) = 2.409
S(X2) = 1.559
S(Y | X2) = 0.850
I(Y | X2) = 0.150
pX3(x)
0.40
0.15
0.45
S(X3,Y) = 2.228
S(X3) = 1.458
S(Y | X3) = 0.770
I(Y | X3) = 0.230
(b)
The amount of diagnostic information contained in the system, I(Y | X), is the
difference between the maximum uncertainty allowed within the experimental frame, Smax(Y | X) = log28 - log24 = 1, and the actual uncertainty: I(Y | X) =
1 - 0.667 = 0.333. Now assume that we want to simplify the system by quantizing states of the input variable. Since the states of X are ordered, there are
six meaningful quantizations (with at least two states), which are expressed by
the following partitions on X:
9.2. PRINCIPLE OF MINIMUM UNCERTAINTY
361
Table 9.1. Continued
Z4:
p4(x, y)
X4
Z5:
{0,1}
{2,3}
Y
0
1
0.10
0.20
0.45
0.25
p5(x, y)
X5
Z6:
{0,1,2}
{3}
Y
0
1
0.25
0.05
0.50
0.20
p6(x, y)
X6
{0}
{1,2,3}
Y
0
1
0.10
0.20
0.30
0.40
X4
pX4(x)
{0,1}
{2,3}
0.55
0.45
X5
pX5(x)
{0,1,2}
{3}
0.75
0.25
X6
pX6(x)
{0}
{1,2,3}
0.40
0.60
S(X4,Y) = 1.815
S(X4) = 0.993
S(Y | X4) = 0.822
I(Y | X4) = 0.178
S(X5,Y) = 1.680
S(X5) = 0.811
S(Y | X5) = 0.869
I(Y | X5) = 0.131
S(X6,Y) = 1.846
S(X6) = 0.971
S(Y | X6) = 0.875
I(Y | X6) = 0.125
(c)
X 1 = {{0, 1}, {2}, {3}},
X 2 = {{0}, {1, 2}, {3}},
X 3 = {{0}, {1}, {2, 3}},
X 4 = {{0, 1}, {2, 3}},
X 5 = {{0, 1, 2}, {3}},
X 6 = {{0}, {1, 2, 3}}.
Simplifications Z1, Z2, Z3, which are based on partitions X1, X2, X3, respectively,
are shown in Table 9.1b. These simplifications have the same complexity, which
is expressed in this example by the number of states of the input variable.
Among them, simplification Z1 has the smallest diagnostic uncertainty,
S(Y | X1) = 0.719, and the smallest loss of information with respect to the given
system Z:
I (Y X ) - I (Y X 1 ) = 0.052.
Simplifications Z4, Z5, Z6, which are based on partitions X4, X5, X6, respectively,
are shown in Table 9.1c. They have the same complexity and, clearly, they are
362
9. METHODOLOGICAL ISSUES
less complex than simplifications Z1, Z2, Z3. Among them, simplification Z4 has
the smallest diagnostic uncertainty, S(Y | X4) = 0.822. The loss of information
with respect to Z is
I (Y X ) - I (Y X 4 ) = 0.155,
and with respect to Z1:
I (Y X 1 ) - I (Y X 4 ) = 0.103.
The solution set (including the given system Z) is:
SOLZ = {Z , Z1 , Z4 }.
EXAMPLE 9.2. A relation R among variables x1, x2, x3 is defined by the set
of possible triples listed in Table 9.2a. Observe that the state sets of the variables are, respectively, X1 = {0, 1} and X2 = X3 = {0, 1, 2}. The relation may be
utilized for providing us with useful information about the state of any one of
the variables, given a joint state of the remaining two variables. Relevant
amounts of uncertainty for this purpose, which are expressed in this case by
the Hartley functional, are shown in Table 9.2b under the heading R. Also
shown in the table are the associated amounts of information.
The aim of this example is to illustrate the role of the principle of minimum
uncertainty in choosing simplifications based on the quantization of variables
x2 and x3. Assuming that the states in X2 and X3 are ordered, each of the sets
can be quantized by using one of the following functions:
q1 : 0 Æ 0
1Æ 0
2Æ1
q2 : 0 Æ 0
1Æ1
2 Æ 1.
When we apply function q1 to both X2 and X3 in relation R, we obtain the simplified relation R11 shown in Table 9.2a, where duplicate states are crossed out.
By applying the other combinations of functions q1 and q2, we obtain the other
simplified relations in Table 9.2a (R12, R21, and R22), where each subscript indicates the combination employed. For each of the four simplified relations, relevant amounts of uncertainty and information are given in Table 9.2b. Observe
that the simplifications with the minimum amount of uncertainty (indicated
in the table by bold entries) are also correctly the ones with the maximum
amount of information (or with minimum loss of information with respect to
R). For variable x1, any of relations R12, R21, and R22 qualifies as the best simplification; for variable x2, R11 is the only minimum-uncertainty simplification;
for x3, both R21 and R22 are minimum-uncertainty simplifications. Observe that
R12 does not preserve any information of R with respect to variable x3.
Table 9.2. Preferred Simplification Determined by the Principle of Minimum Uncertainty (Example 9.2)
R
R11
R12
R21
R22
x1
x2
x3
x1
x2
x3
x1
x2
x3
x1
x2
x3
x1
x2
x3
0
0
0
0
0
0
1
1
1
1
0
0
1
1
2
2
0
1
1
1
1
2
0
1
0
1
0
0
1
2
0
0
0
0
0
0
1
1
1
1
0
0
0
0
1
1
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1
1
1
1
0
0
0
0
1
1
0
0
0
0
1
1
0
1
0
1
0
0
1
1
0
0
0
0
0
0
1
1
1
1
0
0
1
1
1
1
0
1
1
1
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1
1
1
1
0
0
1
1
1
1
0
1
1
1
1
1
0
1
0
1
0
0
1
1
(a)
R
R11
R12
R21
R22
log2 10 = 3.322
log2 5 = 2.321
log2 6 = 2.585
log2 8 = 3.000
log2 5 = 2.321
log2 3 = 1.585
log2 4 = 2.000
log2 3 = 1.585
log2 6 = 2.585
log2 3 = 1.585
log2 4 = 2.000
log2 4 = 2.000
log2 6 = 2.585
log2 4 = 2.000
log2 4 = 2.000
log2 4 = 2.000
log2 6 = 2.585
log2 4 = 2.000
log2 4 = 2.000
log2 4 = 2.000
H(X1|X2 ¥ X3)
H(X2|X1 ¥ X3)
H(X3|X1 ¥ X2)
0.322
0.737
1.000
0.736
0.321
0.736
0.585
0.585
1.000
0.585
0.585
0.585
0.585
0.585
0.585
I(X1|X2 ¥ X3)
I(X2|X1 ¥ X3)
I(X3|X1 ¥ X2)
0.678
0.848
0.585
0.264
0.679
0.264
0.415
0.415
0.000
0.415
0.415
0.415
0.415
0.415
0.415
H(X1 ¥ X2 ¥ X3)
H(X1 ¥ X2)
H(X1 ¥ X3)
H(X2 ¥ X3)
363
(b)
364
9. METHODOLOGICAL ISSUES
When departing from the two classical uncertainty theories, several options
open for applying the principle of minimum uncertainty. Using Figure 6.15 as
a guide, the following options can be identified:
1.
2.
3.
4.
To minimize the generalized Hartley functional GH.
To minimize the aggregated uncertainty S̄.
To minimize either of the total uncertainty components in TU or aTU.
To consider both components in TU or in aTU and replace a single uncertainty preference ordering with two preference orderings, one for each
component.
Among these alternatives, the first one seems conceptually the most fundamental. This alternative, which may be called a principle of minimum nonspecificity, guarantees that the imprecision in probabilities does not increase
more than necessary when we simplify a system to some given level of complexity. This alternative is also computationally attractive since the functional
to be minimized (the generalized Hartley measure) is a linear functional. The
aim of the following example is to illustrate some of the other alternatives.
EXAMPLE 9.3. Consider a system Z with one input variable, x1, and one
output variable, x2, in which the relationship between the variables is expressed
in terms of the joint interval-valued probability distribution given in Table
9.3a. To illustrate the principle of minimum uncertainty, let us consider two
simplifications of the systems by quantizing the input variable via either function q1 or function q2 introduced in Example 9.2. Interval-valued probability
distributions of the two simplifications, Z1 and Z2, are shown in Table 9.3b.They
are also shown with their marginals in Figure 9.2. Their complete formulations,
including the Möbius representations, are in Table 9.4. Subsets A of the Cartesian product {0, 1}2 are defined by their characteristic functions. Relevant conditional uncertainties (since x1 is an input variable and x2 is an output variable)
are given in Table 9.3c. They are calculated by the differences between the
uncertainties on X1 ¥ X2 (shown in Table 9.4) and the uncertainties on X1
(shown in Figure 9.2). We can conclude that (a) Z1 is preferred according to S̄
and generalized Shannon (GS); (b) Z2 is preferred according to GH; and (c)
both Z1 and Z2 are accepted in terms of TU = ·GH, GSÒ, since they are not
comparable in terms of the joint preference ordering.
9.2.2. Conflict-Resolution Problems
Another application of the principle of minimum uncertainty is the area of
conflict-resolution problems. For example, when we integrate several overlapping subsystems into one overall system, the subsystems may be locally inconsistent in the following sense. An overall system composed of subsystems is
said to be locally inconsistent if it contains at least one pair of subsystems that
share some variables and whose uncertainty functions project to distinct mar-
9.2. PRINCIPLE OF MINIMUM UNCERTAINTY
365
Table 9.3. Illustration to Example 9.3
(a) Given System Z
x1
x2
0
0
1
1
2
2
1
1
0
1
0
1
m(x1, x2)
¯
0.1
0.3
0.0
0.2
0.0
0.2
m̄(x1, x2)
0.3
0.4
0.0
0.4
0.2
0.4
(b) Simplifications Z1 and Z2 of System Z
Z1
x1
x2
0
0
1
1
1
1
0
1
m1(s)
¯
0.1
0.5
0.0
0.2
Z2
m̄ 1(s)
0.3
0.7
0.2
0.4
m2(s)
¯
0.1
0.3
0.0
0.4
m̄ 2(s)
0.3
0.4
0.2
0.6
(c) Uncertainties of Simplifications Z1 and Z2
Z1
S̄(X2 | X1) = 0.814
GH(X2 | X1) = 0.200
GS(X2 | X1) = 0.614
Z2
S̄(X2 | X1) = 0.871
GH(X2 | X1) = 0.158
GS(X2 | X1) = 0.713
ginal uncertainty functions based on the shared variables. An example of
locally inconsistent probabilistic subsystems is shown in Figure 9.3a.
Local inconsistency among subsystems that form an overall system is a kind
of conflict. If it is not resolved, the overall system is not meaningful. For
example, the overall system in Figure 9.3a is not meaningful because no joint
probability distribution function exists on X ¥ Y ¥ Z that is consistent with
both probability distribution functions 1p, 2p of the two given subsystems.
Two attitudes toward locally inconsistent collections of subsystems can be
recognized. According to one of them, such collections should be rejected on
the basis of the fact that they do not represent any overall systems. According
to the other attitude, the local inconsistencies should be resolved by modifying the given uncertainty functions of the subsystems to achieve their consistency. However, this can usually be done in numerous ways. The right way to
do that, on epistemological grounds, is to obtain the consistency with the smallest possible total loss of information contained in the given uncertainty functions. The total loss of information is expressed by the sum of information
losses for the individual subsystems. Thus, resolving local inconsistency can be
formulated, in generic terms, as the following optimization problem, where
366
9. METHODOLOGICAL ISSUES
Table 9.4. Complete Formulation of the Two Simplifications Discussed in Example 9.3
Z1
x1x2
A:
00
01
10
11
0
1
0
0
0
1
1
1
0
0
0
1
1
1
0
1
0
0
1
0
0
1
0
0
1
1
0
1
1
0
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
1
1
1
0
0
0
0
1
0
0
1
0
1
1
0
1
1
1
1
m1(A)
¯
0.0
0.1
0.5
0.0
0.2
0.6
0.1
0.3
0.5
0.7
0.2
0.6
0.8
0.3
0.7
1.0
Z2
m̄ 1(A)
m1(A)
0.0
0.3
0.7
0.2
0.4
0.8
0.3
0.5
0.7
0.9
0.4
0.8
1.0
0.5
0.9
1.0
0.0
0.1
0.5
0.0
0.2
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.2
GH(X1 ¥ X2) = 0.400
S̄(X1 ¥ X2) = 1.785
GS(X1 ¥ X2) = 1.385
TU1 = ·0.4, 1.385Ò
m2(A)
¯
0.0
0.1
0.3
0.0
0.4
0.4
0.1
0.5
0.3
0.7
0.4
0.4
0.8
0.6
0.7
1.0
m̄ 2(A)
m2(A)
0.0
0.3
0.4
0.2
0.6
0.6
0.3
0.7
0.5
0.9
0.6
0.6
1.0
0.7
0.9
1.0
0.0
0.1
0.3
0.0
0.4
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.1
0.0
0.1
GH(X1 ¥ X2) = 0.358
S̄(X1 ¥ X2) = 1.871
GS(X1 ¥ X2) = 1.513
TU2 = ·0.358, 1.513Ò
I(su, sû) denotes the loss of information when uncertainty function su is
replaced with uncertainty function sû.
Given a family of subsystems {sZ | s Œ⺞n} whose uncertainty functions, su (s
Œ ⺞n), formalized in uncertainty theory T, are locally inconsistent, determine
locally consistent counterparts of these uncertainty functions, sû (s Œ ⺞n), for
which the functional
n
 I ( u, uˆ )
s
s
s =1
reaches its minimum subject to the following constraints:
(a)
(b)
Axioms of theory T.
Conditions of local consistency of uncertainty functions sû(s Œ ⺞n).
When for example, this optimization problem is formulated within probability theory, then su and sû are probability distribution functions: (a) are
axioms of probability theory, (b) are linear algebraic equations, and
I ( s u, s uˆ ) =
Â
x ŒX s
s
u(x) log 2
s
s
u(x)
,
uˆ (x)
9.2. PRINCIPLE OF MINIMUM UNCERTAINTY
[0.1, 0.3]
367
[0.5, 0.7]
00
01
0
[0.6, 0.8]
S (X1) = S(0.6, 0.4)
= 0.971
Z1
[0.0, 0.2]
[0.2, 0.4]
10
11
[0.2, 0.4]
GH(X1) = 0.2
1
GS(X1) = 0.971 – 0.2
= 0.771
0
[0.1, 0.8]
[0.1, 0.3]
00
1
[0.7, 0.9]
[0.3, 0.4]
01
0
[0.4, 0.6]
S (X1) = 1
Z2
[0.0, 0.2]
[0.4, 0.6]
10
11
0
[0.1, 0.3]
[0.4, 0.6]
1
GH(X1) = 0.2
GS(X1) = 0.8
1
[0.7, 0.9]
Figure 9.2. Two simplifications discussed in Example 9.3.
where Xs is the state set of subsystem Zs, is the directed divergence introduced
in Section 3.3 (see Eq. (3.56)).
The appearance of local inconsistencies among subsystems of an overall
system is an indicator that the claims expressed by the uncertainty functions
of the subsystems are not fully warranted under the given evidence.Thus, modifying them in a minimal way (with minimum loss of information) to achieve
their consistency, as facilitated by the described optimization problems, is an
epistemologically sound conflict-resolution strategy.
EXAMPLE 9.4. To illustrate the described optimization problem by a specific example, let us consider the two locally inconsistent subsystems described
368
9. METHODOLOGICAL ISSUES
1
p(xi, yj)
X
= 1pij
x1 x2
Y y1 0.5 0.1
y2 0.2 0.2
Y
y1
y2
Subsystem Z1
1
pY(yj)
0.6
0.4
π
2
p(zk, yj)
= 2pkj
z1
z2
0.4 0.25 y1
Y
0.15 0.2 y2
Z
2
pY (yj) Y
0.65 y1
0.35 y2
Subsystem Z2
Inconsistent marginals
(a)
1
X
p̂ (xi, yj)
x1
Z
x2
Y
y1
y2
= 1 p̂ij
y
Y y1 0.5208 0.1042
2
0.1875 0.1875
1
p̂Y (yj)
0.625
0.375
Y
0.625 y1
0.375 y2
2
=
z1
pY (yj)
0.3846 0.2404 y1
0.1632 0.2118 y2
Consistent marginals
Subsystem Ẑ1
2
z2
pˆ (zk , yj )
= 2 p̂kj
Y
Subsystem Ẑ 2
(b)
Figure 9.3. Resolving local inconsistency of subsystems (Example 9.4). (a) Locally inconsistent subsystems Z1 and Z2; (b) consistent subsystem Ẑ 1 and Ẑ 2 obtained by minimizing the loss
of information in given subsystems Z1 and Z2.
in Figure 9.3a. Denoting, for convenience, 1p(xi, yj) = 1pij, 2p(zk, yj) = 2pkj, 1p̂(xi,
yj) = 1p̂ij, and 2p̂(zk, yj) = 2p̂kj for all i, j, k Œ {1, 2}, the optimization problem in
this case has the following form.
Given probabilities 1pij and 2pkj(i, j, k Œ {1, 2}), determine the values of 1p̂ij
and 2p̂kj(i, j, k Œ {1, 2}) for which the functional
2
1
2
ÂÂ
1
pij log 2
i =1 j =1
2
2
2
pij
pkj
2
+
p
log
kj
2
Â
Â
1ˆ
2ˆ
pij k =1 j =1
pkj
reaches its minimum under the following constraints:
(c1)
(c2)
(c3)
(c4)
(c5)
(c6)
1
p̂11 + 1p̂12 + 1p̂21 + 1p̂22 = 1
2
p̂11 + 2p̂12 + 2p̂21 + 2p̂22 = 1
1
p̂ij ≥ 0 for all i, j Œ {1, 2}
2
p̂kj ≥ 0 for all k, j Œ {1, 2}
1
p̂11 + 1p̂21 = 2p̂11 + 2p̂21
1
p̂12 + 1p̂22 = 2p̂12 + 2p̂22
Constraints (c1)–(c4) capture axioms of probability theory; constraints (c5)
and (c6) specify conditions for local consistency of the two subsystems.
9.3. PRINCIPLE OF MAXIMUM UNCERTAINTY
369
The objective functional in the optimization problem, which measures the total
loss of information for each modification of the probability distribution functions of subsystems Z1 and Z2, is within the given constraints positive (due to
the Gibbs inequality) and convex. Hence, the optimization problem has a
unique solution, which is given in Figure 9.3b.The loss of information in resolving the local inconsistency in this example is 0.0021.
9.3. PRINCIPLE OF MAXIMUM UNCERTAINTY
The principle of maximum uncertainty allows us to develop epistemologically
sound procedures for dealing with a wide variety of problems that involve
ampliative reasoning—any reasoning in which conclusions are not entailed in
the given premises.
Ampliative reasoning is indispensable to science in a variety of ways.
For example, whenever we utilize a given system for predictions, we employ
ampliative reasoning. Similarly, when we want to estimate microstates from
the knowledge of relevant macrostates and partial information regarding the
microstates (as in image processing and many other problems), we must resort
to ampliative reasoning. The problem of the identification of an overall system
from some of its subsystems is another example that involves ampliative
reasoning.
Ampliative reasoning is also common and important in our daily lives,
where, unfortunately, the principle of maximum uncertainty is not always
adhered to. Its violation leads almost invariably to conflicts in human communication, as was well expressed by Bertrand Russell in his Unpopular
Essays [1950]:
[W]henever you find yourself getting angry about a difference in opinion, be on
your guard; you will probably find, on examination, that your belief is getting
beyond what the evidence warrants.
9.3.1. Principle of Maximum Entropy
The principle of maximum uncertainty is well developed and broadly utilized
within classical information theory, where it is called the principle of maximum
entropy. It is formulated, in generic terms, as the following optimization
problem: determine a probability distribution ·p(x) | x ŒX Ò that maximizes
the Shannon entropy subject to given constraints c1, c2, . . . , cn, which express
partial information about the unknown probability distribution, as well as the
general constraints (axioms) of probability theory.The most typical constraints
employed in practical applications of the maximum entropy principle are
mean (expected) values of one or more random variables or various marginal
probability distributions of an unknown joint distribution.
370
9. METHODOLOGICAL ISSUES
As an example, consider a random variable x with possible (given) nonnegative real values x1, x2, . . . , xn. Assume that probabilities pi of values
xi (i Œ ⺞n) are not known, although we do know the mean (expected) value
E(x) of the variable. Employing the maximum entropy principle, we estimate
the unknown probabilities pi, i Œ ⺞n, by solving the following optimization
problem, in which values xi (i Œ ⺞n) and their expected value are given and
probabilities pi (i Œ ⺞n) are to be determined.
Maximize the functional
n
S( p1 , p2 , . . . , pn ) = -Â pi ln pi
i =1
subject to the constraints
E(x) =
n
 px
(9.1)
i i
i=1
and
pi ≥ 0 (i Œ ⺞n ),
n
Âp
i
= 1.
(9.2)
i=1
For the sake of simplicity, in this formulation we use the natural logarithm
in the definition of Shannon entropy. Clearly, the solution to this problem is
not affected by changing the base of the logarithm in the objective functional.
Equation (9.1) represents the available information; Eq. (9.2) represents the
standard constraints imposed on p by probability theory.
First, we form the Lagrange function
n
Ê n
ˆ
Ê n
ˆ
L = -Â pi ln pi - a Á Â pi - 1˜ - b Á Â pi xi - E ( x)˜ ,
Ë
¯
Ë
¯
i =1
i =1
i =1
where a and b are the Lagrange multipliers that correspond to the two constraints. Second, we form the partial derivatives of L with respect to pi (i Œ
⺞n), a, and b, and set them equal to zero; this results in the equations
∂L
= - ln pi - 1 - a - bxi = 0 for each i Œ⺞n
∂ pi
n
∂L
= 1 - Â pi = 0
∂a
i =1
n
∂L
= E ( x) - Â pi xi = 0.
∂b
i =1
9.3. PRINCIPLE OF MAXIMUM UNCERTAINTY
371
The last two equations are exactly the same as the constraints of the optimization problem. The first n equations can be written as
p1 = e -1-a -bx1 = e -(1+a ) e - bx1
p2 = e -1-a -bx 2 = e -(1+a ) e - bx 2
M
M
pn = e -1-a -bx n = e -(1+a ) e - bx n .
When we divide each of these equations by the sum of all of them (which must
be one), we obtain
pi =
e - bxi
Â
n
k =1
e - bxk
(9.3)
for each i Œ ⺞n. In order to determine the value of b, we multiply the ith
equation in Eq. (9.3) by xi and add all of the resulting equations, thus
obtaining
Â
E ( x) =
Â
n
i =1
n
xi e - bxi
i =1
e - bxi
and
n
Âx e
i
i =1
- bxi
n
- E ( x)Â e - bxi = 0.
i =1
Multiplying this equation by ebE(x) results in
n
 [x
i
- E(x)]e - b [ x i -E ( x )] = 0.
(9.4)
i=1
This equation must now be solved (numerically) for b and the solution
substituted for b in Eq. (9.3), which results in the estimated probabilities
pi (i Œ ⺞n).
EXAMPLE 9.5. To illustrate this application of the maximum entropy principle, first consider an “honest” (unbiased) die. Here xi = i for i Œ ⺞6 and
E(x) = 3.5. Equation (9.4) has the form
-2.5e 2.5 b - 1.5e1.5 b - 0.5e 0.5 b + 0.5e -0.5 b + 1.5e -1.5 b + 2.5e -2.5 b = 0.
372
9. METHODOLOGICAL ISSUES
The solution is clearly b = 0; when this value is substituted for b into Eq. (9.3),
we obtain the uniform probability pi = 1/6 for all i Œ ⺞6.
Now consider a biased die for which it is known that E(x) = 4.5. Equation
(9.4) assumes a different form
-3.5e 3.5 b - 2.5e 2.5 b - 1.5e1.5 b - 0.5e 0.5 b + 0.5e -0.5 b + 1.5e -1.5 b = 0.
When solving this equation (by a suitable numerical method), we obtain
b = -0.37105. Substitution of this value for b into Eq. (9.3) yields the maximum
entropy probability distribution:
p1 =
1.45
= 0.05
26.66
p2 =
2.10
= 0.08
26.66
p3 =
3.04
= 0.11
26.66
p4 =
4.41
= 0.17
26.66
p5 =
6.39
= 0.24
26.66
p6 =
9.27
= 0.35.
26.66
Our only knowledge about the random variable x in the examples discussed
is the knowledge of its expected value E(x). It is expressed by Eq. (9.1) as a
constraint on the set of relevant probability distributions. If E(x) were not
known, we would be totally ignorant about x, and the maximum entropy principle would yield the uniform probability distribution (the only distribution
for which the entropy reaches its absolute maximum).The entropy of the probability distribution given by Eq. (9.3) is usually smaller than the entropy of the
uniform distribution, but it is the largest entropy from among all the entropies
of the probability distributions that conform to the given expected value E(x).
A generalization of the principle of maximum entropy is the principle of
minimum cross-entropy. It can be formulated as follows: given a prior probability distribution function p¢ on a finite set X and some relevant new evidence,
determine a new probability distribution function p that minimizes the crossentropy Ŝ given by Eq. (3.56) subject to constraints c1, c2, . . . , cn, which represent the new evidence, as well as to the standard constraints of probability
theory.
New evidence reduces uncertainty. Hence, uncertainty expressed by p is, in
general, smaller than uncertainty expressed by p¢. The principle of minimum
cross-entropy helps us to determine how much smaller it should be. It allows
9.3. PRINCIPLE OF MAXIMUM UNCERTAINTY
373
us to reduce the uncertainty of p¢ by the smallest amount necessary to satisfy
the new evidence. That is, the posterior probability distribution function p estimated by the principle has the largest uncertainty among all other probability distribution functions that conform to the evidence.
9.3.2. Principle of Maximum Nonspecificity
When the principle of maximum uncertainty is applied within the classical possibility theory, where the only recognized type of uncertainty is nonspecificity,
it is reasonable to describe this restricted application of the principle by a more
descriptive name—a principle of maximum nonspecificity. This specialized
principle is formulated as an optimization problem in which the objective functional is based on the Hartley measure (basic or conditional, Hartley-based
information transmission, etc.). Constraints in this optimization problem
consist of the axioms of classical possibility theory and any information pertaining to possibilities of the considered alternatives.
According to the principle of maximum nonspecificity in classical possibility theory, any of the considered alternatives that do not contradict given evidence should be considered possible. An important problem area, in which this
principle is crucial, is the identification of n-dimensional relations from the
knowledge of some of their projections. It turns out that the solution obtained
by the principle of maximum nonspecificity in each of these identification
problems is the cylindric closure of the given projections. Indeed, the cylindric
closure is the largest and, hence, the most nonspecific n-dimensional relation
that is consistent with the given projections. The significance of this solution
is that it always contains the true but unknown overall relation.
A particular method for computing cylindric closure is described and illustrated in Examples 2.3 and 2.4. A more efficient method is to simply join all
the given projections by the operation of relational join (introduced in Section
1.4) and, if relevant, eliminate inconsistent outcomes.This method is illustrated
in the following example.
EXAMPLE 9.6. Consider a possibilistic system with three 2-valued variables,
x1, x2, x3, that is discussed in Example 2.4. The aim is to identify the unknown
ternary relation among the variables (a subset of the set of overall states listed
in Figure 9.4a) solely from the knowledge of some of its projections. It is
assumed that we know two of the binary projections specified in Figure 9.4b
or all of them. As is shown in Example 2.4, the identification is not unique in
any of these cases. When applying the principle of maximum nonspecificity, we
obtain in each case the least specific ternary relation, which is the cylindric
closure of the given projections. The aim of this example is to show that an
efficient way of determining the cylindric closure is to apply the operation of
relational join.
Assume that projections R12 and R23 are given. Taking their relational join
R12 * R23, as illustrated in Figure 9.4c, we readily obtain their cylindric closure
(compare with the same result in Example 2.4). In a similar way, we obtain the
States
s0
s1
s2
s3
s4
s5
s6
s7
x1
x1
0
0
0
0
1
1
1
1
R12
x2
0
0
1
1
0
0
1
1
x2
x3
0
1
0
1
0
1
0
1
R23
x1
R12 : 0
0
1
(a)
x2
0
1
0
x2
R23 : 0
0
1
x3
0
1
1
(b)
x1
R13 : 0
0
1
x3
x1
x3
0
1
0
R13
x3
R23-1
x2
0
0
0
0
0
0
1
1
1
1
1
1
States
s0
s1
s3
s4
s5
x2
x1
0
0
0
1
1
x2
0
0
1
0
0
R12-1
x1
States
s0
s1
s3
s4
x1
0
0
0
1
x3
x1
R12
x3
0
1
1
0
1
R13
(c)
x2
0
0
1
0
x3
0
1
1
0
x2 R23
(d )
R13-1
x3
x1
0
0
0
0
0
0
0
1
1
1
1
1
1
1
States
s0
s1
s2
s3
s4
R12-1 * R13
x1 x2 x3
0
0
0
0
0
1
0
1
0
0
1
1
1
0
0
States
(e)
s0
—
s1
s3
—
s4
—
(R12 * R23 )* R13-1
x1 x2 x3 x1
0
0 0 0
0
0 0 1
0
0 1 0
0
1 1 0
1
0 0 0
1
0 0 1
1
0 1 0
(f)
Figure 9.4. Computation of cylindric closures (Example 9.6).
374
9.3. PRINCIPLE OF MAXIMUM UNCERTAINTY
375
cylindric closures for the other pairs of projections, as shown in Figure 9.4d
and 9.4e. Observe, however, that we need to use the inverse of one of the projections in these cases to be able to apply the relational join.The order in which
the join is performed is not significant. When the relational join is applied to
all three binary projections, as shown in Figure 9.4f, the outcomes are quadruples, in which one variable appears twice (variable x1 in our case). Any quadruple in which the two appearances of this variable have distinct values must be
excluded (they are shaded in the figure); such a quadruple indicates that the
triple obtained by the join R12 * R23 is inconsistent with projection R13.
The idea of cylindric closure as the least specific identification of an ndimensional relation from some of its projections is applicable to convex
subsets of n-dimensional Euclidean space as well. The main difference is that
there is a continuum of distinct projections in the latter case.
9.3.3. Principle of Maximum Uncertainty in GIT
In all uncertainty theories, except the two classical ones, two types of uncertainty coexist, which are measured by the appropriately generalized Hartley
and Shannon functionals. To apply the principle of maximum uncertainty, we
can use one of these types of uncertainty or both of them. In addition, we can
also use the well-established aggregate of both types of uncertainty.This means
that we can formulate the principle of maximum uncertainty in terms of four
distinct optimization problems, which are distinguished from one another by
the following objective functionals:
(a)
(b)
(c)
(d)
(e)
Generalized Hartley measure GH: Eq. (6.38).
Generalized Shannon measure GS: Eq. (6.64).
Aggregated measure of total uncertainty S̄: Eq. (6.61).
Disaggregated measure of total uncertainty TU = ·GH, GSÒ: Eq. (6.65).
Alternative disaggregated measure of total uncertainty aTU = ·S̄ - S,
SÒ: Eq. (6.75).
Which of the five optimization problems to choose depends a great deal on
the context of each application. However, a few general remarks regarding the
four options readily can be made. First, options (d) and (e) are clearly the most
expressive ones, but their relationship is not properly understood yet. The
utility of option (c) is somewhat questionable since functional S̄ is known to
be highly insensitive to changes in evidence. Of the two remaining options, (a)
seems to be conceptually more fundamental than (b); it is a generalization of
the principle of maximum nonspecificity discussed in Section 9.3.2. Moreover,
option (a) is computationally attractive due to the linearity of the generalized
Hartley measure.
None of the five optimization problems has been properly developed so far
in any of the nonstandard theories of uncertainty. The rest of this section is
thus devoted to illustrating the optimization problems by simple examples.
376
9. METHODOLOGICAL ISSUES
EXAMPLE 9.7. To illustrate the principle of maximum nonspecificity in evidence theory, let us consider a finite universal set X and three of its nonempty
subsets that are of interest to us: A, B, and A « B. Assume that the only evidence on hand is expressed in terms of two numbers, a and b, that represent
the total beliefs focusing on A and B, respectively, (a, b Œ [0, 1]). Our aim is
to estimate the degree of support for A « B based on this evidence.
As a possible interpretation of this problem, let X be a set of diseases considered in an expert system designed for medical diagnosis in a special area
of medicine, and let A and B be sets of diseases that are supported for a particular patient by some diagnostic tests to degrees a and b, respectively. Using
this evidence, it is reasonable to estimate the degree of support for diseases in
A « B by using the principle of maximum nonspecificity. This principle is a
safeguard that does not allow us to produce an answer (diagnosis) that is more
specific than warranted by the evidence.
The use of the principle of maximum nonspecificity in our example leads
to the following optimization problem: Determine values m(X), m(A), m(B),
and m(A « B) for which the functional
GH (m) = m( X ) log 2 X + m( A) log 2 A + m(B) log 2 B
m( A « B) log 2 A « B
reaches its maximum subject to the constraints
m( A) + m( A « B) = a
m(B) + m( A « B) = b
m( X ) + m( A) + m(B) + m( A « B) = 1
m( X ), m( A), m(B), m( A « B) ≥ 0,
where a, b Œ [0, 1] are given numbers.
The constraints are represented in this case by three linear algebraic equations of four unknowns and, in addition, by the requirement that the unknowns
be nonnegative real numbers. The first two equations represent our evidence,
the third equation and the inequalities represent general constraints of evidence theory. The equations are consistent and independent. Hence, they
involve one degree of freedom. Selecting, for example, m(A « B) as the free
variable, we readily obtain
m( A) = a - m( A « B)
m(B) = b - m( A « B)
m( X ) = 1 - a - b + m( A « B).
(9.5)
Since all the unknowns must be nonnegative, the first two equations set the
upper bound of m(A « B), whereas the third equation specifies its lower
bound; the bounds are
9.3. PRINCIPLE OF MAXIMUM UNCERTAINTY
max{0, a + b - 1} £ m( A « B) £ min{a, b}.
377
(9.6)
Using Eqs. (9.5), the objective function now can be expressed solely in terms
of the free variable m(A « B). After a simple rearrangement of terms, we
obtain
GH (m) = m( A « B)[log 2 X - log 2 A - log 2 B + log 2 A « B ]
+ (1 - a - b) log 2 X + a log 2 A + b log 2 B .
Clearly, only the first term in this expression can influence its value, so that we
can rewrite the expression as
GH (m) = m( A « B) log 2 K1 + K 2 ,
(9.7)
where
K1 =
X ◊ A« B
A◊ B
and
K 2 = (1 - a - b) log 2 X + a log 2 A + b log 2 B ,
are constant coefficients. The solution to the optimization problem depends
only on the value of K1. Since A, B, and A « B are assumed to be nonempty
subsets of X, K1 > 0. If K1 < 1, then log2K1 < 0 and we must minimize m(A «
B) to obtain the maximum of GH(m); hence, m(A « B) = max{0, a + b - 1}
due to Eq. (9.6). If K1 > 1, then log2K1 > 0, and we must maximize m(A « B);
hence, m(A « B) = min{a, b} as given by Eq. (9.6). When K = 1, log2K1 = 0, and
GH(m) is independent of m(A « B); this implies that the solution is not unique
or, more precisely, that any value of m(A « B) in the range of Eq. (9.6) is a
solution to the optimization problem. The complete solution thus can be
expressed by the following equations:
ÏÔmax{0, a + b - 1}
m( A « B) = Ì[max{0, a + b - 1}, min{a, b}]
ÔÓmin{a, b}
when K1 < 1
when K1 = 1
when K1 > 1.
The three types of solutions are illustrated visually in Figure 9.5 and given for
specific numerical values in Table 9.5.
EXAMPLE 9.8. The aim of this example is to illustrate an application of the
principle of maximum nonspecificity within the uncertainty theory based on
l-measures. Assume that marginal l-measures, lmX and lmY, are given, where
X = Y = {0, 1}, and we want to determine the unknown joint l-measure, lm, by
378
9. METHODOLOGICAL ISSUES
K1 < 1
GH(m)
K1 > 1
GH(m)
max GH(m)
max GH(m)
0
m(A « B)
1
min{a, b}
max{0, a + b – 1}
GH(m)
0
m(A « B)
1
min{a, b}
max{0, a + b – 1}
K1 = 1
max GH(m)
0
m(A « B)
1
min{a, b}
max{0, a + b – 1}
Figure 9.5. Illustration of the three possible outcomes in Example 9.7.
using the principle of maximum nonspecificity. A simple notation to be used
in this example is introduced in Figure 9.6: values x1, x2, y1, y2 are given; values
a, b, c, d are to be determined. Observe that the given values must satisfy the
equations:
1 - x1 - x2
= l,
x1 x2
1 - y1 - y2
= l,
y1 y2
379
9.3. PRINCIPLE OF MAXIMUM UNCERTAINTY
Table 9.5. Examples of the Three Types of Possible Solutions Discussed in
Example 9.7
(a) Three Particular Examples
Example
|X|
|A|
|B|
|A « B|
a
b
GH[m(A « B)]
(i)
(ii)
(iii)
10
10
20
5
5
10
5
4
12
2
2
7
0.7
0.8
0.4
0.5
0.6
0.5
-0.322m(A « B) + 2.122
1.497
0.222m(A « B) + 3.553
(b) Solutions for Three Particular Examples Obtained by the Principle of
Maximum Nonspecificity
Example
(i)
(ii)
(iii)
Type
m(A « B)
m(A)
m(B)
m(X)
K1 < 1
K1 = 1
K1 > 1
0.2
[0.4, 0.6]
0.4
0.5
[0.2, 0.4]
0.0
0.3
[0.0, 0.2]
0.1
0.0
[0.0, 0.2]
0.5
Y
l
m (x, y)
X
l
mX (x)
0
1
0
x1
0
a
b
1
x2
1
c
d
Y
0
1
mX (x)
y1
y2
l
X
Figure 9.6. Notation employed in Example 9.8.
The relationship between the joint and marginal measures is expressed by the
following equations:
a + b + lab = x1
c + d + lcd = x2
a + c + lcd = y1
b + d + lbd = y2 .
To determine the solution set of nonnegative values for a, b, c, d, we need one
free variable. For example, choosing c as the free variable, we readily obtain
the following dependencies of the remaining variables on c:
380
9. METHODOLOGICAL ISSUES
a=
y1 - c
1 + lc
(from the third equation),
d=
x2 - c
1 + lc
(from the second equation),
b=
x1 (1 + lc) + c - y1
1 + ly1
(from the first equation).
Since it is required that a, b, c, d ≥ 0, we obtain the following range of acceptable values of c:
Ï y1 - x1 ¸
maxÌ0,
˝ £ c £ min{x2 , y1}.
Ó lx1 + 1 ˛
(9.8)
Each value of c within this range defines a particular joint l-measure, lmc, that
is consistent with the given marginals. One way of determining the least specific of these measures is to search through the range of c by small increments
and calculate for each value of c the measure lmc, its Möbius representation,
and the value of the generalized Hartley measure. The measure with the
maximum value of the Hartley measure is then chosen. In a similar way, joint
l-measures that maximize functionals S̄ or GS can be determined.
To illustrate the procedure for some numerical values, let x1 = 0.6, x2 = 0.1,
y1 = 0.4, and y2 = 0.2. Then, clearly, l = 5, c Œ [0, 0.1], and
a=
0.4 - c
,
1 + 5c
b = 0.067 + 1.333c,
d=
0.1 - c
.
1 + 5c
The maximum of the generalized Hartley measure is 0.573 and it is obtained
for c = 0.069.The least specific l-measure (the one for c = 0.069) and its Möbius
representation are shown in Table 9.6. Also shown in the table are measures
that maximize functionals S̄ (obtained for c = 0.039) and GS (obtained for
c = 0.036). All these measures represent lower probabilities (since l is positive). The corresponding upper probabilities can readily be calculated via the
duality relation.
The problem of identifying an unknown n-dimensional relation from
some of its projections by using the principle of maximum nonspecificity is
illustrated in Example 9.6 for relations formalized in classical possibility
theory. Two common methods for dealing with the problem are: (1) constructing the intersection of cylindric extensions of the given projections;
and (2) combining the given projection by the operation of relational join
9.3. PRINCIPLE OF MAXIMUM UNCERTAINTY
381
Table 9.6. Resulting l-measures in Example 9.8
X¥Y
A:
max GH
00
01
01
11
0
1
0
0
0
1
1
1
0
0
0
1
1
1
0
1
0
0
1
0
0
1
0
0
1
1
0
1
1
0
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
1
1
1
0
0
0
0
1
0
0
1
0
1
1
0
1
1
1
1
5
m(A)
0.000
0.246
1.159
0.069
0.023
0.600
0.400
0.298
0.282
0.200
0.100
0.876
0.692
0.469
0.338
1.000
max S̄
m(A)
5
m(A)
0.000
0.246
1.159
0.069
0.023
0.195
0.085
0.028
0.055
0.018
0.008
0.067
0.023
0.010
0.006
0.008
0.000
0.302
0.119
0.039
0.051
0.600
0.400
0.430
0.181
0.200
0.100
0.756
0.804
0.553
0.278
1.000
max GS
m(A)
5
m(A)
m(A)
0.000
0.302
0.119
0.039
0.051
0.179
0.059
0.077
0.023
0.030
0.010
0.035
0.046
0.015
0.006
0.009
0.000
0.309
0.115
0.036
0.054
0.600
0.400
0.446
0.171
0.200
0.100
0.744
0.817
0.563
0.272
1.000
0.000
0.309
0.115
0.036
0.054
0.177
0.056
0.084
0.021
0.031
0.010
0.032
0.048
0.015
0.006
0.009
and, if relevant, excluding inconsistencies. Both methods can be easily generalized to the theory of graded possibilities, as is illustrated in the following
example.
EXAMPLE 9.9. Consider a system with three variables, x1, x2, x3, that is similar
to the one in Example 9.6.The set of all considered overall states of the system,
specified in Table 9.7a, is the same as in Example 9.6. Again, we want to identify the unknown ternary relation from its projections by using the principle
of maximum nonspecificity. The difference in this example is that the projections are fuzzy relations, which are defined by their membership functions in
Table 9.7b. Since these projections provide us with partial information about
the overall relation, we can interpret them as basic possibility functions. As in
Example 9.6, the least specific overall relation that is consistent with the given
projections is the cylindric closure of the projections. The construction of cylindric closures for fuzzy relations is discussed in Section 7.6.1. Applying this
construction to the projections in this example, we first construct cylindric
extensions of the projections, ER12, ER23, ER13 (as shown in Table 9.7c). Then,
we determine the cylindric closure by taking the standard intersection (based
on the minimum operator) of the cylindric extensions (shown in Table 9.7d).
Recall that the standard operation of intersection of fuzzy sets is the only cutworthy one. This means that each a-cut of the cylindric closure of some fuzzy
projections is a cylindric closure in the classical sense. An alternative, more
efficient way of constructing cylindric closures of fuzzy projections is to use
the relational join introduced in Section 7.5.2 (Eq. (7.34)).
382
9. METHODOLOGICAL ISSUES
Table 9.7. Illustration to Example 9.9
States
x1
x2
x3
s0
s1
s2
s3
s4
s5
s6
s7
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
(a)
x1
x2
R12 (x1, x2)
x2
x3
R23 (x2, x3)
x1
x3
R13 (x1, x3)
0
0
1
1
0
1
0
1
1.0
0.8
0.6
0.0
0
0
1
1
0
1
0
1
0.5
1.0
0.7
0.4
0
0
1
1
0
1
0
1
0.9
1.0
1.0
0.3
(b)
x1
x2
x3
ER12
x1
x2
x3
ER23
x1
x2
x3
ER13
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
1.0
1.0
0.8
0.8
0.6
0.6
0.0
0.0
0
1
0
1
0
1
0
1
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0.5
0.5
1.0
1.0
0.7
0.7
0.4
0.4
0
0
0
0
1
1
1
1
0
1
0
1
0
1
0
1
0
0
1
1
0
0
1
1
0.9
0.9
1.0
1.0
1.0
1.0
0.3
0.3
(c)
x1
x2
x3
Cyl
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0.5
1.0
0.7
0.4
0.5
0.3
0.0
0.0
(d)
9.4. PRINCIPLE OF REQUISITE GENERALIZATION
383
9.4. PRINCIPLE OF REQUISITE GENERALIZATION
GIT, as an ongoing research program, offers us a steadily growing inventory
of diverse uncertainty theories. Each of the theories is a formal mathematical
system, which means that it is subject to some specific assumptions that are
inherent in its axioms. If these assumptions are violated in the context of a
given application, the theory is ill-suited for the application.
The growing diversity of uncertainty theories within GIT makes it increasingly more realistic to find an uncertainty theory whose assumptions are in
harmony with each given application. However, other criteria for choosing a
suitable theory are often important as well, such as low computational complexity or high conceptual transparency of the theory.
Due to the common properties of uncertainty theories recognized within
GIT, emphasized especially in Chapters 4, 6, and 8, it is also feasible to work
within GIT as a whole. In this case, we would move from one theory to another
as needed when dealing with a given application. There are basically two
reasons for moving from one uncertainty theory to another:
1. The theory we use is not sufficiently general to capture uncertainty that
emerges at some stage of the given application. A more general theory
is needed.
2. The theory we use becomes inconvenient at some stage of the given
application (e.g., its computational complexity becomes excessive) and
it is desirable to replace it with a more convenient theory.
These two distinct reasons for replacing one uncertainty theory with
another lead to two distinct principles that facilitate these replacements: a
principle of requisite generalization, which is introduced in this section, and a
principle of uncertainty invariance, which is introduced in Section 9.5.
The following is one way of formulating the principle of requisite generalization:Whenever, at some stage of a problem-solving process involving uncertainty, a given uncertainty theory becomes incapable of representing emerging
uncertainty of some type, it should be replaced with another theory, sufficiently
more general, that is capable of representing this type of uncertainty. As
suggested by the name of this principle, the extent of generalization is not
optional, but requisite, determined by the nature of the emerging uncertainty.
It seems pertinent to compare the principle of requisite generalization with
the principle of maximum uncertainty. Both these principles clearly aim at
an epistemologically honest representation of uncertainty. The difference
between them, a fundamental one, is that the former principle applies to GIT
as a whole, while the latter applies to each individual uncertainty theory within
GIT.
The principle of requisite generalization is introduced here for the first time.
There is virtually no experience with its practical applicability. At this point,
the best way to illustrate it seems to be to describe some relevant examples.
384
9. METHODOLOGICAL ISSUES
Table 9.8. Three Ways of Handling Incomplete Data
(Example 9.10)
v1
v2
i
0
0
1
1
0
1
0
1
0
1
2
3
(a)
i:
0
1
2
3
N(A)
Pro1(A)
Pro2(A)
m(A)
Bel(A)
Pl(A)
A:
1
0
0
0
1
1
0
0
0
1
0
0
1
0
1
0
0
0
1
0
0
1
0
1
0
0
0
1
0
0
1
1
106
55
25
8
121
314
152
128
0.546
0.284
0.129
0.041
0.830
0.675
0.325
0.170
0.369
0.237
0.246
0.148
0.606
0.615
0.385
0.394
0.106
0.055
0.025
0.008
0.121
0.314
0.152
0.128
0.106
0.055
0.025
0.008
0.373
0.445
0.215
0.161
0.632
0.419
0.467
0.288
0.839
0.785
0.555
0.627
(b)
EXAMPLE 9.10. Consider two variables, v1, v2, each of which has two possible states, 0 and 1. For convenience, let the joint states of the variables be
labeled by an index i, as defined in Table 9.8a. Assume that we work within
classical probability theory. Assume further that we have a record of an appreciable number of observations of the variables. Some of the observations
contain a value of both variables, some of them contain a value of only one
variable due to some measurement or communication constraints (not essential for our discussion). When only one variable is observed, it is known that
this observation represents one of two joint states of the variable, but it is not
known which one it actually is. For example, observing that v1 = 1 (and not
knowing the state of v2) means that the joint state may be i = 2 or i = 3, but
we do not know which of them is the actual joint state. There are basically
three ways to handle these incomplete data:
(i) We can just ignore all the incomplete observations. This means,
however, that we ignore information that may be useful, or even
crucial, in some applications.
(ii) We can apply the principle of maximum entropy to fill any gaps in the
data. This results in a particular probability measure, which is obtained
by uniformly distributing the uncertain observations in each pair of
relevant states. Using this principle allows us to stay in probability
theory (and it is epistemologically the best way to deal with incom-
9.4. PRINCIPLE OF REQUISITE GENERALIZATION
385
plete data within probability theory), but the use of the uniform distribution is totally arbitrary (imposed only by the axioms of probability theory) and does not represent the actual uncertainty in this case.
(iii) We can apply the principle of requisite generalization and move from
probability theory to Dempster–Shafer theory (DST), which is sufficiently general to represent the actual uncertainty in this case. Observing a state of one variable only is easily described in DST as observing
a set of two joint states. For example, observing that v1 = 1 (and not
knowing the state of v2) can be viewed as observing the subset {2, 3}
of the four joint states. In this way, uncertainty is represented fully and
accurately.
A numerical example illustrating the three options is shown in Table 9.8b.
It is assumed that we have a record of 1000 observations of the variables, some
containing values of both variables and some containing a value of only one
of the variables. Listed in the table are only subsets A of joint states that are
supported by observations (these are also all focal elements of the representation in DST). Symbols in the table have the following meanings:
N(A): The number of observations of the relevant subsets A of the joint
sets.
Pro1(A): Values of the probability measure based on ignoring incomplete
observations.
Pro2(A): Values of the probability measure derived by the maximum
entropy principle.
m(A): Values of the basic probability assignment function in DST, based on
the frequency interpretation.
Bel(A) and Pl(A): Values of the associated belief and plausibility measures,
respectively.
We can see that DST is the right (or requisite) generalization in this
example. It is capable of fully representing both types of uncertainty that are
inherent in the given evidence. No further generalization is needed. The probabilistic representation derived by the principle of maximum entropy is not
capable of capturing nonspecificity, and hence, it is not a full representation of
uncertainty in this case.
EXAMPLE 9.11. Consider the simplest possible marginal problem in probability theory, which is depicted in Figure 9.7: given marginal probabilities on
the two-element sets, what are the associated joint probabilities? This question can be answered in two very different ways, depending on whether it is
required or not required to remain in probability theory. If it is required, then
we need to determine one particular joint probability distribution function.
The proper way to do that (as is explained in Section 9.3) is to use the prin-
386
9. METHODOLOGICAL ISSUES
?
Example:
p11
p12
•
•
p21
p22
•
•
y1
y2
•
•
p( y1 )
p( y2 )
x1
• p(x )
x2
• p(x )
1
2
Given
p( x1 ) = 0.8, p(x2 ) = 0.2
p( y1 ) = 0.6, p( y2 ) = 0.4
Figure 9.7. The marginal problem in probability theory (Example 9.11).
ciple of maximum entropy. From subadditivity and additivity of the Shannon
entropy, it follows immediately that the maximum-entropy joint probability
distribution function p is the product of the given marginal probability distribution functions. That is, using the notation introduced in Figure 9.7,
pij = p(x1 ) ◊ p( y j ) for i, j = 1, 2.
For the numerical values given in Figure 9.7, we get p11 = 0.48, p12 = 0.32,
p21 = 0.12, and p22 = 0.08.
If it is not required that we remain in probability theory, we should apply
the principle of requisite generalization and move to another theory of uncertainty that is sufficiently general to capture the full uncertainty in this problem.
The full uncertainty is expressed in terms of the set of all joint probability distributions that are consistent with the given marginal probability distributions.
Here we can utilize Example 4.3, in which this set is determined. The following is its formulation for the numerical values in Figure 9.7:
p11 Œ [0.4, 0.6]
p12 = 0.8 - p11 ,
p21 = 0.6 - p11 ,
p22 = p11 - 0.4.
From this set of joint probability distributions, we can derive the associated
lower and upper probabilities, m and m̄, and the Möbius representation,
which are shown in Table 9.9. The monotone measure representing lower
387
9.5. PRINCIPLE OF UNCERTAINTY INVARIANCE
Table 9.9. Illustration of the Principle of Requisite Generalization (Example 9.11)
A:
x1
y1
x1
y2
x2
y1
x2
y2
m(A)
¯
m̄(A)
m(A)
Pro(A)
0
1
0
0
0
1
1
1
0
0
0
1
1
1
0
1
0
0
1
0
0
1
0
0
1
1
0
1
1
0
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
1
1
1
0
0
0
0
1
0
0
1
0
1
1
0
1
1
1
1
0.0
0.4
0.2
0.0
0.0
0.8
0.6
0.4
0.2
0.4
0.2
0.8
0.8
0.6
0.4
1.0
0.0
0.6
0.4
0.2
0.2
0.8
0.6
0.8
0.6
0.4
0.2
1.0
1.0
0.8
0.6
1.0
0.0
0.4
0.2
0.0
0.0
0.2
0.2
0.0
0.0
0.2
0.2
-0.2
-0.2
-0.2
-0.2
0.4
0.0
0.48
0.32
0.12
0.08
0.80
0.60
0.56
0.44
0.40
0.20
0.92
0.88
0.68
0.52
1.00
probabilities is superadditive, but it is not 2-monotone (there are eight violations of 2-monotonicity pertaining to subsets with these elements). Hence, to
fully represent uncertainty in this example, we need to move to the most
general uncertainty theory. This is requisite, not a matter of choice.
9.5. PRINCIPLE OF UNCERTAINTY INVARIANCE
The multiplicity of uncertainty theories within GIT has opened a new area of
inquiry—the study of transformations between the various uncertainty theories. The primary motivation for studying these transformations stems from
questions regarding uncertainty approximations. How to approximate, in a
meaningful way, uncertainty represented in one theory by a representation in
another, less general theory? From the practical point of view, uncertainty
approximations are sought to reduce computational complexity or to make
the representation of uncertainty more transparent to the user. From the theoretical point of view, studying uncertainty approximations enhances our
understanding of the relationships between the various uncertainty theories.
In this section, all uncertainty approximation problems are expressed in
terms of one common principle, which is referred to as the principle of uncertainty invariance. To formulate this principle, assume that T1 and T2 denote two
uncertainty theories such that T2 is less general than or incomparable to T1.
Assume further that uncertainty pertaining to a given problem-solving situation is expressed by a particular uncertainty function u1, and that this function
is to be approximated by some uncertainty function u2 of theory T2. Then,
388
9. METHODOLOGICAL ISSUES
according to the principle of uncertainty invariance, it is required that the
amounts of uncertainty contained in u1 and u2 be the same. That is, the transformation from T1 to T2 is required to be invariant with respect to the amount
of uncertainty. This explains the name of the principle. Due to the unique connection between uncertainty and uncertainty-based information, the principle
may also be viewed as a principle of information invariance or information
preservation. It seems reasonable to compare this principle, in a metaphoric
way, with the principle of energy preservation in physics.
The following are examples of some obvious types of uncertainty approximations, some of which are examined in this section:
•
•
•
•
•
•
Approximating belief measures by probability measures, measures of
graded possibilities, l-measures, or k-additive measures.
Approximating probability measures by measures of graded possibilities
and vice versa.
Approximating measures of graded possibilities by crisp possibility
measures.
Approximating k-monotone measures (k finite) by belief measures.
Approximating 2-monotone measures by measures based on reachable
interval-valued distributions.
Approximating fuzzified measures of some type by their crisp
counterparts.
Observe that in virtually all these approximations, the principle of uncertainty invariance can be formulated in terms of the total aggregated uncertainty, S̄, or in terms of one or both on its components, GH and GS. These
variants of the principle lead, of course, to distinct approximations. Their
choice depends on what type of uncertainty we desire to preserve in each
application.
Preserving the amount of uncertainty of a certain type need not be the only
requirement employed in formulating the various problems of uncertainty
approximation. Other requirements may be added to the individual formulations in the context of each application. A common requirement is that uncertainty function u1 can be converted to its counterpart u2 by an appropriate
scale, at least ordinal.
In the rest of this section, only some of the approximation problems are
examined and illustrated by specific examples. This area is still not sufficiently
developed to warrant a more comprehensive coverage.
9.5.1. Computationally Simple Approximations
Assume that the measure of uncertainty employed in the principle of uncertainty invariance is the functional S̄. Then, all uncertainty approximations in
which a given uncertainty function u1 is to be approximated by a probability
9.5. PRINCIPLE OF UNCERTAINTY INVARIANCE
389
Table 9.10. Probabilistic Approximation of a Given l-Measure (Example 9.12)
Given
X
A:
a
b
c
d
0
1
0
0
0
1
1
1
0
0
0
1
1
1
0
1
0
0
1
0
0
1
0
0
1
1
0
1
1
0
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
1
1
1
0
0
0
0
1
0
0
1
0
1
1
0
1
1
1
1
m(A)
¯
0.000
0.302
0.119
0.039
0.051
0.600
0.400
0.430
0.181
0.200
0.100
0.756
0.804
0.553
0.278
1.000
Approximated
m̄(A)
m(A)
Pro(A)
m(A)
0.000
0.722
0.447
0.196
0.244
0.900
0.800
0.819
0.570
0.600
0.400
0.949
0.961
0.881
0.698
1.000
0.000
0.302
0.119
0.039
0.051
0.179
0.059
0.077
0.023
0.030
0.010
0.035
0.046
0.015
0.006
0.009
0.000
0.302
0.298
0.196
0.204
0.600
0.498
0.506
0.494
0.502
0.400
0.796
0.804
0.702
0.698
1.000
0.000
0.302
0.298
0.196
0.204
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
measure u2 are conceptually trivial and computationally simple. Just recall
that when we calculate S̄(u1) by Algorithm 6.1 (or in any other way), we obtain,
as a by-product, a probability distribution function for which the Shannon
entropy is equal to S̄(u1).This probability distribution function is thus the probabilistic approximation of the uncertainty function u1 based on the principle
of uncertainty invariance and the use of functional S̄.
EXAMPLE 9.12. Consider the lower and upper probabilities m and m̄, defined
in Table 9.10. It can be easily checked that m is a l-measure for l = 5. Applying Algorithm 6.1 to m , we obtain S̄( m ) = 1.971 and, as a by-product, the probability distribution
p = · 0.302, 0.298, 0.196, 0.204Ò.
Probability measure Pro based on this distribution (also shown in Table 9.10)
thus can be viewed as a probabilistic approximation of m for which
S (m ) = S (Pro)(= S(p)).
Although probabilistic approximations based on invariance of the aggregated
total uncertainty S̄ are computationally simple, their utility is questionable due
to the notorious insensitivity of this uncertainty measure. When either GH or
GS is employed instead of S̄, then, clearly, these approximations must be
handled differently, as is illustrated in the next section.
390
9. METHODOLOGICAL ISSUES
9.5.2. Probability–Possibility Transformations
Let the n-tuples p = ·p1, p2, . . . , pnÒ and r = ·r1, r2, . . . , rnÒ denote, respectively,
a probability distribution and a possibility profile on a finite set X with n or
more elements. Assume that these tuples are ordered in the same way and do
not contain zero components. Hence,
(a)
(b)
(c)
(d)
(e)
pi Œ (0, 1] and ri Œ (0, 1] for all i Œ ⺞n.
pi ≥ pi+1 and ri ≥ ri+1 for all i Œ ⺞n-1.
n
Âi =1 p = 1 (probabilistic normalization).
r1 = 1 (possibilistic normalization).
If n < |X|, then pi = ri = 0 for all i = n + 1, n + 2, . . . , |X|.
Assume further that values ri correspond to values pi for all i Œ ⺞n by some
scale, at least ordinal.
Assuming that p and r are connected by a scale of some type means that
certain properties (such as ordering or proportionality of values) are preserved when p is transformed to r or vice versa. Transformations between
probabilities and possibilities are thus very different under different types
of scales. Let us examine these fundamental differences for the five most
common scale types: ratio, difference, interval, log-interval, and ordinal scales.
1. Ratio Scales. Ratio-scale transformations p Æ r have the form ri = api
for all i Œ ⺞n, where a is a positive constant. From the possibilistic normalization (d), we obtain 1 = ap1 and, consequently,
ri =
pi
p1
for all i Œ ⺞n. For the inverse transformation, r Æ p, we have pi = ri/a. From
the probabilistic normalization (c), we get 1 = (r1 + r2 + · · · + rn)/a and,
consequently,
pi =
ri
r1 + r2 + ◊ ◊ ◊ + rn
for all i Œ ⺞n.
These transformations are clearly too rigid to preserve any other property
than the property of ratio scales, ri/pi = a or pi/ri = 1/a, for all i Œ ⺞n. Hence,
the principle of uncertainty invariance cannot for formulated in terms of the
ratio scales. Since there is no obvious reason why the ratios should be preserved, the utility of ratio-scale transformations between probabilities and
possibilities is questionable.
9.5. PRINCIPLE OF UNCERTAINTY INVARIANCE
391
2. Difference Scales. Difference-scale transformations use the form ri = pi
+ b for all i Œ ⺞n, where b is a positive constant. From the normalization
requirements (d) and (c), we readily obtain
ri = 1 - ( p1 - pi ),
pi = ri -
r2 + r3 + ◊ ◊ ◊ + rn
n
for all i Œ ⺞n, respectively. These transformations are as rigid as the ratio-scale
transformations and, consequently, their utility is severely limited for the same
reasons.
3. Interval Scales. Interval-scale transformations are of the form ri = api +
b for all i Œ ⺞n, where a and b are constant (a > 0). Determining the value of
b from the possibilistic normalization (d), we obtain
ri = 1 - a( p1 - pi )
(9.9)
for all i Œ ⺞n, and determining its value from the probabilistic normalization,
we obtain
pi =
ri - a r 1
+
a
n
(9.10)
for all i Œ ⺞n, where
ar =
1 n
 ri
n i=1
is the average value of possibility profile r. To determine a, we can now apply
the principle of uncertainty invariance by requiring that the amounts of uncertainty associated with p and r be equal. While the uncertainty in p is uniquely
measured by the Shannon entropy S(p), there are, at least in principle, three
options for r: GH(r), GS(r), S̄(r). The most sensible choice seems to be GH,
which is supported at least by the following three arguments: (1) GH is a
natural generalization of the Hartley measure, which is the unique uncertainty
measure in classical possibility theory; (2) in possibilistic bodies of evidence,
nonspecificity (measured by GH) is more significant than conflict (measured
by GS) and it dominates the overall uncertainty, especially for large sets of
alternatives; and (3) the remaining option, S̄, is overly insensitive. It is thus
reasonable to express the requirement of uncertainty invariance by the
equation
S (p) = GH (r).
(9.11)
392
9. METHODOLOGICAL ISSUES
This equation may help, in principle, to determine the value of a in Eqs. (9.9)
and (9.10) for which the probability–possibility transformation, p ´ r, is uncertainty-invariant in the given sense. However, it is known (see Note 9.5) that
no value of a exists for the some possibility profiles that satisfy Eq. (9.11). It
is also known that we do not encounter this limitation under a different, but
closely connected type of scales—the log-interval scales. It is thus reasonable
to abandon interval scales and formulate probability–possibility transformations in terms of the more satisfactory log-interval scales.
4. Log-Interval Scales. Log-interval scale transformations have the form
ri = bpai for all i Œ ⺞n, where a and b are positive constants. Determining
the value of b from the possibilistic normalization (d), we obtain
ri =
Ê pi ˆ
Ë p1 ¯
a
(9.12)
for all i Œ ⺞n. The value of a is then determined by applying Eq. (9.11), which
expresses the requirement of uncertainty invariance. This equation assumes
the form
a
n
Ê
S(p) = Â
pi ˆ
i
log 2
,
Ë
¯
p
i
1
1
i=2
(9.13)
where S(p) is the Shannon entropy of the given probability distribution p.
After solving the equation (numerically) and applying the resulting value of
a to Eq. (9.12), we obtain the desired possibility profile r, one that is connected
to p via a log-interval scale and contains the same amount of uncertainty as p
in the sense of Eq. (9.11).
For the inverse transformation, from r to p, we determine the value of b via
the probabilistic normalization (c), and obtain
ri1 a
s
pi =
(9.14)
for all i Œ ⺞n, where
n
s = Â rk1 a .
k =1
Equation (9.11) now assumes the form
n
GH (r) = -Â
i =1
ri1 a
r1 a
log 2 i ,
s
s
(9.15)
9.5. PRINCIPLE OF UNCERTAINTY INVARIANCE
393
where GH(r) is the generalized Hartley measure of the given possibility profile
r. Solving this equation for a and applying the solution to Eq. (9.14) results in
the sought probability distribution p.
Figure 9.8 is a convenient overview of the procedures involved in probability–possibility transformation that are uncertainty-invariant (in the sense of
Eq. (9.11)) and are based on log-interval scales.
EXAMPLE 9.13. Let p = ·0.7, 0.2, 0.075, 0.025Ò. The uncertainty invariant loginterval scale transformation of p into r is determined as follows. First, we calculate S(p) = 1.2379. Then, we solve Eq. (9.13), which assumes the form
1.2379 = 0.2857a + 0.585 ¥ 0.1071a + 0.415 ¥ 0.0357a .
We obtain (by a numerical method) a = 0.2537. Using Eq. (9.12), we calculate
components of the desired possibility profile r = ·1, 0.728, 0.567, 0.429Ò.
EXAMPLE 9.14. Let r = ·1, 0.7, 0.5Ò. To determine p connected to r via a loginterval scale and the principle of uncertainty invariance expressed by Eq.
(9.15), we need to calculate GH(r) = 0.99281 first. The equation then assumes
the form
1
1 0.71 a
0.71 a 0.51 a
0.51 a
0.99281 = - log 2 log 2
log 2
,
s
s
s
s
s
s
where
s = 1 + 0.71 a + 0.51 a .
Solving this equation results in a = 0.262215. Applying this value to Eq. (9.14),
we readily obtain
p = · 0.753173, 0.193264, 0.0535633Ò.
Observe that S(p) = 0.992482, which is virtually the same as GH(r), except for
a tiny difference at the sixth decimal place due to rounding.
Let us now take the resulting probability distribution p as input and convert
it to the associated possibility profile by the same variation of the principle of
uncertainty invariance and log-interval scales. We now apply Eq. (9.13), which
has the form
a
0.992482 =
a
Ê p2 ˆ
Ê p3 ˆ
+
log 2 1.5.
Ë p1 ¯
Ë p1 ¯
394
9. METHODOLOGICAL ISSUES
From probabilities to possibilities: pi
ri
p
Eg. (9.12):
a
( pi )
ri =
1
p
Eg. (9.13)
i
S(p)
Eg. (9.11):
Eg. (9.14):
p=
i
ri
n
=
GH(r)
ri
1/a
Eg. (9.15)
1/a
 rk
k=1
From possibilities to probabilities: r i
p
i
Figure 9.8. Uncertainty-invariant probability–possibility transformations based on log-interval
scales.
Its solution is a = 0.262215, which, as expected, is the same value as the one
obtained for the inverse transformation r Æ p. Applying this value to Eq.
(9.12), we readily obtain
r = ·1, 0.7, 0.5Ò,
which is the same as the initial possibility profile in this example. This again
conforms to our expectation.
5. Ordinal Scales. Scales of one additional type are applicable to uncertainty-invariant transformations between probabilities and possibilities. These
are known as ordinal scales. As the name suggests, ordinal scales are only
required to preserve the ordering of components in the n-tuples p and r. Their
9.5. PRINCIPLE OF UNCERTAINTY INVARIANCE
395
form is ri = f(pi) for all i Œ ⺞n, where f is a strictly increasing function. Contrary to uncertainty-invariant transformations p ´ r based on log-interval
scales, those based on ordinal scales are in general not unique. This is not necessarily a disadvantage, since the additional degrees of freedom offered by
ordinal scales allow us to satisfy various additional requirements.
EXAMPLE 9.15. As a simple example to illustrate probability–possibility
transformations under ordinal scales, let the probability distribution
p = · 0.4, 0.15, 0.15, 0.1, 0.1, 0.1Ò
be given and let the aim be to determine the corresponding possibility profile
r by using the principle of uncertainty invariance under ordinal scales. Then,
r = ·r1 = 1, a, a, b, b, bÒ,
where a = f(0.15) and b = f(0.1). Now applying the principle of uncertainty
invariance, we obtain the equation
S(p) = (a - b) log 2 3 + b log 2 6
= a log 2 3 + b.
This means that we still have one remaining degree of freedom. Choosing, for
example, a as the free variable, we have
b = S(p) - a log 2 3.
(9.16)
Since it is required that b £ a, we obtain from the last equation the inequality
S (p) - a log 2 3 £ a,
which can be rewritten as
a ≥
S (p)
.
1 + log 2 3
Since it is also required that a £ 1, the range of a is defined by the inequalities
S (p)
£ a £ 1.
1 + log 2 3
The set of all possibility profiles r for which
GH (r) = S(p)
(9.17)
396
9. METHODOLOGICAL ISSUES
1.0
Maximum c
Log-interval scale
0.9
b
0.8
0.7
0.92
0.9
0.908
0.96
0.94
0.98
1.0
a
Figure 9.9. The sets of possibility profiles derived in Example 9.15.
is thus characterized by the set of all pairs that satisfy Eq. (9.16) within the
restricted range of a defined by inequality (9.17). For the given probability distribution p, the set of possibility profiles r for which GH(r) = S(p) = 2.346 is
characterized by equation
b = 2.346 - 1.585a,
where a Œ [0.908, 1]. This set is shown in Figure 9.9. Also shown in the figure
is the possibility profile based on log-interval scales and the one for which the
probability–possibility consistency index c, defined for any p and r by the
formula
n
c = Â pi ri ,
(9.18)
i =1
reaches its maximum. Observe that each value of a in the given range defines
an ordinal scale characterized by a function fa. For example, when a = 0.95,
then fa(0.4) = 1, fa(0.15) = 0.95, and fa(0.1) = 0.84; moreover GH(fa(p)) = 2.346.
Now consider a generalization of Example 9.15, in which the given probability distribution contains d distinct values (d £ n). Then, clearly, there are
d - 2 free variables, since we have two equations for d distinct values in r:
r1 = 1 (possibilistic normalization) and GH(r) = S(p). In Example 9.15, d = 3,
9.5. PRINCIPLE OF UNCERTAINTY INVARIANCE
397
and hence, there is only one free variable. The next example illustrates a generalization to d = 4.
EXAMPLE 9.16. Given the probability distribution
p = · 0.5, 0.1, 0.1, 0.1, 0.05, 0.05, 0.02, 0.02, 0.02, 0.02, 0.02Ò,
the corresponding possibility profiles,
r = ·ri i Œ⺞ 11Ò,
based on ordinal scales can be expressed in the form
r = ·b1 = 1, b2 , b2 , b2 , b3 , b3 , b4 , b4 , b4 , b4 , b4 Ò
where b1, b2, b3, b4 represent distinct values of possibilities. Now, the principle
of uncertainty invariance is expressed by the equation
S(p) = (b2 - b3 ) log 4 4 + (b3 - b4 ) log 2 6 + b4 log 211.
One of the three variable (say b4) can be expressed in terms of the other two
variables, chosen as free variables. After calculating S(p) = 2.493 and inserting
it into the equation, we obtain
b4 = 2.852 - 2.288 b2 - 0.669b3 .
The ordinal-scale transformation is characterized by the equalities
1 - b2 ≥ 0
b2 - b3 ≥ 0
b3 - b4 ≥ 0
b4 ≥ 0.
After expressing b4 in the last two inequalities in terms of b2 and b3, we obtain
the following set of four inequalities regarding the two free variables b2 and
b3, which characterize the set of all possibility profiles whose nonspecificity is
equal to S(p):
1 - b2 ≥ 0
(a)
b2 - b3 ≥ 0
(b)
2.288b2 + 1.669b3 - 2.852 ≥ 0
(c)
-2.288b2 - 0.669b3 + 2.852 ≥ 0.
(d)
This set is illustrated visually by the shaded area in Figure 9.10. The probability–possibility index c defined by Eq. (9.18) reaches its maximum at b2 = b3 =
398
9. METHODOLOGICAL ISSUES
1
0.964, 0.964 : maximum c
(d)
0.9
1, 0.843
(b)
0.8
0.721, 0.721
0.778, 0.698
0.7
Log-interval scale
b3
0.6
(a)
0.5
(c)
0.4
1, 0.338
0.3
0.5
0.6
0.8
0.7
0.9
1
b2
Figure 9.10. Characterization of the set of possibility profiles derived in Example 9.16.
0.964. The log-interval scaling transformation is obtained for b2 = 0.778 and
b3 = 0.698 (a = 0.156122).
The examined examples clearly demonstrate that uncertainty–invariant
transformations from probabilities to possibilities under ordinal scales can be
computed without any insurmountable difficulties. However, the complexity
of these computations grows extremely rapidly with the number of distinct
values in the given probability distributions. The opposite transformations
from possibilities to probabilities are more difficult. The main obstacle is
Eq. (9.11). It is, in these transformations, a nonlinear equation, that makes any
symbolic treatment virtually impossible. Hence, we need to determine numerically all combinations of values of d - 1 unknowns (one of the unknowns is
eliminated by the probabilistic normalization) that satisfy the equation and all
the inequalities characterizing ordinal scales. More research is needed to deal
with this particular problem.
399
9.5. PRINCIPLE OF UNCERTAINTY INVARIANCE
9.5.3. Approximations of Belief Functions by Necessity Functions
In this section, the uncertainty–invariance principle is illustrated by the
problem of approximating given belief functions of DST, by necessity functions of the theory of graded possibilities. The primary reason for pursuing
these approximations is to reduce computational complexity. It is assumed that
the uncertainty to be preserved is the total aggregated uncertainty S̄. That is,
the principle is expressed by the equation
S (Bel ) = S (Nec),
(9.19)
where Bel and Nec denote, respectively, a given belief function and a necessity function that is supposed to approximate Bel. It is assumed that Bel and
Nec are defined on P(X ).
In addition to Eq. (9.19), it is also required that Bel and Nec satisfy the
inequality
Nec( A) £ Bel ( A)
(9.20)
for all A Œ P(X). The motivation for using this requirement stems from the
fact that belief functions are more general than necessity functions. Therefore,
they are capable of expressing given evidence more precisely. This in turn,
means that the set containment
[Bel ( A), Pl ( A)] Õ [Nec( A), Pos( A)]
should hold for all A Œ P(X) when Nec approximates Bel. Functions Bel and
Nec that are defined on the same universal set and satisfy inequality (9.20) are
said to be consistent.
The approximation problem addressed in this section is thus formulated as
follows: Given a belief function, Bel, determine a necessity function, Nec, that
satisfies the requirements (9.19) and (9.20). The class of necessity functions
that satisfy these requirements is characterized by the following theorem.
Theorem 9.1. Let Bel denote a given belief function on X (with |X| = n), and
s
let {Ai}i=1
(with s Œ ⺞n) denote the partition of X consisting of the sets Ai chosen
by Step 1 of Algorithm 6.1 in the ith pass through the loop of Steps 1 to 5
during the computation of S̄(Bel). A necessity function Nec satisfies the
requirements (9.19) and (9.20) if and only if
Nec (U ij=1 A j ) = Bel (U ij=1 A j )
(9.21)
for each i Œ ⺞s.
Proof. [Harmanec and Klir, 1997].
䊏
400
9. METHODOLOGICAL ISSUES
We see from the theorem that, unless s = n and |Ai| = 1 for all i Œ ⺞s, the
solution to the approximation problem is not unique. The question is what criteria should be used to choose one particular approximation.Two criteria seem
to be most natural. According to one of them, we choose the necessity function with the maximum nonspecificity; according to the other one, we choose
the necessity function that is in some sense closest to the given belief function. Choices based on the first criterion are addressed by the following
theorem.
Theorem 9.2. Among the necessity functions that satisfy Eq. (9.21) for given
belief function Bel, the one that maximizes nonspecificity is given by the
formula
Nec(B) = Bel
(U
i
j =1
Aj
)
for all B Œ P(X), where i is the largest integer such that
assumed, by convention, that
U
0
j=1
(9.22)
i
U
j =1
Aj Õ B , and it is
Aj = ∆ .
Proof. [Harmanec and Klir, 1997].
䊏
EXAMPLE 9.17. To illustrate the meaning of Eq. (9.22), consider the belief
function Bel defined in Table 9.11. When applying Algorithm 6.1 to this belief
function, we obtain
Bel ( A1 ) = Bel ({a}) = 0.3,
Bel ( A2 ) = Bel ({b, c}) = 0.5,
Bel ( A3 ) = Bel ({d}) = 0.2.
Hence, the unique necessity function, Nec1, that approximates Bel (i.e., that
satisfies requirements (9.19) and (9.20)) and maximizes nonspecificity is the
one specified in Table 9.11, together with the associated functions Pos1 and m1.
This approximation may conveniently be represented by the possibility profile
r1 = ·1, 0.7, 0.7, 0.2Ò.
In order to apply the second criterion operationally, we need a meaningful
way of measuring the closeness of a necessity function to a given belief function. For this purpose, it is reasonable to use functional DBel defined by the
formula
DBel (Nec ) =
Â
A ŒP ( X )
[ Bel ( A) - Nec( A)]
(9.23)
for each necessity function Nec that is consistent with a given belief function
Bel. We want to minimize DBel(Nec) for all necessity functions that are consistent with Bel and satisfy Eq. (9.19) or, according to Theorem 9.1, satisfy Eq.
(9.21).
Table 9.11. Necessity Approximations of a Belief Function (Examples 9.17 and 9.18)
Given
X
A:
Approximation 1
Approximation 2
a
b
c
d
Bel(A)
Pl(A)
m(A)
Nec1(A)
Pos1(A)
m1(A)
Nec2(A)
Pos2(A)
m2(A)
0
1
0
0
0
1
1
1
0
0
0
1
1
1
0
1
0
0
1
0
0
1
0
0
1
1
0
1
1
0
1
1
0
0
0
1
0
0
1
0
1
0
1
1
0
1
1
1
0
0
0
0
1
0
0
1
0
1
1
0
1
1
1
1
0.0
0.3
0.0
0.1
0.0
0.4
0.4
0.5
0.4
0.0
0.1
0.8
0.6
0.6
0.4
1.0
0.0
0.6
0.4
0.4
0.2
0.9
1.0
0.6
0.5
0.6
0.6
1.0
0.9
1.0
0.7
1.0
0.0
0.3
0.0
0.1
0.0
0.1
0.0
0.2
0.3
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.3
0.0
0.0
0.0
0.3
0.3
0.3
0.0
0.0
0.0
0.8
0.3
0.3
0.0
1.0
0.0
1.0
0.7
0.7
0.2
1.0
1.0
1.0
0.7
0.7
0.7
1.0
1.0
1.0
0.7
1.0
0.0
0.3
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.5
0.0
0.0
0.0
0.2
0.0
0.3
0.0
0.0
0.0
0.4
0.3
0.3
0.0
0.0
0.0
0.8
0.4
0.3
0.0
1.0
0.0
1.0
0.7
0.6
0.2
1.0
1.0
1.0
0.7
0.7
0.6
1.0
1.0
1.0
0.7
1.0
0.0
0.3
0.0
0.0
0.0
0.1
0.0
0.0
0.0
0.0
0.0
0.4
0.0
0.0
0.0
0.2
401
402
9. METHODOLOGICAL ISSUES
Observe that Eq. (9.23) may be rewritten as
DBel (Nec) =
Â
A ŒP ( X )
Bel ( A) -
Â
A ŒP ( X )
Nec( A).
Since SAŒP(X)Bel(A), Nec(⭋), and Nec(X) are constants, minimizing DBel is
equivalent to maximizing the expression
Â
A ŒP ( X ) - {∆ , X }
Nec( A)
(9.24)
over all necessity functions Nec that satisfy Eq. (9.21).
EXAMPLE 9.18. Considering again the belief function defined in Table 9.11,
let Nec2 denote a necessity function that satisfies Eq. (9.21) and maximizes the
expression (9.24). To satisfy Eq. (9.21) in this case means that Nec2({a}) = 0.3,
Nec2({a, b, c}) = 0.8, and Nec2(X) = 1. Due to the nested structure of focal
subsets in possibility theory, Nec2({b, c}) = 0 and either Nec2({a, b}) = 0.3 and
Nec2({a, c}) Œ [0.3, 0.4] or Nec2({a, b}) Œ [0.3, 0.4] and Nec2({a, c}) = 0.3. The
maximum of expression (9.24) is clearly obtained for either Nec2({a, c}) = 0.4
and Nec2({a, b}) = 0.3 or Nec2({a, b}) = 0.4 and Nec2({a, c}) = 0.3. There are thus
two minima of DBel in this case; the second one is shown in Table 9.11.
9.5.4. Transformations Between l-Measures and Possibility Measures
A common property of l-measures and possibility measures is that they are
fully represented by their values on singletons. It is thus reasonable to consider uncertainty–invariant transformations between l-measures and possibility measures by connecting the associated values on singletons via scales of
some type.
Considering a universal set X with n elements, let
r = ·ri i Œ⺞n Ò
denote an ordered possibility profile on X with ri ≥ ri+1 for all i Œ ⺞n-1, and let
l
m = · l m i i Œ⺞n Ò
denote the associated profile of a l-measure on X with l < 0 (i.e., the
l-measure represents an upper probability function) and lmi ≥ lmi+1 for all
i Œ ⺞n-1. Then, components of r and lm can be connected by scales of some
type.
As an illustration of these transformations, let us examine transformations
from lm to r under log-interval scales and under the requirement
GS ( l m) = GH (r).
(9.25)
9.5. PRINCIPLE OF UNCERTAINTY INVARIANCE
403
In this framework,
Ê l mi ˆ
ri = Á l ˜
Ë m1 ¯
a
(9.26)
for all i Œ ⺞n. These equations are derived in a way similar to Eq. (9.12) for
transformations from probabilities to possibilities. Equation (9.25) assumes
the form
a
i
Ê l mi ˆ
GS( m) = Â Á l ˜ log 2
,
Ë
¯
i
1
m1
i=2
n
(9.27)
l
where GS(lm) and lmi for all i Œ ⺞n are given and the scaling parameter a is to
be determined. Once the value of a is determined, components of r are calculated by Eq. (9.26).
EXAMPLE 9.19. Let lm = ·0.576, 0.35, 0.32, 0.1Ò. Then, l = -0.625 and the lmeasure, which represents an upper probability function, is fully determined
by the l-rule. For this l-measure, we obtain S̄(lm) = 1.890, GH(lm) = 0.329, and
GS(lm) = 1.561 (Exercise 9.19). Equation (9.27) has the form
a
1.561 =
a
a
Ê 0.35 ˆ
Ê 0.32 ˆ
Ê 0.1 ˆ
+
+
.
Ë 0.576 ¯
Ë 0.576 ¯
Ë 0.576 ¯
The solution is a = 0.333. Substituting this value into Eq. (9.26), we readily
obtain
r = ·1, 0.847, 0.822, 0.558Ò.
Clearly, S̄(r) = 2, GH(r) = 1.560 (the small difference from GS(lm) at the third
decimal place is due to rounding errors), and GS(r) = 0.440.
The associated pairs of l-measures and possibility measures, whose profiles
are connected via log-interval scales and satisfy Eq. (9.25), may be viewed as
complementary representations of uncertainty. While the l-measure in each
pair represents uncertainty primarily in terms of GS, the associated possibility measure represents it primarily in terms of GH.
The inverse transformations, from possibility measures to l-measures, are
more difficult, primarily due to the difficulty of formulating Eq. (9.25). They
require further research.
9.5.5. Approximations of Graded Possibilities by Crisp Possibilities
Interesting applications of the principle of uncertainty invariance are approximations of graded possibilities by crisp possibilities. These approximations are
404
9. METHODOLOGICAL ISSUES
especially useful when graded possibilities are interpreted in terms of fuzzy
sets. Approximating fuzzy sets that result from approximate reasoning by crisp
sets is often desirable, since the latter are easier to comprehend. A crisp
approximation of a fuzzy set also can be utilized as an intermediary step in its
defuzzification.
Recall that each basic function r of graded possibilities on X is associated
with a family of basic functions of crisp possibilities, {ar | a Œ (0, 1]}, where
a
Ï1
r (x) = Ì
Ó0
when r (x) ≥ a
otherwise
for all x Œ X. In general, any function in this family may be taken as a crisp
approximation of r. However, according to the principle of uncertainty invariance, we should take the one for which the amount of nonspecificity is the
same as for r. When X is finite, this requirement is expressed by the equation
1
Ú log
2
a
r da = log 2 a r ,
(9.28)
0
where |ar| denotes the number of possible alternatives according to function
a
r. When X is infinite, the requirement is expressed by the alternative
equation
1
Ú UL( r) da = UL( r),
a
a
(9.29)
0
where UL is defined by Eq. (6.26).
EXAMPLE 9.20. To illustrate the application of Eq. (9.28), consider function
r defined in Figure 6.1, whose nonspecificity is 2.99 (according to the calculation in Example 6.1). Hence, Eq. (9.28) has the form
2.99 = log 2 a r
and we need to solve it for ar. Values of log2|ar| for all values a that are distinguished in this example are shown in Table 9.12. We can see that none of
the values is exactly equal to 2.99 (due to discreteness of function r) and there
are two of them that are equally close to 2.99 (for a = 0.6 and 0.7) which are
indicated in Table 9.12 in boldface.
EXAMPLE 9.21. To illustrate the application of Eq. (9.29), first consider the
case of X = ⺢. Let
9.5. PRINCIPLE OF UNCERTAINTY INVARIANCE
Table 9.12. Illustration to Example 9.21
a
|ar|
log2|ar|
0.1
0.3
0.4
0.6
0.7
0.9
1.0
15
12
11
9
7
5
3
3.91
3.59
3.46
3.17
2.81
2.32
1.58
1
0.8
0.6
0.4
0.2
0.5
1
1.5
2
2.5
3
3.5
4
Figure 9.11. Possibility profile in Example 9.22.
when x Œ[0, 1)
when x Œ[1, 2)
when x Œ(2, 4)
otherwise
Ïx 2
ÔÔ1
r ( x)Ì
2
Ô[2 - ( x 2)]
ÔÓ0
be a given possibility profile whose graph is shown in Figure 9.11. Then,
a
r = [ a ,4 - 2 a ]
and
1
1
Ú UL( r) da = Ú log
a
0
0
2
(5 - 3 a ) da
= 1.546.
405
406
9. METHODOLOGICAL ISSUES
Hence, Eq. (9.29) has the form
1.546 = log 2 (5 - 3 a ),
and its solution is a = 0.481. The crisp approximation of r is thus the closed
interval
[ 0.481, 4 - 2 0.481] = [ 0.694, 2.613].
EXAMPLE 9.22. This example illustrates the application of Eq. (9.29) when
r is defined on ⺢2. Let
r (x, y) = max{0, 1 - 2(x 2 + y 2 )}
be a given 2-dimensional possibility profile whose graph is shown in Figure
9.12a. Clearly ar defines points in a circle whose radius depends on a. Projections of r are functions
rX ( x) = max{0, 1 - 2 x 2 }
rY ( y) = max{0, 1 - 2 y 2 }.
Their graphs, which are identical, are shown in Figure 9.12b. From rX (as well
as rY), we can readily conclude that for each a Œ (0, 1] the radius of the circle
representing points of ar is (1 - a ) 2. We can also infer that
a
1-a
È 1-a
rX = a rY = Í ,
Î
2
2
˘
˙˚.
Using all these facts, we have
1-a
È
Ê 1-a ˆ˘
UL( a r ) = log 2 Í1 + 4
+p
Ë 2 ¯ ˙˚
2
Î
and
1
Ú UL( r) da = 1.796.
a
0
Hence, Eq. (9.29) assumes the form
1-a
È
Ê 1-a ˆ˘
1.796 = log 2 Í1 + 4
+p
,
Ë 2 ¯ ˙˚
2
Î
9.5. PRINCIPLE OF UNCERTAINTY INVARIANCE
407
1
0.75
1
0.5
0.5
0.25
0
–1
0
–0.5
–0.5
0
0.5
1
–1
(a)
1
0.8
0.6
0.4
0.2
0
–1
–0.5
0
0.5
(b)
Figure 9.12. Illustration to Example 9.22.
1
408
9. METHODOLOGICAL ISSUES
and its solution is a = 0.585. Crisp approximation of r is thus obtained for a =
0.585. This is the set of all points in the circle whose radius is
(1 - 0.585) 2 = 0.456 and whose center is in the origin of the coordinate
system.
NOTES
9.1. Thus far, the principle of minimum uncertainty has been employed predominantly
within the domain of probability theory, where the function to be minimized is the
Shannon entropy (usually some of its conditional forms) or another function based
on it (information transmission, directed divergence, etc.). The great utility of this
principle in dealing with a broad spectrum of problems is perhaps best demonstrated by the work of Christensen [1980–81, 1985, 1986]. Another important user
of the principle of minimum entropy is Watanabe [1981, 1985], who has repeatedly
argued that entropy minimization is a fundamental methodological tool in the
problem area of pattern recognition. Outside probability theory, the principle of
minimum uncertainty has been explored in reconstructability analysis of possibilistic systems [Cavallo and Klir, 1982b; Klir, 1985, 1990b; Klir et al., 1986, Klir
and Way, 1985; Mariano, 1997]; the function that is minimized in these explorations
is the U-uncertainty or an appropriate function based on it [Higashi and Klir,
1983b]. The use of the principle of minimum uncertainty for resolving local inconsistencies in systems was investigated for probabilistic and possibilistic systems by
Mariano [1985, 1987].
9.2. The principle of maximum entropy was founded, presumably, by Jaynes in the early
1950s [Rosenkrantz, 1983]. Perhaps the greatest skill in using this principle in a
broad spectrum of applications has been demonstrated by Christensen [1980–81,
1985, 1986], Jaynes [1979, 2003], Kapur [1989, 1994, 1994/1996], and Tribus [1969].
The literature concerned with this principle is extensive. The following are a few
relevant books of special significance: Batten [1983], Buck and Macaulay [1991],
Kapur and Kesavan [1987, 1992], Karmeshu [2003], Levine and Tribus [1979], Theil
[1967], Theil and Fiebig [1984], Webber [1979], Wilson [1970]. A rich literature
resource regarding research on the principle of maximum entropy, both basic
and applied, are edited volumes Annual Workshops on Maximum Entropy and
Bayesian Methods that have been published since the 1980s by several publishers,
among them Kluwer, Cambridge University Press, and Reidel.
9.3. The principle of maximum entropy has been justified by at least three distinct
arguments:
1. The maximum entropy probability distribution is the only unbiased distribution, that is, the distribution that takes into account all available information
but no additional (unsupported) information (bias). This follows directly from
the facts that
a. All available information (but nothing else) is required to form the constraints of the optimization problem, and
b. The chosen probability distribution is required to be the one that represents
the maximum uncertainty (entropy) within the constrained set of probability distributions. Indeed, any reduction of uncertainty is an equal gain of
NOTES
409
information. Hence, a reduction of uncertainty from its maximum value,
which would occur when any distribution other than the one with maximum
entropy were chosen would mean that some information from outside the
available evidence was unwittingly added.
This argument of justifying the maximum entropy principle is covered in the
literature quite extensively. Perhaps its best and most thorough presentation
is given in a paper by Jaynes [1979], which also contains an excellent historical survey of related developments in probability theory, and in a book by
Christensen [1980–81, Vol. 1].
2. It was shown by Jaynes [1968], strictly on combinatorial grounds, that the
maximum probability distribution is the most likely distribution in any given
situation.
3. It was demonstrated by Shore and Johnson [1980] that the principle of
maximum entropy can be deductively derived from the following consistency
axioms for inductive (or ampliative) reasoning:
Axiom (ME1) Uniqueness. The result should be unique.
Axiom (ME2) Invariance. The choice of coordinate system (permutation of variables) should not matter.
Axiom (ME3) System Independence. It should not matter whether one accounts
for independent systems separately in terms of marginal probabilities or together
in terms of joint probabilities.
Axiom (ME4) Subset Independence. It should not matter whether one treats an
independent subset of system states in terms of separate conditional probabilities
or together in terms of joint probabilities.
The rationale for choosing these axioms is expressed by Shore and Johnson
[1980] as follows: Any acceptable method of inference must be such that different
ways of using it to take the same information into account lead to consistent
results. Using the axioms, they derive the following proposition: Given some information in terms of constraints regarding the probabilities to be estimated, there
is only one probability distribution satisfying the constraints that can be chosen
by a method that satisfies the consistency axioms; this unique distribution can be
attained by maximizing the Shannon entropy (or any other function that has
exactly the same maxima as the entropy) subject to the given constraints. Alternative derivations of the principle of maximum entropy were demonstrated by
Smith [1974], Avgers [1983], and Paris and Vencovská [1990].
The principle of minimum cross-entropy can be justified by similar arguments.
In fact, Shore and Johnson [1981] derived both principles and showed that the
principle of maximum entropy is a special case of the principle of minimum crossentropy. The latter principle is further examined by Williams [1980], who shows
that it generalizes the well-known Bayesian rule of conditionalization. A broader
range of applications of the principle of minimum cross-entropy was developed
by Rubinstein and Kroese [2004].
In general, the principles of maximum entropy and minimum cross-entropy as
well as the various principles of uncertainty that extend beyond probability theory
are tools for dealing with a broad class of problems referred to as inverse problems or ill-posed problems [McLaughlin, 1984; Tarantola, 1987]. A common characteristic of these problems is that they are underdetermined and, consequently,
410
9. METHODOLOGICAL ISSUES
do not have unique solutions. The various maximum uncertainty principles allow
us to obtain unique solutions to underdetermined problems by injecting uncertainty of some type to each solution to reflect the lack of information in the formulation of the problem.
As argued by Bordley [1983], the assumption that “the behavior of nature can
be described in such a way as to be consistent” (so-called consistency premise) is
essential for science to be possible. This assumption, when formulated in a particular context in terms of appropriate consistency axioms, leads to an optimization
principle. According to Bordley, the various optimization principles, which are
exemplified by the principles of maximum entropy and minimum cross-entropy,
are central to science.
9.4. The principle of maximum nonspecificity was first conceived by Dubois and Prade
[1987b], who also demonstrated its significance in approximate reasoning based
on possibility theory [Dubois and Prade, 1991]. Applications of the principle were
developed in the area of image processing by Wierman [1994] and in the area of
fuzzy control by Padet [1996].
9.5. The principle of uncertainty invariance was introduced in the context of probability–possibility transformations by Klir [1989b, 1990a]. These transformations
were mathematically investigated by Geer and Klir [1992] and Harmanec and
Klir [1997] for the discrete case, and by Wonneberger [1994] for the continuous
case of n dimensions. It follows from the work of Geer and Klir [1992] that the
existence of transformations based on the principle of uncertainty invariance
expressed by Eq. (9.11) is not guaranteed under interval scales, but it is guaranteed under log-interval scales. Probability–possibility transformations were also
investigated experimentally (by a series of simulation experiments) and compared
with other probability–possibility transformations in terms of several criteria [Klir
and Parviz, 1992]. Approximations of belief measures by necessity measures by
preserving S̄ were studied by Harmanec and Klir [1997].
9.6. The material presented in Section 9.5.3 is based on a paper by Harmanec and Klir
[1997]. Many details presented in the paper are omitted in Section 9.5.3, in particular proofs of theorems and an efficient algorithm to facilitate relevant
computations.
9.7. One methodological area that exemplifies the utility of the principles of uncertainty discussed in this chapter is known in the literature as reconstructability
analysis (RA). The purpose of RA is to deal in information–theoretic terms with
two broad classes of problems that are associated with the relationship between
overall systems and their various subsystems: identification problems and reconstruction problems. In identification problems, a set of subsystems is given and the
aim is to make meaningful inference about the associated overall system from
information in the subsystems and, possibly, some additional background information. This may involve the use of the principle of minimum uncertainty to
resolve local inconsistencies among the given subsystems, the use of the principle
of maximum uncertainty to identify one particular overall system that is formalized within the same theory as the subsystems, or the use of the principle of requisite generalization by identifying the family of all overall systems that are
consistent with the given information, and moving thus to a more general theory
that is capable of representing this family. In reconstruction problems, an overall
system is given and the aim is to investigate, in a systematic way, which sets of sub-
EXERCISES
411
systems at each complexity level preserve most of the information contained in
the overall systems. This requires determining how accurately the overall system
can be reconstructed from each considered sets of subsystems.
Reconstruction problems and identification problems of RA were first recognized in terms of n-dimensional relations (i.e., within the classical possibility
theory) by Ashby [1964] and Madden and Ashby [1972], respectively. They were
further investigated within probability theory by Klir [1976, 1985, 1986a,b, 1990b],
Broekstra [1976 –77], Cavallo and Klir [1979, 1981, 1982a], Jones [1982, 1985, 1986],
and Conant [1988], and within the theory of graded possibilities by Cavallo and
Klir [1982b], Higashi et al. [1984], and Klir et al. [1986].
The few references on RA given in the previous paragraph cover only some of
the early, historically significant contributions to RA. The literature on RA is now
too extensive to be covered here more completely. However, RA is a broad and
important application area of GIT and, therefore, it is appropriate to refer at least
to key sources of further information on RA:
• Overview articles of RA: [Klir and Way, 1985; Pittarelli, 1990; Zwick, 2004].
• Special Issues on RA:
—International Journal of General Systems, 7(1), 1981, pp. 1–107;
—International Journal of General Systems, 29(3), 2000, pp. 357–495;
—Kybernetes, 33(5–6), 2004, pp. 873–1062.
• Bibliographies of RA: International Journal of General Systems, 7(1), 1981;
24(1–2), 1996.
EXERCISES
9.1. Consider the relation defined in Table 2.2 and assume that variables x1,
x2, x3 are input variables and variable x4 is an output variable. Using the
principle of minimum uncertainty, determine which of the three input
variables can be excluded to lose the least amount of information.
9.2. Consider the probabilistic systems in Table 3.3 and assume in each case
that x and y are input variables and z is an output variable. Using the
principle of minimum uncertainty, determine which of the two input
variables is preferable to exclude in each case.
9.3. Consider the joint possibility profile in Figure 5.5a and assume that X
and Y are states of an input variable and an output variable, respectively.
Using the principle of minimum uncertainty, determine the best quantization of the input variable with:
(a) Four quantized states;
(b) Three quantized states.
9.4. Let a finite set of states contains n states, where n ≥ 2. How many meaningful quantizations exist for each n = 2, 3, . . . , 10, provided that:
(a) The states are totally ordered;
(b) The states are not ordered.
412
9. METHODOLOGICAL ISSUES
9.5. Let sets X and Y in Table 9.13 denote, respectively, state sets of an input
variable x and an output variable y. The dependence of the output variable on the input variable is expressed in Table 9.13a, b, c in terms of
crisp possibilities, graded possibilities, and probabilities, respectively.
Using the principle of minimum uncertainty, determine in each case the
set of all admissible simplifications obtained by quantizing states of the
input variable, provided that
(a) The states are totally ordered;
(b) The states are not ordered.
9.6. Using the principle of minimum uncertainty, resolve the local inconsistency of the two probabilistic subsystems defined in Table 9.14 that form
an overall system with variables x, y, z.
9.7. Consider a finite random variable x that takes values in the set X. It is
known that the mean (expected) value of the variable is E(x). Estimate,
using the maximum entropy principle, the probability distribution on X
provided that:
(a) X = ⺞2 and E(x) = 1.7 (a biased coin if 1 and 2 represent, for example,
the head and the tail, respectively);
(b) X = ⺞6 and E(x) = 3 (a biased die);
(c) X = ⺞10 and E(x) = 4;
(d) X = ⺞10 and E(x) = 6.
Table 9.13. Illustration to Exercise 9.5
r(x,y)
X
0
1
2
3
Y
r(x,y)
0
1
2
1
1
1
0
0
0
1
0
1
0
0
1
X
Y
0
1
2
3
p(x,y)
0
1
2
0.8
1.0
1.0
0.0
0.2
0.4
0.8
0.5
1.0
0.2
0.3
1.0
(a)
X
0
1
2
3
Y
0
1
2
0.02
0.30
0.01
0.00
0.05
0.10
0.01
0.00
0.10
0.20
0.01
0.20
(b)
(c)
Table 9.14. Illustration to Exercise 9.6
x
y
0
0
1
1
0
1
0
1
1
p(x,y)
y
z
0.4
0.3
0.2
0.1
0
0
1
1
0
1
0
1
2
p(y,z)
0.3
0.2
0.4
0.1
EXERCISES
413
9.8. Use the maximum entropy principle to derive a formula for the joint
probability distribution function, p, under the assumption that only marginal probability distributions, pX and pY, on finite sets X and Y are
known.
9.9. Consider a universal set X, four subsets of which are of interest to us: A
« B, A « C, B « C, and A « B « C. The only evidence we have is
expressed in terms of DST by the equations
m( A « B) + m( A « B « C ) = 0.2
m( A « C ) + m( A « B « C ) = 0.5
m(B « C ) + m( A « B « C ) = 0.1
Using the maximum nonspecificity principle, estimate the values of
m(A « B), m(A « C), m(B « C), and m(A « B « C), and m(X).
9.10. Repeat Example 9.8 with the following numerical values:
(a) x1 = 0.5; x2 = 0.2; y1 = 0.6; y2 = 0.14
(b) x1 = 0.6; x2 = 0.7; y1 = 0.4; y2 = 0.84
(c) x1 = 0.9; x2 = 0.8; y1 = 0.6; y2 = 0.96
9.11. Construct the cylindric closure in Example 9.9 by using the operation of
relational join defined by Eq. (7.34).
9.12. Show that the function m defined in Table 9.9 is a monotone and superadditive measure, but it is not 2-monotone. Identify all its violations of
2-monotonicity.
9.13. For each of the following probability distributions, p, determine the corresponding possibility profile, r, by the transformations summarized in
Figure 9.8:
(a) p = ·0.5, 0.3, 0.2Ò
(b) p = ·0.3, 0.2, 0.2, 0.1, 0.1, 0.1Ò
(c) p = ·0.30, 0.20, 0.15, 0.10, 0.10, 0.05, 0.03, 0.03, 0.02, 0.02Ò
(d) p = ·0.2, 0.2, 0.2, 0.2, 0.2Ò
9.14. For each of the following possibility profiles, r, determine the corresponding probability distribution, p, by the transformations summarized
in Figure 9.8:
(a) r = ·1, 0.8, 0.4Ò
(b) r = ·1, 1, 0.6, 0.6, 0.2, 0.2, 0.2Ò
(c) r = ·1, 0.9, 0.7, 0.65, 0.6, 0.5, 0.35, 0.2Ò
(d) r = ·1, 1, 1, 1, 1Ò
9.15. For some cases in Exercises 9.13 and 9.14, show the inverse transformation, which should result in the same tuple from which you started.
414
9. METHODOLOGICAL ISSUES
9.16. Show that the maximum of the probability–possibility consistency index
c defined by Eq. (9.18) is obtained for:
(a) a = 0.908 in Example 9.15
(b) b2 = b3 = 0.964 in Example 9.16
9.17. Determine which of the possibility profiles in Examples 9.15 and 9.16
are based on log-interval scale.
9.18. Apply the principle of uncertainty invariance expressed by Eq. (9.11) to
transform each of the following probability distributions to the corresponding possibility profile via ordinal scales:
(a) p = ·0.5, 0.3, 0.2Ò
(b) p = ·0.3, 0.2, 0.2, 0.1, 0.1, 0.1Ò
(c) p = ·0.2, 0.2, 0.1, 0.1, 0.1, 0.1, 0.1, 0.05, 0.05Ò
(d) p = ·0.4, 0.3, 0.2, 0.1Ò
9.19. Determine the l-measure lm in Example 9.19 and calculate the values
of S̄(lm), GH(lm), and GS(lm).
9.20. Using the principle of uncertainty invariance, determine the crisp
approximation of the following basic possibility function r defined on ⺢:
(a) r(x) = 1.25 max{0, 2-|3(x-3)| - 0.2}
(b) r(x) = max{0, min{1, 2.3x - x2}}
(c) r(x) = max{0, 0.2(x - 50) - 0.1(x - 50)2}
(d) r(x) = 2x - x2
when x Œ[0, 2)
Ïx 2
Ô3 - x
when x Œ[2, 2.5)
Ô
when x Œ[2.5, 4)
(e) r ( x) = Ì0.5
Ô(5 - x) 2
when x Œ[4, 5)
ÔÓ0
otherwise
9.21. Using the principle of uncertainty invariance, determine the crisp
approximation of the following basic possibility functions r defined on
⺢2:
2
2
(a) r(x) = e - x - y
(b) r(x) = max{0, min{1, 1.5 - x2 - y2}}
10
CONCLUSIONS
To be uncertain is uncomfortable,
but to be certain is ridiculous.
—Chinese Proverb
10.1. SUMMARY AND ASSESSMENT OF RESULTS IN GENERALIZED
INFORMATION THEORY
A turning point in our understanding of the concept of uncertainty was
reached when it became clear that there are several types of uncertainty. This
new insight was obtained by examining uncertainty emerging from mathematical theories more general than classical set theory and classical measure
theory. Recognizing more than one type of uncertainty raised numerous questions regarding the nature and scope of the very concept of uncertainty and
its connection with the concept of information. Studying these questions
evolved eventually into a new area of research, which became known in the
early 1990s as generalized information theory (GIT).
The principal aims of GIT have been threefold: (1) to liberate the notions
of uncertainty and uncertainty-based information from the narrow confines of
classical set theory and classical measure theory; (2) to conceptualize a broad
framework within which any conceivable type of uncertainty can be characterized; and (3) to develop theories for the various types of uncertainty that
emerge from the framework at each of the four levels: formalization, calculus,
measurement, and methodology. Undoubtedly, these are long-term aims, which
Uncertainty and Information: Foundations of Generalized Information Theory, by George J. Klir
© 2006 by John Wiley & Sons, Inc.
415
416
10. CONCLUSIONS
may perhaps never be fully achieved. Nevertheless, they serve well as a
blueprint for a challenging, large-scale research program. Significant results
emerging from this research program can readily be identified by scanning
material covered in this book, especially in Chapters 4–6, 8, and 9.They include:
(a) Calculi of several nonclassical theories of uncertainty are now well
developed. These are theories based on generalized (graded) possibility measures, Sugeno l-measures, Choquet capacities of order •, and
reachable interval-valued probability distributions.
(b) The nonclassical theories listed in part (a) have also been viewed and
studied as special theories subsumed under a more general theory of
imprecise probabilities. From this point of view, some common properties of the theories are recognized and utilized. They all are based on
a pair of dual measures—lower and upper probabilities—but may also
be represented in terms of closed and convex sets of probability distributions or functions obtained by the Möbius transform of lower or
upper probabilities. Numerous global results, not necessarily restricted
to the special theories listed in part (a), have already been obtained for
imprecise probabilities.
(c) The Hartley and Shannon functionals for measuring the amount of
uncertainty in classical theories of uncertainty have been adequately
generalized not only to the special theories listed in part (a) but also
to other theories dealing with imprecise probabilities.
(d) Only some limited efforts have been made thus far to fuzzify the
various uncertainty theories. They include a fuzzification of classical
probabilities to fuzzy events, a fuzzification of the theory based on
reachable interval-valued probability distributions, several distinct
fuzzifications of the Dempster–Shafer theory, and the fuzzy-set interpretation of the theory of graded possibilities.
(e) Some limited results have been obtained for formulating and using the
principles of minimum and maximum uncertainty within the various
nonclassical uncertainty theories. Two new principles emerged from
GIT: the principle of requisite generalization and the principle of
uncertainty invariance. Some of their applications are examined in
Sections 9.4 and 9.5, respectively.
Among the results summarized in the previous paragraphs, those listed
under parts (a)–(c) are the most significant and satisfactory. That is, we have
now several well-developed nonclassical uncertainty theories, which, in addition, are integrated into a more general theory of imprecise probabilities. Furthermore, we have well-justified generalizations of the classical Hartley and
Shannon functionals for measuring amounts of uncertainty of the two types
that coexist in each of the nonclassical theories: nonspecificity and conflict.
On the other hand, the results listed under parts (d) and (e) are considerably less satisfactory. Thus far, fuzzifications of uncertainty theories have been
10.2. MAIN ISSUES OF CURRENT INTEREST
417
very limited. It was proposed in an ad hoc fashion in most cases, and involved
only standard fuzzy sets. Similarly underdeveloped are the various principles
of uncertainty that are examined in Chapter 9. These principles are dependent
on the capability of measuring uncertainty. Unfortunately, this capability
became available only recently, when the well-justified generalizations of the
Hartley and Shannon functionals were established for the various theories of
imprecise probabilities. (Recall the difficulties in generalizing the Shannon
functional, which are discussed in Chapter 6.)
10.2. MAIN ISSUES OF CURRENT INTEREST
It seems reasonable to expect that research on the various principles of uncertainty, which are briefly examined in Chapter 9, will dominate the area of GIT
in the near future. This research will have to address a broad spectrum of questions. Some of them will be conceptual. For example, which of the measures,
from among GH, GS, S̄, TU = ·GH, GSÒ, or aTU = ·S̄ - S, SÒ—should be used
when one of the principles is applied to problems of a given type? Other questions will undoubtedly involve computational issues. For example, can an existing computational method be adapted for applications of a given principle to
problems of a given type? In the case that the answer is negative, a substantial research effort will be needed to develop a fitting algorithm. Since it is not
likely that many algorithms will be found that could be easily adapted for
problems in this area, it is reasonable to expect that work on designing
new algorithms will dominate research on uncertainty principles in the near
future.
Since there is growing interest in utilizing linguistic information, we can
expect some efforts in the near future to fuzzify the existing uncertainty theories in a more systematic way. It is not likely, however, that these efforts will
involve nonstandard fuzzy sets.
There are also some theoretical issues of current interest. One of them is
the issue of uniqueness of functional S̄ as an aggregate measure of the respective types of uncertainty in all uncertainty theories. Although the uniqueness
of S̄ is still an open question, some progress has been made toward establishing it, at least in the Dempster–Shafer theory (DST). It was proved that the
functional S̄ is the smallest one in the DST to measure aggregate uncertainty.
Its uniqueness can thus be proved by showing that it is also the largest functional to measure aggregate uncertainty in the DST.
An important area of theoretical research in GIT in the near future will
undoubtedly be a comprehensive study of possible disaggregations of S̄. One
particular issue in this area is clarifying the relationship of the two versions of
disaggregated total uncertainty introduced in Chapter 6, TU and aTU, and to
identify their application domains.
It also remains to study the range of applicability of Algorithm 6.1 for computing S̄. Thus far, the algorithm has been proved correct within DST. Its
418
10. CONCLUSIONS
applicability outside DST, while plausible, has yet to be formally established.
We also need to derive more algebraic properties of S̄ from the algorithm, in
addition to those employed in proving that S̄ ≥ GH in Appendix D.
Of course, there are many other issues regarding the development of efficient algorithms for GIT. We need, for example, efficient algorithms for computing S, for converting credal sets to lower or upper probabilities and vice
versa, and for computing Möbius representations of given lower and upper
probabilities.
10.3. LONG-TERM RESEARCH AREAS
A respectable number of nonclassical uncertainty theories have already been
developed, as surveyed in this book. However, this is only a tiny fraction of
prospective theories that are of interest in GIT. Each of these theories is based
on some generalization of classical measures, or some generalization of classical sets, or some generalization of both. It is clear that most of these theories are yet to be developed, and this undoubtedly will be a long-term area of
research in GIT.
Further explorations of the 2-dimensional GIT territory can be pursued by
focusing either on some previously unexplored types of generalized measures
or on some previously unexplored types of generalized sets. Examples of the
former are Choquet capacities of various finite orders, decomposable measures, and k-additive measures. The coherent family of Choquet capacities
seems to be especially important for facilitating the new principle of requisite
generalization introduced in Section 9.4. Decomposable measures and kadditive measures, on the other hand, are eminently suited to be approximators of more complex measures.
Focusing on unexplored types of generalized sets is a huge undertaking. It
involves all the nonstandard types of fuzzy sets. This direction is perhaps the
prime area of long-term research in GIT. Among the many challenges of this
research area is the use of fuzzified measures (i.e., measures defined on fuzzy
sets) for fuzzifying existing uncertainty theories or for developing new ones.
Research in this area will undoubtedly be crucial for developing a perceptionbased theory of probabilistic reasoning.
The route of exploring the area of GIT will undoubtedly be guided, at least
to some extent, by application needs. Some of the developed theories will
survive and some, with questionable utility, will likely sink into obscurity.
Finally, it should be emphasized that the broad framework introduced in
this book for formalizing uncertainty (and uncertainty-based information) still
may not be sufficiently general to capture all conceivable types of uncertainty.
It is not clear, for example, whether the information-gap conception of uncertainty, which has been developed by Ben-Haim [2001], can be formalized
within this framework. This is an open research question. If the answer turns
out to be negative, then, clearly, the framework will have to be appropriately
extended to conform to the goal of the research program of GIT.
10.4. SIGNIFICANCE OF GIT
419
10.4. SIGNIFICANCE OF GIT
GIT is an outcome of two generalizations in mathematics. In one of them, classical measures are generalized by abandoning the requirement of additivity;
in the other one, classical sets are generalized by abandoning the requirement
of sharp boundaries between sets. Generalizing mathematical theories has
been a visible trend in mathematics since about the middle of the 20th century,
and the two generalizations of interest in this book embody this trend well.
Other examples include generalizations from ordinary geometry (Euclidean
as well as non-Euclidean) to fractal geometry; from ordinary automata to cellular automata; from regular languages to developmental languages; from
precise analysis to interval analysis; from graphs to hypergraphs; and many
others.
Each generalization of a mathematical theory usually results in a conceptually simpler theory. This is a consequence of the fact that some properties
of the former theory are not required in the latter. At the same time, the more
general theory always has a greater expressive power, which, however, is
achieved only at the cost of greater computational demands. This explains why
these generalizations are closely connected with the emergence of computer
technology and steady increases in computing power. By generalizing mathematical theories, we not only enrich our insights but, together with computer
technology, also extend our capabilities for modeling the intricacies of the real
world.
Generalizing classical measures by abandoning the requirement of additivity broadens their applicability. Contrary to classical measures, generalized
measures are capable of formalizing, for example, synergetic or inhibitory
effects manifested by some properties measured on sets, data gathering in the
face of unavoidable measurement errors, or evidence expressed in terms of a
set of probability distributions.
Generalizing classical sets by abandoning sharp boundaries between sets is
an extremely radical idea, at least from the standpoint of contemporary
science. When accepted, one has to give up classical bivalent logic, generally
presumed to be the principal pillar of science. Instead, we obtain a logic in
which propositions are not required to be either true or false, but may be true
or false to different degrees. As a consequence, some laws of bivalent logic do
not hold any more, such as the law of excluded middle or the law of contradiction. At first sight, this seems to be at odds with the very purpose of science.
However, this is not the case. There are at least four reasons why allowing
membership degrees in sets and degrees of truth in propositions enhance scientific methodology considerably.
1. Fuzzy sets and fuzzy logic possess far greater capabilities than their classical counterparts to capture irreducible measurement uncertainties in their
various manifestations. As a consequence, their use considerably improves the
bridge between mathematical models and the associated physical reality. It is
paradoxical that, in the face of the inevitability of measurement errors, fuzzy
420
10. CONCLUSIONS
data are always more accurate than their crisp counterparts. This greater accuracy is gained by replacing quantization of variables involved with granulation, as is explained in Section 7.7.1.
2. Fuzzy sets and fuzzy logic are powerful tools for managing complexity
and controlling computational cost. This is primarily due to granulation of
systems variables, which is a natural way of making the imprecision of systems
models compatible with tasks for which they are constructed. Not only are
complexity and computational cost substantially reduced by appropriate granulation, but the resulting solutions tend also to be more useful.
3. An important feature of fuzzy set theory is its capability of capturing the
vagueness of linguistic terms in statements expressed in natural languages.
Vagueness of a symbol (a linguistic term) in a given language results from the
existence of objects for which it is intrinsically impossible to decide whether
the symbol does or does not apply to them according to linguistic habits of
some speech community using the language. That is, vagueness is a kind of
uncertainty that does not result from information deficiency, but rather from
imprecise meanings of linguistic terms, which particularly abound in natural
languages. Classical set theory and classical bivalent logic are not capable of
expressing the imprecision in meanings of vague terms. Hence, propositions in
natural language that contain vague terms were traditionally viewed as unscientific. This view is extremely restrictive. As we increasingly recognize, natural
language is often the only way in which meaningful knowledge can be
expressed. The applicability of science that shies away from natural language
is thus severely limited. This traditional limitation of science is now overcome
by fuzzy set theory, which has the capability of dealing in mathematical terms
with problems that require the use of natural language. Even though the
vagueness inherent in natural language can be expressed via fuzzy sets only
in a crude way, this capability is invaluable to modern science.
4. The apparatus of fuzzy set theory and fuzzy logic also enhances our capabilities of modeling human common-sense reasoning, decision making, and
other aspects of human cognition. These capabilities are essential for knowledge acquisition from human experts, for knowledge representation and
manipulation in expert systems in a human-like manner, and, generally, for
designing and building human-friendly machines with high intelligence.
The basic claim of GIT, that uncertainty is a broader concept than the
concept of classical probability, has been debated in the literature since the late
1980s (for an overview, see Note 10.4). As a result of this ongoing debate, as
well as convincing advances in GIT, limitations of classical probability theory
are now better understood. GIT is a continuation of classical, probability-based
information theory, but without the limitations of probability theory.
The role of information in human affairs has become so predominant that
it is now quite common to refer to our society as an “information society.” It
is thus increasingly important for us to develop a good understanding of the
NOTES
421
broad concept of information. In the generalized information theory, the
concept of uncertainty is conceived in the broadest possible terms, and uncertainty-based information is viewed as a commodity whose value is its potential to reduce uncertainty pertaining to relevant situations. The theory does not
deal with the issues of how much uncertainty of relevant users (cognitive
agents) is actually reduced in the context of each given situation, and how
valuable this uncertainty reduction is to them. However, the theory, when
adequately developed (see Note 10.6.), will be a solid base for developing a conceptual structure to capture semantic and pragmatic aspects
relevant to information users under various situations of information flow.
Only when this is adequately accomplished, will a genuine science of information be created.
NOTES
10.1. It was proved by Harmanec [1995] that the functional S̄ is the smallest one that
satisfies axioms for aggregate uncertainty in DST (stated in Section 6.6). This
result is, in some sense, relevant to the proof of the uniqueness of S̄. The uniqueness of S̄ can now be proved by showing that it is also the largest functional that
satisfies the axioms.
10.2. The prospective role of functional S, complementary to S̄, was raised by Kapur
et al. [1995]. In particular, it was suggested that the difference S̄ - S, referred to
as uncertainty gap, may be viewed as a measure of uncertainty about the probability distribution. Alternatively, it may be viewed as a measure of information
contained in constraints involved in applying the principle of maximum entropy.
10.3. Ben-Haim [2001] introduced an unorthodox theory of uncertainty, oriented primarily to decision making. The essence of the theory is well captured in a review
of Ben-Haim’s book written by Jim Hal [International Journal of General
Systems, 32(2), 2003, pp. 204–203], from which the following brief characterization of the theory is reproduced:
An info-gap analysis has three components: a system model, an info-gap
uncertainty model and performance requirements. The system model
describes the structure and behaviour of the system in question, using as
much information as is reasonably available. The system model may, for
example, be in the form of a set of partial differential equations, a network
model of a project or process, or indeed a probabilistic model such as a
Poisson process. The uncertainty in the system model is parameterized with
an uncertainty parameter a (a positive real number), which defines a family
of nested sets that bound regions or clusters of system behaviour. When a
= 0 the prediction from the system model converges to a point, which is
the anticipated system behaviour, given current available information.
However, it is recognised that the system model is incomplete so there will
be a range of variation around the nominal behaviour. Uncertainty, as
defined by the parameter a, is therefore a range of variation of the actual
around the nominal. No further commitment is made to the structure of
422
10. CONCLUSIONS
uncertainty. The a is not normalized and has no betting interpretation, so
is clearly distinct from a probability.
Uncertainty in Ben-Haim’s theory is not explicitly defined in measure–theoretic
terms. How to describe it within the GIT framework, if possible at all, is an open
question.
10.4. The following is a chronology of those major debates regarding uncertainty and
probability that are well documented in the literature:
• Probability theory versus evidence theory in artificial intelligence (AI)
—Shafer, Lindley, Spiegelhalter, Watson, Dempster, Kong;
—Statistical Science, 1987, 2(l), 1–44.
• Bayesian (probabilistic) approach to managing uncertainty in AI versus other
approaches;
—Peter Cheeseman and 23 correspondents;
—Computational Intelligence (Canadian), 1988, 4(l), 57–142.
• “An AI View of the treatment of uncertainty” by A. Saffioti;
—11 correspondents;
—The Knowledge Engineering Review, 1988, 2(l), 59–91.
• Cambridge debate: Bayesian approach to dealing with uncertainty versus other
approaches;
—Peter Cheeseman ´ George Klir, Cambridge University, August 1988;
—International Journal of General Systems, 1989, 15(4), 347–378.
• Fuzziness versus probability;
—Michael Laviolette and John Seaman and 8 correspondents;
—IEEE Transactions on Fuzzy Systems, February 1994, 2(l), 1–42.
• “The Paradoxical Success of Fuzzy Logic”;
—Charles Elkan and 22 correspondents;
—IEEE Expert, August 1994, 9(4), 2–49.
• “Probablistic and Statistical View of Fuzzy Methods” by M. Laviolette,
J.W. Seaman, J.D. Barrett, and W.H. Woodall;
—6 responses;
—Technometrics, 1995, 37(3), 249–292.
In many of these debates, the focus is on the relationship between probability
theory and the various novel uncertainty theories. The issues discussed and
the claims presented by defenders of probability theory vary from debate to
debate. The most extreme claims are expressed by Lindley in the first debate
[italics added]:
The only satisfactory description of uncertainty is probability. By this I
mean that every uncertainty statement must be in the form of a probability; that several uncertainties must be combined using the rules of probability; and that the calculus of probabilities is adequate to handle all
situations involving uncertainty. Probability is the only sensible description
of uncertainty and is adequate for all problems involving uncertainty. All
other methods are inadequate. . . . Anything that can be done with fuzzy
NOTES
423
logic, belief functions, upper and lower probabilities, or any other alternative to probability, can better be done with probability.
These extreme claims are often justified by referring to a theorem by Cox [1946,
1961], which supposedly established that the only coherent way of representing
and dealing with uncertainty is to use rules of probability calculus. However, it
has been shown that either the theorem does not hold under assumptions stated
explicitly by Cox [Halpern, 1999], or it holds under additional, hidden assumptions that are too strong to capture all aspects of uncertainty [Colyvan, 2004] (see
Colyvan’s paper for additional references on this topic).
10.5. The distinction between the broad concept of uncertainty and the narrower
concept of probability has been obscured in the literature on classical, probability-based information theory. In this extensive literature, the Hartley measure is
routinely portrayed as a special case of the Shannon entropy that emerges from
the uniform probability distribution. This view is ill-conceived since the Hartley
measure is totally independent of any probabilistic assumptions, as correctly recognized by Kolmogorov [1965] and Rényi [1970b]. Strictly speaking, the Hartley
measure is based on one concept only: a finite set of possible alternatives, which
can be interpreted as experimental outcomes, states of a system, events, messages,
and the like, or as sequences of these. In order to use this measure, possible alternatives must be distinguished, within a given universal set, from those that are
not possible. It is thus the possibility of each relevant alternative that matters in
the Hartley measure. Hence, the Hartley measure can be meaningfully generalized only through broadening the notion of possibility. This avenue is now available in terms of the theory of graded possibilities and other nonclassical
uncertainty theories that are the subject of GIT.
10.6. As shown by Dretske [1981, 1983], a study of semantic aspects of information
requires a well-founded underlying theory of uncertainty-based information.
While in his study Dretske relies only on information expressed in terms of the
Shannon entropy, the broader view of GIT allows us now to approach the same
study in a more flexible and, consequently, more meaningful fashion. Studies
regarding pragmatic aspects of information, such as those by Whittemore and
Yovits [1973, 1974] and Yovits et al. [1981], should be affected likewise. Hence,
the use of the various novel uncertainty theories, uncertainty measures, and
uncertainty principles in the study of semantic and pragmatic aspects of information will likely be another main, long-term direction of research.
10.7. The aims of GIT are very similar to those of generalized theory of uncertainty
(GTU), which have recently been proposed by Lotfi Zadeh [2005]. However, the
two theories are formulated quite differently. In GTU, information is viewed in
terms of generalized constraints on values of given variables. These constraints
are expressed, in general, in terms of the granular structure of linguistic variables.
In the absence of any constraint regarding the variables of concern, we are totally
ignorant about their actual state. In this situation, our uncertainty about the
actual state of the variables is maximal. Any known constraint regarding the variables reduces this uncertainty, and may thus be viewed as a source of information. The concept of a generalized constraint, which is central to GTU, has many
distinct modalities. Choosing any of them reduces the level of generality. Making
further choices will eventually result in one of the classical theories of uncer-
424
10. CONCLUSIONS
tainty. This approach to dealing with uncertainty is clearly complementary to the
one employed in GIT. In the latter, we start with the two classical theories of
uncertainty and generalize each of them by relaxing its axiomatic requirements.
The GTU approach to uncertainty and information is thus top-down—from
general to specific. On the contrary, the GIT approach to uncertainty and information is bottom-up—from specific to general. Both approaches are certainly
meaningful and should be pursued. In the long run, results obtained by the two
complementary approaches will eventually meet. When this happens, our understanding of information-based uncertainty and uncertainty-based information
will be quite satisfactory.
A
APPENDIX
UNIQUENESS OF THE
U-UNCERTAINTY
In the following lemmas, functional U is assumed to satisfy requirements
(U2), (U3), (U5), (U8), and (U9) as axioms: additivity, monotonicity, expansibility, branching, and normalization, respectively (p. 205). The notation
1 m = 11
, 14
, 12
, .4
. .3
, 1 and 0 m = 01
, 04
, 02
, .4
. .4
, 0 is carried over from the formulation of
4
3
m
the branching requirement.
m
Lemma A.1. For all q, k Œ ⺞, 2 £ q £ k £ n, with r = {r1, r2, . . . , rn}
ˆ
Ê
U (r) = U Á r1, r2, . . . , rk -q , r1
k , rk , . . . , rk , rk +1, . . . , rn ˜
4243
¯
Ë
q
rk -q +1 - rk rk -q + 2 - rk
rk -1 - rk
Ê
ˆ
+ (rk -q - rk )U Á 1 k -q ,
,
,...,
, 0 n -k +1 ˜
Ë
¯
rk -q - rk rk -q - rk
rk -q - rk
(A.1)
- (rr -q - rk )U (1 k -q , 0 n -k +q ).
Proof. By the branching axiom we have
U (r) = U (r1, r2, . . . , rk - 2 , rk , rk , rk +1, . . . , rn )
rk -1 - rk
Ê
ˆ
+ (rk - 2 - rk )U 1 k - 2,
, 0 n -k +1
Ë
¯
rk - 2 - rk
- (rr - 2 - rk )U (1 k - 2, 0 n -k + 2 ).
(A.2)
This also shows that the lemma is true for q = 2. We now proceed with a proof
by induction on q. Applying the induction hypothesis to the first term on the
right-hand side of Eq. (A.2), we get
Uncertainty and Information: Foundations of Generalized Information Theory, by George J. Klir
© 2006 by John Wiley & Sons, Inc.
425
426
APPENDIX A. UNIQUENESS OF THE U-UNCERTAINTY
U (r1, r2 , . . . , rk-2 , rk, rk, rk+1, . . . , rn )
ˆ
Ê
= U Á r1, r2 , . . . , rk-q, r1
k, rk, . . . , rk , rk, rk+1, . . . , rn ˜
4243
¯
Ë
q-1
r
r
rk-2 - rk
r
r
k k-q+ 2
k
k-q+1
Ê
ˆ
,
, ...,
, 0 n-k+ 2
+ (rk-q - rk )U 1 k-q,
Ë
¯
rk-q - rk rk-q - rk
rk-q - rk
- (rk-q - rk )U (1 k-q, 0 n-k+q ).
(A.3)
If we substitute this back into Eq. (A.2), we deduce that
Ê
ˆ
U (r) = U Á r1, r2, . . . , rk -q , r1
k , rk , . . . , rk ,rk +1, . . . , rn ˜
4243
Ë
¯
q
rk -q +1 - rk rk -q + 2 - rk
rk - 2 - rk
Ê
ˆ
,
,...,
, 0 n -k + 2 ˜
+ (rk -q - rk )U Á 1 k -q ,
Ë
¯
rk -q - rk rk -q - rk
rk -q - rk
- (rk -q - rk )U (1 k -q , 0 n -k +q )
rk -1 - rk
Ê
ˆ
+ (rk - 2 - rk )U 1 k - 2,
, 0 n -k +1
Ë
¯
rk - 2 - rk
(A.4)
- (rk - 2 - rk )U (1 k - 2, 0 n -k + 2 ).
Now apply the branching axiom to the quantity
(rk -q - rk )U ÊÁ 1 k -q ,
Ë
rk -q +1 - rk rk -q + 2 - rk
rk -1 - rk
ˆ
,
,...,
, 0 n -k +1 ˜
¯
rk -q - rk rk -q - rk
rk -q - rk
(A.5)
to produce
rk -q +1 - rk rk -q + 2 - rk
rk -1 - rk
ˆ
,
,...,
, 0 n -k +1 ˜
¯
rk -q - rk rk -q - rk
rk -q - rk
rk -q +1 - rk rk -q +1 - rk
rk - 2 - rk
Ê
ˆ
= (rk -q - rk )U Á 1 k -q ,
,
,...,
, 0 n -k + 2 ˜
Ë
¯
rk -q - rk rk -q - rk
rk -q - rk
r
r
Ê
ˆ
k -1
k
+ (rk - 2 - rk )U 1 k - 2,
, 0 n -k +1
Ë
¯
rk - 2 - rk
- (rk - 2 - rk )U (1 k - 2, 0 n -k + 2 ).
(rk -q - rk )U ÊÁ 1 k -q ,
Ë
(A.6)
All three terms on the right-hand side of the equality in Eq. (A.6) are present
in the right-hand side of Eq. (A.3), so that by a simple substitution we conclude that which was to be proved:
APPENDIX A. UNIQUENESS OF THE U-UNCERTAINTY
Ê
ˆ
U (r) = U Á r1, r2, . . . , rk -q , r1
k , rk , . . . , rk ,rk +1, . . . , rn ˜
4243
Ë
¯
q
r
r
r
r
rk -1 - rk
Ê
ˆ
k - q +1
k
k -q + 2
k
,
,...,
, 0 n -k +1 ˜
+ (rk -q - rk )U Á 1 k -q ,
Ë
¯
rk -q - rk rk -q - rk
rk -q - rk
- (rk -q - rk )U (1 k -q , 0 n -k +q ).
427
䊏(A.7)
Lemma A.2. For all k Œ ⺞, k < n, define g(k) = U(1k, 0n-k), then g(k) = log2k.
Proof. By the expansibility axiom we know that U(1k, 0n-k) = U(1k). By the
additivity axiom g(k + l) = g(k) + g(l), so by a proof identical to that of
Theorem 2.1 we can conclude that g(k) = log2k.
䊏
Lemma A.3. For k Œ ⺞, k ≥ 2, and r Œ ⺢, 0 £ r £ 1, we have
U (1 k rn-k ) = (1 - r) log 2 k + r log 2 n,
(A.8)
where rn-k represents r, r, r, . . . , r.
14
4244
3
n -k
Proof. Let r be the possibility profile 1k, rn-k. Form the joint possibility profile
r2, where both marginal possibility profiles are equivalent to r. It is simple to
check that r 2 = 1 k 2 , rn 2 -k 2 has this property. Now U (r 2 ) = U (1 k 2 ,rn 2 -k 2 ). First, we
can deduce from the additivity axiom that
U (r 2 ) = U (r) + U (r) = 2U (r).
(A.9)
Second, by applying Lemma A.1, we get
U (r 2 ) = U (1 k 2 , rn 2 -k 2 )
= U (1 k , rn 2 -k )
+ (1 - r)U (1 k 2 , 0 n 2 -k 2 )
- (1 - r)U (1 k , 0 n 2 -k )
(A.10)
We also know by the expansibility axiom that U(r) = U(r, 0) for any possibility profile r, and we can apply Lemma A.1 to any possibility profile and this
will make the final q possibility values zero. Let us perform this operation upon
the first term on the right-hand side of the equality in Eq. (A.10),
U (1 k , rn 2 -k ) = U (1 k ,n -k , 0 n 2 -n )
+ rU (1 n 2 )
- rU (1 n , 0 n 2 -n ).
Substituting Eq. (A.7) into Eq. (A.10), we calculate that
428
APPENDIX A. UNIQUENESS OF THE U-UNCERTAINTY
U (1 k 2 , rn 2 -k 2 ) = U (1 k , rn 2 -k ) + (1 - r)U (1 k 2 , 0 n 2 -k 2 ) - (1 - r)U (1 k , 0 n 2 -k )
= U (1 k ,n -k , 0 n 2 -n ) + rU (1 n 2 ) - rU (1 n , 0 n 2 -n ).
(A.11)
Now using the expansibility axiom to drop all final zeros and Lemma A.2 to
replace U(1k) with log2 k,
U (1 k 2 , rn 2 -k 2 ) = U (1 k , rn -k ) + r log 2 n 2 - r log 2 n
+ (1 - r) log 2 k 2 - (1 - r) log 2 n.
(A.12)
We now finish the proof by remembering that U (1 k 2 , rn 2 -k 2 ) = U (r 2 ) = 2U (r),
that U(1k, rn) = U(r), and that log2 x2 = 2 log2 x:
2U (r) = U (r) + 2 r log 2 n - r log 2 n
+ 2(1 - r) log 2 k - (1 - r) log 2 n
U (r) = r log 2 n + (1 - r) log 2 k.
䊏(A.13)
Theorem A.1. The U-uncertainty is the only functional that satisfies the
axioms of expansibility, monotonicity, additivity, branching, and normalization.
Proof. The proof is by induction on n, the length of the possibility profile. If
n = 2, then we can apply Lemma A.3 to get the desired result immediately.
Assume now that the theorem is true for n - 1; then we can use the
expansibility and branching axioms in the same way as was used in the proof
of Lemma A.3 to replace rn with zero:
U (r) = U (r1, r2 , . . . , rn )
= U (r1, r2, . . . , rn , 0)
= U (r1, r2, . . . , rn -1, 0, 0)
rn ˆ
Ê
+ rn -1U 1 n -1,
,0
Ë
rn -1 ¯
- rn -1U (1 n -1, 0, 0).
(A.14)
We can now apply Lemma A.3 to the term U(1n-1, (rn/rn-1), 0) and drop terminal zeros,
U (r) = U (r1 , r2 , . . . , rn -1 )
rn ˆ
rn
ÈÊ
˘
+ rn -1 Í 1 log 2 (n - 1) +
log 2 n˙
rn -1 ¯
rn -1
ÎË
˚
- rn -1 log 2 (n - 1)
= U (r1 , r2 , . . . , rn -1 )
+ (rn -1 - rn ) log 2 (n - 1) + rn log 2 n
- rn -1 log 2 (n - 1)
= U (r1 , r2 , . . . , rn -1 ) - rn log 2 (n - 1) + rn log 2 n
n
= U (r1 , r2 , . . . , rn -1 ) + rn log 2
.
n-1
(A.15)
APPENDIX A. UNIQUENESS OF THE U-UNCERTAINTY
429
Applying the induction hypothesis, we conclude
n
n-1
n -1
i
n
= Â ri log 2
+ rn log 2
i -1
n-1
i=2
n
i
= Â ri log 2
.
i -1
i=2
U (r) = U (r1 , r2 , . . . , rn -1 ) + rn log 2
䊏(A.16)
B
APPENDIX
UNIQUENESS OF THE
GENERALIZED HARTLEY
MEASURE IN THE
DEMPSTER–SHAFER
THEORY
A proof of uniqueness of the generalized Hartley measure in the
Dempster–Shafer theory (DST) that is presented here is based on the following six axiomatic requirements:
Axiom (GH1) Subadditivity. For any given joint body of evidence, ·F, mÒ on
X ¥ Y, and the associated marginal bodies of evidence, ·Fx, mxÒ and ·FY, mYÒ
GH (m) £ GH (mx ) + GH (mY ).
Axiom (GH2) Additivity. For any noninteractive bodies of evidence ·Fx, mxÒ
and ·FY, mYÒ, on X and Y, respectively, and the associated joint body of
evidence, ·F, mÒ, where
F = FX ¥ FY and m( A ¥ B) = mX ( A) ◊ mY (B)
for all A Œ Fx and all B Œ FY,
GH (m) = GH (mX ) + GH (mY ).
Axiom (GH3) Symmetry. GH is invariant with respect to permutations of
values of the basic probability assignment function within each group of
subsets of X with equal cardinalities.
Uncertainty and Information: Foundations of Generalized Information Theory, by George J. Klir
© 2006 by John Wiley & Sons, Inc.
430
APPENDIX B. UNIQUENESS OF THE GENERALIZED HARTLEY MEASURE IN THE DST
431
Axiom (GH4) Branching. GH(m) = GH(m1) + GH(m2) for any three bodies
of evidence on X, ·F, mÒ, ·F1, m1Ò, and ·F2, m2Ò, such that
F = {A, B, C , . . .},
F1 = {A1 , B, C , . . .},
F2 = {A, B1 , C1 , . . .},
where
A1 Õ A,
B1 Õ B,
C1 ÕC , . . . ,
A1 = B1 = C1 = ◊ ◊ ◊ = 1,
and
m( A) = m1 ( A) = m2 ( A)
m(B) = m1 (B) = m2 (B1 )
m(C ) = m1 (C ) = m2 (C1 )
Axiom (GH5) Continuity. GH is a continuous functional.
Axiom (GH6) Normalization. GH(m) = 1 when m(A) = 1 and |A| = 2.
The requirements of subadditivity, additivity, continuity, and normalization
are obvious generalizations of their counterparts for the U-uncertainty. The
requirement of symmetry states that the functional GH (measuring nonspecificity) should depend only on values of the basic probability assignment functions and the cardinalities of the sets to which these values are allocated and
not on the sets themselves.
The branching requirement states the following: Given a basic probability
assignment function on X, if we maintain its values, but one of its focal subsets
(e.g., subset A) is replaced with a singleton (thus obtaining basic probability
assignment function m1) and, separately, replace all remaining focal subsets of
m (i.e., all except A) with singletons (thus obtaining function m2), then the sum
of the nonspecificities of m1 and m2 should be the same as the nonspecificity
of the original basic probability assignment function m.
It can be now established, via the following theorem, that the generalized
Hartley measure defined by Eq. (6.27) is uniquely characterized by the
axiomatic requirements (GH1)–(GH6).
Theorem B.1. Let M denote the set of all basic probability assignments functions in DST and let GH denote a functional of the form
GH: M Æ [0, •).
432
APPENDIX B. UNIQUENESS OF THE GENERALIZED HARTLEY MEASURE IN THE DST
If the functional GH satisfies the axiomatic requirements of subadditivity,
additivity, symmetry, branching, and normalization (i.e., the axiomatic requirements (GH1)–(GH6)), then for any m Œ M,
GH (m) =
 m( A) log
2
A,
A ŒF
where F denotes the set of focal elements associated with m.
Proof. To facilitate the proof, let us introduce a convenient notation. Let (aA,
B, gC, . . .) denote a basic probability assignment function such that m(A) =
a, m(B) = b, m(C) = g . . . , where A, B, C, . . . are the focal elements of m.
When we are concerned only with the cardinalities of the focal elements, the
basic probability assignment function can be written in the form (aa, bb, gc,
. . .) where a, b, c, . . . are positive integers that stand for the cardinalities of
the focal elements; occasionally, when a = 1, we write (a) instead of (1a). For
the sake of simplicity, let us refer to the value m(A) of a basic assignment m
as the weight of A.
b
(i) First, we prove that GH(1a) = log2a. For convenience, let W(a) = GH(1a)
(that is, W is a nonnegative real-valued function on ⺞). From additivity (GH2),
we have
W (ab) = W (a) + W (b).
(B.1)
When a = b = 1, W(1) = W(1) + W(l) and, consequently, W(1) = 0. To show
that W(a) £ W(a = 1), let us take set A of a(a + 1) elements, which is a
subset of a Cartesian product X ¥ Y. By symmetry (GH3), GH(A) does not
depend on how the elements of A are arranged. Let us consider two possible
arrangements. In the first arrangement, A is viewed as a rectangle a ¥ (a + 1);
here ¥ denotes a Cartesian product of sets with cardinalities a and a + 1.
Then
GH ( A) = W (a) + W (a + 1)
(B.2)
by additivity. In the second arrangement, we view A as a subset of a square
(a + 1) ¥ (a + 1) such that at least one diagonal of the square is fully covered.
In this case, the projections onto both X and Y have the same cardinality
a + 1. Hence, by subadditivity (GH1), we have
GH ( A) £ W (a + 1) + W (a + 1).
It follows immediately from Eqs. (B.1) and (B.2) that
W (a) £ W (a + 1).
(B.3)
APPENDIX B. UNIQUENESS OF THE GENERALIZED HARTLEY MEASURE IN THE DST
433
Function W is thus monotonic nondecreasing. Since it is also additive in the
sense of Eq. (B.1) and normalized by Axiom (GH6), it is equivalent to the
Hartley information H. Hence, by Theorem 2.1
W (a) = GH ( 1a) = log 2 a.
(B.4)
(ii) We prove that GH(a1, b1, g1, . . .) = 0. First, we show that the equality
holds only for two focal elements, that is, we show that GH(a1, b1) = 0. Consider a set A of a2 elements for some sufficiently large, temporarily fixed a. Let
B = A ¥ (a1, b1). Then GH(A) = 2 log2a and
GH (B) = 2 log 2 a + GH ( a 1, b 1)
(B.5)
by Eq. (B.4) and additivity. Set B can be considered as consisting of two
squares a ¥ a with weights a and b. Let us now place both of these squares
into a larger square (a + 1) ¥ (a + 1) in such a way that at least one of the diagonals of the larger square is totally covered by elements of the smaller squares
and, at the same time, the a ¥ a squares do not completely overlap. Clearly,
both projections of this arrangement of set B have the same cardinality a + 1.
Hence,
GH (B) £ GH (a + 1) + GH (a + 1)
by subadditivity. Substituting for GH(B) from Eq. (B.5) and applying Eq.
(B.4), we obtain
GH ( a 1, b 1) + 2 log 2 a £ 2 log 2 (a + 1)
or, alternatively,
GH ( a 1, b 1) £ 2[log 2 (a + 1) - log 2 a].
This inequality must be satisfied for all a Œ ⺞. When a goes to infinity, the lefthand side of the inequality remains constant, whereas its right-hand side
converges to zero. Hence,
GH ( a 1, b 1) = 0.
By repeating the same argument for (a1, b1, g1, . . .) with n focal elements, we
readily obtain
GH ( a 1, b 1,g 1, . . .) £ n[2[log 2 (a + 1) - log 2 a]],
and, again, by allowing a to go to infinity, we obtain
434
APPENDIX B. UNIQUENESS OF THE GENERALIZED HARTLEY MEASURE IN THE DST
GH ( a 1, b 1,g 1, . . .) = 0.
(B.6)
Applying this result and the additivity of GH, we also have for an arbitrary
basic assignment m the following:
GH (m ◊( a 1, b 1,g 1, . . .)) = GH (m) + GH ( a 1, b 1,g 1, . . .) = GH (m).
(B.7)
This means that we can replicate all focal elements of m the same number of
times, splitting their weights in a fixed proportion a, b, g, . . . , and the value of
GH does not change.
(iii) Next, we prove that GH(m1) = GH(m2), where m1 = (aa, b+gb) and m2
= (aa, bb, gb). Since the actual weights are not essential in this proof, we omit
them for the sake of simplicity; if desirable, they can be easily reconstructed.
The proof is accomplished by showing that GH(m1) £ GH(m2) and, at the same
time, GH(m2) £ GH(m1). To demonstrate the first inequality, let us view the
focal elements m1 and m2 as collections of intervals of lengths a, b, and a, b,
b, respectively. Furthermore, let us place both intervals of m1 side by side, the
first and second interval of m2 side by side, and the third interval of m2 above
the second interval of m2. According to this arrangement, the two projections
of m2 consist of m1 and a pair of singletons with appropriate weights, say (x1,
y
1). It then follows from the subadditivity that
GH (m2 ) £ GH (m1 ) + GH ( x 1, y1).
The last term in this inequality is 0 by Eq. (B.6) and, consequently, GH(m2) £
GH(m1).
To prove the opposite inequality, let m3 = (b1, g1) so that m1 · m3 assigns the
same weights to the b’s as does m2. We select a sufficiently large integer n, temporarily fixed. Let s and s¢ denote squares n ¥ n and (n + 1) ¥ (n + 1), respectively. We can view m1 · m3 · s as a collection of four parallelepipeds, two with
edges a, n, n, and the other two with edges b, n, n. Similarly, m2 · s¢ can be
viewed as a collection of three parallelepipeds, one with edges a, n + 1, n + 1,
and two with edges b, n + 1, n + 1. We now place two blocks with edges a, n,
n of m1 · m3 · s inside one block with edges a, n + 1, n + I of m2 so that they cover
the main diagonal. Furthermore, we place two blocks with edges b, n, n of
m1 · m3 · s inside the separate blocks with edges b, n + 1, n + 2 of m2 · s¢ so
that they again cover the diagonals. Using additivity and Eq. (B.4), the
construction results in the equation
GH (m1 ◊ m3 ◊ s) = GH (m1 ) + GH ( b 1,g 1) + 2 log 2 n
= GH (m1 ) + 2 log 2 n.
Using subadditivity and the projections of the construction, we obtain
APPENDIX B. UNIQUENESS OF THE GENERALIZED HARTLEY MEASURE IN THE DST
435
GH (m1 ◊ m3 ◊ s) £ GH (m2 ) + GH (n + 1) + GH (n + 1).
Hence,
GH (m1 ) + 2 log 2 n £ GH (m2 ) + 2 log 2 n
and
GH (m2 ) - GH (m1 ) ≥ 2[log 2 n - log 2 (n + 1)].
For n going to infinity, the right-hand side of this inequality converges to 0
and, consequently,
GH (m2 ) - GH (m1 ) ≥ 0
or
GH (m2 ) ≥ GH (m1 ).
The proof can easily be extended to the case of a general basic probability
assignment function in which any given cardinality may repeat more than
twice and the number of different cardinalities is arbitrary.
(iv) Repeatedly applying the branching property, we obtain
GH ( a a, b b,g c, . . .) = GH ( a a, x1 1, x2 1, . . .) + GH ( b b, y1 1, y2 1, . . .) + . . . .
According to property (iii), we can combine the singletons so that the proof
of the theorem reduces to the determination of GH(aa, 1-a1) for arbitrary a
and a. Moreover,
GH ( 1 a) = GH ( 1 2 a,1 2 a)
and, by the branching axiom,
GH ( 1 2 a,1 2 a) = 2GH ( 1 2 a,1 21).
Hence,
GH ( 1a) = 2GH ( 1 2 a,1 21).
Similarly,
GH ( 1a) = 3GH ( 1 3 a, 2 31) = 3GH ( 1 4 a,3 41) . . . .
436
APPENDIX B. UNIQUENESS OF THE GENERALIZED HARTLEY MEASURE IN THE DST
Using Eq. (B.4) we obtain for each t = 1/n the equation
GH ( t a,1-t 1) = t log 2 a.
This formula can easily be shown to hold for any rational t. Since an arbitrary
real number can be approximated by a monotonic sequence of rational
numbers, property (iii) implies that the formula holds also for any t Œ [0, 1].
This concludes the proof.
䊏
C
APPENDIX
CORRECTNESS OF
ALGORITHM 6.1
The correctness of the algorithm is stated in Theorem C.1. The proof employs
the following two lemmas.
Lemma C.1. Let x > 0 and c - x > 0. Denote
L( x) = -[(c - x) log 2 (c - x) + x log 2 x].
Then L(x) is strictly increasing in x when (c - x) > x.
Proof. L¢(x) = log2(c - x) - log2x so that L¢(x) > 0 whenever (c - x) > x.
䊏
Lemma C.2. [Dempster, 1967b] Let X be a frame of discernment, Bel a generalized belief function* on X, and m the corresponding generalized basic
probability assignment; then a tuple ·px | x Œ XÒ satisfies the constraints
0 £ px £ 1,
Âp
x
"x Œ X ,
= Bel ( X ),
x ŒX
and
Bel ( A) £ Â px ,
"A Õ X ,
x ŒX
* A generalized belief function, Bel, is a function that satisfies all requirements of a belief measure
except the requirement that Bel(X) = 1. Similarly, values of a generalized basic probability assignment are not required to add to one. Bel in Algorithm 6.1 is clearly a generalized belief function
after the first iteration of the algorithm.
Uncertainty and Information: Foundations of Generalized Information Theory, by George J. Klir
© 2006 by John Wiley & Sons, Inc.
437
438
APPENDIX C. CORRECTNESS OF ALGORITHM 6.1
if and only if there exist nonnegative real numbers a xA for all nonempty sets
A 債 X and for all x Œ A such that
Â
px =
a xA = m( A).
x x ŒA
Using the results of Lemma C.1 and Lemma C.2, we can now address the
principal issue of this appendix, the correctness of Algorithm 6.1. This issue is
the subject of the following theorem.
Theorem C.1. Algorithm 6.1 stops after a finite number of steps and the
output is the correct value of function AU(Bel) since ·px | x ŒXÒ maximizes the
Shannon entropy within the constraints induced by Bel.
Proof. The frame of discernment X is a finite set and the set A chosen in Step
1 of the algorithm is nonempty (note also that the set A is determined
uniquely). Therefore, the “new” X has a smaller number of elements than the
“old” X and so the algorithm terminates after finitely many passes through
the loop of Steps 1–5.
To prove the correctness of the algorithm, we proceed in two stages. In the
first stage, we show that any distribution ·px | x Œ XÒ that maximizes the
Shannon entropy within the constraints induced by Bel has to satisfy the equality px = Bel(A)/|A| for all x Œ A. In the second stage we show that, for any
partial distribution ·px | x Œ X - AÒ that satisfies the constraints induced by the
“new” generalized belief function defined in Step 3 of the algorithm, the complete distribution ·qx | x Œ XÒ, defined by
qx = Bel ( A) A
for x Œ A and qx = px
for
x Œ X - A,
(C.1)
satisfies the constraints induced by the original (generalized) belief function,
and vice versa. That is, for any distribution ·qx | x Œ XÒ satisfying the constraints
induced by Bel and such that qx = Bel(A)/|A| for all x Œ A, it holds that
·qx | x Œ X - AÒ satisfies the constraints induced by the “new” generalized belief
function defined in Step 3 of the algorithm. The theorem then follows by
induction.
First, assume that ·px | x Œ XÒ is such that
AU (Bel ) = - Â px log 2 px ,
x ŒX
Âp
x
= Bel ( X ),
x ŒX
and
Bel (B) £ Â p x
x ŒB
APPENDIX C. CORRECTNESS OF ALGORITHM 6.1
439
for all B à X, and there is y Œ A such that py π Bel(A)/|A|, where A is the set
chosen in Step 1 of the algorithm. That is, Bel(A)/|A| ≥ Bel(B)/|B| for all B 債
X, and if Bel(A)/|A| = Bel(B)/|B|, then |A| > |B|. Furthermore, we can assume
that py > Bel(A)/|A|. This is justified by the following argument: If py <
Bel(A)/|A|, then due to Bel(A) £ SxŒApx there exists y¢ Œ A such that py¢ >
Bel(A)/|A|, and we can take y¢ instead of y.
m
For a finite sequence {xi}i=0
, where xi Œ X and m is a positive integer, let F
denote the set of all focal elements associated with Bel and let
F( xi ) = »{C Õ X C Œ F , xi -1 ŒC , and a xci -1 > 0}
for i = 1, . . . , m, where a xc are the nonnegative real numbers whose existence
is guaranteed by Lemma C.2 (we fix one such set of those numbers). Let
{
m
D = x ŒX $ m non - negative integers and {x i } i=0 such that
x0 = y, x m = x, for all i = 1, 2, . . . , m, x i ŒF(x i ) and for all
i = 2, 3, . . . , m, "z Œ F (x i-1 ) pz ≥ p x i -2 } .
There are now two possibilities. Either there is z Œ D such that in the
m
sequence {xi}i=0
from the definition of D, where xm = z, it holds that pz < pm-1.
This, however, leads to a contradiction with the maximality of ·px | x Œ XÒ since
-Â x ŒX px log 2 px < -Â x ŒX qx log 2 qx
by Lemma C.1, where qx = px for x Œ (X - {z, xm-1}), qz = pz + e, and
qxm -1 = pxm -1 - e; here
e Œ (0,min{a Cx m -1 , ( p x m -1 + pz ) 2}),
where C is the focal element of Bel from the definition of D containing both
z and xm-1 and such that a Cxm-1 > 0. The distribution ·qx | x Œ XÒ satisfies the constraints induced by Bel due to the Lemma C.2. Or, the second possibility is
that for all x Œ D and all focal elements C 債 X of Bel such that x Œ C, whenever a Cx > 0, it holds that pz ≥ px for all z Œ (C - {x}). It follows from Lemma
C.2 that Bel(D) = SxŒDpx. However, this fact contradicts the choice of A,
since
Bel (D)
=
D
Â
x ŒD
D
Px
≥ py >
Bel ( A)
.
A
We have shown that any ·px | x Œ XÒ maximizing the Shannon entropy
within the constraints induced by Bel, has to satisfy px = Bel(A)/|A| for all
x Œ A.
Let Bel¢ denote the generalized belief function on X - A defined in Step 3
of the algorithm; that is, Bel¢ (B) = Bel(B » A) - Bel(A). It is really a gener-
440
APPENDIX C. CORRECTNESS OF ALGORITHM 6.1
alized belief function. The reader can verify that its corresponding generalized
basic probability assignment can be expressed by
Â
m ¢(B) =
m(C )
C Õ X ( C « ( X - A ) )= B
for all nonempty sets B 債 X - A, and m¢(Ø) = 0. Assume, then, that ·px | x Œ
X - AÒ is such that px Œ [0, 1],
Â
p x = Bel ¢(X - A),
x ŒX - A
and
Âp
x
≥ Bel ¢(B)
x ŒB
for all B à X - A. Let ·qx | x Œ XÒ denote the complete distribution defined by
Eq. (C.1). Clearly, qx Œ [0, 1]. Since
Bel ¢(X - A) = Bel (X ) - Bel ( A),
we have
Â
Bel ( X ) =
px + Bel ( A)
x ŒX - A
Bel ( A)
A
x ŒA
Â
=
px + Â
x ŒX - A
=
Âq .
x
x ŒX
From Bel(A)/|A| ≥ Bel(B)/|B| for all B 債 X, it follows that SxŒBqx ≥ Bel(C) for
all C 債 A. Assume that B 債 X and B « (X - A) π Ø. We know that
Â
p x ≥ Bel ¢(B « (X - A)) = Bel ( A » B) - Bel ( A).
x ŒB «( X - A )
From Eq. (5.40), we get
Bel ( A » B) - Bel ( A) ≥ Bel (B) - Bel ( A « B).
Since Bel(A)/|A| ≥ Bel(A « B)/|A « B|, we get
Âq
x ŒB
x
=
Â
x ŒB «( X - A )
px +
Bel ( A)
≥ Bel (B).
A
x ŒA « B
Â
APPENDIX C. CORRECTNESS OF ALGORITHM 6.1
441
Conversely, assume ·qx | x Œ XÒ is such that qx Œ [0, 1], qx = Bel(A)/|A| for all
x Œ A, SxŒXqx = Bel(X), and SxŒBqx ≥ Bel(B) for all B à X. Clearly, we have
Bel ¢(X - A) = Bel (X ) - Bel ( A) =
Â
qx .
x ŒX - A
Let C 債 X - A. We know that SxŒA»C qx ≥ Bel(A » C), but it follows from this
fact that
Âq
x
≥ Bel ( A » C ) - Bel ( A) = Bel ¢(C ).
x ŒC
This means that we reduced the size of our problem, and therefore the
theorem follows by induction.
䊏
Several remarks regarding Algorithm 6.1 can be made:
(a) Note that we have also proved that the distribution maximizing the
entropy within the constraints induced by a given belief function is
unique. It is possible to prove this fact directly by using the concavity
of the Shannon entropy.
(b) The condition that A has the maximal cardinality among sets B with
equal value of Bel(B)/|B| in Step 1 is not necessary for the correctness
of the algorithm, but it speeds it up. The same is true for the condition
Bel(X) > 0 in Step 5. Moreover, we could exclude the elements of X
outside the union of all focal elements of Bel (usually called core within
evidence theory) altogether.
(c) If A Ã B and Bel(A) = Bel(B), then Bel(A)/|A| > Bel(B)/|B|. It means
that it is not necessary to work with the whole power set P(X); it is
enough to consider C = {A 債 X $ {F1, . . . , F1} 債 P(X) such that
l
m(Fi) > 0 and A = Ui =1 Fi}.
D
APPENDIX
PROPER RANGE
OF GENERALIZED
SHANNON ENTROPY
The purpose of this Appendix is to prove that S̄(Bel) - GH(Bel) ≥ 0 for any
given belief function (or any of its equivalent representations). To facilitate
the proof, which is adopted from Smith [2000], some relevant features of
Algorithm 6.1 are examined first.
Applying Algorithm 6.1 to a given belief function Bel defined on P(X)
results in the following sequence of sets Ai 債 X and associated probabilities
pi, each obtained in the ith pass through Steps 1–5 (or the ith iteration) of the
algorithm (i Œ ⺞n, assuming that the algorithm terminates after n passes):
•
A1 is equal to A 債 X such that
Bel ( A)
A
is maximal. If there are more such sets than one, then the one with the
largest cardinality is chosen. Each element of set A1 is assigned the
probability
p1 =
•
Bel ( A1 )
.
A1
A2 is equal to A 債 X such that
Bel ( A » A1 ) - Bel ( A1 )
A - A1
Uncertainty and Information: Foundations of Generalized Information Theory, by George J. Klir
© 2006 by John Wiley & Sons, Inc.
442
APPENDIX D. PROPER RANGE OF GENERALIZED SHANNON ENTROPY
443
is maximal. If there are more such sets than one, then the one with the
largest cardinality |A - A1| is chosen. Each element of set A2 - A1 is
assigned the probability
p2 =
•
Bel ( A2 » A1 ) - Bel ( A1 )
.
A2 - A1
A3 is equal to A 債 X such that
Bel ( A » A2 » A1 ) - Bel ( A2 » A1 )
A - ( A2 » A1 )
is maximal. If there are more such sets than one, then the one with the largest
cardinality |A - (A2 » A1)| is chosen. Each element of set A3 - (A1 » A2) is
assigned the probability
Bel ( A3 » A2 » A1 ) - Bel ( A2 » A1 )
.
A3 - ( A2 » A1 )
p3 =
•
In general, for any i Œ ⺞n, Ai is equal to A 債 Ai such that
(
i-1
)
Bel A » U j=1 A j - Bel
A-
(
i-1
U j=1 A j
)
(U
i-1
j=1
Aj
)
(D.1)
is maximal. If there are more such sets than one, then the one with the largest
i-1
i-1
Aj) is
Aj)| is chosen. Each element of set Ai - (傼j=1
cardinality |A - (傼j=1
assigned the probability
pi =
Bel
(U
) (U
A - (U A )
i
A j - Bel
j=1
i-1
i
j=1
i-1
j=1
Aj
).
(D.2)
j
In the following, let m denote the basic probability assignment function
associated with the considered belief function Bel. Observe that in each iteration of the algorithm, values of m for some focal elements are converted to
probabilities. In pass 1, these are focal elements B such that B 債 A1; in pass
2, they are focal elements B such that B 債 A1 » A2, but B Õ/ A. In general, in
the ith iteration they are focal elements in the family
{
i
i-1
}
Fi = B Õ X B Õ U j=1 A j and B Õ U j=1 A j .
(D.3)
444
APPENDIX D. PROPER RANGE OF GENERALIZED SHANNON ENTROPY
The regularities of Algorithm 6.1, expressed by Eqs. (D.1)–(D.3), make it
possible to decompose S̄ and GH into n components, S̄i and GHi, one for each
iteration of the algorithm, such that
n
 S (Bel ),
S (Bel ) =
i
(D.4)
i=1
n
 GH (Bel ).
GH (Bel ) =
i
(D.5)
i=1
The proof that S̄(Bel) - GH(Bel) ≥ 0 then can be accomplished by proving
that S̄i(Bel) ≥ GHi(Bel) for all i Œ ⺞n. To pursue the proof, it is convenient to
prove the following Lemma first.
Lemma D.1. In each iteration of Algorithm 6.1,
pi £
1
U
.
i
(D.6)
Aj
j=1
Proof. Assume that inequality (D.6) does not hold for some i Œ ⺞n. Then,
pi >
1
i
U j=1 A j
.
(D.7)
Due to the maximization of the assigned probabilities in each iteration of the
algorithm, pj > pi for all j < i. Hence,
pj >
1
U
i
j=1
Aj
(D.8)
for all j Œ ⺞i under the assumption (D.7).
i-1
Aj| elements of X are assigned the probAccording to Eq. (D.2), |Ai - 傼j=1
ability pi in the ith iteration of the algorithm. Hence, the inequality
i
 A -U
j
j =1
j -1
k =1
Ak p j £ 1
(D.9)
must hold since all the probabilities generated by the algorithm must add to
1. However,
APPENDIX D. PROPER RANGE OF GENERALIZED SHANNON ENTROPY
i
 A -U
j
j=1
445
i-1
j-1
k=1
Ak p j = A1 p1 + A2 - A1 p2 + ◊ ◊ ◊ + Ai - U j=1 A j pi
>
1
U
( A1 + A2 - A1 + ◊ ◊ ◊
i
Aj
j=1
i-1
+ Ai - U j=1 A j
=
1
U
+
=
(by Eq. (D.8))
( A1 + ( A2 » A1 - A1 ) + ◊ ◊ ◊
i
Aj
j=1
(U
U
U
)
i
j=1
i
i
j=1
Aj -
Aj
U
i-1
j=1
Aj
))
= 1,
Aj
j=1
which contradicts inequality (D.9).
䊏
Theorem D.1. For any belief function Bel and the associated basic probability assignment function m, S̄(m) - GH(m) ≥ 0.
Proof. Assume that Algorithm 6.1 terminates after n iterations. Let
GH i =
 m(B) log
2
(D.10)
B,
B ŒFi
where Fi is defined by Eq. (D.3). Then,
n
 GH
i
= GH (m)
(D.11)
i=1
i-1
and, since B 債 U j=1 A j for all B Œ Fi,
 m(B) log
B ŒFi
B £ Â m(B) log 2 U j =1 Aj .
i
(D.12)
S i = - Ai - U j=1 A j pi log 2 pi ,
(D.13)
2
Now define
i-1
which represents the contribution to S̄(m) by probabilities generated in iteration i of the algorithm. That is,
n
ÂS
i=1
i
= S (m).
(D.14)
446
APPENDIX D. PROPER RANGE OF GENERALIZED SHANNON ENTROPY
Now observe that
Bel
(U
i
j =1
)
Aj - Bel
(U
i -1
j =1
) Â m(B).
Aj =
(D.15)
B ŒFi
Applying this equation and Eq. (D.2) to Eq. (D.13), we obtain
S i = - Ai -
(
i =1
U j =1 Aj
 m(B)
)A-
B ŒFi
i
=-
 m(B) log
2
(U
i =1
Aj
j =1
)
log 2 pi
pi .
(D.16)
B ŒFi
The proof is now completed by showing that S̄i ≥ GHi for all i Œ ⺞n:
S i = - Â m(B) log 2 pi
B ŒFi
≥-
2
B ŒFi
=
1
i
 m(B) log U
 m(B) log B
i
2
B ŒFi
≥
ˆ
˜
˜
Ë U j =1 Aj ¯
Ê
 m(B) log ÁÁ
2
j =1
(by Lemma 9.1)
Aj
(by Eq. (D.12))
B ŒFi
= GH i .
䊏
E
APPENDIX
MAXIMUM OF GSa IN
SECTION 6.9
It is stated in Section 6.9 that the maximum, GSa* (n), of the functional
n
GS a (n) = - Â mi log 2 (mii),
(E.1)
i=1
where mi Œ [0, 1] for all i Œ ⺞n and
n
Âm
i
= 1,
(E.2)
i=1
is obtained when
mi = (1 i)2 -( a+1 ln 2 )
(E.3)
for all i Œ ⺞n; the value of a is determined by solving the equation
n
2 - (a +1 ln 2) Â (1 i) = 1.
(E.4)
i =1
In order to derive this maximum, it is convenient to use the method of
Lagrange multipliers. Using the constraint Eq. (E.2) of values of mi, we form
the Lagrange function
n
Ê n
ˆ
L = -Â mi log 2 (mi i) - a Á Â mi - 1˜ ,
Ë
¯
i =1
i =1
Uncertainty and Information: Foundations of Generalized Information Theory, by George J. Klir
© 2006 by John Wiley & Sons, Inc.
447
448
APPENDIX E. MAXIMUM OF GSa IN SECTION 6.1
where a is a Lagrange multiplier. Now, setting the partial derivatives of L with
respect to the individual variables mi to zero, we obtain
∂L
= - log 2 (mi i) - (1 ln 2) - a = 0
∂mi
for all i Œ ⺞n. Solving these equations for mi, we obtain
mi = (1 i)2 - (a +1 ln 2)
(E.5)
for all mi Œ ⺞n. Adding all these equations results in
n
1 = Â (1 i)2 - (a +1 ln 2) ,
i =1
due to Eq. (E.2). This equation can be rewritten as
n
1 = 2 - (a +1 ln 2) Â (1 i).
(E.6)
i =1
Introducing, for convenience
n
Sn =
 (1 i),
i=1
and solving Eq. (E.6) for a, we readily obtain
a = - log 2 (1 sn ) - (1 ln 2).
(E.7)
Substituting this expression for a into Eq. (E.5), we obtain
mi = (1 i)2 log 2(1 s n )
= 1 s n i.
Finally, substituting this expression for each mi into Eq. (E.1), we obtain
GAa* (n) =
n
 (1 i)(1 s ) log
n
2
sn
i=1
n
= [(1 sn ) log 2 sn ]Â (1 i)
= log 2 sn .
i=1
F
APPENDIX
GLOSSARY OF KEY
CONCEPTS
Additive (classical) measure. A set function m: C Æ ⺢+ defined on a nonempty
family C of subsets of a given set X for which m(Ø) = 0 and m(A » B) =
m(A) + m(B) for all A, B, A » B Œ C such that A « B = Ø.
Aggregate uncertainty. For a given convex set D of probability distributions,
aggregate uncertainty (subsuming nonspecificity and conflict) is defined by
the maximal value of the Shannon entropy within D.
Alternating Choquet capacity of order k. A subadditive measure m on ·X, C Ò
that satisfies the inequality
K +1 Ê
ˆ
Ê k
ˆ
m Á I A j ˜ £ Â (-1)
mÁ U Aj ˜
Ë jŒK ¯
Ë j=1 ¯ K Õ⺞ k
K π∆
for all families of k sets in C.
Basic possibility function. A possibility measure restricted to singletons.
Basic probability assignment. For any given nonempty and finite set X, a
function m: P(X) Æ [0, 1] such that m(Ø) = 0, Â m( A) = 1, and m(A) ≥ 0
AÕ X
for all A 債 X.
Belief measure. Choquet capacity of infinite order.
Bit. A unit of uncertainty: one bit of uncertainty is equivalent to uncertainty
regarding the truth or falsity of one elementary proposition.
Characteristic function. Function cA : X Æ {0, 1} by which a subset A of a given
universal set X is defined. For each x ŒX,
Ï1
cA = Ì
Ó0
if x is a member of A
if x is not a member of A.
Uncertainty and Information: Foundations of Generalized Information Theory, by George J. Klir
© 2006 by John Wiley & Sons, Inc.
449
450
APPENDIX F. GLOSSARY OF KEY CONCEPTS
Choquet capacity of order k. A superadditive measure m on ·X, CÒ that
satisfies the inequalities
K +1 Ê
ˆ
Ê k
ˆ
m Á U A j ˜ ≥ Â (-1)
mÁ I Aj ˜
Ë jŒK ¯
Ë j=1 ¯ K Õ⺞ k
K π∆
for all families of k sets in C.
Choquet integral. Given a nonnegative, finite, and measurable function f on
set X and a monotone measure on a family C of subsets of X that contains
Ø and X, the Choquet integral of f with respect to m on a given set A Œ C,
(C)兰Af dm, is defined by the formula
(C )Ú f dm = Ú m ( A « a F ) da ,
A
A
where F = {x | f(x) ≥ a}, a Œ [0, •).
Compatibility relation. Binary relation on X2 that is reflexive and symmetric.
Conflict. The type of information-based uncertainty that is measured by the
Shannon entropy and its various generalizations.
Convex fuzzy subset A of Rn. Fuzzy set whose a-cuts are convex subset of Rn
in the classical sense for all a Œ (0, 1].
Convex subset A of Rn. For any pair of points r = ·ri | i Œ ⺞nÒ and s = ·si | i Œ
⺞nÒ in A and every real number l Œ [0, 1], the point t = ·lri + (1 - l) si | i Œ
⺞nÒ is also in A.
Cutworthy property. Any property of classical sets that is fuzzified via the acut representation by requiring that it holds in the classical sense in all acuts of the fuzzy sets involved.
Defuzzification. A replacement of a given fuzzy interval by a single real
number that, in the context of a given application, best represents the fuzzy
interval.
Dempster–Shafer theory. Theory of uncertainty based on belief measures and
plausibility measures.
Disaggregated total uncertainty. For a given set D of probability distributions,
either the pair
a
TU = ·GH , S - GHÒ
or, alternatively, the pair
a
TU = ·S - S , SÒ
where GH, S̄ and S denote, respectively, the generalized Hartley measure,
the aggregate uncertainty, and the minimal value of the Shannon entropy
within D.
APPENDIX F. GLOSSARY OF KEY CONCEPTS
451
Equivalence relation. Binary relation on X2 that is reflexive, symmetric, and
transitive.
Fuzzification. A process of imparting fuzzy structure to a definition (concept),
a theorem, or even a whole theory.
Fuzziness. The type of uncertainty that is not based on information deficiency,
but rather on the linguistic imprecision (vagueness) of natural language.
Fuzzy partition of set X. A finite family {Ai | Ai ŒF(X), Ai π Ø, i Œ ⺞n, n ≥ 1}
of fuzzy subsets Ai of X such that for each x ŒX
 A ( x) = 1.
i
i Œ⺞n
Fuzzy complement. A function c: [0, 1] Æ [0, 1] that is monotonic decreasing
and satisfies c(0) = 1 and c(1) = 0; also it is usually continuous and such that
c(c(a)) = a for any a Œ [0, 1].
Fuzzy implication. Function J of the form [0, 1]2 Æ [0, 1] that for any truth
values a, b of given fuzzy propositions p, q, respectively, defines the truth
value J(a, b), of the proposition “if p, then q.”
Fuzzy number. Normal fuzzy sets on ⺢ whose support is bounded and whose
a-cuts are closed intervals of real numbers for all a Œ (0, 1].
Fuzzy relation. Fuzzy subset of a Cartesian product of several crisp sets.
Fuzzy system. A system whose variables range over states that are fuzzy
numbers or fuzzy intervals (or some other relevant fuzzy sets).
Generalized Hartley measure. A functional GH defined by the formula
GH (D) =
Âm
D
( A) log 2 A ,
AÕX
where D is a given convex set of probability distributions on a finite set X
and mD is the Möbius representation associated with D.
Generalized Shannon entropy. For a given convex set D of probability distributions, the difference between the aggregate uncertainty and the
generalized Hartley measure or, alternatively, the minimal value of the
Shannon entropy within D.
Hartley-like measure of uncertainty. The functional defined by Eq. (2.38) by
which the uncertainty associated with any bounded and convex subset of
⺢n is measured.
Hartley measure of uncertainty. The functional H(E) = log2 |E|, where E is a
finite set of possible alternatives and uncertainty is measured in bits.
Information transmission. In every theory of uncertainty, the difference
between the sum of marginal uncertainties and the joint uncertainty.
Interval-valued probability distribution. For a given finite set X, a tuple
·[p(x), p̄(x)] | x ŒXÒ such that  P ( x) £ 1 and  p ( x) ≥ 1.
¯
x ŒX
x ŒX
452
APPENDIX F. GLOSSARY OF KEY CONCEPTS
k-Monotone measures. An alternative name for Choquet capacities of order
k.
Linguistic variable. A variable whose states are fuzzy intervals assigned to
relevant linguistic terms.
Lower probability function. For any given set D of probability distribution
functions p on a finite set X, the lower probability function Dm is defined
for all sets A 債 X by the formula
D
m ( A) = inf
p ŒD
 p( x).
x ŒA
Measure of fuzziness of a fuzzy set. A functional that usually measures the
lack of distinction between a given fuzzy set and its complement.
Möbius representation. Given a monotone measure m on ·X, P(X)Ò, where X
is finite, its Möbius representation is a set function m defined for all
A ŒP(X) by the formula
m( A) =
Â
(-1)
A -B
m (B).
B BÕ A
Monotone measure. A set function m : C Æ ⺢+ defined on a nonempty family
C of a given set X for which m(Ø) = 0 and m(A) £ m(B) for all A, B Œ C
such that A 債 B.
Necessity measure. A superadditive measure m for which m(A « B) =
min{m(A), m(B)}.
Nested family of crisp sets. Family of sets {A1, A2, . . . , An} such that Ai 債 Ai+1
for all i = 1, 2, . . . , n - 1.
Nonspecificity. The type of information-based uncertainty that is measured
by the Hartley measure and its various generalizations, or, alternatively, by
the difference between the maximum and minimum values of the Shannon
entropy within a given convex set of probability distributions.
Partial ordering. A binary relation on X2 that is reflexive, antisymmetric, and
transitive.
Partition of X. A disjoint family {A1, A2, . . . An} of nonempty subsets of X such
n
that
UA
i
= X.
i =1
Plausibility measure. Alternating Choquet capacity of infinite order.
Possibility measure. A subadditive measure m for which m(A » B) =
max{m(A), m(B)}.
Possibility profile. A tuple consisting of all values of a basic possibility function in decreasing order.
Relation. A subset (crisp or fuzzy) of a Cartesian product.
APPENDIX F. GLOSSARY OF KEY CONCEPTS
453
Shannon cross-entropy. The functional
S[ p( x), q(c) x Œ X ] =
 p( x) log
x ŒX
2
p( x)
,
q( x)
where p and q are probability distribution functions and X is a finite set.
When X = ⺢, p and q are probability density functions, and
S[ p( x), q(c) x Œ⺢] = Ú p( x) log 2
⺢
p( x)
dx.
q( x)
Shannon entropy. The functional S[ p( x) x Œ X ] = - p( x) log 2 p( x), where
x ŒX
p is a probability distribution function on a finite set X. This functional measures the amount of uncertainty associated with function p.
Standard complement of fuzzy set A. Fuzzy set whose membership function
is defined by 1 - A(x) for all x Œ X.
Standard intersection of fuzzy sets A and B. Fuzzy set whose membership
function is defined by min{A(x), B(x)} for all x Œ X.
Standard union of fuzzy sets A and B. Fuzzy set whose membership function
is defined by max {A(x), B(x)} for all x ŒX.
Strong a-cut of fuzzy set A. Crisp set a+A = {x | A(x) ≥ a}.
a-cut of fuzzy set A. Crisp set aA = {x | A(x) ≥ a}.
Subadditive measure. A monotone measure m on ·X, CÒ for which m(A » B)
£ m(A) + m(B) whenever A, B, A » B Œ C.
Sugeno l-measure. A monotone measure lm on ·X, P(X)Ò such that
l
m ( A » B) = l m ( A) + l m (B) + ll m ( A)m (B)
for any disjoint subsets A, B of X, where l Œ (-1, •).
Superadditive function. A monotone measure on ·X, CÒ for which m(A » B)
≥ m(A) + m(B) whenever A, B, A » B Œ C.
Triangular conorm (t-conorm). A function u : [0, 1]2 Æ [0, 1] that is commutative, associative, monotone nondecreasing, and such that u(a, 0) = a for
all a Œ [0, 1]. Triangular conorms qualify as union operations of fuzzy sets.
Triangular norm (t-norm). A function i : [0, 1]2 Æ [0, 1] that is commutative,
associative, monotone nondecreasing, and such that i(a, 1) = a for all a Œ
[0, 1]. Triangular norms qualify as intersection operations of fuzzy sets.
Uncertainty-based information. The difference between a priori uncertainty
and a reduced a posteriori uncertainty.
Universal set. The collection of all objects that are of interest in a given
application.
454
APPENDIX F. GLOSSARY OF KEY CONCEPTS
Upper probability function. For any given set D of probability distribution
functions p on a finite set X, upper probability function Dm̄ is defined for all
A 債 X by the formula
D
m ( A) = sup  p( x).
p ŒD x ŒA
G
APPENDIX
GLOSSARY OF SYMBOLS
GENERAL SYMBOLS
{x, y, . . .}
{x | p(x)}
·x1, x2, . . . , xnÒ
[xij]
[x1, x2, . . . , xn]
[a, b]
[a, b), (b, a]
(a, b)
[a, •]
A, B, C, . . .
X
Ø
x ŒA
cA
A (x)
A=B
AπB
A-B
A債B
AÃB
SUB(A, B)
P(X)
F(X)
|A|
hA
Set of elements x, y, . . .
Set determined by property p
n tuple
Matrix
Vector
Closed interval of real numbers between a and b
Interval of real numbers closed in a and open in b
Open interval of real numbers
Set of real numbers greater than or equal to a
Arbitrary sets (crisp or fuzzy)
Universal set (universe of discourse)
Empty set
Element x belongs to crisp set A
Characteristic function of crisp set A
Membership grade of x in fuzzy set A
Equality of sets (crisp or fuzzy)
Inequality of sets (crisp or fuzzy)
Difference of crisp sets (A - B = {x | x Œ A and x 僆 A})
Set inclusion of crisp or fuzzy sets
Proper set inclusion (A 債 B and A π B) of crisp or fuzzy
sets
Degree of subsethood of A in B
Set of all crisp subsets of X (power set)
Set of all fuzzy subsets of X (fuzzy power set)
Cardinality of crisp or fuzzy set A (sigma count)
Height of fuzzy set A
Uncertainty and Information: Foundations of Generalized Information Theory, by George J. Klir
© 2006 by John Wiley & Sons, Inc.
455
456
APPENDIX G. GLOSSARY OF SYMBOLS
Ā
A«B
A»B
A¥B
A2
[a, b]2
XÆY
R(X, Y)
R°Q
R*Q
R-1
<
£
xŸy
x⁄y
x|y
x fi y
x¤y
"
$
S
P
max{a1, a2, . . . , an}
min{a1, a2, . . . , an}
i, j, k
I, J, K
⺞
⺞n
⺢
⺢+
⺪
p(A)
Complement of crisp set A
Intersection of crisp sets
Union of crisp sets
Cartesian product of crisp sets A and B
Cartesian product A ¥ A
Cartesian product [a, b] ¥ [a, b]
Function from X to Y
Relation on X ¥ Y (crisp or fuzzy)
Standard composition of relations R and Q (crisp or
fuzzy)
Standard join of relations R and Q (crisp or fuzzy)
Inverse of a binary relation (crisp or fuzzy)
Less than
Less than or equal to (also used for a partial ordering)
Meet (greatest lower bound) of x and y in a lattice or
logic conjunction
Join (least upper bound) of x and y in a lattice or logic
disjunction
x given y
x implies y
x if and only if y
For all (universal quantifier)
There exists (existential quantifier)
Summation
Product
Maximum of {a1, a2, . . . , an}
Minimum of {a1, a2, . . . , an}
Arbitrary identifiers (indices)
General sets of identifiers
Set of positive integers (natural numbers)
Set {1, 2, . . . , n}
Set of all real numbers
Set of nonnegative real numbers
Set of all integers
Partition of crisp set A
SPECIAL SYMBOLS
AU
a
A
a+
A
Bel
c
Aggregate uncertainty in Dempster–Shafer theory
a-Cut of fuzzy set A for some value a Œ [0, 1]
Strong a-cut of fuzzy set A for some value a Œ [0, 1]
Belief measure
Fuzzy complement
APPENDIX G. GLOSSARY OF SYMBOLS
(C)兰Af dm
d(A)
D
F
·F, mÒ
i
m
Nec
NecF
Pl
Pos
PosF
pX, pY
r
u
f(A)
GH
GS
h
H
HL
S
S
S̄
T
TU
a
TU
U
m
l
m
m
m̄
457
Choquet integral of function f with respect to monotone
measure m
Defuzzified value of fuzzy set A
Convex set of probability distributions (credal set)
Set of focal elements in evidence theory
Body of evidence
Fuzzy intersection or t-norm
Möbius representation of a lower probability function
Necessity measure
Necessity measure associated with a fuzzy proposition “V
is F”
Plausibility measure
Possibility measure
Possibility measure associated with a fuzzy proposition
“V is F”
Marginal probability distributions
Possibility profile
Fuzzy union or t-conorm
Measure of fuzziness of fuzzy set A
Generalized Hartley measure
Generalized Shannon entropy
Averaging operation
Hartley measure
Hartley-like measure (for convex subsets of ⺢n)
Shannon entropy
Minimal value of Shannon entropy within a given convex
set of probability distributions (a generalized Shannon
entropy)
Aggregate uncertainty for convex sets of probability
distributions
Information transmission
Disaggregated total uncertainty TU = ·GH, S̄ - GHÒ
Alternative disaggregated total uncertainty aTU = ·S̄ - S,
SÒ
U-Uncertainty (measure of nonspecificity in the theory
of graded possibilities)
Monotone measure
Sugeno l-measure
Lower probability function
Upper probability function
BIBLIOGRAPHY
Abellán, J., and Moral, S. [1999], “Completing a total uncertainty measure in the
Dempster–Shafer theory.” International Journal of General Systems, 28(4–5),
pp. 299–314.
Abellán, J., and Moral, S. [2000], “A non-specificity measure for convex sets of probability distributions.” International Journal of Uncertainty, Fuzziness, and KnowledgeBased Systems, 8(3), pp. 357–367.
Abellán, J., and Moral, S. [2003a], “Maximum of entropy for credal sets.” International Journal of Uncertainty, Fuzziness, and Knowledge-Based Systems, 11(5),
pp. 587–597.
Abellán, J., and Moral, S. [2003b], “Building classification trees using the total
uncertainty criterion.” International Journal of Intelligent Systems, 18(12),
pp. 1215–1225.
Abellán, J., and Moral, S. [2004], “Range of entropy for credal sets.” In M. Lopez-Díaz
et al. (eds.), Soft Methodology and Random Information Systems. Springer, Berlin
and Heidelberg, pp. 157–164.
Abellán, J., and Moral, S. [2005], “Difference of entropies as a non-specificity function
on credal sets.” International Journal of General Systems, 34(3), pp. 203–217.
Aczél, J. [1966], Lectures on Functional Equations and Their Applications. Academic
Press, New York.
Aczél, J. [1984], “Measuring information beyond communication theory.” Information
Processing & Management, 20(3), pp. 383–395.
Aczél, J., and Daróczy, Z. [1975], On Measures of Information and Their Characterizations. Academic Press, New York.
Aczél, J.; Forte, B.; and Ng, C. T. [1974], “Why the Shannon and Hartley entropies are
‘natural’.” Advances in Applied Probability, 6, pp. 131–146.
Alefeld, G., and Herzberger, J. [1983], Introduction to Interval Computation. Academic
Press, New York.
Applebaum, D. [1996], Probability and Information: An Integrated Approach. Cambridge University Press, Cambridge and New York.
Arbib, M. A., and Manes, E. G. [1975], “A category-theoretic approach to systems in a
fuzzy world.” Synthese, 30, pp. 381–406.
Uncertainty and Information: Foundations of Generalized Information Theory, by George J. Klir
© 2006 by John Wiley & Sons, Inc.
458
BIBLIOGRAPHY
459
Ash, R. B. [1965], Information Theory. Interscience, New York (reprinted by Dover,
New York, 1990).
Ashby, W. R. [1964], “Constraint analysis of many-dimensional relations.” General
Systems Yearbook, 9, pp. 99–105.
Ashby, W. R. [1965], “Measuring the internal informational exchange in a system.”
Cybernetica, 8(1), pp. 5–22.
Ashby, W. R. [1969], “Two tables of identities governing information flows within large
systems.” ASC Communications, 1(2), pp. 3–8.
Ashby, W. R. [1970], “Information flows within coordinated systems.” In J. Rose (ed.),
Progress in Cybernetics, Vol. 1. Gordon and Breach, London, pp. 57–64.
Ashby, W. R. [1972], “Systems and their informational measures.” In G. J. Klir (ed.),
Trends in General Systems Theory. Wiley-Interscience, New York, pp. 78–97.
Atanassov, K. T. [1986], “Intuitionistic fuzzy sets.” Fuzzy Sets and Systems, 20(1), pp.
87–96.
Atanassov, K. T. [2000], Intuitionistic Fuzzy Sets. Springer-Verlag, New York.
Attneave, F. [1959], Applications of Information Theory to Psychology. Holt, Rinehart
& Winston, New York.
Aubin, J. P., and Frankowska, H. [1990], Set-Valued Analysis. Birkhäuser, Boston.
Auman, R. J., and Shapley, L. S. [1974], Values of Non-Atomic Games. Princeton University Press, Princeton, NJ.
Avgers, T. G. [1983], “Axiomatic derivation of the mutual information principle as a
method of inductive inference.” Kybernetes, 12(2), pp. 107–113.
Babuška, R. [1998], Fuzzy Modeling for Control. Kluwer, Boston.
Ban, A. I., and Gal, S. G. [2002] Defects of Properties in Mathematics. World Scientific,
Singapore.
Bandler, W., and Kohout, L. J. [1980a], “Fuzzy power set and fuzzy implication operators.” Fuzzy Sets and Systems, 4(1), pp. 13–30.
Bandler, W., and Kohout, L. J. [1980b], “Semantics of implication operators and fuzzy
relational products.” International Journal of Man-Machine Studies, 12(1), pp. 89–116.
Bandler, W., and Kohout, L. J. [1988], “Special properties, closures and interiors of crisp
and fuzzy relations.” Fuzzy Sets and Systems, 26(3), pp. 317–331.
Bandler, W., and Kohout, L. J. [1993], “Cuts commute with closures.” In: R. Lowen and
M. Roubens (eds.), Fuzzy Logic: State of the Art. Kluwer, Dordrecht and Boston,
pp. 161–167.
Banon, G. [1981], “Distinction between several subsets of fuzzy measures.” Fuzzy Sets
and Systems, 5(3), pp. 291–305.
Bárdossy, G., and Fodor, J. [2004], Evaluation of Uncertainties and Risks in Geology.
Springer, Berlin and Heidelberg.
Batten, D. F. [1983], Spatial Analysis of Interacting Economics. Kluwer-Nighoff, Boston.
Bell, D. A. [1953], Information Theory and Its Engineering Applications. Pitman, New
York.
Bell, D. A.; Guan, J. W.; and Shapcott, C. M. [1998], “Using the Dempster-Shafer orthodonal sum for reasoning which involves space.” Kybernetes, 27(5), pp. 511–526.
Bellman, R. [1961], Adaptive Control Processes: A Guided Tour. Princeton University
Press, Princeton, NJ.
460
BIBLIOGRAPHY
Bĕlohlávek, R. [2002], Fuzzy Relational Systems. Kluwer/Plenum, New York.
Bĕlohlávek, R. [2003], “Cutlike semantics for fuzzy logic and its applications.” International Journal of General Systems, 32(4), pp. 305–319.
Ben-Haim, Y. [2001], Information-Gap Decision Theory. Academic Press, San Diego.
Benvenuti, P., and Mesiar, R. [2000],“Integrals with respect to a general fuzzy measure.”
In M. Grabish et al. (eds.), Fuzzy Measures and Integrals. Springer-Verlag, New
York, pp. 203–232.
Bezdek, J. C.; Dubois, D.; and Prade, H. (eds) [1999], Fuzzy Sets in Approximate Reasoning and Information Systems. Kluwer, Boston.
Bharathi-Devi, B., and Sarma, V. V. S. [1985], “Estimation of fuzzy memberships from
histograms.” Information Sciences, 35(1), pp. 43–59.
Bhattacharya, P. [2000], “On the Dempster-Shafer evidence theory and nonhierarchical aggregation of belief structures.” IEEE Transactions on Systems, Man,
and Cybernetics, 30(5), pp. 526–536.
Billingsley, P. [1965], Ergodic Theory and Information. John Wiley, New York.
Billingsley, P. [1986], Probability and Measure. John Wiley, New York.
Billot, A. [1992], Economic Theory of Fuzzy Equilibria: An Axiomatic Analysis.
Springer-Verlag, New York.
Black, M. [1937], “Vagueness: an exercise in logical analysis.” Philosophy of Science, 4,
pp. 427–455 (reprinted in International Journal of General Systems, 17(2–3), 1990,
pp. 107–128).
Black, P. K. [1997], “Geometric structure of lower probabilities.” In J. Goutsias, R. P. S.
Mahler, and H. T. Nguyen (eds.), Random Sets. Springer, New York, pp. 361–383.
Blahut, R. E. [1987], Principles and Practice of Information Theory. Addison-Wesley,
Reading, MA.
Boekee, D. E., and Van Der Lubbe, J. C. A. [1980], “The R-norm information measure.”
Information and Control, 45(2), pp. 136–155.
Bolc, L., and Borowic, P. [1992], Many-Valued Logics: Theoretical Foundations.
Springer-Verlag, Berlin and Heidelberg.
Bordley, R. F. [1983], “A central principle of science: optimization.” Behavioral Science,
28(1), pp. 53–64.
Borgelt, C., and Kruse, R. [2002], Graphical Models: Methods for Data Analysis and
Mining. John Wiley, New York.
Brillouin, L. [1956], Science and Information Theory. Academic Press, New York.
Brillouin, L. [1964], Scientific Uncertainty and Information. Academic Press, New York.
Broekstra, G. [1976–77], “Constraint analysis and structure identification.” Annals of
Systems Research I, 5, pp. 67–80; II, 6, 1–20.
Broekstra, G. [1980], “On the foundation of GIT (General Information Theory).”
Cybernetics and Systems, 11, pp. 143–165.
Buck, B., and Macaulay, V. A. [1991], Maximum Entropy in Action. Oxford University
Press, New York.
Buckley, J. J. [2003], Fuzzy Probabilities. Physica-Verlag/Springer-Verlag, Heidelberg
and New York.
Cai, K. Y. [1996], Introduction to Fuzzy Reliability. Kluwer, Boston.
BIBLIOGRAPHY
461
Cano, A., and Moral, S. [2000], “Algorithms for imprecise probabilities.” In J. Kohlas
and S. Moral (eds.), Algorithms for Uncertainty and Defeasible Reasoning. Kluwer,
Dordrecht and Boston, pp. 369–420.
Caratheodory, C. [1963], Algebraic Theory of Measure and Integration. Chelsea, New
York.
Carlsson, C.; Fedrizzi, M.; and Fuller, R. [2004], Fuzzy Logic in Management. Kluwer,
Boston.
Cavallo, R. E., and Klir, G. J. [1979], “Reconstructability analysis of multi-dimensional
relations.” International Journal of General Systems, 5(3), pp. 143–171.
Cavallo, R. E., and Klir, G. J. [1981], “Reconstructability analysis: evaluation of reconstruction hypotheses.” International Journal of General Systems, 7(1), pp. 7–32.
Cavallo, R. E., and Klir, G. J. [1982a], “Decision making in reconstructability analysis.”
International Journal of General Systems, 8(4), pp. 243–255.
Cavallo, R. E., and Klir, G. J. [1982b],“Reconstruction of possibilistic behavior systems.”
Fuzzy Sets and Systems, 8(2), pp. 175–197.
Chaitin, G. J. [1987], Information, Randomness, and Incompleteness: Papers on Algorithmic Information Theory. World Scientific, Singapore.
Chameau, J. L., and Santamarina, J. C. [1987], “Membership functions I, II.” International Journal of Approximate Reasoning, 1(3), pp. 287–301, 303–317.
Chateauneuf, A., and Jaffray, J. Y. [1989], “Some characterizations of lower probabilities and other monotone capacities through the use of Möbius inversion.” Mathematical Social Sciences, 17, pp. 263–283.
Chau, C. W. R.; Lingras, P.; and Wong, S. K. M. [1993], “Upper and lower entropies of
belief functions using compatible probability functions.” In J. Komorovski and Z. W.
Ras (eds.), Methodologies for Intelligent Systems. Springer-Verlag, New York, pp.
303–315.
Chellas, B. F. [1980], Modal Logic: An Introduction. Cambridge University Press,
Cambridge and New York.
Cherry, C. [1957], On Human Communication. MIT Press, Cambridge, MA.
Chokr, B. A., and Kreinovich V. Y. [1991], “How far are we from the complete knowledge? Complexity of knowledge acquisition in the Dempster-Shafer approach.”
Proc. of the Fourth Univ. of New Brunswick Artificial Intelligence Workshop,
Fredericton, N.B. Canada, pp. 551–561.
Chokr, B. A., and Kreinovich, V. [1994], “How far are we from the complete knowledge? Complexity of knowledge acquisition in the Dempster-Shafer approach.” In
R. R. Yager, M. Federizzi, and J. Kacprzyk (eds.), Advances in the Dempster-Shafer
Theory of Evidence. John Wiley, New York, pp. 555–576.
Choquet, G. [1953–54], “Theory of capacities.” Annales de L’Institut Fourier, 5, pp.
131–295.
Christensen, R. [1980–81], Entropy Minimax Sourcebook. Entropy Limited, Lincoln,
MA.
Christensen, R. [1985],“Entropy minimax multivariate statistical modeling—I:Theory.”
International Journal of General Systems, 11(3), pp. 231–277.
Christensen, R. [1986], “Entropy minimax multivariate statistical modeling—II: Applications.” International Journal of General Systems, 12(3), pp. 227–305.
462
BIBLIOGRAPHY
Clarke, M.; Kruse, R.; and Moral, S. (eds.) [1993], Symbolic and Quantitative Approaches
to Reasoning and Uncertainty. Springer-Verlag, Berlin and New York.
Coletti, G.; Dubois, D.; and Scozzafava, R. (eds.) [1995], Mathematical Models for Handling Partial Knowledge in Artificial Intelligence. Plenum Press, New York and
London.
Coletti, G., and Scozzafava, R. [2002], Probabilistic Logic in a Coherent Setting. Kluwer,
Boston.
Colyvan, M. [2004], “The philosophical significance of Cox’s theorem.” International
Journal of Approximate Reasoning, 37(1), pp. 71–85.
Conant, R. C. [1976], “Laws of information which govern systems.” IEEE Transactions
on Systems, Man, and Cybernetics, 6(4), pp. 240–255.
Conant, R. C. [1981], “Efficient proofs of identities in N-dimensional information
theory.” Cybernetica, 24(3), pp. 191–197.
Conant, R. C. [1988], “Extended dependency analysis of large systems—Part I:
Dynamic analysis; Part II: Static analysis.” International Journal of General Systems,
14(2), pp. 97–141.
Cordón, O.; Herrera, F.; Hoffmann, F.; and Magdalena, L. [2001], Genetic Fuzzy
Systems: Evolutionary Tuning and Learning of Fuzzy Knowledge Bases. World Scientific, Singapore.
Cover, T. M., and Thomas, J. A. [1991], Elements of Information Theory. John Wiley,
New York.
Cox, R. T. [1946], “Probability, frequency, and reasonable expectation.” American
Journal of Physics, 14(1), pp. 1–13.
Cox, R. T. [1961], The Algebra of Probable Inference. Johns Hopkins Press, Baltimore.
Csiszár, I., and Körner, J. [1981], Information Theory: Coding Theorems for Discrete
Memoryless Systems. Academic Press, New York.
Daróczy, Z. [1970], “Generalized information functions.” Information and Control,
16(1), pp. 36–51.
De Baets, B., and Kerre, E. E. [1994], “A primer on solving fuzzy relational equations
on the unit interval.” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2(2), pp. 205–225.
De Bono, E. [1991], I Am Right, You Are Wrong: From Rock Logic to Water Logic.
Viking Penguin, New York.
De Campos, L. M., and Bolaños, M. J. [1989], “Representation of fuzzy measures
through probabilities.” Fuzzy Sets and Systems, 31(1), pp. 23–36.
De Campos, L. M., and Bolaños, M. J. [1992], “Characterization and comparison of
Sugeno and Choquet integrals.” Fuzzy Sets and Systems, 52(1), pp. 61–67.
De Campos, L. M., and Huete, J. F. [1993], “Independence concepts in upper and lower
probabilities.” In B. Bouchon-Meunier, L. Velverde, and R. R. Yager (eds.), Uncertainty in Intelligent Systems. North-Holland, Amsterdam, pp. 85–96.
De Campos, L. M., and Huete, J. F. [1999], “Independence concepts in possibility
theory.” Fuzzy Sets and Systems, 103(1&3), pp. 127–152 & 487–505.
De Campos, L. M.; Huete, J. F.; and Moral, S. [1994], “Probability intervals: a tool
for uncertain reasoning.” International Journal of Uncertainty, Fuzziness and
Knowledge-Based Systems, 2(2), pp. 167–196.
BIBLIOGRAPHY
463
De Campos, L. M.; Lamata, M. T.; and Moral, S. [1990], “The concept of conditional
fuzzy measure.” International Journal of Intelligent Systems, 5(3), pp. 237–246.
De Cooman, G. [1997], “Possibility theory—I, II, III.” International Journal of General
Systems, 25(4), pp. 291–371.
De Cooman, G.; Ruan, D.; and Kerre, E. E. (eds.) [1995], Foundations and Applications
of Possibility Theory. World Scientific, Singapore.
De Finetti, B. [1974], Theory of Probability (Vol. 1). John Wiley, New York and London.
De Finetti, B. [1975], Theory of Probability, Vol. 2. John Wiley, New York and London.
De Luca, A., and Termini, S. [1972], “A definition of a nonprobabilistic entropy in the
setting of fuzzy sets theory.” Information and Control, 20(4), pp. 301–312.
De Luca, A., and Termini, S. [1974], “Entropy of L-fuzzy sets.” Information and Control,
24(1), pp. 55–73.
Delgado, M., and Moral, S. [1989], “Upper and lower fuzzy measures.” Fuzzy Sets and
Systems, 33(2), pp. 191–200.
Delmotte, F. [2001], “Comparison of the performances of decision aimed algorithms
with Bayesian and belief bases.” International Journal of Intelligent Systems, 16(8),
pp. 963–981.
Dembo, A.; Cover, T. C.; and Thomas, J. A. [1991], “Information theoretic inequalities.”
IEEE Transactions on Information Theory, 37(6), pp. 1501–1518.
Demicco, R. V., and Klir, G. J. [2003], Fuzzy Logic in Geology. Academic Press, San
Diego.
Dempster, A. P. [1967a], “Upper and lower probabilities induced by a multivalued
mapping.” Annals of Mathematical Statistics, 38(2), pp. 325–339.
Dempster, A. P. [1967b], “Upper and lower probability inferences based on a sample
from a finite univariate population.” Biometrika, 54, pp. 515–528.
Dempster, A. P. [1968a], “A generalization of Bayesian inference.” Journal of the Royal
Statistical Society, Ser. B, 30, pp. 205–247.
Dempster, A. P. [1968b], “Upper and lower probabilities generated by a random closed
interval.” Annals of Mathematical Statistics, 39, pp. 957–966.
Denneberg, D. [1994], Non-Additive Measure and Integral. Kluwer, Boston.
Devlin, K. [1991], Logic and Information. Cambridge University Press, Cambridge and
New York.
Di Nola, A.; Sessa, S.; Pedrycz, W.; and Sanches, E. [1989], Fuzzy Relation Equations
and Their Applications to Knowledge Engineering. Kluwer, Dordrecht.
Dockx, S., and Bernays, P. (eds), [1965], Information and Prediction in Science. Academic Press, New York.
Dretske, F. I. [1981], Knowledge and the Flow of Information. MIT Press, Cambridge,
MA.
Dretske, F. I. [1983], “Precis of knowledge and the flow of information.” Behavioral and
Brain Sciences, 6, pp. 55–90.
Dubois, D.; Lang, J.; and Prade, H. [1994], “Possibilistic logic.” In D. M. Gabbay, et al.
(eds.), Handbook of Logic in Artificial Intelligence and Logic Programming, Vol. 3.
Clarendon Press, Oxford, pp. 439–513.
Dubois, D.; Moral, S.; and Prade, H. [1997], “A semantics for possibility theory based
on likelihoods.” Journal of Mathematical Analysis and Applications, 205, pp. 359–380.
464
BIBLIOGRAPHY
Dubois, D.; Nguyen, H. T.; and Prade, H. [2000] “Possibility theory, probability theory
and fuzzy sets: misunderstandings, bridges and gaps.” In D. Dubois and H. Prade
(ed.), Fundamentals of Fuzzy Sets. Kluwer, Boston, pp. 343–438.
Dubois, D., and Prade, H. [1978], “Operations on fuzzy numbers.” International Journal
of Systems Science, 9(6), pp. 613–626.
Dubois, D., and Prade, H. [1979], “Fuzzy real algebra: some results.” Fuzzy Sets and
Systems, 2(4), pp. 327–348.
Dubois, D., and Prade, H. [1980], Fuzzy Sets and Systems: Theory and Applications.
Academic Press, New York.
Dubois, D., and Prade, H. [1982a], “Towards fuzzy differential calculus.” Fuzzy Sets and
Systems, 8(1), pp. 1–17; 8(2), 105–116; 8(3), 225–233.
Dubois, D., and Prade, H. [1982b], “A class of fuzzy measures based on triangular
norms.” International Journal of General Systems, 8(1), pp. 43–61.
Dubois, D., and Prade, H. [1985a], “Evidence measures based on fuzzy information.”
Automatica, 21, pp. 547–562.
Dubois, D., and Prade, H. [1985b], “A review of fuzzy set aggregation connectives.”
Information Sciences, 36(1–2), pp. 85–121.
Dubois, D., and Prade H. [1985c], “A note on measures of specificity for fuzzy sets.”
International Journal of General Systems, 10(4), pp. 279–283.
Dubois, D., and Prade, H. [1986a], “A set-theoretic view of belief functions.” International Journal of General Systems, 12(3), pp. 193–226.
Dubois, D., and Prade, H. [1986b], “On the unicity of Dempster rule of combination.”
International Journal of Intelligent Systems, 1(2), pp. 133–142.
Dubois, D., and Prade, H. [1987a], “Fuzzy numbers: an overview.” In J. C. Bezdek (ed.),
Analysis of Fuzzy Information—Vol. 1: Mathematics and Logic. CRC Press, Boca
Raton, FL, pp. 3–39.
Dubois, D., and Prade, H [1987b], “The principle of minimum specificity as a basis for
evidential reasoning.” In B. Bouchon and R. R. Yager (eds.), Uncertainty in Knowledge-Based Systems. Springer-Verlag, Berlin, pp. 75–84.
Dubois, D., and Prade, H. [1987c], “Two-fold fuzzy sets and rough sets: some issues in
knowledge representation.” Fuzzy Sets and Systems, 23(1), pp. 3–18.
Dubois, D., and Prade, H. [1987d], “Properties of measures of information in evidence
and possibility theories.” Fuzzy Sets and Systems, 24(2), pp. 161–182.
Dubois, D., and Prade, H. [1988a], Possibility Theory. Plenum Press, New York.
Dubois, D., and Prade, H. [1988b], “Modelling uncertainty and inductive inference: a
survey of recent non-additive probability systems.” Acta Psychologica, 68, pp. 53–78.
Dubois, D., and Prade, H. [1990a], “Consonant approximations of belief functions.”
International Journal of Approximate Reasoning, 4(5–6), pp. 419–449.
Dubois, D., and Prade, H. [1990b], “Rough fuzzy sets and fuzzy rough sets.” International Journal of General Systems, 17(2–3), pp. 191–209.
Dubois, D., and Prade, H. [1991], “Fuzzy sets in approximate reasoning.” Fuzzy Sets
and Systems, 40(1), pp. 143–244.
Dubois, D., and Prade, H. [1992a], “On the combination of evidence in various mathematical frameworks.” In J. Flamm and T. Luisi (eds.), Reliability Data Collection and
Analysis. Kluwer, Dordrecht and Boston, pp. 213–241.
BIBLIOGRAPHY
465
Dubois, D., and Prade, H. [1992b], “Putting rough sets and fuzzy sets together.” In
R. Slowinski (ed.), Intelligent Decision Support. Kluwer, Boston, pp. 203–232.
Dubois, D., and Prade, H. [1992c], “Evidence, knowledge, and belief functions.” International Journal of Approximate Reasoning, 6(3), pp. 295–319.
Dubois, D., and Prade, H. [1994], “A survey of belief revision and updating rules in
various uncertainty models.” International Journal of Intelligent Systems, 9(1), pp.
61–100.
Dubois, D., and Prade, H. [1998], “Possibility Theory: Qualitative and Quantitative
Aspects.” In D. M. Gabbay and P. Smets [1998–], Handbook of Defeasible Reasoning and Uncertainty Management Systems, Kluwer, Dordrecht, Boston, and London,
Vol. 1, pp. 169–226.
Dubois, D., and Prade, H. (eds.) [1998-], The Handbooks of Fuzzy Sets Series. Kluwer,
Boston.
Dubois, D., and Prade, H. (eds.) [2000], Fundamentals of Fuzzy Sets. Kluwer, Boston.
Dvořák, A. [1999], “On Linguistic approximation in the frame of fuzzy logic deduction.” Soft Computing, 3(2), pp. 111–115.
Ebanks, B.; Sahoo, P.; and Sanders, W. [1997], Characterizations of Information Measures. World Scientific, Singapore.
Eckschlager, K. [1979], Information Theory as Applied to Chemical Analysis. John
Wiley, New York.
Elsasser, W. M. [1937], “On quantum measurements and the role of the uncertainty
relations in statistical mechanics.” Physical Review, 52, pp. 987–999.
Fagin, J., and Halpern, J. Y. [1991], “A new approach to updating beliefs.” In P. P.
Bonissone et al. (eds.), Uncertainty in Artificial Intelligence 6. North-Holland,
Amsterdam and New York, pp. 347–374.
Fast, J. D. [1962], Entropy. Gordon and Breach, New York.
Feinstein, A. [1958], Foundations of Information Theory. McGraw-Hill, New York.
Feller, W. [1950], An Introduction to Probability Theory and Its Applications, Vol. I.
John Wiley, New York.
Feller, W. [1966], An Introduction to Probability Theory and Its Applications, Vol. II.
John Wiley, New York.
Fellin, W. et al. [2005], Analyzing Uncertainty in Civil Engineering. Springer, Berlin and
Heidelberg.
Ferdinand, A. E. [1974], “A theory of system complexity.” International Journal of
General Systems, 1(1), pp. 19–35.
Ferson, S. [2002], RAMAS Risk Calc 4.0 Software: Risk Assessment with Uncertainty
Numbers. Lewis Publishers, Boca Raton, FL.
Ferson, S.; Ginzburg, L.; Kreinovich, V.; Myers, D.; and Sentz, K. [2003], Constructing
Probability Boxes and Dempster–Shafer Structures. Sandia National Laboratory,
SAND 2002–4015, Albuquerque, NM.
Ferson, S., and Hajagos, J. G. [2004], “Arithmetic with uncertainty numbers: rigorous
and (often) best possible answers.” Reliability and Systems Safety, 85 (1–3), pp.
135–152.
Fine, T. L. [1973], Theories of Probability: An Examination of Foundations. Academic
Press, New York.
466
BIBLIOGRAPHY
Fisher, R. A. [1950], Contributions to Mathematical Statistics. John Wiley, New York.
Forte, B. [1975], “Why the Shannon entropy?” Symposia Mathematica, 15, pp. 137–152.
Frieden, B. R. [1998], Physics from Fisher Information: A Unification. Cambridge University Press, Cambridge and New York.
Gabbay, D. M., and Smets, P. (eds.) [1998-], Handbook of Defeasible Reasoning and
Uncertainty Management Systems. Kluwer, Dordrecht, Boston, and London.
Gaines, B. R. [1978], “Fuzzy and Probability Uncertainty Logics.” Information and
Control, 38, pp. 154–169.
Garner, W. R. [1962], Uncertainty and Structure as Psychological Concepts. John Wiley,
New York.
Gatlin, L. L. [1972], Information Theory and the Living System. Columbia University
Press, New York.
Geer, J. F., and Klir, G. J. [1991], “Discord in possibility theory.” International Journal
of General Systems, 19(2), pp. 119–132.
Geer, J. F., and Klir, G. J. [1992], “A mathematical analysis of information preserving
transformations between probabilistic and possibilistic formulations of uncertainty.”
International Journal of General Systems, 20(2), pp. 143–176.
Georgescu-Roegen, N. [1971], The Entropy Law and the Economic Process. Harvard
University Press, Cambridge, MA.
Gerla, G. [2001], Fuzzy Logic: Mathematical Tools for Approximate Reasoning. Kluwer,
Boston.
Giachetti, R. E., and Young, R. E. [1997], “A parametric representation of fuzzy
numbers and their arithmetic operators.” Fuzzy Sets and Systems, 91(2), pp. 185–202.
Gibbs, J. W. [1902], Elementary Principles in Statistical Mechanics. Yale University Press,
New Haven (reprinted by Ox Bow Press, Woodbridge, CT, 1981).
Gil, M. A. (ed.) [2001], “Special Issue on Fuzzy Random Variables.” Information Sciences, 133(1–2), pp. 1–100.
Glasersfeld, E. von [1995], Radical Constructivism: A Way of Knowing and Learning.
The Farmer Press, London.
Gnedenko, B. V. [1962], Theory of Probability. Chelsea, New York.
Godo, L., and Sandri, S. [2004],“Special Issue on Possibilistic Logic and Related Issues.”
Fuzzy Sets and Systems, 144(1), pp. 1–249.
Goguen, J. A. [1967], “L-fuzzy sets.” Journal of Mathematical Analysis and Applications,
18, pp. 145–174.
Goguen, J. A. [1968–69], “The logic of inexact concepts.” Synthese, 19, pp. 325–373.
Goguen, J. A. [1974], “Concept representation in natural and artificial languages:
axioms, extensions and applications for fuzzy sets.” International Journal of ManMachine Studies, 6(5), pp. 513–561.
Goldman, S. [1953], Information Theory. Prentice Hall, Englewood Cliffs, NJ.
(Reprinted by Dover, New York, 1968.)
Good, I. J. [1950], Probability and the Weighting of Evidence. Hafner, New York, Charles
Griffin, London.
Good, I. J. [1962] “Subjective probability as the measure of non-measurable set.” In E.
Nagel (ed.), Logic, Methodology, and Philosophy of Science. Stanford University
Press, Stanford, CA.
BIBLIOGRAPHY
467
Good, I. J. [1983], Good Thinking. Minnesota University Press, Minneapolis.
Goodman, I. R., and Nguyen, H. T. [1985], Uncertainty Models for Knowledge-Based
Systems. North-Holland, New York.
Gottwald, S. [1979], “Set theory for fuzzy sets of higher level.” Fuzzy Sets and Systems,
2(2), pp. 125–151.
Gottwald, S. [1993], Fuzzy Sets and Fuzzy Logic. Verlag Vieweg, Wiesbaden.
Gottwald, S. [2001], A Treatise on Many-Valued Logics. Research Studies Press,
Baldock, UK.
Goutsias, J.; Mahler, R. P. S.; and Nguyen, H. T. (eds.), [1997], Random Sets: Theory and
Applications. Springer-Verlag, New York.
Grabisch, M. [1997a], “k-order additive discrete fuzzy measures and their representation.” Fuzzy Sets and Systems, 92(2), pp. 167–189.
Grabisch, M. [1997b], “Alternative representations of discrete fuzzy measures for decision making.” International Journal of Uncertainty, Fuzziness and Knowledge-Based
Systems, 5(5), pp. 587–607.
Grabisch, M. [1997c], “Fuzzy measures and integrals: a survey of applications and
recent issues.” In D. D. Dubois, H. Prade, and R. R. Yager (eds.), Fuzzy Information
Engineering. John Wiley, New York, pp. 507–529.
Grabisch, M. [2000], “The interaction and Möbius representations of fuzzy measures
on finite spaces, k-additive measures: a survey.” In M. Grabisch et al. (eds.), Fuzzy
Measures and Integrals: Theory and Applications. Springer-Verlag, New York, pp.
70–93.
Grabisch, M.; Murofushi, T.; and Sugeno, M. (eds.) [2000], Fuzzy Measures and Integrals: Theory and Applications. Springer-Verlag, New York.
Grabisch, M.; Nguyen H. T.; and Walker, E. A. [1995], Fundamentals of Uncertainty
Calculi With Applications to Fuzzy Inference. Kluwer, Dordrecht and Boston.
Gray, R. M. [1990], Entropy and Information Theory. Springer-Verlag, New York.
Greenberg, H. J. [2001], “Special Issue on Representations of Uncertainty.” Annals of
Mathematics and Artifical Intelligent, 32(1–4), pp. 1–431.
Guan, J. W., and Bell, D. A. [1991–92], Evidence Theory and Its Applications: Vol. 1
(1991), Vol. 2 (1992). North-Holland, New York.
Guiasu, S. [1977], Information Theory with Applications. McGraw-Hill, New York.
Hacking, I. [1975], The Emergence of Probability. Cambridge University Press, Cambridge and New York.
Hájek, P. [1993], “Deriving Dempster’s rule.” In L. Valverde and R. R. Yager (eds.),
Uncertainty in Intelligent Systems. North-Holland, Amsterdam, pp. 75–83.
Hájek, P. [1998], Metamathematics of Fuzzy Logic. Kluwer, Boston.
Halmos, P. R. [1950], Measure Theory. Van Nostrand, Princeton, NJ.
Halpern, J. Y. [1999], “A counterexample of theorems of Cox and Fine.” Journal of Artificial Intelligence Research, 10, pp. 67–85.
Halpern, J. Y. [2003], Reasoning about Uncertainty. MIT Press, Cambridge, MA.
Hamming, R. W. [1980], Coding and Information Theory. Prentice Hall, Engelwood
Cliffs, NJ.
Hansen, E. R. [1992], Global Optimization Using Interval Analysis. Marcel Dekker,
New York.
468
BIBLIOGRAPHY
Harmanec, D. [1995], “Toward a characterization of uncertainty measure for the
Dempster-Shafer theory.” In Proceedings of the Eleventh International Conference
on Uncertainty in Artificial Intelligence, Montreal, Canada, pp. 255–261.
Harmanec, D. [1996], Uncertainty in Dempster-Shafer Theory. Ph.D. Dissertation in
Systems Science, T.J. Watson School, Binghamton University—SUNY, Binghamton,
NY.
Harmanec, D. [1997], “A note on uncertainty, Dempster rule of combination, and conflict.” International Journal of General Systems, 26(1–2), pp. 63–72.
Harmanec, D., and Klir, G. J. [1994], “Measuring total uncertainty in Dempster-Shafer
theory: a novel approach.” International Journal of General Systems, 22(4), pp.
405–419.
Harmanec, D., and Klir, G. J. [1997], “On information-preserving transformations.”
International Journal of General Systems, 26(3), pp. 265–290.
Harmanec, D.; Klir, G. J.; and Resconi, G. [1994], “On a modal logic interpretation of
Dempster-Shafer theory of evidence.” International Journal of Intelligent Systems,
9(10), pp. 941–951.
Harmanec, D.; Klir, G. J.; and Wang, Z. [1996], “Modal logic interpretation of
Dempster-Shafer theory: An infinite case.” International Journal of Approximate
Reasoning, 14(2–3), pp. 81–93.
Harmanec, D.; Resconi, G.; Klir, G. J.; and Pan, Y. [1996], “On the computation of uncertainty measure in Dempster-Shafer theory.” International Journal of General
Systems, 25(2), pp. 153–163.
Harmuth, H. F. [1992], Information Theory Applied to Space-Time Physics. World Scientific, River Edge, NJ.
Hartley, R. V. L. [1928], “Transmission of information.” The Bell System Technical
Journal, 7(3), pp. 535–563.
Hawkins, T. [1975], Lebesgue’s Theory of Integration: Its Origins and Development.
Chelsea, New York.
Helton, J. C., and Oberkampf [2004], “Special Issue on Alternative Representations of
Epistemic Uncertainty.” Reliability Engineering & System Safety, 85(1–3), pp. 1–369.
Hernandez, E., and Recasens, J. [2004], “Indistinguishability relation in DempsterShafer theory of evidence.” International Journal of Approximate Reasoning, 37(3),
pp. 145–187.
Higashi, M., and Klir, G. J. [1982], “On measures of fuzziness and fuzzy complements.”
International Journal of General Systems, 8(3), pp. 169–180.
Higashi, M., and Klir, G. J. [1983a], “Measures of uncertainty and information based on
possibility distributions.” International Journal of General Systems, 9(1), pp. 43–58.
Higashi, M., and Klir, G. J. [1983b], “On the notion of distance representing information closeness: possibility and probability distributions.” International Journal of
General Systems, 9(2), pp. 103–115.
Higashi, M.; Klir, G. J.; and Pittarelli, M. A. [1984], “Reconstruction families of possibilistic structure systems.” Fuzzy Sets and Systems, 12(1), pp. 37–60.
Hirshleifer, J., and Riley, J. G. [1992], The Analytics of Uncertainty and Information.
Cambridge University Press, Cambridge, MA.
Hisdal, E. [1978], “Conditional possibilities, independence and noninteraction.” Fuzzy
Sets and Systems, 1(4), pp. 283–297.
BIBLIOGRAPHY
469
Höhle, U. [1982], “Entropy with respect to plausibility measures.” In Proceedings of the
12th IEEE International Symposium on Multiple-Valued Logic, Paris, pp. 167–169.
Höhle, U., and Klement, E. P. (eds.) [1995], Non-Classical Logics and Their Applications to Fuzzy Subsets. Kluwer, Boston.
Höhle, U., and Rodabaugh, S. E. (eds.) [1999], Mathematics of Fuzzy Sets: Logic, Topology, and Measure Theory. Kluwer, Boston.
Huber, P. J. [1981], Robust Statistics. John Wiley, New York.
Hughes, G. E., and Cresswell, M. J. [1996], A New Introduction to Modal Logic.
Routledge, London and New York.
Hyvärinen, L. P. [1968], Information Theory for Systems Engineers. Springer-Verlag,
New York.
Ihara, S. [1993], Information Theory for Continuous Systems. World Scientific,
Singapore.
Jaffray, J. Y. [1997], “On the maximum of conditional entropy for upper/lower probabilities generated by random sets.” In J. Goutsias, R. P. S. Mahler, and H. T. Nguyen
(eds.), Random Sets: Theory and Applications. Springer, New York, pp. 107–128.
Jaynes, E. T. [1968], “Prior probabilities.” IEEE Transactions on Systems Science and
Cybernetics, 4(3), pp. 227–241.
Jaynes, E. T. [1979], “Where do we stand on maximum entropy?” In R. L. Levine and
M. Tribus (eds.), The Maximum Entropy Formalism. MIT Press, Cambridge, Mass.,
pp. 15–118.
Jaynes, E. T. [2003], Probability Theory: The Logic of Science. Cambridge University
Press, Cambridge, MA.
Jeffreys, H. [1939], Theory of Probability. Oxford University Press, Oxford.
Jelinek, F. [1968], Probabilistic Information Theory: Discrete and Memoryless Models.
McGraw-Hill, New York.
Jiroušek, R.; Kleitner, G. D.; and Vejnarová, J. [2003], “Special Issue on Uncertainty
Processing.” Soft Computing, 7(5), pp. 279–368.
Jiroušek, R., and Vejnarová, J. [2003], “General framework for multidimensional
models.” International Journal of Intelligent Systems, 18(1), pp. 107–127.
John, R. [1998], “Type 2 fuzzy sets: an appraisal of theory and applications.” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 6(6), pp.
563–576.
Jones, B. [1982], “Determination of reconstruction families.” International Journal of
General Systems, 8(4), pp. 225–228.
Jones, B. [1985], “Reconstructability considerations with arbitrary data.” International
Journal of General Systems, 11(2), pp. 143–151.
Jones, B. [1986], “K-systems versus classical multivariate systems.” International Journal
of General Systems, 12(1), pp. 1–6.
Jones, D. S. [1979], Elementary Information Theory. Clarendon Press, Oxford.
Josang, A., and Mc Anally, D. [2005], “Multiplication and comultiplication of beliefs.”
International Journal of Approximate Reasoning, 38(1), pp. 19–51.
Joslyn, C. [1994], Possibilistic Processes for Complex Systems Modeling. Ph.D. Dissertation in Systems Science, T.J. Watson School, SUNY-Binghamton, Binghamton,
NY.
470
BIBLIOGRAPHY
Joslyn, C. [1997], “Measurement of possibilistic histograms from interval data.” International Journal of General Systems, 26(1–2), pp. 9–33.
Joslyn, C., and Klir, G. J. [1992], “Minimal information loss in possibilistic approximations of random sets.” Proceedings of the IEEE International Conference on Fuzzy
Systems, San Diego, pp. 1081–1088.
Jumarie, G. [1986], Subjectivity, Information Systems: Introduction to a Theory of Relativistic Cybernetics. Gordon and Breech, New York.
Jumarie, G. [1990], Relative Information: Theories and Applications. Springer-Verlag,
New York.
Kåhre, J. [2002], The Mathematical Theory of Information. Kluwer, Boston.
Kaleva, O. [1987], “Fuzzy differential equations.” Fuzzy Sets and Systems, 24(3), pp.
301–317.
Kandel, A. [1986], Fuzzy Mathematical Techniques with Applications. Addison-Wesley,
Reading, MA.
Kapur, J. N. [1983], “Twenty-five years of maximum entropy principle.” Journal of Mathematical Physical Sciences, 17, pp. 103–156.
Kapur, J. N. [1989], Maximum Entropy Models in Science and Engineering. John Wiley,
New York.
Kapur, J. N. [1994/1996], “Insight into Entropy Optimization Principles.” Mathematical
Science Trust Society, New Delhi, India: Vol. I, 1994; Vol. II, 1996. (Some parts are
also published in Bulletin of the Mathematical Association of India, 21, 1989, pp. 1–38;
22, 1990, pp. 1–42.)
Kapur, J. N. [1994], Measures of Information and Their Applications. John Wiley, New
York.
Kapur, J. N. [1997], Measures of Fuzzy Information. Mathematical Sciences Trust
Society, New Delhi.
Kapur, J. N.; Baciu, G.; and Kesavan, H. K. [1995], “The MinMax information measure.”
International Journal of Systems Science, 26(1), pp. 1–12.
Kapur, J. N., and Kesavan, H. K. [1987], The Generalized Maximum Entropy Principle.
Sadford Educational Press, Waterloo.
Kapur, J. N., and Kesavan, H. K. [1992], Entropy Optimization Principles with Applications. Academic Press, San Diego.
Karmeshu (ed.) [2003], Entropy Measures, Maximum Entropy Principle and Emerging
Applications. Springer-Verlag, Heidelberg and New York.
Kaufmann,A. [1975], Introduction to the Theory of Fuzzy Subsets.Academic Press, New
York.
Kaufmann, A., and Gupta, M. M. [1985], Introduction to Fuzzy Arithmetic: Theory and
Applications. Van Nostrand Reinhold, New York.
Kendall, D. G. [1973] Foundations of a Theory of Random Sets in Stochastic Geometry.
John Wiley, New York.
Kendall, D. G. [1974], “Foundations of a theory of random sets.” In E. F. Harding and
D. G. Kendall (eds.), Stochastic Geometry. John Wiley, New York, pp. 322–376.
Kern-Isberner, G. [1998], “Characterizing the principle of minimum cross-entropy
within conditional-logical framework.” Artificial Intelligence, 98(1–2), pp. 169–208.
BIBLIOGRAPHY
471
Kerre, E. E., and De Cock, M. [1999], “Linguistic modifiers: an overview.” In G. Chen,
M. Ying, and K. Y. Cai (eds.), Fuzzy Logic and Soft Computing. Kluwer, Boston.
Khinchin, A. I. [1957], Mathematical Foundations of Information Theory. Dover, New
York.
Kingman, J. F. C., and Taylor, S. J. [1966], Introduction to Measure and Probability.
Cambridge University Press, New York.
Kirschenmann, P. P. [1970], Information and Reflections. Humanities Press, New York.
Klawonn, F., and Schwecke, E. [1992], “On the axiomatic justification of Dempster’s rule of combination.” International Journal of Intelligent Systems, 7(5),
pp. 469–478.
Klement, E. P.; Mesiar, R.; and Pap, E. [2000], Triangular Norms. Kluwer, Dordrecht,
Boston, and London.
Klement, E. P.; Mesiar, R.; and Pap, E. [2004], “Triangular norms. Position paper I. Basic
analytical and algebraic properties.” Fuzzy Sets and Systems, 143(1), pp. 5–26.
Klir, G. J. [1976], “Identification of generative structures in empirical data.” International Journal of General Systems, 3(2), pp. 89–104.
Klir G. J. [1985], Architecture of Systems Problem Solving. Plenum Press, New York.
Klir, G. J. [1986a], “Reconstructability analysis: an offspring of Ashby’s constraint
theory.” Systems Research, 3(4), pp. 267–271.
Klir, G. J. [1986b], “The role of reconstructability analysis in social science research.”
Mathematical Social Sciences, 12, pp. 205–225.
Klir, G. J. (ed.) [1987], “Special Issue on Measures of Uncertainty.” Fuzzy Sets and
Systems, 24(2), pp. 139–254.
Klir, G. J. [1989a], “Is there more to uncertainty than some probability theorists might
have us believe?” International Journal of General Systems, 15(4), pp. 347–378.
Klir, G. J. [1989b], “Probability-possibility conversion.” Proceedings of the Third IFSA
Congress, Seattle, WA, pp. 408–411.
Klir, G. J. [1990a],“A principle of uncertainty and information invariance.” International
Journal of General Systems, 17(2–3), pp. 249–275.
Klir, G. J. [1990b], “Dynamic aspects in reconstructability analysis: the role of minimum
uncertainty principles.” Revue Internationale de Systemique, 4(1), pp. 33–43.
Klir, G. J. [1991], “Generalized information theory.” Fuzzy Sets and Systems, 40(1), pp.
127–142.
Klir, G. J. [1994], “Multivalued logics versus modal logics: alternative frameworks for
uncertainty modeling.” In P. P. Wang (ed.), Advances in Fuzzy Theory and Technology, Vol. II. Duke Univ., Durham, NC, pp. 3–47.
Klir, G. J. [1995], “Principles of uncertainty: What are they? Why do we need them?”
Fuzzy Sets and Systems, 74(1), pp. 15–31.
Klir, G. J. [1997a], “Fuzzy arithmetic with requisite constraints.” Fuzzy Sets and Systems,
91(2), pp. 165–175.
Klir, G. J. [1997b], “The role of constrained fuzzy arithmetic in engineering.” In B. M.
Ayyub (ed.), Uncertainty Analysis in Engineering and the Sciences. Kluwer, Boston,
pp. 1–19.
Klir, G. J. [1999], “On fuzzy-set interpretation of possibility theory.” Fuzzy Sets and
Systems, 108(3), pp. 263–273.
472
BIBLIOGRAPHY
Klir, G. J. [2000], Fuzzy Sets: Fundamentals, Applications, and Personal Views. Beijing
University Press, Beijing.
Klir, G. J. [2001],“Foundations of fuzzy set theory and fuzzy logic: a historical overview.”
International Journal of General Systems, 30(2), pp. 91–132.
Klir, G. J. [2002a], “Uncertainty in economics: the heritage of G.LS. Shackle.” Fuzzy
Economic Review, VII(2), pp. 3–21.
Klir, G. J. [2002b], “Basic issues of computing with granual probabilities.” In T. Y. Lin,
Y. Y. Yao, and L. A. Zadeh (eds.), Data Mining, Rough Sets and Granular Computing. Physica-Verlag/Springer-Verlag, Heidelberg and New York, pp. 339–349.
Klir, G. J. [2003], “An update on generalized information theory.” Proceedings of
ISIPTA ’03, Carleton Scientific, Lugano, Switzerland, pp. 321–334.
Klir, G. J. [2005], “Measuring uncertainty associated with convex sets of probability distributions: a new approach.” Proceedings of NAFIPS ’05, Ann Arbor, MI (on CD).
Klir, G. J., and Folger, T. A. [1988], Fuzzy Sets, Uncertainty and Information. Prentice
Hall, Englewood Cliffs, NJ.
Klir, G. J., and Harmanec, D. [1994], “On modal logic interpretation of possibility
theory.” International Journal of Uncertainty, Fuzziness, and Knowledge-Based
Systems, 2(2), pp. 237–245.
Klir, G. J., and Harmanec, D. [1995], “On some bridges to possibility theory.” In G.
De Cooman et al. (eds.) Foundations and Applications of Possibility Theory. World
Scientific, Singapore, pp. 3–19.
Klir, G. J., and Mariano, M. [1987],“On the uniqueness of possibilistic measure of uncertainty and information.” Fuzzy Sets and Systems, 24(2), pp. 197–219.
Klir, G. J., and Pan, Y. [1998], “Constrained fuzzy arithmetic: basic questions and some
answers.” Soft Computing, 2(2), pp. 100–108.
Klir, G. J., and Parviz, B. [1992], “Probability-possibility transformations: a comparison.”
International Journal of General Systems, 21(3), pp. 291–310.
Klir, G. J.; Parviz, B.; and Higashi, M. [1986], “Relationship between true and estimated
possibilistic systems and their reconstruction.” International Journal of General
Systems, 12(4), pp. 319–331.
Klir, G. J., and Ramer, A. [1990], “Uncertainty in the Dempster-Shafer theory: a critical re-examination.” International Journal of General Systems, 18(2), pp. 155–166.
Klir, G. J., and Sentz, K. [2005], “On the issue of linguistic approximation.” In A. Mehler
and R. Kohler (eds.), Aspects of Automatic Text Analysis. Springer, Berlin and New
York.
Klir, G. J., and Smith, R. M. [2001], “On measuring uncertainty and uncertainty-based
information: recent developments.” Annals of Mathematics and Artificial Intelligence, 32(1–4), pp. 5–33.
Klir, G. J.; Wang Z.; and Harmanec, D. [1997], “Constructing fuzzy measures in expert
systems.” Fuzzy Sets and Systems, 92(2), pp. 251–264.
Klir, G. J., and Way, E. C. [1985], “Reconstructability analysis: aims, results, open problems.” Systems Research, 2(2), pp. 141–163.
Klir, G. J., and Wierman, M. J. [1998], Uncertainty-Based Information: Elements of Generalized Information Theory. Physica-Verlag/Springer-Verlag, Heidelberg and New
York (2nd ed., 1999).
BIBLIOGRAPHY
473
Klir, G. J., and Yuan, B. [1993], “On measures of conflict among set-valued statements.”
Proceedings of the 1993 World Congress on Neural Networks, Vol. II, Portland, OR,
pp. 627–630.
Klir, G. J., and Yuan, B. [1995a], Fuzzy Sets and Fuzzy Logic: Theory and Applications.
Prentice Hall, Upper Saddle River, NJ.
Klir, G. J., and Yuan, B. [1995b], “On nonspecificity of fuzzy sets with continuous membership functions.” Proceedings 1995 International Conference on Systems, Man, and
Cybernetics, Vancouver, pp. 25–29.
Klir, G. J., and Yuan B. (eds.) [1996], Fuzzy Sets, Fuzzy Logic, and Fuzzy Systems:
Selected Papers by Lotfi A. Zadeh. World Scientific, Singapore.
Knopfmacher, J. [1975], “On measures of fuzziness.” Journal of Mathematical Analysis
and Applications, 49, pp. 529–534.
Kogan, I. M. [1988], Applied Information Theory. Gordon and Breach, New York and
London.
Kohlas, J. [1991], “The reliability of reasoning with unreliable arguments.” Annals of
Operations Reserach, 32, pp. 67–113.
Kohlas, J., and Monney, P. A. [1994], “Theory of evidence: a survey of its mathematical
foundations, applications and computational aspects.” ZOR—Mathematical
Methods of Operations Research, 39, pp. 35–68.
Kohlas, J., and Monney, P. A. [1995], A Mathematical Theory of Hints: An Approach to
the Dempster-Shafer Theory of Evidence. Springer, Berlin.
Kohlas, J., and Moral, S. (eds.) [2000], Algorithms for Uncertainty and Defeasible Reasoning. Kluwer, Dordrecht and Boston.
Kolmogorov, A. N. [1950], Foundations of the Theory of Probability. Chelsea, New York.
(First published in German in 1933.)
Kolmogorov, A. N. [1965], “Three approaches to the quantitative definition of information.” Problems of Information Transmission, 1, pp. 1–7.
Kong, A. [1986], Multivariate Belief Functions and Graphical Models. Research Report
S-107, Harvard University, Cambridge, MA.
Kornwachs, K., and Jacoby, K. (eds.) [1996], Information: New Questions to a Multidisciplinary Concept. Academie Verlag, Berlin.
Kosko, B. [1987], “Fuzzy entropy and conditioning.” Information Science, 40(1), pp.
1–10.
Kosko, B. [1993a], Fuzzy Thinking: The New Science of Fuzzy Logic. Hyperion, New
York.
Kosko, B. [1993b], “Addition as fuzzy mutual entropy.” Information Sciences, 73(3), pp.
273–284.
Kosko, B. [1999], The Fuzzy Future. Harmony Books, New York.
Kosko, B. [2004],“Probable equivalence, superpower sets, and superconditionals.” International Journal of Intelligent Systems, 19(12), pp. 1151–1171.
Kramosil, I. [2001], Probabilistic Analysis of Belief Functions. Kluwer Academic/
Plenum Publishers, New York.
Krätschmer, V. [2001], “A unified approach to fuzzy random variables.” Fuzzy Sets and
Systems, 123(1), pp. 1–9.
474
BIBLIOGRAPHY
Kreinovich, V.; Nguyen, H. T.; and Yam, Y. [2000], “Fuzzy systems are universal approximators for a smooth function and its derivatives.” International Journal of Intelligent Systems, 15(6), pp. 565–574.
Krippendorff, K. [1986], Information Theory: Structural Models for Qualitative Data.
Sage, Beverly Hills, CA.
Kříž, O. [2003], “Envelops of a simplex of discrete probabilities.” Computing, 7(5), pp.
336–343.
Kruse, R. [1982a], “On the construction of fuzzy measures.” Fuzzy Sets and Systems,
8(3), pp. 323–327.
Kruse, R. [1982b], “A note on l-additive fuzzy measures.” Fuzzy Sets and Systems, 8(2),
pp. 219–222.
Kruse, R.; Gebhardt, J.; and Klawonn [1994], Foundations of Fuzzy Systems. John Wiley,
Chichester, UK.
Kruse, R., and Meyer, K. D. [1987], Statistics with Vague Data. D. Reidel, Dordrecht and
Boston.
Kruse, R.; Schwecke, E.; and Heinsohn, J. [1991], Uncertainty and Vagueness in Knowledge Based Systems: Numerical Methods. Springer-Verlag, New York.
Kruse, R., and Siegel, P. (eds.) [1991], Symbolic and Quantitative Approaches to Uncertainty. Springer-Verlag, New York.
Kuhn, T. S. [1962], The Structure of Scientific Revolutions. University of Chicago Press,
Chicago.
Kullback, S. [1959], Information Theory and Statistics. John Wiley, New York.
Kyburg, H. E. [1961], Probability and the Logic of Rational Belief. Wesleyan University
Press, Middletown, CT.
Kyburg, H. E. [1987], “Bayesian and non-Bayesian evidential updating.” Artificial Intelligence, 31, pp. 271–293.
Kyburg, H. E., and Teng, C. M. [2001], Uncertain Inference. Cambridge University Press,
Cambridge and New York.
Kyburg, H. E., and Pittarelli, M. [1996], “Set-based Bayesianism.” IEEE Transactions
on Systems, Man and Cybernetics, Part A, 26(3), pp. 324–339.
Lamata, M. T., and Moral, S. [1988], “Measures of entropy in the theory of evidence.”
International Journal of General Systems, 14(4), pp. 297–305.
Lamata, M. T., and Moral, S. [1989], “Classification of fuzzy measures.” Fuzzy Sets and
Systems, 33(2), pp. 243–253.
Lao Tsu [1972], Tao Te Ching. Vintage Books, New York.
Lees, E. S., and Shu, Q. [1995], Fuzzy and Evidential Reasoning. Physica-Verlag,
Heidelberg.
Levine, R. D., and Tribus, M. (eds.) [1979], The Maximum Entropy Formalism. MIT
Press, Cambridge, MA.
Lewis, H. W. [1997], The Foundations of Fuzzy Control. Plenum Press, New York.
Lewis, P. M. [1959], “Approximating probability distributions to reduce storage requirements.” Information and Control, 2(3), pp. 214–225.
Levi, I. [1967], Gambling with Truth. Knopf, New York.
Levi, I. [1980], The Enterprise of Knowledge. MIT Press, London.
BIBLIOGRAPHY
475
Levi, I. [1984], Decisions and Revisions. Cambridge University Press, New York.
Levi, I. [1986], Hard Choices. Cambridge University Press, New York.
Levi, I. [1991], The Fixation of Belief and Its Undoing. Cambridge University Press,
New York.
Levi, I. [1996], For the Sake of Argument. Cambridge University Press, New York.
Levi, I. [1997], The Covenant of Reason. Cambridge University Press, New York.
Li, M., and Vitányi, P. [1993], An Introduction to Kolmogorov Complexity and Its Applications. Springer-Verlag, New York.
Lin, C. T., and Lee, C. S. G. [1996], Neural Fuzzy Systems: A Neuro Fuzzy Synergism to
Intelligent Systems. Prentice Hall, Upper Saddle River, NJ.
Lingras, P., and Wong, S. K. M. [1990], “Two perspectives of the Dempster-Shafer theory
of belief functions.” International Journal of Man-Machine Studies, 33(4), pp.
467–488.
Lipschutz, S. [1964], Set Theory and Related Topics. Shawn, New York.
Loo, S. G. [1977], “Measures of fuzziness.” Cybernetica, 20(3), pp. 201–210.
Mackay, D. M. [1969], Information, Mechanism and Meaning. MIT Press, Cambridge,
MA.
Madden, R. F., and Ashby, W. R. [1972], “On the identification of many-dimensional
relations.” International Journal of Systems Science, 3, pp. 343–356.
Maeda, Y., and Ichihashi, H. [1993], “An uncertainty measure with monotonicity under
the random set inclusion.” International Journal of General Systems, 21(4), pp.
379–392.
Maeda, Y.; Nguyen, H. T.; and Ichihashi, H. [1993], “Maximum entropy algorithms
for uncertainty measures.” International Journal of Uncertainty, Fuzziness and
Knowledge-Based Systems, 1(1), pp. 69–93.
Malinowski, G. [1993], Many-Valued Logics. Oxford University Press, Oxford.
Manton, K. G.; Woodbury, M. A.; and Tolley, H. D. [1994], Statistical Applications Using
Fuzzy Sets. John Wiley, New York.
Mansuripur, M. [1987], Introduction to Information Theory. Prentice Hall, Englewood
Cliffs, N.J.
Mareš, M. [1994], Computation Over Fuzzy Quantities. CRC Press, Boca Raton, FL.
Mariano, M. [1985], “The problem of resolving inconsistency in reconstructability analysis.” IEEE Workshop on Languages for Automation, Palma de Mallorca,
Spain.
Mariano, M. [1997], Aspects of Inconsistency in Reconstructability Analysis. Ph.D.
Dissertation in System Science, Binghamton University—SUNY, Binghamton, NY.
Marichal, J., and Roubens, M. [2000], “Entropy of discrete fuzzy measures.” International Journal of Uncertainty, Fuzziness, and Knowledge-Based Systems, 8(6), pp.
625–640.
Martin, N. F. G., and England, J. W. [1981], Mathematical Theory of Entropy. AddisonWesley, Reading, MA.
Mathai, A. M., and Rathie P. N. [1975], Basic Concepts of Information Theory and Statistics. John Wiley, New York.
Matheron, G. [1975], Random Sets and Integral Geometry. John Wiley, New York.
476
BIBLIOGRAPHY
Maung, I. [1995], “Two characterizations of a minimum-information principle for
possibilistic reasoning.” International Journal of Approximate Reasoning, 12(2),
pp. 133–156.
McLaughlin, D. W. [1984], Inverse Problems. American Mathematical Society, Providence, RI.
McNeill, D., and Freiberger, P. [1993], Fuzzy Logic: The Discovery of a Revolutionary
Computer Technology—and How It Is Changing Our World. Simon & Schuster, New
York.
Mendel, J. M. [2001], Uncertain Rule-Based Fuzzy Logic Systems. Prentice Hall PTR,
Upper Saddle River, NJ.
Menger, K. [1942], “Statistical metrics.” Proceedings of the National Academy of
Science, 28, pp. 535–537.
Meyerowitz, A.; Richman, F.; and Walker, E. A. [1994], “Calculating maximum-entropy
probability densities for belief functions.” International Journal of Uncertainty,
Fuzziness, and Knowledge-Based Systems, 2(4), pp. 377–389.
Miranda, E.; Cousco, I.; and Gil, P. [2003], “Extreme points of credal sets generated by
2 alternating capacities.” International Journal of Approximate Reasoning, 33(1), pp.
95–115.
Molchanov, I. [2005], Theory of Random Sets. Springer, New York.
Moles, A. [1966], Information Theory and Esthetic Perception. University of Illinois
Press, Urbana, IL.
Möller, B., and Beer, M. [2004], Fuzzy Randomness. Springer, Berlin, Heidelberg, and
New York.
Moore, R. E. [1966], Interval Analysis. Prentice Hall, Englewood Cliffs, NJ.
Moore, R. E. [1979], Methods and Applications of Interval Analysis. SIAM,
Philadelphia.
Morderson, J. N., and Malik, D. S. [2002], Fuzzy Automata and Languages: Theory and
Applications. Chapman & Hall/CRC, Boca Raton, Fl.
Mordeson, J. N., and Nair, P. S. [1998], Fuzzy Mathematics: An Introduction for
Engineers and Scientists. Physica-Verlag/Springer-Verlag, Heidelberg and New
York.
Murofushi, T., and Sugeno, M. [1989], “An interpretation of fuzzy measures and the
Choquet integral as an integral with respect to a fuzzy measure.” Fuzzy Sets and
Systems, 29(2), pp. 201–227.
Murofushi, T., and Sugeno, M. [1993], “Some quantities represented by the Choquet
integral.” Fuzzy Sets and Systems, 56(2), pp. 229–235.
Murofushi, T., and Sugeno, M. [2000], “Fuzzy measures and fuzzy integrals.” In M.
Grabish et al. (eds.), Fuzzy Measures and Integrals: Theory and Applications.
Springer-Verlag, New York, pp. 3–41.
Murofushi, T.; Sugeno, M.; and Machida, M. [1994], “Non-monotonic fuzzy measures
and the Choquet integral.” Fuzzy Sets and Systems, 64(1), pp. 73–86.
Natke, H. G., and Ben-Haim, Y. (eds.) [1997], Uncertainty: Models and Measures.
Academie Verlag, Berlin.
Nauck, D.; Klawonn, F.; and Kruse, R. [1997], Foundations of Neuro-Fuzzy Systems.
John Wiley, New York.
BIBLIOGRAPHY
477
Negoita, C. V., and Ralescu, D. A. [1975], Applications of Fuzzy Sets to Systems Analysis. Birkhäuser, Basel and Stuttgart.
Negoita, C. V., and Ralescu, D. A. [1987], Simulation, Knowledge-Based Computing, and
Fuzzy Statistics. Van Nostrand Reinhold, New York.
Neumaier, A. [1990], Interval Methods for Systems of Equations. Cambridge University Press, Cambridge and New York.
Nguyen, H. T. [1978a], “On conditional possibility distributions.” Fuzzy Sets and
Systems, 1(4), pp. 299–309.
Nguyen, H. T. [1978b], “On random sets and belief functions.” Journal of Mathematical
Analysis and Applications, 65, pp. 531–542.
Nguyen, H. T.; Kreinovich, V.; and Shekhter, V. [1998], “On the possibility of using
complex values in fuzzy logic for representing inconsistencies.” International Journal
of Intelligent Systems, 13(8), pp. 683–714.
Nguyen, H. T., and Walker, E. A. [1997], A First Course in Fuzzy Logic. CRC Press,
Boca Raton, FL.
Norton, J. [1988], “Limit theorems for Dempster’s rule of combination.” Theory and
Decision, 25, pp. 287–313.
Novák, V.; Perfilieva, I.; and Močkoř, J. [1999], Mathematical Principles of Fuzzy Logic.
Kluwer, Boston.
Padet, C. [1996], “On applying information principles to fuzzy control.” Kybernetes,
25(1), pp. 61–64.
Pal, N. R., and Bezdek, J. C. [1994], “Measuring fuzzy uncertainty.” IEEE Transactions
on Fuzzy Systems, 2(2), pp. 107–118.
Pan, Y. [1997a], Calculus of Fuzzy Probabilities and Its Applications. Ph.D. Dissertation in Systems Science, T.J. Watson School, Binghamton University—SUNY,
Binghamton, NY.
Pan, Y. [1997b], “Revised hierarchical analysis method based on crisp and fuzzy
entries.” International Journal of General Systems, 26(1–2), pp. 115–131.
Pan, Y., and Klir, G. J. [1997], “Bayesian inference based on interval probabilities.”
Journal of Intelligent and Fuzzy Systems, 5(3), pp. 193–203.
Pan, Y., and Yuan, B. [1997], “Bayesian inference of fuzzy probabilities.” International
Journal of General Systems, 26(1–2), pp. 73–90.
Pap, E. [1995], Null-Additive Set Functions. Kluwer, Boston.
Pap, E. [1997], “Decomposable measures and nonlinear equations.” Fuzzy Sets and
Systems, 92(2), pp. 205–221.
Pap, E. (ed.) [2002], Handbook of Measure Theory. Elsevier, Amsterdam.
Paris, J. B. [1994], The Uncertain Reasoner’s Companion: A Mathematical Perspective.
Cambridge University Press, Cambridge, UK.
Paris, J. B., and Vencovská, A. [1989], “On the applicability of maximum entropy
to inexact reasoning.” International Journal of Approximate Reasoning, 3(1),
pp. 1–34.
Paris, J. B., and Vencovská, A. [1990], “A note on the inevitability of maximum entropy.”
International Journal of Approximate Reasoning, 4(3), pp. 183–223.
Pavelka, J. [1979], “On fuzzy logic I, II, III.” Zeitschrift für Math. Logik und Grundlagen der Mathematik, 25(1), pp. 45–52; 25(2), pp. 119–134; 25(5), pp. 447–464.
478
BIBLIOGRAPHY
Pawlak, Z. [1982], “Rough sets.” International Journal of Computer and Information
Sciences, 11, pp. 341-356.
Pawlak, Z. [1991], Rough Sets. Kluwer, Boston.
Pedrycz, W., and Gomide, F. [1998], An Introduction to Fuzzy Sets: Analysis and Design.
MIT Press, Cambridge, MA.
Peeva, K., and Kyosev, Y. [2004], Fuzzy Relational Calculus: Theory, Applications and
Software. World Scientific, Singapore.
Piegat, A. [2001], Fuzzy Modeling and Control. Physica-Verlag/Springer-Verlag,
Heidelberg and New York.
Pinsker, M. S. [1964], Information and Information Stability of Random Variables and
Processes. Holden-Day, San Francisco.
Pittarelli, M. [1990], “Reconstructability analysis: an overview.” Revue Internationale de
Systemique, 4(1), pp. 5–32.
Pollack, H. N. [2003], Uncertain Science . . . Uncertain World. Cambridge University
Press, Cambridge and New York.
Prade, H., and Yager, R. R. [1994], “Estimations of expectedness and potential surprize
in possibility theory.” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2(4), pp. 417–428.
Press, S. J. [2003], Subjective and Objective Bayesian Statistics. Wiley-Interscience,
Hoboken, NJ.
Qiao, Z. [1990], “On fuzzy measure and fuzzy integral on fuzzy sets.” Fuzzy Sets and
Systems, 37(1), pp. 77–92.
Quastler, H. [1955], Information Theory in Psychology. The Free Press, Glencoe,
Ill.
Ragin, C. C. [2000], Fuzzy-Set Social Science. University of Chicago Press, Chicago.
Ramer, A. [1986], Informational and Combinatorial Aspects of Reconstructability
Analysis: A Mathematical Inquiry. Ph.D. Dissertation in Systems Science, T.J. Watson
School, Binghamton University—SUNY, Binghamton, NY.
Ramer, A. [1987], “Uniqueness of information measure in the theory of evidence.”
Fuzzy Sets and Systems, 24(2), pp. 183–196.
Ramer, A. [1989], “Conditional possibility measures.” Cybernetics and Systems, 20(3),
pp. 233–247.
Ramer, A. [1990a], “Axioms of uncertainty measures: dependence and independence.”
Fuzzy Sets and Systems, 35(2), pp. 185–196.
Ramer, A. [1990b], “Information measures for continuous possibility distributions.”
International Journal of General Systems, 17(2–3), pp. 241–248.
Ramer, A., and Klir, G. J. [1993], “Measures of discord in the Dempster-Shafer theory.”
Information Sciences, 67(1–2), pp. 35–50.
Ramer, A., and Lander, L. [1987], “Classification of possibilistic uncertainty and information functions.” Fuzzy Sets and Systems, 24(2), pp. 221–230.
Ramer, A., and Padet, C. [2001], “Nonpecificity in Rn.” International Journal of General
Systems, 30(6), pp. 661–680.
Reche, F., and Salmerón, A. [2000], “Operational approach to general fuzzy measures.”
International Journal of Uncertainty, Fuzziness, and Knowledge-Based Systems, 8(3),
pp. 369–382.
BIBLIOGRAPHY
479
Regan, H. M.; Ferson, S.; and Berleant, D. [2004], “Equivalence of methods for
uncertainty propagation of real-valued random variables.” International Journal of
Approximate Reasoning, 36(1), pp. 1–30.
Reichenbach, H. [1949], The Theory of Probability. University of California Press,
Berkeley and Los Angeles.
Rényi, A. [1970a], Foundations of Probability. Holden-Day, San Francisco.
Rényi, A. [1970b], Probability Theory. North-Holland, Amsterdam (Chapter IX, “Introduction to information theory,” pp. 540–616).
Rényi, A. [1987], A Diary on Information Theory. John Wiley, New York.
Rescher, N. [1969], Many-Valued Logic. McGraw-Hill, New York.
Rescher, N. [1976], Plausible Reasoning. Van Gorcum, Amsterdam.
Resconi, G.; Klir, G. J.; and St. Clair, U. [1992], “Hierarchical uncertainty metatheory
based upon modal logic.” International Journal of General Systems, 21(1), pp. 23–50.
Resconi, G.; Klir, G. J.; St. Clair, U.; and Harmanec, D. [1993], “On the integration of
uncertainty theories.” International Journal of Uncertainty, Fuzziness, and
Knowledge-Based Systems, 1(1), pp. 1–18.
Resnikoff, H. L. [1989], The Illusion of Reality. Springer-Verlag, New York.
Reza, F. M. [1961], Introduction to Information Theory. McGraw-Hill, New York.
(Reprinted by Dover, New York, 1994.).
Rissanen, J. [1989], Stochastic Complexity in Statistical Inquiry. World Scientific,
Teaneck, NJ.
Rodabaugh, S. E., and Klement, E. P. (eds.) [2003], Topological and Algebraic Structures in Fuzzy Sets. Kluwer, Boston.
Rodabaugh, S. E.; Klement, E. P.; and Höhle, U. (eds.) [1992], Applications of Category
Theory to Fuzzy Subsets. Kluwer, Boston.
Rosenkrantz, R. D. (ed.) [1983], Jaynes, E. T.: Papers on Probability, Statistics and Statistical Physics. Reidel, Boston.
Ross, T. J.; Booker, J. M.; and Parkinson, W. J. (eds.) [2002], Fuzzy Logic and Probability Applications: Bridging the Gap. ASA-SIAM, Philadelphia.
Rouvray, D. H. (ed.) [1997], Fuzzy Logic in Chemistry. Academic Press, San Diego.
Rubinstein, R. Y., and Kroese, D. P. [2004], The Cross-Entropy Method: A Unified
Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine
Learning. Springer, New York.
Ruspini, E. H.; Bonissone, P. P.; and Pedrycz, W. (eds.) [1998], Handbook of Fuzzy
Computation. Institute of Physics Publication, Bristol, UK, and Philadelphia.
Russell, B. [1950], Unpopular Essays. Simon and Schuster, New York.
Rutkowska, D. [2002], Neuro-Fuzzy Architectures and Hybrid Learning. PhysicaVerlag/Springer-Verlag, Heidelberg and New York.
Rutkowski, L. [2004], Flexible Neuro-Fuzzy Systems. Kluwer, Boston.
Sanchez, E. [1976], “Resolution in composite fuzzy relation equations.” Information
and Control, 30(1), pp. 38–48.
Sancho-Royo, A., and Verdegay, J. L. [1999], “Methods for the construction of membership functions.” International Journal of Intelligent Systems, 14(12), pp.
1213–1230.
480
BIBLIOGRAPHY
Savage, L. J. [1972], The Foundations of Statistics. Dover, New York.
Schubert, J. [1994], Cluster-Based Specification Techniques in Dempster-Shafer Theory
for an Evidential Intelligence Analysis of Multiple Target Tracks. Royal Institute of
Technology, Stockholm.
Schweizer, B., and Sklar, A. [1983], Probabilistic Metric Spaces. North-Holland, New
York.
Sgarro,A. [1997],“Bodies of evidence versus simple interval probabilities.” International
Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 5(2), pp. 199–209.
Shackle, G. L. S. [1949], Expectation in Economics. Cambridge University Press,
Cambridge.
Shackle, G. L. S. [1955], Uncertainty in Economics and Other Reflections. Cambridge
University Press, Cambridge.
Shackle, G. L. S. [1961], Decision, Order and Time in Human Affairs. Cambridge University Press, New York and Cambridge.
Shackle, G. L. S. [1979], Imagination and the Nature of Choice. Edinburgh University
Press, Edinburgh.
Shafer, G. [1976a], A Mathematical Theory of Evidence. Princeton University Press,
Princeton, NJ.
Shafer, G. [1976b], “A theory of statistical evidence.” In W. L. Harper and C. A. Hooker
(eds.), Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science. D. Reidel, Dordrecht, pp. 365–436.
Shafer, G. [1978], “Non-additive probabilities in the work of Bernoulli and Lambert.”
Archive for History of Exact Sciences, 19, pp. 309–370.
Shafer, G. [1979], “The allocation of probability.” Annals of Probability, 7(5), pp.
827–839.
Shafer, G. [1981], “Constructive probability.” Synthese, 48, pp. 1–60.
Shafer, G. [1982], “Belief functions and parametric models.” Journal of the Royal Statistical Society, B-44, pp. 322–352.
Shafer, G. [1985], “Belief functions and possibility measures.” In J. C. Bezdek (ed.),
Analysis of Fuzzy Information. CRC Press, Boca Raton, FL.
Shafer, G. [1986], “The Combination of Evidence.” International Journal of Intelligent
Systems, 1(3), pp. 155–179.
Shafer, G. [1990], “Perspectives on the theory and practice of belief functions.” International Journal of Approximate Reasoning, 4(5–6), pp. 323–362.
Shannon, C. E. [1948], “The mathematical theory of communication.” The Bell System
Technical Journal, 27(3&4), pp. 379–423, 623–656.
Shannon, C. E., and Weaver, W. [1949], The Mathematical Theory of Communication.
University of Illinois Press, Urbana, IL.
Shapley, L. S. [1971], “Core of convex games.” International Journal of Game Theory,
1(1), pp. 11–26.
Shore, J. E., and Johnson, R. W. [1980], “Axiomatic derivation of the principle of
maximum entropy and the principle of minimum cross-entropy.” IEEE Transactions
on Information Theory, 26(1), pp. 26–37.
Shore, J. E., and Johnson, R. W. [1981], “Properties of cross-entropy minimization.”
IEEE Transactions on Information Theory, 27(4), pp. 472–482.
BIBLIOGRAPHY
481
Sims, J. R., and Wang, Z. [1990], “Fuzzy measures and fuzzy integrals: an overview.”
International Journal of General Systems, 17(2–3), pp. 157–189.
Slepian, D. (ed.) [1974], Key Papers in the Development of Information Theory. IEEE
Press, New York.
Sloane, N. J., and Wyner, A. D. (eds.) [1993], Claude Elwood Shannon: Collected Papers.
IEEE Press, Piscataway, NJ.
Smets, P. [1981], “The degree of belief in a fuzzy event.” Information Sciences, 25(1),
pp. 1–19.
Smets, P. [1983], “Information content of an evidence.” International Journal of ManMachine Studies, 19(1), pp. 33–43.
Smets, P. [1988], “Belief functions.” In P. Smets et al. (eds.), Non-standard Logics for
Automated Reasoning. Academic Press, San Diego, pp. 253–286.
Smets, P. [1990], “The combination of evidence in the transferable belief model.”
IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(5), pp. 447–458.
Smets, P. [1992a], “The transferable belief model and random sets.” International
Journal of Intelligent Systems, 7(1), pp. 37–46.
Smets, P. [1992b], “Resolving misunderstandings about belief functions.” International
Journal of Approximate Reasoning, 6(3), pp. 321–344.
Smets, P. [1998], “The transferable belief model for quantified belief representations.”
In D. M. Gabbay and P. Smets (eds.), Handbook of Defeasible Reasoning and Uncertainty Management Systems, Vol. 1, Kluwer, Boston, pp. 267–301.
Smets, P., and Kennes, R. [1994], “The transferable belief model.” Artificial Intelligence,
66, pp. 191–234.
Smith, R. M. [2000], Generalized Information Theory: Resolving Some Old Questions
and Opening Some New Ones. Ph.D. Dissertation in Systems Science, T.J. Watson
School, Binghamton University—SUNY, Binghamton, NY.
Smith, S. A. [1974], “A derivation of entropy and the maximum entropy criterion in the
context of decision problems.” IEEE Transactions on Systems, Man, and Cybernetics, 4(2), pp. 157–163.
Smithson, M. [1989], Ignorance and Uncertainty: Emerging Paradigms. Springer-Verlag,
New York.
Smuts, J. C. [1926], Holism and Evolution. Macmillan, London. (Reprinted by Greenwood Press, Westport, CT, 1973.)
Stonier, T. [1990], Information and the Internal Structure of Universe. Springer-Verlag,
New York.
Sugeno, M. [1974], Theory of Fuzzy Integrals and its Applications. Ph. D. Thesis, Tokyo
Institute of Technology.
Sugeno, M. [1977], “Fuzzy measures and fuzzy integrals: a survey.” In M. M. Gupta,
G. N. Saridis, and B. R. Gaines (eds.), Fuzzy Automata and Decision Processes.
North-Holland, Amsterdam and New York, pp. 89–102.
Tanaka, H.; Sugihara, K.; and Maeda, Y. [2004], “Non-additive measures by interval
probability functions.” Information Sciences, 164, pp. 209–227.
Tarantola, A. [1987], Inverse Problem Theory. Elsevier, New York.
Temple, G. [1971], The Structure of Lebesgue Integration Theory. Oxford University
Press, London.
482
BIBLIOGRAPHY
Theil, H. [1967], Economics and Information Theory. North-Holland, Amsterdam, and
Rand McNally & Co., Chicago.
Theil, H., and Fiebig, D. G. [1984], Exploiting Continuity: Maximum Entropy Estimation of Continuous Distributions. Ballinger, Cambridge, MA.
Thomson, W. [1891], Popular Lectures and Addresses. MacMillan, London.
Tribus, M. [1969], Rational Descriptions, Decisions and Designs. Pergamon Press,
Oxford.
Tsiporkova, E.; Boeva, V.; and De Baets, B. [1999], “Evidence measures induced by
Kripke’s accessibility relations.” International Journal of Uncertainty, Fuzziness and
Knowledge-Based Systems, 7(6), pp. 589–613.
Tyler, S. [1978], The Said and the Unsaid. Academic Press, New York.
Vajda, I. [1989], Theory of Statistical Inference and Information. Kluwer, Boston.
Van der Lubbe, J. C. A. [1984], “A generalized class of certainty and information.” Information Sciences, 32(3), pp. 187–215.
Van Leekwijck, W., and Kerre, E. E. [1999], “Defuzzification: criteria and classification.”
Fuzzy Sets and Systems, 108(2), pp. 159–178.
Vejnarová, J. [1991], “A few remarks on measures of uncertainty in Dempster-Shafer
theory.” In Proceedings of the Workshop on Uncertainty in Expert Systems, Alšovice,
Czech Republic.
Vejnarová, J. [1998], “A note on the interval-valued marginal problem and its maximum
entropy solution.” Kybernetika, 34(1), pp. 17–26.
Vejnarová, J. [2000], “Conditional independence relations in possibility theory.” International Journal of Uncertainty, Fuzziness, and Knowledge-Based Systems, 8(3), pp.
253–269.
Vejnarová, J., and Klir, G. J. [1993],“Measure of strife in Dempster-Shafer theory.” International Journal of General Systems, 22(1), pp. 25–42.
Verdú, S., and McLaughlin, S. W. (eds.) [2000], Information Theory: 50 Years of Discovery. IEEE Press, Piscataway, NJ.
Vicig, P. [2000], “Epistemic independence for imprecise probabilities.” International
Journal of Approximate Reasoning, 24(2–3), pp. 235–250.
Viertl, R. [1996], Statistical Methods for Non-Precise Data. CRC Press, Boca Raton,
FL.
Walker, C. L. [2003], “Categories of fuzzy sets.” Soft Computing, 8(4), pp. 299–304.
Walley, P. [1991], Statistical Reasoning With Imprecise Probabilities. Chapman & Hall,
London.
Walley, P. [1996], “Measures of uncertainty in expert systems.” Artificial Intelligence, 83,
pp. 1–58.
Walley, P. [1997], “Statistical inferences based on a second-order possibility distribution.” International Journal of General Systems, 26(4), pp. 337–383.
Walley, P. [2000], “Towards a unified theory of imprecise probability.” International
Journal of Approximate Reasoning, 24(2–3), pp. 125–148.
Walley, P., and De Cooman, G. [1999], “Coherence of rules for defining conditional possibility.” International Journal of Approximate Reasoning, 21(1), pp. 63–107.
Walley, P., and De Cooman, G. [2001], “A behavioral model for linguistic uncertainty.”
Information Sciences, 134(1), pp. 1–37.
BIBLIOGRAPHY
483
Walley, P., and Fine, T. L. [1979], “Varieties of model (classificatory) and comparative
probability.” Synthese, 41(3), pp. 321–374.
Walley, P., and Fine, T. L. [1982], “Toward a frequentist theory of upper and lower probability.” The Annals of Statistics, 10(3), pp. 741–761.
Wang, J., and Wang, Z. [1997], “Using neural networks to determine Sugeno measures
by statistics.” Neural Networks, 10(1), pp. 183–195.
Wang, W.; Wang, Z.; and Klir, G. J. [1998], “Genetic algorithms for determining
fuzzy measures from data.” Journal of Intelligent and Fuzzy Systems, 6(2),
pp. 171–183.
Wang, Z., and Klir, G. J. [1992], Fuzzy Measure Theory. Plenum Press, New York.
Wang, Z.; Klir, G. J.; and Wang, W. [1996], “Monotone set functions defined by Choquet
integral.” Fuzzy Sets and Systems, 81(2), pp. 241–250.
Watanabe, S. [1969], Knowing and Guessing. John Wiley, New York.
Watanabe, S. [1981], “Pattern recognition as a quest for minimum entropy.” Pattern
Recognition, 13(5), pp. 381–387.
Watanabe, S. [1985], Pattern Recognition: Human and Mechanical. John Wiley, New
York.
Weaver, W. [1948], “Science and complexity.” American Scientist, 36, pp. 536–544.
Webber, M. J. [1979], Information Theory and Urban Spatial Structure. Croom Helm,
London.
Weber, S. [1984], “Decomposable measures and integrals for Archimedean t-conorms.”
Journal of Mathematical Analysis and Applications, 101(1), pp. 114–138.
Weichselberger, K. [2000], “The theory of interval-probability as a unifying concept
for uncertainty.” International Journal of Approximate Reasoning, 24(2–3),
pp. 149–170.
Weichselberger, K., and Pöhlmann, S. [1990], A Methodology for Uncertainty in
Knowledge-Based Systems. Springer-Verlag, New York.
Weir, A. J. [1973], Lebesgue Integration and Measure. Cambridge University Press, New
York.
Weltner, K. [1973], The Measurement of Verbal Information in Psychology and Education. Springer-Verlag, New York.
Whittemore, B. J., and Yovits, M. C. [1973], “A generalized conceptual development for
the analysis and flow of information.” Journal of the American Society for Information Sciences, 24(3), pp. 221–231.
Whittemore, B. J., and Yovits, M. C. [1974], “The quantification and analysis of information used in decision processes.” Information Sciences, 7(2), pp. 171–184.
Wierman, M. [1994], Possibilistic Image Processing. Ph.D. Dissertation in Systems
Science, T.J.Watson School, SUNY-Binghamton, Binghamton, NY.
Wierzchoń, S. T. [1982], “On fuzzy measure and fuzzy integral.” In M. M. Gupta and
E. Sanchez (eds.), Fuzzy Information and Decision Processes. North-Holland,
New York, pp. 79–86.
Wierzchoń, S. T. [1983], “An algorithm for identification of fuzzy measure.” Fuzzy Sets
and Systems, 9(1), pp. 69–78.
Williams, P. M. [1980],“Bayesian conditionalisation and the principle of minimum information.” British Journal for the Philosophy of Science, 31, pp. 131–144.
484
BIBLIOGRAPHY
Williamson, R. C., and Downs, T. [1990], “Probabilistic arithmetic. I. Numerical methods
for calculation convolutions and dependency bounds.” International Journal of
Approximate Reasoning, 4(2), pp. 89–158.
Wilson, A. G. [1970], Entropy in Urban and Regional Modelling. Pion, London.
Wilson, N. [2000], “Algorithms for Dempster-Shafer theory.” In J. Kohlas and S. Moral
(eds.), Algorithms for Uncertainty and Defeasible Reasoning. Kluwer, Dordrecht and
Boston, pp. 421–475.
Wolf, R. G. [1977], “A survey of many-valued logic (1966–1974).” In J. M. Dunn and
G. Epstein (eds.), Modern Uses of Multiple-Valued Logic. D. Reidel, Boston,
pp. 167–323.
Wolkenhauer, O. [1998], Possibility Theory with Applications to Data Analysis.
Research Studies Press, Tauton, UK.
Wong, S. K. M.; Wang, L. S.; and Yao, Y. Y. [1995], “On modeling uncertainty with interval structures.” Computational Intelligence, 11(2), pp. 406–426.
Wonneberger, S. [1994], “Generalization of an invertible mapping between probability
and possibility.” Fuzzy Sets and Systems, 64(2), pp. 229–240.
Wyner, A. D. [1981], “Fundamental limits in information theory.” Proceedings of the
IEEE, 69, pp. 239–251.
Yager, R. R. [1979], “On the measure of fuzziness and negation. Part I: membership in
the unit interval.” International Journal of General Systems, 5(4), pp. 221–229.
Yager, R. R. [1980a], “On a general class of fuzzy connectives.” Fuzzy Sets and Systems,
4(3), pp. 235–242.
Yager, R. R. [1980b], “On the measure of fuzziness and negation. Part II: lattices.” Information and Control, 44(3), pp. 236–260.
Yager, R. R. (ed.) [1982a], Fuzzy Set and Possibility Theory. Pergamon Press, Oxford.
Yager, R. R. [1982b], “Generalized probabilities of fuzzy events from fuzzy belief structures.” Information Sciences, 28(1), pp. 45–62.
Yager, R. R. [1982c], “Measuring tranquility and anxiety in decision making: an application of fuzzy sets.” International Journal of General Systems, 8(3), pp. 139–146.
Yager, R. R. [1983], “Entropy and specificity in a mathematical theory of evidence.”
International Journal of General Systems, 9(4), pp. 249–260.
Yager, R. R. [1984], “Probabilities from fuzzy observations.” Information Sciences,
32(1), pp. 1–31.
Yager, R. R. [1986], “Toward general theory of reasoning with uncertainty: nonspecificity and fuzziness.” International Journal of Intelligent Systems, 1(1), pp. 45–67.
Yager, R. R. [1987a], “Set based representations of conjunctive and disjunctive knowledge.” Information Sciences, 41 (1), pp. 1–22.
Yager, R. R. [1987b], “On the Dempster-Shafer framework and new combination
rules.” Information Sciences, 41, pp. 93–137.
Yager, R. R. [1990], “Ordinal measures of specificity.” International Journal of General
Systems, 17(1), pp. 57–72.
Yager, R. R. [1991], “Similarity based specificity measures.” International Journal of
General Systems, 19(2), pp. 91–105.
Yager, R. R. [2000], “On the entropy of fuzzy measures.” IEEE Transactions on Fuzzy
Systems, 8(4), pp. 453–461.
BIBLIOGRAPHY
485
Yager, R. R. [2004], “On the retranslation process in Zadeh’s paradigm of computing
with words.” IEEE Transactions on Systems, Man and Cybernetics (Part B), 34(2),
pp. 1184–1195.
Yager, R. R.; Fedrizzi, M.; and Kacprzyk, J. (eds.) [1994], Advances in the DempsterShafer Theory of Evidence. John Wiley, New York.
Yager, R. R., and Filev, D. P. [1994], Essentials of Fuzzy Modeling and Control. John
Wiley, New York.
Yager, R. R.; Ovchinnikov, S.; Tong, R. M.; and Nguyen, H. T. (eds.) [1987], Fuzzy Sets
and Applications: Selected Papers by L. A. Zadeh. John Wiley, New York.
Yaglom, A. M., and Yaglom, I. M. [1983], Probability and Information. Reidel, Boston.
Yang, M.; Chen, T.; and Wu, K. [2003], “Generalized belief function, plausibility
function, and Dempster’s combination rule to fuzzy sets.” International Journal of
Intelligent Systems, 18(8), pp. 925–937.
Yen, J. [1990], “Generalizing the Dempster-Shafer theory to fuzzy sets.” IEEE Transactions on Systems, Man, and Cybernetics, 20(3), pp. 559–570.
Yeung, R. W. [2002], A First Course in Information Theory. Kluwer, Boston.
Yovits, M. C.; Foulk, C. R.; and Rose, L. L. [1981],“Information flow and analysis: theory,
simulation, and experiments.” Journal of the American Society for Information
Science., 32, pp. 187–210, 243–248.
Yu, F. T. S. [1976], Optics and Information Theory. John Wiley, New York.
Zadeh, L. A. [1965], “Fuzzy Sets.” Information and Control, 8(3), pp. 338–353.
Zadeh, L. A. [1968], “Probability measures of fuzzy events.” Journal of Mathematical
Analysis and Applications, 23, pp. 421–427.
Zadeh, L. A. [1971], “Similarity relations and fuzzy orderings.” Information Science,
3(2), pp. 177–200.
Zadeh, L. A. [1975–76], “The concept of a linguistic variable and its application to
approximate reasoning.” Information Sciences, 8, pp. 199–249, 301–357; 9, pp. 43–80.
Zadeh, L. A. [1978a], “Fuzzy sets as a basis for a theory of possibility.” Fuzzy Sets and
Systems, 1(1), pp. 3–28.
Zadeh, L. A. [1978b], “PRUF—a meaning representation language for natural languages.” International Journal of Man-Machine Studies, 10(4), pp. 395–460.
Zadeh, L. A. [1981], “Possibility theory and soft data analysis.” In L. Cobb and R. M.
Thrall (eds.), Mathematical Frontiers of the Social and Policy Sciences. Westview
Press, Boulder, CO, pp. 69–129.
Zadeh, L. A. [1986], “A simple view of the Dempster-Shafer theory of evidence and its
implication for the rule of combination.” AI Magazine, 7(2), pp. 85–90.
Zadeh, L. A. [1996], “Fuzzy logic = computing with words.” IEEE Transactions on
Fuzzy Systems, 4(2), pp. 103–111.
Zadeh, L. A. [1997], “Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic.” Fuzzy Sets and Systems, 90(2), pp.
111–127.
Zadeh, L. A. [1999], “From computing with numbers to computing with words—from
manipulation of measurements to manipulation of perceptions.” IEEE Transactions
on Circuits and Systems (I. Fundamental Theory and Applications), 45(1), pp.
105–119.
486
BIBLIOGRAPHY
Zadeh, L. A. [2002], “Toward a perception-based theory of probabilistic reasoning with
imprecise probabilities.” Journal of Statistical Planning and Inference, 105, pp.
233–264.
Zadeh, L. A. [2005], “Toward a generalized theory of uncertainty (GTU)—An outline.”
Information Sciences, 172(1–2), pp. 1–40.
Zimmermann, H. J. [1996], Fuzzy Set Theory—And Its Applications. Kluwer, Boston.
Zimmermann, H. J. [2000], “An application-oriented view of modeling uncertainty.”
European Journal of Operations Research, 122, pp. 190–198.
Zwick, M. [2004], “An overview of reconstructability analysis.” Kybernetes, 33(5–6), pp.
877–905.
Zwick, R., and Wallsten, T. S. [1989], “Combining stochastic uncertainty and linguistic
inexactness.” International Journal of Man-Machine Studies, 30(1), pp. 69–111.
SUBJECT INDEX
Additive measure, 61, 449
Additivity, 47, 62, 73, 102, 103, 105, 159,
197, 201, 203, 205, 206, 209, 214, 227,
228, 235, 236, 250, 419
countable, 102
finite, 102
Aggregate uncertainty measure, 226–239,
248, 253, 255, 375, 388, 417, 421,
437–441, 449
Algebra:
Boolean, 102, 261, 268, 269
non Boolean, 261, 262
Alpha-cut (a-cut), 263, 264, 328, 350
Alpha-cut representation, 264–266, 315,
338, 340, 350, 453
Alternative rule of combination (in
DST), 173, 174, 188
Ample field, 144
Ampliative reasoning, 356, 369, 409
Approximate reasoning, 287, 293, 305,
307, 410
Approximating:
belief functions by necessity functions,
399, 410
graded possibilities by crisp
possibilities, 403–408
Approximation problems, 388–389
Average Shannon entropy, 225
Averaging operation, 269
Base variable, 295
Basic probability assignment, 167–170,
210, 449
Bayes’ theorem, 66, 67
Belief measure (function), 166–169, 175,
188, 237, 399, 400, 449
Bit, 30, 68, 209, 214, 449
Body of evidence, 159, 168, 170
consonant, 176, 177
Boltzmann entropy, 92–94, 97
Boolean algebra, 102, 261, 268, 269
Boolean lattice, 116–117
Branching, 29, 74, 90, 91, 197, 205–207,
253, 254
Cardinality, 13
Cartesian product, 14, 16, 17
Category theory, 320, 350
Cauchy equation, 70
Characteristic function, 11, 12, 335,
449
Choquet capacity, 106, 115, 135, 138, 143,
185, 418
alternating, 116, 166, 449
of infinite order, 107, 143, 165, 166,
180, 416
of order k, 107, 450
of order 2, 107, 123, 180
Choquet integral, 133–135, 138, 450
Classical information theory, 7, 95, 101,
102, 356, 357, 369, 420
Classical measure, 101–103, 137, 449
Classical measure theory, 5, 9, 19, 20, 95,
101, 415
Classical set theory, 5, 9, 22, 101, 135,
266, 268, 315, 320, 415, 420
Uncertainty and Information: Foundations of Generalized Information Theory, by George J. Klir
© 2006 by John Wiley & Sons, Inc.
487
488
SUBJECT INDEX
Classical uncertainty theory, 8, 26, 196,
252
Commonality function, 169
Compatibility relation, 14, 316, 317, 450
Composition, 16, 284, 285
Compositional rule of inference, 294
Computing with perceptions, 305
Computational complexity, 186, 250, 399
Conditional possibility measure, 155,
156, 159, 189, 204
Conditional Shannon entropy, 80, 81, 85,
88, 97
Conditional U-uncertainty, 122, 184
Conflict, 68, 69, 196, 218–222, 239, 248,
249, 253, 450
Conflict-resolution problems, 364–369
Confusion, 219, 254
Consistency axioms, 409, 410
Consonant body of evidence, 176, 177
Constrained fuzzy arithmetic, 275–280,
347
Constraint:
equality, 275, 277, 278, 347
generalized, 423
probabilistic, 278–280, 347
Continuity, 47, 73, 104, 197, 205, 250
Convex set of probability distributions,
123, 139, 178, 183, 214, 237, 249,
334
Coordinate invariance, 47, 198
Core (of a fuzzy set), 263
Countable additivity, 102
Credal set, 139, 214, 215, 250, 255, 418
Cutworthy property, 266, 273, 284, 286,
315–317, 350, 450
Cylindric closure, 17, 41, 43, 282, 283,
316, 373–375
Cylindric extension, 17, 41, 281–283
Decision making, 1, 132
Decomposable measure, 145, 185, 186,
188, 418
Defuzzification, 298, 299, 307, 404, 450
Degree:
of membership, 19, 261, 287
of truth, 287
De Morgan laws, 13, 268, 269
Dempster’s rule of combination,
170–174, 188
Dempster-Shafer theory (DST), 143, 144,
166, 167, 169, 176, 178, 180, 187, 188,
190, 209, 218, 240–243, 253, 254, 351,
385, 399, 416, 417, 421, 430–441,
450
Dirac measure, 145, 189
Directed divergence, 94
Disaggregated total uncertainty, 239, 240,
248, 249, 254, 255, 375, 417, 450
alternative, 250, 255, 375, 450
Discord, 221, 254
Disorganized complexity, 3, 4
Disjoint sets, 13
Dissonance, 219, 254
Duality, 112, 144, 159
Entropies of order b, 96
Equivalence relation, 14, 316–318, 451
Euclidean vector space, 18, 375
Expansibility, 72, 197, 205, 206, 209
Expected value, 62, 65, 133, 369, 370
Extension principle, 318–320, 350
Finite additivity, 102
Focal set (element), 168, 328
Frame of discernment, 166
Function, 16
characteristic, 11, 12, 335, 449
membership, 19–21, 260, 335
Fuzzification, 261, 315, 320, 321, 334, 338,
348, 350, 351, 416, 451
Fuzziness, 321, 322, 451
Fuzzy arithmetic, 273–280, 306
constrained, 275–280, 347
standard, 273–277, 306
Fuzzy conjunction, 287
Fuzzy complement, 266, 287, 306, 317,
322, 350, 451
Fuzzy differential calculus, 306
Fuzzy differential equations, 306
Fuzzy disjunction, 287
Fuzzy event, 334
Fuzzy implication, 293, 307, 451
Lukasiewicz, 293
Fuzzy inference rules, 297
Fuzzy intersection, 266–268, 287, 306
Fuzzy interval, 270, 295, 316, 337, 338
canonical form of, 270, 271
trapezoidal, 271
SUBJECT INDEX
Fuzzy logic, 286, 287, 305, 306, 419,
420
in a broad sense, 286, 287, 307, 321
in a narrow sense, 286, 287, 307
Fuzzy measure, 106, 137
Fuzzy negation, 287
Fuzzy number, 271, 306, 451
complex, 307
triangular, 271
Fuzzy partition, 337, 338, 451
Fuzzy predicate, 287
Fuzzy probability, 351
Fuzzy proposition, 287, 326
conditional, 291
probability qualified, 290, 291
truth qualified, 288, 291
Fuzzy propositional form:
canonical, 287
conditional, 291
probability-qualified, 290
truth-qualified, 288
Fuzzy qualifier, 293
Fuzzy relation, 280–286, 294, 307, 327,
451
binary, 284–286
inverse, 284
Fuzzy relation equations, 286, 294,
307
Fuzzy sets, 19, 20, 186, 260–270, 305, 403,
419, 420
convex, 315, 316, 450
interval-valued, 299, 300, 303, 307
intuitionistic, 302, 308
lattice-based (L-fuzzy), 302, 307
nonstandard, 261, 299, 417, 418
normal, 263
of level l (l ≥ 2), 301, 302, 307
of type k (k ≥ 2), 300, 301, 303, 307
rough, 302, 303, 308
standard, 19, 260–270, 299, 303, 315
Fuzzy set theory, 9, 20, 260–262, 266, 270,
287, 303, 305, 306, 420
Fuzzy set union, 266–268, 287, 306
Fuzzy system, 294, 305, 451
Games:
cooperative, 138
non-atomic, 139
General interval structures, 139
489
Generalized Hartley measure, 198,
209–216, 243, 248, 253, 254, 346, 375,
380, 430–436, 451
conditional, 213
Generalized Information Theory (GIT),
7–10, 22, 102, 261, 355–357, 375, 383,
415–424
Generalized measure, 103
Generalized modus ponens, 293
Generalized Shannon entropy, 216, 218,
219, 226, 239–249, 375, 442–448, 451
Generalized theory of uncertainty
(GTU), 423, 424
Genetic algorithms, 308
Gibbs’ theorem, 79, 81, 82
Graded possibility, 144, 145, 186, 206, 403
Granulation, 295–297
Hartley-like measure, 45–57, 451
Hartley-like U-uncertainty, 206, 208
Hartley measure, 27–34, 36, 38, 42, 44, 46,
56, 197–199, 203, 204, 355, 416, 417,
451
conditional, 33, 44
joint, 32, 33
marginal, 32, 33
normalized, 31
Height (of a fuzzy set), 263
Identification problems, 373
Imprecise probabilities, 115, 129–133,
135, 136, 138, 139, 143–145, 160,
178–180, 185, 416
Independence, 63, 64, 156, 159, 187
Infimum, 15, 18
Information, 6, 9, 10, 22, 38, 42, 53, 415,
420, 423, 424
algorithmic, 23
linguistic, 417
uncertainty-based, 7, 22, 23, 95, 101,
424, 453
Information gap, 418, 421
Information transmission, 34, 43, 44, 46,
80, 84, 85, 94, 97, 204, 451
normalized, 35, 83
Informativeness, 38
predictive, 87
Interaction representation, 116–119, 136,
137, 138, 225
490
SUBJECT INDEX
International Fuzzy Systems Association
(IFSA), 306
Interval:
closed, 17, 316
fuzzy, 270, 271, 295, 316, 337, 338
open, 17
semiopen, 18
Interval arithmetic, 306
Interval-valued probability distributions,
144, 145, 178, 451
conditional, 184, 188
joint, 183
marginal, 183
reachable (feasible), 179, 180, 188, 255,
280, 338, 350, 351, 416
Inverse:
of binary relation, 16
of function, 16
Inverse problems, 286, 294, 409
k-additive measures, 145, 186, 188, 418
k-monotone measure, 145, 452
Knowledge acquisition, 303
Lagrange multipliers, 370
Lambda measures (l-measures), 144,
145, 160–165, 178, 348–350, 377, 402,
403, 416
Lambda rule (l-rule), 161, 162, 185, 186,
187
Lattice, 15
Lebesgue integral, 133, 134, 137
Linguistic hedge, 269
Linguistic uncertainty, 321
Linguistic variable, 294–296, 423, 452
Locally inconsistent subsystems,
364–369, 408
Log-interval scales, 392–394, 396, 398,
402, 403, 410
Lower probability function, 112–115,
119, 130, 132, 137, 138, 161, 178, 179,
196, 237, 280, 416, 452
conditional, 122, 184
joint, 122, 183
marginal, 121
Marginal problem, 385, 386
Measure:
additive, 61, 449
classical, 101–103, 137, 449
decomposable, 145, 185, 186, 188,
418
fuzzy, 106, 137
generalized, 103
k-additive, 145, 186, 188, 418
k-monotone, 145, 452
monotone, 103–107, 117–119, 135, 137,
139, 143–148, 160, 167, 185, 196, 262,
351, 452
nonadditive, 106, 137
probability, 62, 64, 146, 162, 166, 174,
175, 334
regular, 104
semicontinuous, 104
subadditive, 105, 453
Sugeno, 144, 160, 187, 453
superadditive, 105, 453
2-monotone, 145
•-monotone, 145
Measurement, 105
Measurement unit, 30, 46, 68
Measure of fuzziness, 321–326, 350, 351,
452
Membership function, 19–21, 260, 335
constructing, 303, 308
trapezoidal, 339
Möbius representation, 108, 136–138,
149, 150, 153, 165, 186, 196, 237, 329,
346, 452
Möbius transform, 108, 119, 138, 168,
214, 416
Modal logic, 56, 188
Modifier, 269, 306
Monotone measure, 9, 10, 103–107,
117–119, 135, 137, 139, 143–148, 160,
167, 185, 196, 262, 351, 452
Monotonicity, 30, 47, 74, 103–105, 197,
205, 206, 211–213, 215, 250, 254
Natural language, 20, 305, 307, 321, 334,
420
Necessity, 26, 56
Necessity measure (function), 26, 56,
106, 144, 147–149, 158, 159, 176, 187,
330, 399, 400, 452
Nested sets, 14, 263, 328, 452
Neural network, 187, 304, 308
Nonadditive measure, 106, 137
SUBJECT INDEX
Noninteraction, 33, 34, 63–66, 123–125,
138, 156, 159, 177, 202, 203, 210,
214
Nonspecificity, 28, 53, 196, 209, 239, 243,
248, 249, 254, 255, 385, 452
identification, 41, 42
predictive, 37
Normalized Shannon entropy, 83
Normalization, 30, 75, 147, 158, 159, 197,
205, 206, 209, 329
North American Fuzzy Information
Processing Society (NAFIPS),
306
Optimization problems, 375
Ordinal scales, 394–398
Organized complexity, 3, 4
Organized simplicity, 3, 4
Partial ordering, 15, 316, 317, 452
Partition, 14, 336, 452
fuzzy, 337, 338, 451
Plausibility measure (function), 166–169,
188, 452
joint, 169
Possibility, 7, 26, 56, 423
comparative, 160
graded, 144, 145, 186, 206, 403
similarity-based, 160
Possibility—l-measure transformations,
402–403
Possibility function, 26, 27, 56, 452
basic, 26, 146, 149, 159, 449
conditional, 155, 156, 159, 189, 204
joint, 153–155
marginal, 153–155
Possibility measure, 106, 144–149, 158,
159, 176, 185, 187, 402, 403, 416
Possibility profile, 149–153, 187, 198–200,
326, 452
greatest, 152
smallest, 152
Possibility theory:
classical, 26, 144, 252
fuzzy-set interpretation of, 160,
326–331, 351
generalized, 144, 145
graded, 144, 145, 148, 158–160, 176,
186, 253, 326, 399, 410, 423
491
Potential surprise, 186
Power set:
crisp, 12
fuzzy, 262
Prediction, 37, 87
Principle:
of information preservation, 357,
388
of maximum entropy, 356, 369–372,
408, 409, 421
of maximum nonspecificity, 373,
375–381, 410
of maximum uncertainty, 356–358, 369,
375, 383, 410
of minimum cross-entropy, 372, 409,
410
of minimum entropy, 356,
of minimum information loss, 357, 367,
368
of minimum nonspecificity, 364
of minimum uncertainty, 356–359,
408
of requisite generalization, 356–358,
383–387
of uncertainty invariance, 356–358,
387, 388, 399, 402–404, 410
Probability, 5, 7, 26, 61, 95, 187, 420, 422,
423
Probability box ( p-box), 188, 189
Probability consistency, 227, 228,
235
Probability density function, 65, 334
conditional, 65
joint, 65
marginal, 65
Probability distribution, 62
Probability distribution function, 62, 64,
65, 146, 159, 167, 334
conditional, 63, 159
joint, 63
marginal, 63
Probability granule, 339, 351
Probability measure, 62, 64, 146, 162, 166,
174, 175, 334
Probability of fuzzy event, 335, 351
Probability-possibility consistency index,
396
Probability-possibility transformations,
390–398, 410
492
SUBJECT INDEX
Probability qualifier, 290
Probability theory, 3, 22, 61, 95, 105, 110,
130, 137, 158, 159, 252, 334, 408, 409,
420, 422
Projection, 17, 42, 169, 281–283, 373
Quantization, 295–297, 336
Random set, 190
Reconstructability analysis (RA), 408,
410, 411
Random variable, 62, 336, 351, 370
Range, 47, 197, 205, 209, 214, 227, 228,
235
Regular measure, 104
Relation, 14, 452
antisymmetric, 14
binary, 14, 16
compatibility, 14, 316, 317, 450
equivalence, 14, 316–318, 451
n-dimensional, 17, 323
of partial ordering, 15, 316, 317,
452
reflexive, 14, 316
symmetric, 14, 316
transitive, 14, 316
Relational join, 17, 284, 286, 316,
373–375, 380
Rényi entropies, 96
R-norm entropies, 96
Robust statistics, 137
Rough sets, 308
Rule of combination:
alternative, 173, 174, 188
Dempster’s, 170–174, 188
Scalar cardinality (of a fuzzy set),
263
Scales, 390
difference, 391
interval, 391, 392, 410
log-interval, 392–394, 396, 398, 402,
403, 410
ordinal, 394–398
ratio, 390
Semicontinuous measure, 104
Semilattice:
join, 15
meet, 15
Set, 11
classical, 19, 419
crisp, 260, 403
empty, 13
fuzzy, 19, 20, 186, 260–270, 403, 419
power, 12
universal, 11, 453
Set complement, 12
fuzzy, 266, 287, 306, 317, 322, 350,
451
involutive fuzzy, 267
standard fuzzy, 267, 323, 453
Set consistency, 227, 228, 235
Set difference, 12
Set intersection, 12
drastic fuzzy, 268
fuzzy, 266–268, 287, 306
standard fuzzy, 268, 270, 316, 317, 453
Set of probability distribution functions,
112, 216, 238, 254
convex, 123, 139, 178, 183, 214, 237,
249, 334
marginal, 121
Set theory:
classical, 5, 9, 22, 101, 135, 266, 268,
315, 320, 415, 420
fuzzy, 9, 20, 260–262, 266, 270, 287, 303,
305, 306, 420
Set union, 12
drastic fuzzy, 268
fuzzy, 266–268, 287, 306
standard fuzzy, 268, 270, 316, 317, 453
Set-valued statement:
conjunctive, 223
disjunctive, 223
Shannon cross-entropy, 94, 453
Shannon entropy, 68–77, 79–84, 86–89,
91–97, 197, 198, 203, 204, 218, 325,
350, 355, 369, 408, 409, 416, 417
average, 225
conditional, 80, 81, 85, 88, 97
joint, 79, 97
normalized, 83
weighted, 97
Sigma algebra (s-algebra), 64, 102, 166,
334
Sigma count, 263
Similarity, 160, 303
Simplification problems, 358–364
SUBJECT INDEX
493
Standard fuzzy arithmetic, 273–277, 306
Standard fuzzy complement, 267, 323,
453
Standard fuzzy intersection, 268, 270,
316, 317, 453
Standard fuzzy set union, 268, 270, 316,
317, 453
State-transition relation, 35
Statistical mechanics, 3, 22
Strife, 222, 254
Strong a-cut, 263, 264, 453
Subadditive measure, 105, 453
Subadditivity, 47, 73, 197, 201, 203, 205,
209, 216, 218, 220, 223–228, 235, 239,
249, 254, 255
Subset, 11
Subsethood:
crisp, 262
fuzzy, 262
Sugeno l-measures, 144, 160, 187, 453
Superadditive measure, 105, 453
Superset, 11
Support, 263
Supremum, 15, 18
Symmetry, 73, 197, 205, 209, 254
System, 5
deterministic, 5
fuzzy, 294, 305, 451
hybrid, 297
knowledge-based, 297, 298
model-based, 297
nondeterministic, 5, 35, 86
System identification, 39, 40–43
Triangular norm (t-norm), 267, 306, 320,
453
Truth qualifier, 288
Total ignorance, 37, 129, 152, 159, 215,
240
Triangular conorm (t-conorm), 185, 186,
267, 306, 453
Vacuous probabilities, 130
Uncertainty, 1–10, 22, 95, 101, 105, 415,
420, 422–424
information-based, 424
linguistic, 321
diagnostic, 2, 5, 35
predictive, 5, 35, 87
prescriptive, 5, 35
retrodictive, 5, 35
Uncertainty function, 8, 101, 196
Uncertainty gap, 421
Uncertainty measure, 196–198
aggregate, 226–239, 248, 253, 255, 375,
388, 417, 421, 437–441, 449
Uncertainty theory, 6, 8–10, 33, 64, 188,
196–198
classical, 8, 26, 196, 252
fuzzified, 261, 262, 287
generalized, 252
Universal set, 11, 453
Upper probability function, 112, 115,
119, 130, 132, 137, 138, 162, 178, 179,
196, 237, 280, 416, 454
conditional, 122, 184
joint, 122, 183
marginal, 121
U-uncertainty, 198–206, 253, 425–429
conditional, 200, 203, 204
joint, 200–202
marginal, 200–202
Weighted average, 270
Weighted Shannon entropy, 97
NAME INDEX
Abellán, J., 254, 255, 458
Aczél, J., 95, 96, 458
Alefeld, G., 306, 458
Apostol, T.M., 23
Applebaum, D. 458
Arbib, M. A., 350, 458
Aristotle, 101
Ash, R. B., 95, 97, 459
Ashby, W. R., 97, 411, 459, 475
Atanassov, K. T., 308, 459
Attneave, F., 95, 459
Aubin, J.P., 139, 459
Auman, R.J., 139, 459
Avgers, T.G., 409, 459
Babuška, R., 307, 459
Baciu, G., 470
Ban, A.I., 139, 459
Bandler, W., 307, 350, 459
Banon, G., 187, 459
Bárdossy, G., 308, 459
Barrett, J.D., 422
Batten, D.F., 95, 408, 459
Beer, M., 351, 476
Bell, D. A., 95, 187, 459, 467
Bellman, R., 5, 459
Bělohlávek, R., 307, 350, 460
Ben-Haim,Y., 418, 421, 460, 476
Benvenuti, P., 139, 460
Berleant, D., 479
Bernays, P., 463
Bernoulli, J., 61, 138
Bezdek, J. C., 307, 351, 460, 477
Bharathi-Devi, B., 308, 460
Bhattacharya, P., 460
Billingsley, P., 95, 137, 460
Billot, A., 308, 460
Black, M., 305, 460
Black, P.K., 460
Blahut, R. E., 95, 460
Boekee, D. E., 96, 460
Boeva, V., 482
Bolaños, M.J., 138, 139, 462
Bolc, L., 460
Bonissone, P.P., 479
Booker, J.M., 473, 479
Bordley, R.F., 410, 460
Borel, É., 137
Borgelt, C., 186, 460
Borowic, P., 307, 460
Bouchon-Meunier, B., 462
Brillouin, L., 95, 460
Broekstra, G., 411, 460
Buck, B., 408, 460
Buckley, J.J., 351, 460
Cai, K.Y., 351, 460
Cano, A., 138, 461
Cantor, G., 137
Caratheodory, C., 137, 461
Carlsson, C., 461
Cauchy, A., 137
Cavallo, R.E., 408, 411, 461
Chaitin, G.J., 23, 461
Chameau, J.- L., 308, 461
Chateauneuf, A., 138, 461
Uncertainty and Information: Foundations of Generalized Information Theory, by George J. Klir
© 2006 by John Wiley & Sons, Inc.
494
NAME INDEX
Chau, C. W. R., 255, 461
Cheesman, P., 422
Chellas, B. F., 56, 461
Chen, T., 485
Cherry, C., 22, 461
Chokr, B. A., 255, 461
Choquet, G., 137, 461
Christensen, R., 408, 409, 461
Clarke, M., 462
Coletti, G., 139, 462
Colyvan, M., 423, 462
Conant, R.C., 97, 462
Cordón, O., 308, 462
Cousco, I., 476
Cover, T. M., 95, 462, 463
Cox, R. T., 423, 462
Cresswell, M.J., 56, 469
Csiszár, I., 95, 462
Daróczy, Z., 95, 96, 458, 462
De Baets, B., 307, 462, 482
De Bono, E., 306, 462
De Campos, L. M., 138, 139, 187, 188,
462, 463
De Cock, M., 306, 471
De Cooman, G., 186, 189, 463, 482
De Finetti, B., 95, 463
De Luca, A., 350, 463
Delgado, M., 463
Delmotte, F., 188, 463
Dembo, A., 463
Demicco, R.V., 308, 463
Dempster, A. P., 138, 187, 188, 422, 437,
463
Denneberg, D., 137, 139, 463
Devlin, K., 23, 463
Di Nola, A., 307, 463
Dirichlet, P.G., 137
Dockx, S., 463
Downs, T., 188, 484
Dretske, F. I., 23, 423, 463
Dubois, D., 186–188, 253, 254, 306–308,
410, 460, 462–465
Dvořák, A., 305, 465
Ebanks, B., 95, 465
Ecksehlager, K., 95, 465
Elkan, C., 422
Elsasser, W.M., 97, 465
England, J.W., 95, 475
Erdös, P., 56
Fadeev, D.K., 56
Fagin, J., 465
Fast, J.D., 97, 465
Fedrizzi, M., 461, 485
Feinstein, A., 95, 465
Feller, W., 95, 465
Fellin, W., 351, 465
Ferdinand, A. E., 465
Ferson, S., 188, 465, 479
Fiebig, D.G., 408, 482
Filev, D. P., 307, 485
Fine, T. L., 95, 138, 143, 465, 483
Fisher, R.A., 23, 466
Fodor, J., 308, 459
Folger, T.A., 472
Forte, B., 95, 96, 458, 466
Foulk, C.R., 485
Frankowska, H., 139, 459
Freiberger, P., 306, 476
Frieden, B.R., 23, 466
Fuller, R., 461
Gabbay, D.M., 463, 466
Gaines, B. R., 466
Gal, S.G., 139, 459
Garner, W.R., 95, 466
Gatlin, L.L., 95, 466
Gebhardt, J., 474
Geer, J.F., 254, 410, 466
Georgescu-Roegen, N., 95, 466
Gerla, G., 307, 466
Giachetti, R.E., 351, 466
Gibbs, J. W., 22, 466
Gil, P., 476, 466
Ginzburg, L., 465
Glasersfeld, E. von, 466
Gnedenko, B.V., 95, 466
Godo, L., 466
Goguen, J. A., 308, 350, 466
Goldman, S., 95, 466
Gomide, F., 306, 478
Good, I. J., 466, 467
Goodman, I. R., 467
Gottwald, S., 307, 467
Goutsias, J., 190, 467
Grabisch, M., 137–139, 188, 467
495
496
NAME INDEX
Gray, R.M., 95, 467
Greenberg, H.J., 467
Guan, J. W., 187, 459, 467
Guiasu, S., 95, 97, 467
Gupta, M. M., 306, 470
Hacking, I., 95, 467
Hajagos, J.G., 188, 465
Hájek, P., 188, 307, 467
Hal, J., 421
Halmos, P.R., 95, 137, 467
Halpern, J.Y, 423, 465, 467
Hamming, R.W., 467
Hankel, H., 137
Hansen, E.R., 306, 467
Harmanec, D., 188, 255, 399, 400, 410,
421, 468, 472, 479
Harmuth, H.F., 23, 468
Hartley, R.V.L., 27–29, 56, 468
Hawkins, T., 137, 468
Heinsohn, J., 474
Helton, J.C., 468
Hernandez, E., 458, 468
Herrera, F., 462
Herzberger, J., 306, 458
Higashi, M., 22, 253, 350, 408, 411, 468,
472
Hirshleifer, J., 468
Hisdal, E., 468
Hoffmann, F., 462
Höhle, U., 22, 254, 350, 469, 479
Holmes, O.W., 1
Holmes, S., 26
Huber, P. J., 137, 469
Huete, J. F., 138, 187, 462
Hughes, G.E., 56, 469
Hyvärinen, L.P., 95, 469
Ichihashi, H., 255, 475
Ihara, S., 95, 97, 469
Jacoby K., 22, 473
Jaffray, J.Y., 138, 461, 469
Jaynes, E.T., 95, 408, 409, 469
Jeffreys, H., 95, 469
Jelinek, F., 95, 469
Jiroušek, R., 469
John, R., 307, 469
Johnson, 409, 480
Jones, B., 411, 469
Jones, D.S., 95, 97, 469
Jordan, C., 137
Josang, A., 469
Joslyn, C., 187, 469, 470
Jumarie, G.M., 22, 470
Kacprzyk, J., 461, 485
Kåhre, J., 95, 470
Kaleva, O., 306, 470
Kandel, A., 470
Kapur, J. N., 255, 408, 421, 470
Karmeshu, 408, 470
Kaufmann, A., 306, 470
Kendall, D.G., 190, 470
Kennes, R., 188, 481
Kern-Isberner, G., 463, 470
Kerre, E.E., 306, 307, 462, 463, 471,
482
Kesavan, H.K., 408, 470
Khinchin, A.I., 95, 471
Kingman, J.F.C., 95, 137, 471
Kirschenmann, P.P., 471
Klawonn, F., 188, 474, 471, 476
Kleitner, G.D., 469
Klement, E.P., 188, 306, 350, 469, 471,
479
Klir, G. J., 22, 56, 137, 139, 161, 186–188,
253–255, 305–308, 350, 351, 399, 400,
408, 410, 411, 422, 461, 463, 466, 468,
470–473, 477–479, 482, 483
Knopfmacher, J., 351, 473
Kogan, I.M., 95, 473
Kohlas, J., 187, 473
Kohout, L. J., 307, 350, 459
Kolmogorov, A. N., 23, 56, 95, 423, 473
Komorovski, J., 461
Kong, A., 422, 473
Körner, J., 95, 462
Kornwachs, K., 22, 473
Kosko, B., 306, 473
Kramosil, I., 187, 188, 473
Krätschmer, V., 473
Kreinovich, V. Y., 255, 307, 461, 465, 474,
477
Krippendorff, K., 474
Kříž, O., 474
Kroese, D.P., 409, 479
Kruse, R., 186, 187, 460, 462, 474, 476
NAME INDEX
Kuhn, T. S., 308, 474
Kullback, S., 95, 474
Kyburg, H.E., 138, 474
Kyosev, Y., 307, 478
Lamata, M. T., 254, 463, 474
Lambert, J.H., 138
Lander, L., 253, 478
Lang, J., 463
Lao Tsu, 9, 355, 474
Laviolette, M., 422
Lebesgue, H., 137
Lee, C.S.G., 306, 308, 475
Lees, E.S., 474
Levi, I., 139, 186, 474, 475
Levine, R.D., 408, 474
Lewis, H.W., 474
Lewis, P.M., 474
Li, M., 23, 475
Lin, C.T, 306, 308, 475
Lindley, D.V., 422
Lingras, P., 461, 475
Lipschutz, S., 23, 475
Loo, S.G., 351, 475
Macaulay, V.A., 408, 460
Machida, M., 476
Mackay, D.M., 475
Madden, R.F., 411, 475
Maeda, Y., 255, 475, 481
Magdalena, L., 462
Mahler, R.P.S., 467
Malik, D.S., 476
Malinowski, G., 307, 475
Manes, E. G., 350, 458
Mansuripur, M., 95, 475
Manton, K. G., 351, 475
Mareš, M., 306, 475
Mariano, M., 253, 408, 472, 475
Marichal, J., 255, 475
Martin, N.F.G., 95, 475
Mathai, A. M., 95, 475
Matheron, G., 190, 475
Maung, I., 476
McAnally, D., 469
McLaughlin, S., 95, 409, 476
McNeil, D., 306, 476
Mendel, J.M., 307, 476
Menger, K., 306, 476
Mesiar, R., 139, 460, 471
Meyer, K.D., 474
Meyerowitz, A., 255, 476
Miranda, E., 138, 476
Močkoř, J., 477
Molchanov, I., 190, 476
Moles, A., 95, 476
Möller, B., 351, 476
Monney, P.A., 187, 473
Moore, R. E., 306, 476
Moral, S., 138, 254, 255, 458, 461–463,
473, 474
Mordeson, J.N., 351, 476
Murofushi, T., 138, 139, 467, 476
Myers, D., 465
Nagel, E., 466
Nair, P.S., 351, 476
Natke, H.G., 476
Nauck, D., 308, 476
Negoita, C.V., 307, 351, 477
Neumaier, A., 306, 477
Ng, C.T., 96, 458
Nguyen, H. T., 190, 306, 307, 464, 467,
474, 475, 477, 485
Norton, J., 188, 477
Novák, V., 307, 477
Oberkampf, W.L., 468
Ovchinnikov, S., 485
Padet, C., 57, 410, 477, 478
Pal, N. R., 351, 477
Pan, Y., 188, 306, 351, 468, 472, 477
Pap, E., 137, 139, 188, 471, 477
Paris, J.B., 409, 477
Parkinson, W.J., 479
Parviz, B., 254, 410, 472
Pavelka, J., 307, 477
Pawlak, Z., 308, 478
Peano, G., 137
Pedrycz, W., 306, 463, 478, 479
Peeva, K., 307, 478
Peirce, C.S., 260
Perfilieva, I., 477
Piegat, A., 478
Pinsker, M.S., 478
Pittarelli, M., 411, 468, 474, 478
Pöhlmann, S., 188, 483
497
498
NAME INDEX
Pollack, H.N., 22, 478
Prade, H., 186–188, 253, 254, 306–308,
410, 460, 463–465, 478
Press, S.J., 478
Quaio, Z., 478
Quastler, H., 95, 478
Radon, I., 137
Ragin, C.C., 308, 478
Ralescu, D.A., 307, 351, 477
Ramer, A., 57, 253, 254, 472, 478
Ras, Z. W., 461
Rathie, P. N., 95, 475
Recasens, J., 468
Reche, F., 139, 478
Regan, H.M., 188, 479
Reichenbach, H., 95, 479
Rényi, A., 29, 30, 56, 95, 96, 196, 423, 479
Rescher, N., 307, 479
Resconi, G., 188, 468, 479
Resnikoff, H.L., 479
Reza, F.M., 95, 97, 479
Richman, F., 476
Rieman, E.M., 137
Riley, J.G., 468
Rissanen, J., 479
Rodabaugh, S.E., 350, 469, 479
Rose, L.L., 485
Rosenkrantz, R. D., 408, 479
Ross, T.J., 351, 479
Roubens, M., 255, 475
Rouvray, D.H., 308, 479
Ruan, D., 463
Rubinstein, R.Y., 409, 479
Ruspini, E.H., 306, 479
Russell, B., 369, 479
Rutkowska, D., 308, 479
Rutkowski, L., 308, 479
Saffioti, A., 422
Sahoo, P., 465
Salmerón, A., 139, 478
Sanchez, E., 307, 463, 479
Sancho-Royo, A., 308, 479
Sanders, W., 465
Sandri, S., 466
Santamarina, J. C., 308, 461
Sarma, V.V.S., 308, 460
Savage, L.J., 95, 480
Schubert, J., 480
Schwecke, E., 188, 471, 474
Schweizer, B., 306, 480
Scozzafava, R., 139, 462
Seaman, J.W., 422
Sentz, K., 305, 465, 472
Sessa, S., 463
Sgarro, A., 188, 480
Shackle, G. L. S., 1, 186, 480
Shafer, G., 138, 187, 188, 422, 480
Shannon, C. E., 68, 95, 480
Shapcott, C.M., 459
Shapley, L.S., 139, 459, 480
Shekhter, V., 477
Shore, J.E., 409, 480
Shu, Q., 474
Siegel, P., 474
Sims, J.R., 137, 481
Sklar, A., 306, 480
Slepian, D., 95, 481
Sloane, N.J., 95, 481
Smets, P., 188, 190, 466, 481
Smith, R.M., 255, 442, 472, 481
Smith, S.A., 409, 481
Smithson, M., 481
Smuts, J. C., 308, 481
Spiegelhalter, D.J., 422
St. Clair, U., 479
Stonier, T., 23, 481
Sugeno, M., 137–139, 187, 467, 476, 481
Sugihara, K., 481
Tanaka, H., 188, 481
Taranola, A., 409, 481
Taylor, S.J., 137, 471
Temple, G., 137, 481
Teng, C.M., 474
Termini, S., 350, 463
Theil, H., 95, 408, 482
Thomas, J.A., 95, 462, 463
Thomson, W. (Lord Kelvin), 2, 482
Tolley, H.D., 475
Tong, R. M., 485
Tribus, M., 408, 474, 482
Tsiporkova, E., 188, 482
Tyler, S.A., 315, 482
NAME INDEX
Vajda, I., 482
Van Der Lubbe, J.C.A., 96, 460, 482
Van Leekwijck, W., 307, 482
Vejnarová, J., 187, 254, 469, 482
Velverde, L., 462
Vencovská, A., 409, 477
Verdegay, J.L., 308, 479
Verdu, S., 95, 482
Vicig, P., 482
Viertl, R., 351, 482
Vitányi, P., 23, 475, 482
Volterra, V., 137
Walker, E.A., 306, 350, 467, 476, 477, 482
Walley, P., 138, 187, 189, 482, 483
Wallsten, T.S., 351, 485
Wang, J., 187, 483
Wang, L.S., 484
Wang, W., 483
Wang, Z., 137, 139, 161, 187, 351, 472,
481, 483
Watanabe, S., 95, 408, 483
Watson, S.R., 422
Way, E. C., 408, 411, 472
Weaver, W., 3, 4, 95, 483
Webber, M. J., 95, 408, 483
Weber, S., 138, 188, 483
Weichselberger, K., 188, 483
Weierstrass, K., 137
Weir, A.J., 137, 483
Weltner, K., 95, 483
Whittemore, B.J., 423, 483
Wierman, M., 410, 472, 483
KOLXO3
12:48 pm, 11/9/05
499
Wierzchón, S. T., 187, 483
Williams, P.M., 409, 483
Williamson, R.C., 188, 484
Wilson, A.G., 408, 484
Wilson, N., 188, 484
Wolf, R. G., 307, 484
Wolkenhauer, O., 186, 484
Wong, S. K. M., 461, 475, 484
Wonneberger, S., 410, 484
Woodal, W.H., 422
Woodbury, M.A., 475
Wu, K., 485
Wymer, A. D., 95, 484
Wyner, A., 481
Yager, R. R., 22, 186–188, 254, 255, 305,
307, 350, 461, 462, 478, 484, 485
Yaglom, A. M., 95, 485
Yaglom, I. M., 95, 485
Yam, Y., 474
Yang, M., 351, 485
Yao, Y.Y., 484
Yen, J., 351, 485
Yeung, R.W., 95, 485
Yovits, M.C., 423, 483, 485
Young, P.E., 351, 466
Yu, F.T.S., 95, 485
Yuan, B., 56, 254, 305–308, 351, 473, 477
Zadeh, L. A., 186, 188, 305, 307, 350, 351,
423, 485, 486
Zimmermann, H. J., 306, 486
Zwick, R., 351, 411, 486