Logic, both in mathematics and in common speech, relies on clear notions of truth and falsity. Information that is either true or false is known as Boolean logic. For example, consider a statement such as "Unless I turn the lights on, the room will be dark." It leaves no room for uncertainty.
However, what about daylight coming in the windows? Everyday life presents us with many situations in which the accumulation of evidence leads to a conclusion. For example, a small amount of oil underneath a car in your driveway may be no great cause for concern. You may remember that you recently changed the car's oil, spilling some on the driveway in the process. If so, the oil stain evidence may be unimportant; if not, you may have your car checked for an oil leak, or wait to see if another oil stain shows up underneath your car when you park it in different areas.
Bayesian probability theory is a branch of mathematical probability theory that allows one to model uncertainty about the world and outcomes of interest by combining common-sense knowledge and observational evidence.
A belief network is:
a set of variables,
a graphical structure connecting the variables, and
a set of conditional distributions.
A belief network is commonly represented as a graph, which is a set of vertices and edges. The vertices, or nodes, represent the variables and the edges or arcs, represent the conditional dependencies in the model. The absence of an arc between two variables indicates conditional independence; that is, there are no situations in which the probabilities of one of the variables depends directly upon the state of the other.
Construction of a belief network follows a common set of guidelines:
Include all variables that are important in modeling your system.
Use causal knowledge to guide the connections made in the graph
Use your prior knowledge to specify the conditional distributions.
Causal knowledge in this context means linking variables in the model in such a way that arcs lead from causes to effects.
A variable is an element of a probability model that can take on a set of different values that are mutually exclusive and exhaustive. Consider a medical experiment in which men and women of different ages are studied. Some of the relevant variables would be the gender of the participant, the age of the participant and the experimental result. The variable gender has only two possible values: male or female. The variable age, on the other hand, can take on many values.
While probability theory as a whole can handle variables of both types, MSBNX can only accept variables with a limited number of conditions. We call these discrete variables, and we call each of the set of possible conditions a state.
How would you handle the age variable in MSBNX? Most likely, you would create a variable where each state represents a range of years. For example:
Child (0-18)
Adult (18-45)
Middle Aged (45-65)
Senior (65+)
This process is known as discretization.
In probability theory, there is no a priori way of knowing which variables influence other variables. In general, the complete or joint probability distribution must be known to correctly perform inference. For a real-world model, the joint distribution is usually very large and cannot be directly stored on a computer.
One of the primary roles of a Bayesian model is to allow the model creator to use commonsense and real-world knowledge to eliminate needless complexity in the model. For example, a model builder would be likely to know that the time of day would not normally directly influence a car's oil leak. Any such influence would be based on other, more direct factors, such as temperature and driving conditions.
The method used to remove meaningless relationships in a Bayesian model is to explicitly declare the meaningful ones. After establishing all the variables in a model, you must deliberately associate variables that cause changes in the system to those variables that they influence. Only those influences are considered.
These influences are represented by conditioning arcs between nodes. Each arc should represent a causal relationship between a temporal antecedent (known as the parent) and is later outcome (known as the child).
Inference, or model evaluation, is the process of updating probabilities of outcomes based upon the relationships in the model and the evidence known about the situation at hand.
When a Bayesian model is actually used, the end user applies evidence about recent events or observations. This information is applied to the model by "instantiating" or "clamping" a variable to a state that is consistent with the observation. Then the mathematical mechanics are performed to update the probabilities of all the other variables that are connected to the variable representing the new evidence.
After inference, the updated probabilities reflect the new levels of belief in (or probabilities of) all possible outcomes coded in the model. These beliefs are mediated by the original assessment of belief performed by the author of the model.
The beliefs originally encoded in the model are known as prior probabilities, because they are entered before any evidence is known about the situation. The beliefs computed after evidence is entered are known as posterior probabilities, because they reflect the levels of belief computed in light of the new evidence.
To recap, every variable in the real world situation is represented by a Bayesian variable. Each such variable describes a set of states that represent all possible distinct situations for the variable.
Once the set of variables and their states are known, the next step is to define the causal relationships among them. For any variable, this means asking the questions:
What other variables (if any) directly influence this variable?
What other variables (if any) are directly influenced by this variable?
In a standard Bayesian belief network, each variable is represented by a colored ellipse; this graphical representation is called a node.
Each causal influence relationship is described by a line (or arc) connecting the influencing variable to the influenced variable. The influence arc has a terminating arrowhead pointing to the influenced variable.
The common terminology, then is as follows:
A node is a Bayesian variable.
An arc connects a parent (influencing) node to a child (influenced) node.
To create a belief network, then, the following steps are necessary.
Create a set of variables representing the distinct elements of the situation being modeled.
For each such variable, define the set of outcomes or states that each can have. This set is referred to in the mathematical literature as "mutually exclusive and exhaustive," meaning that it must cover all possibilities for the variable, and that no important distinctions are shared between states.
Establish the causal dependency relationships between the variables. This involves creating arcs (lines with arrowheads) leading from the parent variable to the child variable.
Assess the prior probabilities. This means supplying the model with numeric probabilities for each variable in light of the number of parents the variable was given in Step 3.
See Model Creation for more information.
After the model has been developed, MSBNX allows you to model or mimic real-world situations by entering or removing evidence on the model. As you do so, there are various methods for displaying the resulting probabilities.
See the topic Evaluation Window for more information.
When inference is performed on a model, there are various mathematical schemes for discovering which pieces of evidence would be the most important to discover. The algorithms or procedures produce a list of variables that have not been given evidence. Simply put, they are ordered by the degree to which the application of evidence would simplify the situation.
See Troubleshooting for more information.
The following books are useful guides in this field.
An Introduction to Bayesian Networks. Finn V. Jensen, Springer-Verlag, New York NY, 1996. ISBN 0-387-91502-8.
Probabilistic Reasoning in Intelligent Systems. Judea Pearl, Morgan Kaufmann, San Mateo CA, 1988. ISBN 0-934613-73-7.
Computational Intelligence. David Poole, et al., Oxford University Press, New York NY, 1998. ISBN 0-19-510270-3.