A common use of Bayesian belief network models is to diagnosis system failures in a probabilistic framework.
This has several advantages over conventional rule-based or decision-tree methods, since Bayes nets support uncertain evidence in a theoretically correct fashion. In addition, prior distributions in Bayes nets can be built that model logical functions such as AND, OR and NOT using what are known as deterministic nodes; that is, nodes whose distributions contain only zeroes and ones. Such nodes, therefore, act as logic gates.
A key question in decision theory is this: In the current evidence setting, what new evidence would most effectively lead to a clear diagnosis? Often known as the value of information, information theory provides mathematical approaches to answering this question.
MSBNX supports two algorithms that use information theory to order or rank variables in a Bayes net according to their information weight or influence.
In either scenario, variables or nodes in the model play certain roles. These roles are also known as labels, and must be assigned correctly or the results cannot be interpreted.
Both methods produce as a result an ordered list of variables ranked by a value of information score. In a typically implementation, for example, this list would determine the order of questions being asked of a diagnostician or technician.
In such a model, variables are assigned one of two roles:
Hypothesis Node. Also known as a hidden variable, this is typically a variable that cannot be directly observed. It is the target or purpose of the overall diagnosis.
Information Node. An observable variable that influences the hypothesis node(s) in the model.
There may be other nodes in the model which are not labeled; although they influence inference in the normal way, they do not otherwise enter into the diagnostic process.
Utility-based diagnosis uses mutual information to compute the amount of weight or "lift" that evidence about the state of each information node would bring to each hypothesis variable. The resulting ranking of uncertain (undetermined) information nodes is used to expedite the diagnostic process.
In addition to being assigned to roles, variables in a troubleshooting model are also given one or more costs. The belief network author may consider that these costs are measured in dollars (or other monetary currency), time (in minutes or seconds) or any other unit that is consistent with the problem formulation.
Troubleshooting uses an algorithm that iterates over all reasonable repair plans in an attempt to find the ones with the highest likelihood of success at the cheapest cost. The result is a list of nodes, ordered by cost. Establishing evidence about the top-ranked (cheapest) node is guaranteed (within the limits of the model) to lead to correct diagnosis in the shortest and cheapest number of steps.
To perform mutual information diagnosis in a model:
At least one node in a diagnostic model must be identified as an hypothesis node
At least two of its nodes must be identified as information nodes.
There are three types of costs that are important in a troubleshooting model.
Cost to Observe. This is the cost of observing a symptom or sensor. For example, testing the battery of a car or running a blood test for gram-positive bacteria.
Cost to Fix. This is the cost of fixing or replacing a component in a system. For example, replacing the power supply in a computer.
Service cost. This is a cost assigned to the network or model as a whole. In other words, it indicates the cost that would be expected if the diagnostic operation failed. For example, if a computer server in a network could not be repaired through diagnosis it would have to be replaced.
Each of the different possible roles of variables in troubleshooting networks may have either a cost to observe, a cost to fix, both or neither. The service cost of the model is treated as the cost to fix for the entire model as a whole. Note: The service cost is required to perform troubleshooting.
The roles of variables in a troubleshooting model and their costs are as follows
Name |
Costs Allowed |
Purpose |
informational |
observe |
Used to define observable evidentiary variables |
problem-defining |
fix |
Used to define primary symptoms of failure; that is, the element of the model that is the target of the diagnosis. |
fixable and observable |
observe and fix |
Used to define observable and replaceable elements |
fixable but unobservable |
fix |
Used to define elements that can only be replaced or repaired |
unfixable |
neither |
Used to define elements that can neither be fixed or observed |
other |
neither |
Used to define variables that play no direct part in the diagnostic process. These may be deterministic or "modal" variables that reshape the problem in a logical fashion. |
The most vital part of a troubleshooting network is its problem nodes. These nodes must be declared in a particular manner: state zero (the first state declared) must be associated with the normal behavior of the component or element. All other states must be associated with the mutually exclusive and exhaustive set of states associated with failure modes of the component. Many problem nodes have only two states: "Works" and "Doesn't Work". If a problem node has more than two states, they must correlate to clearly distinct situations. Consider a computer printer with four states: "Works", "No Paper is Output" "Printing is Very Slow", and "Printing is Garbled". In each case, observation allows its problem state to be distinguished. (There is, however, some ambiguity-- consider a case where printing is both garbled and slow.)
Multiple problem nodes may be defined, but only one is actually considered during any given troubleshooting session.
The mechanics of troubleshooting work as follows.
One of the problem nodes is set (instantiated) to one of its problem states (that is, a state other than state zero).
The Troubleshooting Recommendations algorithm is run, and a ranked list of nodes is returned, each with its predicted utility.
The highest (first) variable in the ranked list is the one with the lowest cost. The technician or diagnostician would then attempt to gather evidence about this variable.
The evidence found about the highest ranked variable is entered as evidence into the model. Alternatively, evidence can be entered for any other uninstantiated node in the collection.
Return to step 2.
The evaluation window of MSBNX will attempt to determine the correct type of procedure to perform. The rules it uses are as follows.
If there is a node labeled as problem-defining, then troubleshooting diagnosis is enabled.
If there is a node labeled as hypothesis, then mutual information diagnosis is enabled.
Otherwise, diagnosis is disabled.
If other criteria for diagnosis are not met, an error message will indicate the situation.
See Web Links for Microsoft Research Technical Reports about troubleshooting under uncertainty.