Belief propagation

Belief propagation is an iterative algorithm for computing marginals of functions on a graphical model most commonly used in artificial intelligence and information theory. Judea Pearl in 1986 and Lauritzen and Spiegelhalter also in 1986 formulated independently this algorithm, also known as the sum-product algorithm. It is provably efficient on trees and has demonstrated empirical success in numerous applications including low-density parity-check codes, turbo codes, free energy approximation, and satisfiability. It is commonly used in pairwise Markov random fields (MRFs with a maximum clique size of 2), Bayesian networks, and factor graphs.

Recall that the marginal distribution of a single variable [itex]X_i[itex] is simply the summation of a joint distribution over all variables except [itex]X_i[itex], and let [itex]\mathbf{x}[itex] be an assignment of all variables in the joint distribution:

[itex]P(x_i) = \sum_{\mathbf{x}: X_i=x_i} P(\mathbf{x})[itex]

For the purposes of explaining this algorithm, consider the marginal function, which is simply an unnormalized marginal distribution with a generic global function [itex]g(\mathbf{x})[itex]:

[itex]z(x_i) = \sum_{\mathbf{x}: X_i=x_i} g(\mathbf{x})[itex]
 Contents

Exact algorithm for trees

This algorithm functions by passing real-valued messages across edges in a graphical model. More precisely, in trees: a vertex sends a message to an adjacent vertex if (a) it has received messages from all of its other adjacent vertices and (b) hasn't already sent one. So in the first iteration, the algorithm will send messages from all leaf nodes to the lone vertex adjacent to its respective leaf and continues sending messages in this manner until all messages have been sent exactly once, hence explaining the term propagation. It is easily proven that all messages will be sent (there are twice the number of edges of them). Upon termination, the marginal of a variable is simply the product of the incoming messages of all its adjacent vertices. A simple proof of this fact, though somewhat messy, can be done by mathematical induction.

The message definitions will be described in the factor graph setting, as the algorithms for other graphical models are nearly identical. Since factor graphs have variable and factor nodes, there are two types of messages to define:

A variable message is a real-valued function that is a message sent from a variable to a factor, and defined as

[itex]X_n\rightarrow f_m(x_n) = \prod_{f_i\in N(X_m)\setminus \{f_m\}} f_i\rightarrow X_n(x_n).[itex]

A factor message is a real-valued function that is a message sent from a factor to a variable, and defined as

[itex]f_m\rightarrow X_n(x_n) = \sum_{\mathbf{x_m}:X_n=x_n} f_m(\mathbf{x_m}) \prod_{X_i\in N(f_m)\setminus \{X_n\}} X_i\rightarrow f_m(x_i),[itex]

where [itex]N(u)[itex] is defined as the set of neighbours (adjacent vertices in a graph) of a vertex [itex]u[itex]. [itex]X_m[itex] is the set of vertices adjacent to [itex]f_m[itex] and [itex]\mathbf{x_m}[itex] is an assignment to these vertices.

As mentioned in the description of the algorithm, the marginal of [itex]X_i[itex] can be computed in the following manner:

[itex]z(x_i) = \prod_{f_j\in N(X_i)} f_j\rightarrow X_i(x_i)[itex]

One can also compute the marginal of a factor [itex]f_j[itex], equivalently, the marginal of the subset of variables [itex]X_j[itex] in the following manner:

[itex]z(\mathbf{x_j}) = f_j(\mathbf{x_j})\prod_{X_i\in N(f_j)} X_i\rightarrow f_j(x_i)[itex]

Approximate algorithm for general graphs

Curiously, nearly the same algorithm is used in general graphs and then the algorithm is sometimes called loopy belief propagation because graphs typically contain cycles, or loops. The procedure must be adjusted slightly because graphs might not contain any leaves. Instead, one initializes all variable messages to 1 and uses the same message definitions above. It is easy to show that in a tree, the message definitions of this modified procedure will converge to the set of message definitions given above in the diameter of the tree.

There are other approximate methods for marginalization including variational methods and Monte Carlo methods.

One method of exact marginalization in general graphs is called the junction-tree algorithm, which is simply belief propagation on a modified graph guaranteed to be a tree. The basic premise is to eliminate cycles by clustering them into single nodes.

Related algorithm and complexity issues

A similar algorithm is commonly referred to as the Viterbi algorithm, but also known as the max-product or min-sum algorithm, which solves the related problem of maximization, or most probable explanation. Instead of attempting to solve the marginal, the goal here is to find the values [itex]\mathbf{x}[itex] that maximises the global function (i.e. most probable values in a probabilistic setting), and it can be defined using the arg max:

[itex]\arg\max_{\mathbf{x}} g(\mathbf{x})[itex]

An algorithm that solves this problem is nearly identical to belief propagation, with the sums replaced by maxes in the definitions.

It is worth noting that inference problems like marginalization and maximization are NP-hard to solve exactly and approximately (at least for relative error) in a graphical model. More precisely, the marginalization problem defined above is #P-complete and maximization is NP-complete.

References

• Frey, Brendan (1998). Graphical Models for Machine Learning and Digital Communication. MIT Press
• Mackay, David (2003). Exact Marginalization in Graphs. In David Mackay, Information Theory, Inference, and Learning Algorithms, pp. 334340. Cambridge: Cambridge University Press.
• Pearl, Judea (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (2nd edition). San Francisco: Morgan Kaufmann Publishers, Inc.

• Art and Cultures
• Countries of the World (http://www.academickids.com/encyclopedia/index.php/Countries)
• Space and Astronomy