概率图模型中的变分近似推理方法

程强; 陈峰; 董建武; 徐文立

doi:10.3724/SP.J.1004.2012.01721

概率图模型中的变分近似推理方法

doi: 10.3724/SP.J.1004.2012.01721

1.
清华信息科学与技术国家实验室清华大学自动化系北京 100084

详细信息

通讯作者:
陈峰

计量
- 文章访问数: 3351
- HTML全文浏览量: 118
- PDF下载量: 5037
- 被引次数: 0
出版历程
- 收稿日期: 2011-12-08
- 修回日期: 2012-06-15
- 刊出日期: 2012-11-20

概率图模型中的变分近似推理方法

1.
Tsinghua National Laboratory for Information Science and Technology, Department of Automation, Tsinghua University, Beijing 100084

摘要

摘要:
概率图模型将图论和概率论相结合, 为多个变量之间复杂依赖关系的表示提供了统一的框架, 在计算机视觉、自然语言处理和计算生物学等领域有着广泛的应用. 概率推理(包括计算边缘概率和计算最大概率状态等问题)是概率图模型研究及应用的核心问题. 本文主要介绍概率图模型近似推理方法中变分推理的最新研究成果. 在变分近似推理的框架下, 系统地归纳了概率图模型推理问题的基本研究思路, 综述了目前主要的近似推理方法, 并分析了近似算法的单调性、收敛性和全局性等性质. 最后, 对概率图模型近似推理方法的研究方向和应用前景作了展望.
- 概率图模型 /
- 贝叶斯网 /
- 马尔科夫随机场 /
- 近似推理 /
- 变分法 /
- 对偶分解
Abstract:
概率图模型将图论和概率论相结合, 为多个变量之间复杂依赖关系的表示提供了统一的框架, 在计算机视觉、自然语言处理和计算生物学等领域有着广泛的应用. 概率推理(包括计算边缘概率和计算最大概率状态等问题)是概率图模型研究及应用的核心问题. 本文主要介绍概率图模型近似推理方法中变分推理的最新研究成果. 在变分近似推理的框架下, 系统地归纳了概率图模型推理问题的基本研究思路, 综述了目前主要的近似推理方法, 并分析了近似算法的单调性、收敛性和全局性等性质. 最后, 对概率图模型近似推理方法的研究方向和应用前景作了展望.
- Graphical models /
- Bayesian network /
- Markov random field (MRF) /
- approximate inference /
- variational method /
- dual decomposition

HTML全文

参考文献(1)

[1]

Wainwright M J, Jordan M. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 2008, 1(1-2): 1-305[2] Koller D, Friedman N. Probabilistic Graphical Models Principles and Techniques. Cambridge, MA: MIT Press, 2009[3] Tappen M F, Freeman W T. Comparison of graph cuts with belief propagation for stereo, using identical MRF parameters. In: Proceedings of the 9th IEEE International Conference on Computer Vision. Beijing, China: IEEE, 2003. 900-906[4] Chu Yi-Ping, Zhang Yin, Ye Xiu-Zi, Zhang San-Yuan. Adaptive video segmentation algorithm using hidden conditional random fields. Acta Automatica Sinica, 2007, 33(12): 1252-1258 (褚一平, 张引, 叶修梓, 张三元. 基于隐条件随机场的自适应视频分割算法. 自动化学报, 2007, 33(12): 1252-1258)[5] Roth S, Black M J. Fields of experts: a framework for learning image priors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE, 2005. 860-867[6] Freeman W T, Pasztor E C, Carmichael O T. Learning low-level vision. International Journal of Computer Vision, 2000, 40(1): 25-47[7] Kwatra V, Schdl A, Essa I, Turk G, Bobick A. Graphcut textures: image and video synthesis using graph cuts. ACM Transactions on Graphics, 2003, 22(3): 277-286[8] Komodakis N, Tziritas G. Image completion using global optimization. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2006. 442-452[9] Huang X D, Acero A, Hon H W. Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. New Jersey: Prentice Hall PTR, 2001[10] Rush A M, Sontag D, Collins M, Jaakkola T. On dual decomposition and linear programming relaxations for natural language processing. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Massachusetts, USA: Association for Computational Linguistics, 2010. 1-11[11] McEliece R J, MacKay D J C, Cheng J F. Turbo decoding as an instance of Pearl's "belief propagation" algorithm. IEEE Journal on Selected Areas in Communications, 1998, 16(2): 140-152[12] Gallager R G. Low density parity check codes. IEEE Transactions on Information Theory, 1962, 8(1): 21-28[13] Kalman R E. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 1960, 82(1): 35-45[14] Kschischang F R, Frey B J, Loeliger H A. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 2001, 47(2): 498-519[15] McAuliffe J D, Pachter L, Jordan M I. Multiple-sequence functional annotation and the generalized hidden Markov phylogeny. Bioinformatics, 2004, 20(12): 1850-1860[16] Siepel A, Haussler D. Combining phylogenetic and hidden Markov models in biosequence analysis. Journal of Computational Biology, 2004, 11(2-3): 413-428[17] Yanover C, Meltzer T, Weiss Y. Linear programming relaxations and belief propagation—— an empirical study. Journal of Machine Learning Research, 2006, 7: 1887-1907[18] Yanover C, Schueler-Furman O, Weiss Y. Minimizing and learning energy functions for side-chain prediction. Journal of Computational Biology, 2008, 15(7): 899-911[19] Jordan M I, Ghahramani Z, Jaakkola T, Saul L K. An introduction to variational methods for graphical models. Machine Learning, 1999, 37(2): 183-233[20] Gilks W R, Richardson S, Spiegelhalter D J. Markov Chain Monte Carlo in Practice. London: Chapman and Hall, 1996[21] Pearl J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo: Morgan Kaufmann, 1988[22] Yedidia J S, Freeman W T, Weiss Y. Constructing free energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 2005, 51(7): 2282-2312[23] Aji S M, McEliece R J. The generalized distributive law. IEEE Transactions on Information Theory, 2000, 46(2): 325-343[24] Park J D, Darwiche A. Complexity results and approximation strategies for MAP explanations. Journal of Artificial Intelligence Research, 2004, 21(1): 101-133[25] Doucet A, Godsill S J, Robert C P. Marginal maximum a posteriori estimation using Markov chain Monte Carlo. Statistics and Computing, 2002, 12: 77-84[26] Huang J B, Chavira M, Darwiche A. Solving MAP exactly by searching on complied arithmetic circuits. In: Proceedings of the 21st National Conference on Artificial Intelligence. Menlo Park, USA: AAAI, 2006. 1143-1148[27] Liu Q, Ihler A. Variational algorithms for marginal MAP. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence. Barcelona, Spain: AUAI, 2011. 453-462[28] Jiang J R, Rai P, Hal D. Message-passing for approximate MAP inference with latent variables. In: Proceedings of the Conference on Neural Information Processing Systems. Granada, Spain: MIT Press, 2011. 1197-1205[29] Cheng Q, Chen F, Dong J W, Xu W L. Approximating the sum operation for marginal-MAP inference. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence. Toronto, Canada, 2012, 1882-1887[30] Weiss Y. Correctness of local probability propagation in graphical models with loops. Neural Computation, 2000, 12(1): 1-41[31] Kolmogorov V, Wainwright M. On the optimality of tree-reweighted max-product message passing. In: Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence. Edinburgh, Scotland: AUAI, 2005. 316-323[32] Schraudolph N N, Kamenetsky D. Efficient exact inference in planar Ising models. In: Proceedings of the Conference on Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2009. 1417-1424[33] Schraudolph N N. Polynomial-time exact inference in NP-hard binary MRFs via reweighted perfect matching. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. Sardinia, Italy: JMLR W CP, 2010. 717-724[34] Cowell R G, Dawid P, Lauritzen S L, Spiegelhalter D J. Probabilistic Networks and Expert Systems. Berlin: Springer, 1999[35] Wainwright M J, Jaakkola T S, Willsky A S. MAP estimation via agreement on trees: message-passing and linear programming. IEEE Transactions on Information Theory, 2005, 51(11): 3697-3717[36] Wainwright M J, Jaakkola T S, Willsky A S. Tree-based reparameterization framework for analysis of sum-product and related algorithms. IEEE Transactions on Information Theory, 2003, 49(5): 1120-1146[38] Cover T M, Thomas J A. Elements of Information Theory. New York: Wiley-Interscience, 2006[37] Aji S M, McEliece R J. The generalized distributive law and free energy minimization. In: Proceedings of the 39th Allerton Conference on Communication, Control, and Computing. Monticello, USA: IEEE, 2001. 672-681[39] Cheng Q, Chen F, Xu W L, Wang S. Recursive sum-product algorithm for generalized outer-planar graphs. Information Processing Letters, 2012, 112(11): 449-456[40] Kolmogorov V. Convergent Tree-reweighted message passing for energy minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(10): 1568-1583[41] Kolmogorov V, Zabih R. What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(2): 147-159[42] Globerson A, Jaakkola T. Approximate inference using planar graph decomposition. In: Proceedings of the 2007 Conference on Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2007. 473-480[43] Van Engelen R A. Approximating Bayesian belief networks by arc removal. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(8): 916-920[44] D'Ambrosio B. Incremental probabilistic inference. In: Proceedings of the 9th Conference on Uncertainty in Artificial Intelligence. Washington, USA: AUAI, 1993. 301-308[45] Poole D. Average-case analysis of a search algorithm for estimating prior and posterior probabilities in Bayesian networks with extreme probabilities. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence. Chambery, France: AAAI, 1993. 606-612[46] Wiegerinck W, Heskes T. Fractional belief propagation. In: Proceedings of the 2003 Conference on Neural Information Processing Systems. Whistler, Canada: MIT Press, 2003. 438-445[47] Heskes T. On the uniqueness of loopy belief propagation fixed points. Neural Computation, 2004, 16(11): 2379-2413[48] Wainwright M J, Jaakkola T S, Willsky A S. A new class of upper bounds on the log partition function. IEEE Transactions on Information Theory, 2005, 51(7): 2313-2335[49] Meshi O, Jaimovich A, Globerson A, Friedman N. Convexifying the Bethe free energy. In: Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. Montreal, Canada: AUAI, 2009. 402-410[50] Heskes T. Convexity arguments for efficient minimization of the Bethe and Kikuchi free energies. Journal of Artificial Intelligence Research, 2006, 26(1): 153-190[51] Komodakis N, Paragios N, Tziritas G. MRF energy minimization and beyond via dual decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(3): 531-552[52] Sontag D, Globerson A, Jaakkola T. Introduction to dual decomposition for inference. Optimization for Machine Learning. Cambridge, USA: MIT Press, 2011. 1-37[53] Opper M, Saad D. Advanced Mean Field Methods: Theory and Practice. Cambridge, USA: MIT Press, 2001[54] Xing E P, Jordan M I, Russell S. A generalized mean field algorithm for variational inference in exponential families. In: Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence. Acapulco, Mexico: AUAI, 2003. 538-591[55] Wiegerinck W. Variational approximations between mean field theory and the junction tree algorithm. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence. Stanford, USA: AUAI, 2000. 626-633[56] Saul L K, Jordan M I. Exploiting tractable substructures in intractable networks. In: Proceedings of the 1997 Conference on Neural Information Processing Systems. Denver, USA: MIT Press, 1997. 486-492[57] Plefka T. Convergence condition of the TAP equation for the infinite-ranged Ising spin glass model. Journal of Physics A: Mathematical and General, 1982, 15(6): 1971-1978[58] Leisink M A R, Kappen H J. Learning in higher order Boltzmann machines using linear response. Neural Networks, 2000, 13(3): 329-335[59] Minka T P. A Family of Algorithms for Approximate Bayesian Inference [Ph.D. dissertation], Massachusetts Institute of Technology, Cambridge, MA, USA, 2001[60] Minka T P. Expectation propagation for approximate Bayesian inference. In: Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence. Seattle, USA: AUAI, 2001. 362-369[61] Minka T, Qi Y. Tree-structured approximations by expectation propagation. In: Proceedings of the 2003 Conference on Neural Information Processing Systems. Whistler, Canada: MIT Press, 2003. 1-8[62] Pakzad P, Anantharam V. Belief propagation and statistical physics. In: Proceedings of the 2002 Conference on Information Sciences and Systems. Princeton, USA: IEEE, 2002. 1-3[63] Pakzad P, Anantharam V. Estimation and marginalization using the Kikuchi approximation methods. Neural Computation, 2005, 17(8): 1836-1873[64] Tatikonda S C, Jordan M I. Loopy belief propagation and Gibbs measures. In: Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence. Alberta, Canada: AUAI, 2002. 493-500[65] Ihler A T, Fisher J W, Willsky A S, Chickering M. Loopy belief propagation: convergence and effects of message errors. Journal of Machine Learning Research, 2006, 6(1): 905-936[66] Chen Feng, Liu Hong, Xu Wen-Li. Recursive belief propagation—— belief propagation in a well-order. Acta Automatica Sinica, 2010, 36(8): 1091-1098 (陈峰, 刘红, 徐文立. 递推信度传播算法——按良序的信度传播. 自动化学报, 2010, 36(8): 1091-1098)[67] Meltzer T, Globerson A, Weiss Y. Convergent message passing algorithms: a unifying view. In: Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. Montreal, Canada: AUAI, 2009. 393-401[68] Globerson A, Jaakkola T. Fixing max-product: convergent message passing algorithms for MAP LP-relaxations. In: Proceedings of the 2008 Conference on Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2008. 553-560[69] Werner T. A linear programming approach to max-sum problem: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(7): 1165-1179[70] Hazan T, Shashua A. Convergent message-passing algorithms for inference over general graphs with convex free energies. In: Proceedings of the 2008 Conference on Uncertainty in Artificial Intelligence. Helsinki, Finland: AUAI, 2008. 264-273[71] Yuille A L. CCCP algorithms to minimize the Bethe and Kikuchi free energies: convergent alternatives to belief propagation. Neural Computation, 2002, 14(7): 1691-1722[72] Pretti M, Pelizzola A. Stable propagation algorithm for the minimization of the Bethe free energy. Journal of Physics A: Mathematical and General, 2003, 36: 11201-11211[73] Shimony S E. Finding MAPs for belief networks is NP-hard. Journal of Artificial Intelligence, 1994, 68(2): 399-410[74] Sontag D A. Approximate Inference in Graphical Models Using LP Relaxations [Ph.D. dissertation], Massachusetts Institute of Technology, Cambridge, MA, USA, 2010[75] Alizadeh F. Interior point methods in semidefinite programming with applications to combinatorial optimization. Journal on Optimization, 1995, 5(1): 13-51[76] Mitchell J E. Cutting plane methods and subgradient methods. Tutorials in Operations Research, 2005. 34-61[77] Bland R G, Goldfarb D, Todd M J. The ellipsoid method: a survey. Operations Research, 1981, 29(6): 1039-1091[78] Ravikumar P, Lafferty J. Quadratic programming relaxations for metric labeling and Markov random field MAP estimation. In: Proceedings of the 23rd International Conference on Machine Learning. New York, USA: ACM, 2006. 737-744[79] van de Ven J, Fabio R. Distributed anytime MAP inference. In: Proceedings of the 2011 Conference on Uncertainty in Artificial Intelligence. Barcelona, Spain: AUAI, 2011. 708-716[80] Peng J, Hazan T, Srebro N, Xu J B. Approximate inference by intersecting semidefinite bound and local polytope. In: Proceedings of the 15th International Conference on Artificial Intelligence and Statistics. Canaries, Spanish: JMLR W CP, 2012. 868-876[81] Kumar A, Zilberstein S, Toussaint M. Message-passing algorithms for MAP estimation using DC programming. In: Proceedings of the 15th International Conference on Artificial Intelligence and Statistics. Canaries, Spanish: JMLR W CP, 2012. 656-664[82] Werner T. High-arity interactions, polyhedral relaxations, and cutting plane algorithm for soft constraint optimisation (MAP-MRF). In: Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA: IEEE, 2008. 1-8[83] Weiss Y, Yanover C, Meltzer T. MAP estimation, linear programming and belief propagation with convex free energies. In: Proceedings of the 2007 Conference on Uncertainty in Artificial Intelligence. Vancouver, Canada: AUAI, 2007. 416-425[84] Jancsary J, Matz G, Trost H. An incremental subgradient algorithm for approximate MAP estimation in graphical models. In: Proceedings of the 2010 Conference on Neural Information Processing Systems. Vancouver, Canada, 2010[85] Sontag D, Jaakkola T. Tree block coordinate descent for MAP in graphical models. In: Proceedings of the 2009 International Conference on Artificial Intelligence and Statistics. Florida, USA: JMLR W CP, 2009. 544-551[86] Osokin A, Vetrov D, Kolmogorov V. Submodular decomposition framework for inference in associative Markov networks with global constraints. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA: IEE, 2011. 1889-1896[87] Batra D, Gallagher A C, Parikh D, Chen T. Beyond trees: MRF inference via outer-planar decomposition. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE, 2010. 2496-2503[88] Yarkony J, Fowlkes C, Ihler A. Covering trees and lowerbounds on quadratic assignment. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE, 2010. 887-894[89] Sontag D, Jaakkola T. New outer bounds on the marginal polytope. In: Proceedings of the 2007 Conference on Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2007. 1393-1400[90] Sontag D, Meltzer T, Globerson A, Jaakkola T, Weiss Y. Tightening LP relaxations for MAP using message passing. In: Proceedings of the 2008 Conference on Uncertainty in Artificial Intelligence. Helsinki, Finland: AUAI, 2008. 503-510[91] Komodakis N, Paragios N. Beyond pairwise energies: Efficient optimization for higher-order MRFs. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE, 2009. 2985-2992[92] Geoffrion A. Lagrangean relaxation for integer programming. Approaches to Integer Programming, 1974, 2: 82-114[93] Johnson J. Convex Relaxation Methods for Graphical Models: Lagrangian and Maximum Entropy Approaches [Ph.D. dissertation], Massachusetts Institute of Technology, Cambridge, MA, USA, 2008[94] Hazan T, Shashua A. Norm-product belief propagation: primal-dual message-passing for approximate inference. IEEE Transactions on Information Theory, 2010, 56(12): 6294-6316[95] Bertsekas D. Auction algorithms for network flow problems: a tutorial introduction. Computational Optimization and Applications, 1992, 1(1): 7-66[96] Johnson J K, Malioutov D M, Willsky A S. Lagrangian relaxation for map estimation in graphical models. In: Proceedings of the 2007 Annual Allerton Conference on Communication, Control, and Computing. Illinois, USA: IEEE, 2007. 1-10[97] Gimpel K, Smith N A. Softmax-margin CRFs: training log-linear models with cost functions. In: Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics-Human Language Technologies. Los Angeles, USA: Association for Computational Linguistics, 2010. 733-736[98] Savchynskyy B, Kappes J, Schmidt S, Schnrr C. A study of Nesterov's scheme for Lagrangian decomposition and MAP labeling. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA: IEEE, 2011. 1817-1823[99] Jojic V, Gould S, Koller D. Accelerated dual decomposition for MAP inference. In: Proceedings of the 2010 International Conference on Machine Learning. Haifa, Israel: ACM, 2010. 503-510[100] Ravikumar P, Agarwal A, Wainwright M J. Message-passing for graph-structured linear programs: proximal projections, convergence and rounding schemes. In: Proceedings of the 25th International Conference on Machine Learning. Helsinki, Finland: ACM, 2008. 800-807[101] Martins A L, Figueiredo M A T, Aguiar P M Q, Smith N A, Xing E P. An augmented Lagrangian approach to constrained MAP inference. In: Proceedings of the 28th International Conference on Machine Learning. Bellevue, USA: ACM, 2011. 1-8[102] Meshi O, Globerson A. An alternating direction method for dual MAP LP relaxation. In: Proceedings of the 2011 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Athens, Greece: Springer, 2011. 470-483[103] Rossi F, Van Beek P, Walsh T. Handbook of Constraint Programming. Amsterdam: Elsevier, 2006[104] Canzar S, Toussaint N C, Klau G W. An exact algorithm for side-chain placement in protein design. Optimization Letters, 2011, 5(3): 393-406[105] Marinescu R, Dechter R. AND/OR branch-and-bound search for combinatorial optimization in graphical models. Artificial Intelligence, 2009, 173(16-17): 1457-1491[106] Sun M, Telaprolu M, Lee H, Savarese S. Efficient and exact MAP-MRF Inference using branch and bound. In: Proceedings of the 2012 International Conference on Artificial Intelligence and Statistics. Canaries, Spanish: JMLR W CP, 2012. 1134-1142[107] Meltzer T, Yanover C, Weiss Y. Globally optimal solutions for energy minimization in stereo vision using reweighted belief propagation. In: Proceedings of the 10th IEEE International Conference on Computer Vision. Beijing, China: IEEE, 2005. 428-435[108] Goldberger J. Solving Sudoku Using Combined Message Passing Algorithms, Technical ReportT R-BIU-ENG-2007-05-03, Bar-Ilan University, Israel, 2007

施引文献

资源附件(0)

访问统计