概率图模型学习技术研究进展

刘建伟; 黎海恩; 罗雄麟

doi:10.3724/SP.J.1004.2014.01025

概率图模型学习技术研究进展

doi: 10.3724/SP.J.1004.2014.01025

1.
中国石油大学(北京)自动化研究所北京 102249

基金项目:

国家重点基础研究发展计划（973计划）（2012CB720500），国家自然科学基金（21006127），中国石油大学（北京）基础学科研究基金（JCX K-2011-07）资助

详细信息

作者简介:
黎海恩中国石油大学（北京）地球物理与信息工程学院硕士研究生. 主要研究方向为机器学习，概率图模型表示、学习和推理. E-mail：lihaien1988@163.com

计量
- 文章访问数: 2998
- HTML全文浏览量: 78
- PDF下载量: 2259
- 被引次数: 0
出版历程
- 收稿日期: 2013-06-05
- 修回日期: 2013-08-01
- 刊出日期: 2014-06-20

Learning Technique of Probabilistic Graphical Models：a Review

1.
Research Institute of Automation, China University of Petroleum, Beijing 102249

Funds:

Supported by National Basic Research Program of China (973 Program) (2012CB720500), National Natural Science Foundation of China (21006127), and Basic Subject Research Fund of China University of Petroleum (JCXK-2011-07)

摘要

摘要: 概率图模型能有效处理不确定性推理，从样本数据中准确高效地学习概率图模型是其在实际应用中的关键问题.概率图模型的表示由参数和结构两部分组成，其学习算法也相应分为参数学习与结构学习.本文详细介绍了基于概率图模型网络的参数学习与结构学习算法，并根据数据集是否完备而分别讨论各种情况下的参数学习算法，还针对结构学习算法特点的不同把结构学习算法归纳为基于约束的学习、基于评分搜索的学习、混合学习、动态规划结构学习、模型平均结构学习和不完备数据集的结构学习.并总结了马尔科夫网络的参数学习与结构学习算法.最后指出了概率图模型学习的开放性问题以及进一步的研究方向.
- 概率图模型 /
- 贝叶斯网络 /
- 马尔科夫网络 /
- 参数学习 /
- 结构学习 /
- 不完备数据集
Abstract: Probabilistic graphical models are powerful techniques to deal with uncertainty inference efficiently, and learning probabilistic graphical models exactly and efficiently from data is the core problem to be solved for the application of graphical models. Since the representation of graphical models is composed of parameters and structure, their learning algorithms are divided into parameters learning and structure learning. In this paper, the parameters and structure learning algorithms of probabilistic graphical models are reviewed. In parameters learning, the dataset is complete or not is also considered. Structure learning algorithms are categorized into six principal classes according to their different characteristics. The parameters and structure learning of Markov networks are also presented. Finally, the open problems and a discussion of the future trend of probabilistic graphical models are given.
- Probabilistic graphical models /
- Bayesian network /
- Markov network /
- parameter learning /
- structure learning /
- incomplete dataset

HTML全文

参考文献(179)

[1]	Koller D, Firedman N. Probabilistic Graphical Models: Principles and Techniques. Cambridge: The MIT Press, 2009
[2]	Wainsright M J, Jordan M I. Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 2008, 1(1-2): 1-305
[3]	Pourret O, Naim P, Marcot B. Bayesian Networks: A Practical Guide to Applications. Chichester: John Wiley, 2008
[4]	Larrñaga P, Moral S. Probabilistic graphical models in artificial intelligence. Applied Soft Computing, 2011, 11(2): 1511-1528
[5]	Weber P, Medina-Oliva G, Simon C, Iung B. Overview on Bayesian networks applications for dependability, risk analysis and maintenance areas. Engineering Applications of Artificial Intelligence, 2012, 25(4): 671-682
[6]	Korb K B, Nicholson A E. Bayesian Artificial Intelligence (2nd edition). Florida: CRC Press, 2010
[7]	Elvira Consortium. Elvira: an environment for probabilistic graphical models. In: Proceedings of the 1st European Workshop in Probabilistic Graphical Models. Cuenca, Spain, 2002. 222-230
[8]	Cheng J, Greiner R. Learning Bayesian belief network classifiers: algorithms and system. In: Proceedings of the 14th Biennial Conference of the Canadian Society for Computational Studies of Intelligence. Ottawa, Canada: Springer, 2002. 141-151
[9]	Murphy K. The Bayes net toolbox for Matlab. Computing Science and Statistics, 2001, 33(2): 1024-1034
[10]	Spiegelhalter D, Thomas A, Best N, Gilks W. BUGS 0. 5: Bayesian Inference Using Gibbs Sampling Manual (version ii), Technical Report, MRC Biostatistics Unit, Institute of Public Health, Cambridge, UK. 1996
[11]	Lauritzen S L. gRaphical models in R. R News, 2002, 3(2): 39
[12]	Cozman F G. The Javabayes system. The International Society for Bayesian Analysis Bulletin, 2001, 7(4): 16-21
[13]	Scheines R, Spirtes P, Glymour C, Meek C. TETRAD II: Tools for Discovery. Hillsdale, NJ: Lawrence Erlbaum Associates, 1994
[14]	Andersen S K, Olesen K G, Jensen F V, Jensen F. HUGIN-a shell for building Bayesian belief universes for expert systems. In: Proceedings of the 11th International Joint Conference on Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 1989. 1080-1085
[15]	Prelee M A, Neuhoff D L, Pappas T N. Image reconstruction from a Manhattan grid via piecewise plane fitting and Gaussian Markov random fields. In: Proceedings of the 19th IEEE International Conference on Image Processing. Orlando, Florida, USA: IEEE, 2012. 2061-2064
[16]	Dawoud A, Netchaev A. Preserving objects in Markov random fields region growing image segmentation. Pattern Analysis and Applications, 2012, 15(2): 155-161
[17]	Yousefi S, Kehtarnavaz N, Cao Y, Razlighi Q R. Bilateral Markov mesh random field and its application to image restoration. Visual Communication and Image Representation, 2012, 23(7): 1051-1059
[18]	Xiong R, Wang J N, Chu J. Face alignment based on 3D face shape model and Markov random field. In: Proceedings of the 12th International Conference on Intelligent Autonomous Systems. Berlin, Heidelberg: Springer, 2012. 249 -261
[19]	Ghosh A, Subudhi B N, Ghosh S. Object detection from videos captured by moving camera by fuzzy edge incorporated Markov random field and local histogram matching. IEEE Transactions on Circuits and Systems for Video Technology, 2012, 22(8): 1127-1135
[20]	Li S Z. Markov Random Field Modeling in Image Analysis (3rd edition). Tokyo, Japan: Springer, 2009
[21]	Blake A, Kohli P, Rother C. Markov Random Fields for Vision and Image Processing. Cambridge: The MIT Press, 2011
[22]	Wei Z, Li H Z. A Markov random field model for network-based analysis of genomic data. Bioinformatics, 2007, 23(12): 1537-1544
[23]	Wei P, Pan W. Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model. Bioinformatics, 2008, 24(3): 404-411
[24]	Wei P, Pan W. Network-based genomic discovery: application and comparison of Markov random field models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 2010, 59(1): 105-125
[25]	Neapolitan R E. Learning Bayesian Networks. Upper Saddle River: Pearson Prentice Hall, 2004
[26]	Geiger D, Heckerman D. A characterization of the Dirichlet distribution through global and local parameter independence. The Annals of Statistics, 1997, 25(3): 1344-1369
[27]	Burge J, Lane T. Shrinkage estimator for Bayesian network parameters. In: Proceedings of the 18th European Conference on Machine Learning. Berlin, Heidelberg: Springer, 2007. 67-78
[28]	Geiger D, Heckerman D. Learning Gaussian networks. In: Proceedings of the 10th International Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 1994. 235-243
[29]	Bøttcher S G. Learning Bayesian Networks with Mixed Variables [Ph.D. dissertation], Aalborg University, Denmark, 2004
[30]	John G H, Langley P. Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 1995. 338-345
[31]	Pérez A, Larrñaga P, Inza I. Bayesian classifiers based on kernel density estimation: flexible classifiers. International Journal of Approximate Reasoning, 2009, 50(2): 341-362
[32]	McLachlan G, Peel D. Finite Mixture Models. New York, USA: John Wiley and Sons, 2000
[33]	Anandkumar A, Hsu D, Kakade S M. A method of moments for mixture models and hidden Markov models. In: Proceedings of the 25th Annual Conference on Learning Theory. Edinburgh, Scotland, UK: The Journal of Machine Learning Research Workshop and Conference Proceedings, 2012, 23: 33.1-33.34
[34]	Hsu D, Kakade S M. Learning mixtures of spherical Gaussians: moment methods and spectral decompositions. In: Proceedings of the 4th Conference on Innovations in Theoretical Computer Science. New York, USA: Association for Computing Machinery, 2013. 11-20
[35]	Mahjoub M A, Bouzaiene A, Ghanmy N. Tutorial and selected approaches on parameter learning in Bayesian network with incomplete data. In: Proceedings of the 9th International Symposium on Neural Networks. Berlin, Heidelberg: Springer, 2012. 478-488
[36]	Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1984, 6(6): 721-741
[37]	Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 1977, 39(1): 1-38
[38]	Elidan G, Friedman N. The information bottleneck EM algorithm. In: Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 2003. 200-208
[39]	Elidan G, Ninio M, Friedman N, Schuurmans D. Data perturbation for escaping local maxima in learning. In: Proceedings of the 18th National Conference on Artificial Intelligence. Menlo Park, USA: American Association for Artificial Intelligence, 2002. 132-139
[40]	Niculescu R S, Mitchell T M, Rao R B. A theoretical framework for learning Bayesian networks with parameter inequality constraints. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 2007. 155-160
[41]	Druzdzel M J, Van Der Gaag L C. Elicitation of probabilities for belief networks: combining qualitative and quantitative information. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 1995. 141-148
[42]	Feelders A, Van Der Gaag L C. Learning Bayesian network parameters with prior knowledge about context-specific qualitative influences. In: Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence. Arlington, USA: AUAI Press, 2005. 193-200
[43]	Liao W H, Ji Q. Learning Bayesian network parameters under incomplete data with domain knowledge. Pattern Recognition, 2009, 42(11): 3046-3056
[44]	Ramoni M, Sebastiani P. Learning Bayesian networks from incomplete databases. In: Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 1997. 401-408
[45]	Ramoni M, Sebastiani P. The use of exogenous knowledge to learn Bayesian networks from incomplete databases. In: Proceedings of the 2nd International Symposium on Advances in Intelligent Data Analysis, Reasoning about Data. London, UK: Springer-Verlag, 1997. 537-548
[46]	Ramoni M, Sebastiani P. Robust learning with missing data. Machine Learning, 45(2): 147-170
[47]	Jaeger M. The AI&M procedure for learning from incomplete data. In: Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence. Arlington, USA: AUAI Press, 2006. 225-232
[48]	Chickering D M. Learning Bayesian networks is NP-complete. Learning from Data, 1996, 112: 121-130
[49]	Spirtes P, Glymour C N, Scheines R. Causation, Prediction, and Search. Cambridge: The MIT Press, 2000
[50]	Kalisch M, Bühlmann P. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. Journal of Machine Learning Research, 2007, 8: 613-636
[51]	Li J N, Wang Z J. Controlling the false discovery rate of the association/causality structure learned with the PC algorithm. Journal of Machine Learning Research, 2009, 10: 475-514
[52]	Pearl J, Verma T S. A theory of inferred causation. Studies in Logic and the Foundations of Mathematics, 1995, 134: 789-811
[53]	Cheng J, Greiner R, Kelly J, Bell D, Liu W R. Learning Bayesian networks from data: an information-theory based approach. Artificial Intelligence, 2002, 137(1-2): 43-90
[54]	Chickering D M, Meek C. On the incompatibility of faithfulness and monotone DAG faithfulness. Artificial Intelligence, 2006, 170(8-9): 653-666
[55]	Yehezkel R, Lerner B. Bayesian network structure learning by recursive autonomy identification. Journal of Machine Learning Research, 2009, 10: 1527-1570
[56]	Xie X C, Geng Z. A recursive method for structural learning of directed acyclic graphs. Journal of Machine Learning Research, 2008, 9: 459-483
[57]	Villanueva E, Maciel C D. Efficient methods for learning Bayesian network super-structures. Neurocomputing, 2014, 123: 3-12
[58]	de Morais S R, Aussem A. An efficient and scalable algorithm for local Bayesian network structure discovery. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Barcelona, Spain: Springer-Verlag, 2010. 164-179
[59]	Cooper G F, Herskovits E. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 1992, 9(4): 309-347
[60]	Heckerman D, Geiger D, Chickering D M. Learning Bayesian networks: the combination of knowledge and statistical data. Machine Learning, 1995, 20(3): 197-243
[61]	Silander T, Myllymäki P. A simple approach for finding the globally optimal Bayesian network structure. In: Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence. Arlington, USA: AUAI Press, 2006. 445-452
[62]	Steck H. Learning the Bayesian network structure: Dirichlet prior versus data. In: Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence. Corvallis, USA: AUAI Press, 2008. 511-518
[63]	Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 1974, 19(6): 716 -723
[64]	Schwarz G. Estimating the dimension of a model. Annals of Statistic, 1978, 6(2): 461-464
[65]	Rissanen J. A universal prior for integers and estimation by minimum description length. The Annals of Statistics, 1983, 11(2): 416-431
[66]	Cruz-Ramírez N, Acosta-Mesa H G, Barrientos-Martínez R E, Nava-Fernández L A. How good are the Bayesian information criterion and the minimum description length principle for model selection? A Bayesian network analysis. In: Proceedings of the 5th Mexican International Conference on Artificial Intelligence. Berlin, Heidelberg: Springer-Verlag, 2006. 494-504
[67]	Wallace C S, Korb K B, Dai H H. Causal discovery via MML. In: Proceedings of the 13th International Conference on Machine Learning. San Francisco, USA: Morgan Kaufmann, 1996. 516-524
[68]	Korb K B, Nicholson A E. Bayesian Artificial Intelligence (2nd edition). Boca Raton, USA: CRC Press, 2010
[69]	O'Donnell R T, Allison L, Korb K B. Learning hybrid Bayesian networks by MML. In: Proceedings of the 19th Australian Joint Conference on Artificial Intelligence. Berlin: Springer-Verlag, 2006. 192-203
[70]	Kayaalp M, Cooper G F. A Bayesian network scoring metric that is based on globally uniform parameter priors. In: Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 2002. 251-258
[71]	de Campos L M. A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. Journal of Machine Learning Research, 2006, 7: 2149-2187
[72]	Riggelsen C. Learning Bayesian networks: a MAP criterion for joint selection of model structure and parameter. In: Proceedings of the 8th IEEE International Conference on Data Mining. Washington, USA: IEEE, 2008. 522-529
[73]	Silander T, Roos T, Myllymäki P. Learning locally minimax optimal Bayesian networks. International Journal of Approximate Reasoning, 2010, 51(5): 544-557
[74]	Carvalho A M, Roos T T, Oliveira A L, Myllymäki P. Discriminative learning of Bayesian networks via factorized conditional log-likelihood. Journal of Machine Learning Research, 2011, 12: 2181-2210
[75]	Bouckaert R R. Probabilistic network construction using the minimum description length principle. In: Proceedings of the 1993 European Conference on Symbolic and Quantitative Approaches to Reasoning and Uncertainty. Berlin, Heidelberg: Springer-Verlag, 1993. 41-48
[76]	Liu F, Zhu Q L. Max-relevance and min-redundancy greedy Bayesian network learning on high dimensional data. In: Proceedings of the 3rd International Conference on Natural Computation. Haikou, China: IEEE, 2007. 217-221
[77]	Gámez J A, Mateo J L, Puerta J M. Learning Bayesian networks by hill climbing: efficient methods based on progressive restriction of the neighborhood. Data Mining and Knowledge Discovery, 2011, 22(1-2): 106-148
[78]	Larranaga P, Kuijpers C, Murga R H, Yurramendi Y. Learning Bayesian network structures by searching for the best ordering with genetic algorithms. IEEE Transactions on Systems, Man, and Cybernetics—— Part A: Systems and Humans, 1996, 26(4): 487-493
[79]	Faulkner E. K2GA: heuristically guided evolution of Bayesian network structures from data. In: Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Data Mining. Honolulu, HI: IEEE, 2007. 18-25
[80]	Kabli R, Herrmann F, McCall J. A chain-model genetic algorithm for Bayesian network structure learning. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation. New York, USA: ACM, 2007. 1264- 1271
[81]	Regnier-Coudert O, McCall J. An island model genetic algorithm for Bayesian network structure learning. In: Proceedings of the 2012 IEEE World Congress on Computational Intelligence. Brisbane, Australia: IEEE, 2012. 1-8
[82]	Wong M L, Lam W, Leung K S. Using evolutionary programming and minimum description length principle for data mining of Bayesian networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(2): 174 -178
[83]	Wong M L, Lee S Y, Leung K S. A hybrid approach to discover Bayesian networks from databases using evolutionary programming. In: Proceedings of the 2002 IEEE International Conference on Data Mining. Washington, USA: IEEE, 2002. 498-505
[84]	Wong M L, Leung K S. An efficient data mining method for learning Bayesian networks using an evolutionary algorithm-based hybrid approach. IEEE Transactions on Evolutionary Computation, 2004, 8(4): 378-404
[85]	Wong M L, Guo Y Y. Learning Bayesian networks from incomplete databases using a novel evolutionary algorithm. Decision Support Systems, 2008, 45(2): 368-383
[86]	Larrñaga P, Karshenas H, Bielza C, Santana R. A review on evolutionary algorithms in Bayesian network learning and inference tasks. Information Sciences, 2013, 233: 109-125
[87]	de Campos L M, Huete J F. Approximating causal orderings for Bayesian networks using genetic algorithms and simulated annealing. In: Proceedings of the 8th Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems. Madrid, Spain: Consejo Superior de Investigaciones Cientificas, 2000. 333-340
[88]	Heng X C, Qin Z, Tian L, Shao L P. Learning Bayesian network structures with discrete particle swarm optimization algorithm. In: Proceedings of the 2007 IEEE Symposium on Foundations of Computational Intelligence. Honolulu, HI: IEEE, 2007. 47-52
[89]	Heng X C, Qin Z, Tian L, Shao L P. Research on structure learning of dynamic Bayesian networks by particle swarm optimization. In: Proceedings of the 2007 IEEE Symposium on Artificial Life. Honolulu, HI: IEEE, 2007. 85-91
[90]	Li X L, Wang S C, He X D. Learning Bayesian networks structures based on memory binary particle swarm optimization. In: Proceedings of the 6th International Conference on Simulated Evolution and Learning. Berlin, Heidelberg: Springer-Verlag, 2006. 568-574
[91]	Sahin F, Devasia A. Distributed particle swarm optimization for structural Bayesian network learning. Swarm Intelligence: Focus on Ant and Particle Swarm Optimization. Vienna, Austria: I-Tech Education and Publishing, 2007, 27: 505-532
[92]	Wang T, Yang J. A heuristic method for learning Bayesian networks using discrete particle swarm optimization. Knowledge and Information Systems, 2010, 24(2): 269-281
[93]	de Campos L M, Fernández-Luna J M, Gámez J A, Puerta J M. Ant colony optimization for learning Bayesian networks. International Journal of Approximate Reasoning, 2002, 31(3): 291-311
[94]	Daly R, Shen Q. Learning Bayesian network equivalence classes with ant colony optimization. Journal of Artificial Intelligence Research, 2009, 35(1): 391-447
[95]	Pinto P C, Nagele A, Dejori M, Runkler T A, Sousa J M C. Using a local discovery ant algorithm for Bayesian network structure learning. IEEE Transactions on Evolutionary Computation, 2009, 13(4): 767-779
[96]	Wu Y H, McCall J, Coles D. Two novel ant colony optimization approaches for Bayesian network structure learning. In: Proceedings of the 2010 IEEE Congress on Evolutionary Computation. Barcelona: IEEE, 2010. 1-7
[97]	Ji J Z, Wei H K, Liu C N. An artificial bee colony algorithm for learning Bayesian networks. Soft Computing, 2013, 17(6): 983-994
[98]	Li B H, Liu S Y, Li Z G. Improved algorithm based on mutual information for learning Bayesian network structures in the space of equivalence classes. Multimedia Tools and Applications, 2012, 60(1): 129-137
[99]	Studený M. Probabilistic Conditional Independence Structures. London: Springer-Verlag, 2005
[100]	Studený M, Vomlel J, Hemmecke R. A geometric view on learning Bayesian network structures. International Journal of Approximate Reasoning, 2010, 51(5): 573-586
[101]	Hemmecke R, Lindner S, Studený M. Characteristic imsets for learning Bayesian network structure. International Journal of Approximate Reasoning, 2012, 53(9): 1336-1349
[102]	Friedman N, Koller D. Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks. Machine Learning, 2003, 50(1-2): 95- 125
[103]	Teyssier M, Koller D. Ordering-based search: a simple and effective algorithm for learning Bayesian networks. In: Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence. Arlington, USA: AUAI Press, 2005. 584-590
[104]	Singh M, Valtorta M. Construction of Bayesian network structures from data: a brief survey and an efficient algorithm. International Journal of Approximate Reasoning, 1995, 12(2): 111-131
[105]	Dash D, Druzdzel M J. A hybrid anytime algorithm for the construction of causal models from sparse data. In: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 1999. 142-149
[106]	de Campos L M, Fernández-Luna J M, Puerta J M. An iterated local search algorithm for learning Bayesian networks with restarts based on conditional independence tests. International Journal of Intelligent Systems, 2003, 18(2): 221- 235
[107]	Friedman N, Nachman I, Peér D. Learning Bayesian network structure from massive datasets: The ''sparse candidate'' algorithm. In: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 1999. 206-215
[108]	Tsamardinos I, Brown L E, Aliferis C F. The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning, 2006, 65(1): 31-78
[109]	Tsamardinos I, Aliferis C F, Statnikov A. Time and sample efficient discovery of Markov blankets and direct causal relations. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2003. 673-678
[110]	Perrier E, Imoto S, Miyano S, Chickering M. Finding optimal Bayesian network given a super-structure. Journal of Machine Learning Research, 2008, 9: 2251-2286
[111]	Kojima K, Perrier E, Imoto S, Miyano S. Optimal search on clustered structural constraint for learning Bayesian network structure. Journal of Machine Learning Research, 2010, 11: 285-310
[112]	de Campos C P, Ji Q. Efficient structure learning of Bayesian networks using constraints. Journal of Machine Learning Research, 2011, 12: 663-689
[113]	Ott S, Imoto S, Miyano S. Finding optimal models for small gene networks. Pacific Symposium on Biocomputing, 2004, 9: 557-567
[114]	Ott S, Miyano S. Finding optimal gene networks using biological constraints. Genome Informatics, 2003, 14: 124-133
[115]	Koivisto M, Sood K. Exact Bayesian structure discovery in Bayesian networks. Journal of Machine Learning Research, 2004, 5: 549-573
[116]	Koivisto M. Advances in exact Bayesian structure discovery in Bayesian networks. In: Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence. Corvallis, USA: AUAI Press, 2006. 241-248
[117]	Singh A P, Moore A W. Finding Optimal Bayesian Networks by Dynamic Programming, Technical Report CMU-CALD-05-106, School of Computer Science, Carnegie Mellon University, USA, 2005
[118]	Eaton D, Murphy K. Bayesian structure learning using dynamic programming and MCMC. In: Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence. Arlington, USA: AUAI Press, 2007. 101-108
[119]	Malone B, Yuan C H, Hansen E A. Memory-efficient dynamic programming for learning optimal Bayesian networks. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI Press, 2011. 1057- 1062
[120]	Madigan D, York J, Allard D. Bayesian graphical models for discrete data. International Statistical Review, 1995, 63(2): 215-232
[121]	Madigan D, Andersson S A, Perlman M D, Volinsky C T. Bayesian model averaging and model selection for Markov equivalence classes of acyclic digraphs. Communications in Statistics-Theory and Methods, 1996, 25(11): 2493-2519
[122]	Giudici P, Castelo R. Improving Markov chain Monte Carlo model search for data mining. Machine Learning, 2003, 50(1-2): 127-158
[123]	Grzegorczyk M, Husmeier D. Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move. Machine Learning, 2008, 71(2-3): 265 -305
[124]	Liang F M, Zhang J. Learning Bayesian networks for discrete data. Computational Statistics and Data Analysis, 2009, 53(4): 865-876
[125]	Tian J, He R, Ram L. Bayesian model averaging using the k-best Bayesian network structures. In: Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence. Corvallis, USA: AUAI Press, 2010. 589-597
[126]	Dash D, Cooper G F. Model averaging for prediction with discrete Bayesian networks. Journal of Machine Learning Research, 2004, 5: 1177-1203
[127]	Kim K J, Cho S B. Evolutionary aggregation and refinement of Bayesian networks. In: Proceedings of the IEEE Congress on Evolutionary Computation. Vancouver, BC: IEEE, 2006. 1513-1520
[128]	Gou K X, Jun G X, Zhao Z. Learning Bayesian network structure from distributed homogeneous data. In: Proceedings of the 8th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing. Washington, USA: IEEE Computer Society, 2007. 250-254
[129]	Liu F, Tian F Z, Zhu Q L. Bayesian network structure ensemble learning. In: Proceedings of the 3rd International Conference on Advanced Data Mining and Applications. Berlin: Springer-Verlag, 2007. 454-465
[130]	Kwoh C K, Gillies D F. Using hidden nodes in Bayesian networks. Artificial Intelligence, 1996, 88(1-2): 1-38
[131]	Sanscartier M J, Neufeld E. Identifying hidden variables from context-specific independencies. In: Proceedings of the 20th International Florida Artificial Intelligence Research Society Conference. Menlo Park, California, USA: The AAAI Press, 2007. 472-477
[132]	Geiger D, Heckerman D, Meek C. Asymptotic model selection for directed networks with hidden variables. Learning in Graphical Models. Netherlands: Springer, 1998, 89: 461- 477
[133]	Ramoni M, Sebastiani P. Learning Bayesian networks from incomplete databases. In: Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 1997. 401-408
[134]	Parviainen P, Koivisto M. Ancestor relations in the presence of unobserved variables. In: Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases. Berlin, Heidelberg: Springer-Verlag, 2011. 581-596
[135]	Friedman N. The Bayesian structural EM algorithm. In: Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 1998. 129-138
[136]	Beal M J, Ghahramani Z. The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures. In: Proceedings of the 7th Valencia International Meeting on Bayesian Statistics. Oxford: Oxford University Press, 2003. 453-464
[137]	Watanabe K, Shiga M, Watanabe S. Upper bound for variational free energy of Bayesian networks. Machine Learning, 2009, 75(2): 199-215
[138]	Elidan G. Bagged structure learning of Bayesian networks. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, FL: JMLR Workshop and Conference, 2011. 251-259
[139]	Wainwright M J, Jaakkola T S, Willsky A S. Tree-reweighted belief propagation algorithms and approximate ML estimation by pseudo-moment matching. In: Proceedings of the 9th Workshop on Artificial Intelligence and Statistics. Key West, Florida: Society for Artificial Intelligence and Statistics, 2003. 97-105
[140]	Sutton C, Minka T. Local Training and Belief Propagation, Technical Report MSR-TR-2006-121, Microsoft Research, 2006
[141]	Wainwright M J. Estimating the ''wrong'' graphical model: benefits in the computation-limited setting. Journal of Machine Learning Research, 2006, 7: 1829-1859
[142]	Ganapathi V, Vickrey D, Duchi J, Koller D. Constrained approximate maximum entropy learning. In: Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence. Corvallis, USA: AUAI Press, 2008. 196-203
[143]	Murray I, Ghahramani Z. Bayesian learning in undirected graphical models: approximate MCMC algorithms. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. Arlington, USA: AUAI Press, 2004. 392-399
[144]	Murray I, Ghahramani Z, Mackay D. MCMC for doubly-intractable distributions. In: Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence. Arlington, USA: AUAI Press, 2006. 359-366
[145]	Besag J. Efficiency of pseudolikelihood estimation for simple Gaussian fields. Biometrika, 1977, 64(3): 616-618
[146]	McCallum A, Pal C, Druck G, Wang X. Multi-conditional learning: generative/discriminative training for clustering and classification. In: Proceedings of the 21st National Conference on Artificial Intelligence. Boston, USA: AAAI Press, 2006. 433-439
[147]	Hinton G E. Training products of experts by minimizing contrastive divergence. Neural Computation, 2002, 14(8): 1771-1800
[148]	LeCun Y, Chopra S, Hadsell R, Marc'Aurelio R, Huang F J. A tutorial on energy-based learning. Predicting Structured Data. Cambridge, MA: MIT Press, 2006. 191-241
[149]	Taskar B, Guestrin C, Koller D. Max-margin Markov networks. In: Proceedings of the 17th Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2003. 24-32
[150]	Bromberg F, Margaritis D, Honavar V. Efficient Markov network structure discovery using independence tests. Journal of Artificial Intelligence Research, 2009, 35(1): 449-484
[151]	Della P S, Della P V, Lafferty J. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(4): 380-393
[152]	Kok S, Domingos P. Learning the structure of Markov logic networks. In: Proceedings of the 22nd International Conference on Machine Learning. New York, USA: ACM, 2005. 441-448
[153]	Domingos P. Unifying instance-based and rule-based induction. Machine Learning, 1996, 24(2): 141-168
[154]	Mihalkova L, Mooney R J. Bottom-up learning of Markov logic network structure. In: Proceedings of the 24th International Conference on Machine Learning. New York, USA: ACM, 2007. 625-632
[155]	Davis J, Domingos P. Bottom-up learning of Markov network structure. In: Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel: Omnipress, 2010. 271-280
[156]	Ravikumar P, Wainwright M J, Lafferty J D. High-dimensional ising model selection using L1-regularized logistic regression. Annals of Statistics, 2010, 38(3): 1287-1319
[157]	Höfling H, Tibshirani R. Estimation of sparse binary pairwise Markov networks using pseudo-likelihood. Journal of Machine Learning Research, 2009, 10: 883-906
[158]	Pekins S, Lacker K, Theiler J. Grafting: fast, incremental feature selection by gradient descent in function space. Journal of Machine Learning Research, 2003, 3: 1333-1356
[159]	Lee S I, Ganapathi V, Koller D. Efficient structure learning of Markov networks using L1-regularization. In: Proceedings of the 20th Annual Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2006. 817-824
[160]	Yang E, Ravikumar P, Allen G I, Liu Z D. Graphical models via generalized linear models. In: Proceedings of the 26th Annual Conference on Advances in Neural Information Processing Systems. Lake Tahoe, USA: Curran Associates, 2012. 1367-1375
[161]	Lowd D, Davis J. Learning Markov network structure with decision trees. In: Proceedings of the 10th IEEE International Conference on Data Mining. Washington, USA: IEEE, 2010. 334-343
[162]	Haaren J V, Davis J. Markov network structure learning: a randomized feature generation approach. In: Proceedings of the 26th AAAI Conference on Articial Intelligence. Toronto, Canada: AAAI Press, 2012. 1148-1154
[163]	Yu K, Wang H, Wu X D. A parallel algorithm for learning Bayesian networks. In: Proceedings of the 11th Pacic-Asia Conference on Advances in Knowledge Discovery and Data Mining. Berlin: Springer-Verlag, 2007. 1055-1063
[164]	Lam W, Bacchus F. Using new data to rene a Bayesian network. In: Proceedings of the 10th Conference on Uncertainty in Articial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 1994. 383-390
[165]	Friedman N, Goldszmidt M. Sequential update of Bayesian network structure. In: Proceedings of the 13th Conference on Uncertainty in Articial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 1997. 165-174
[166]	Nielsen S H, Nielsen T D. Adapting Bayes network structures to non-stationary domains. International Journal of Approximate Reasoning, 2008, 49(2): 379-397
[167]	Castillo G, Gama J. Adaptive Bayesian network classiers. Intelligent Data Analysis, 2009, 13(1): 39-59
[168]	Yasin A, Leray P. iMMPC: a local search approach for incremental Bayesian network structure learning. In: Proceedings of the 10th International Conference on Advances in Intelligent Data Analysis X. Berlin: Springer-Verlag, 2011. 401-412
[169]	Cooper G F, Yoo C. Causal discovery from a mixture of experimental and observational data. In: Proceedings of the 15th Conference on Uncertainty in Articial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 1999. 116-125
[170]	Tong S, Koller D. Active learning for structure in Bayesian networks. In: Proceedings of the 17th International Joint Conference on Articial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 2001. 863-869
[171]	He Y B, Geng Z. Active learning of causal networks with intervention experiments and optimal designs. Journal of Machine Learning Research, 2008, 9: 2523-2547
[172]	Hauser A, BÄuhlmann P. Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. Journal of Machine Learning Research, 2012, 13(1): 2409-2464
[173]	Hauser A, BÄuhlmann P. Two optimal strategies for active learning of causal models from interventions. In: Proceedings of the 6th European Workshop on Probabilistic Graphical Models. Granada, Spain, 2012. 123-130
[174]	Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345-1359
[175]	Dai W Y, Xue G R, Yang Q, Yu Y. Transferring naive bayes classiers for text classication. In: Proceedings of the 22nd AAAI conference on Articial Intelligence. Vancouver, Canada: AAAI Press, 2007. 540-545
[176]	Roy D M, Kaelbling L P. E±cient Bayesian task-level transfer learning. In: Proceedings of the 20th International Joint Conference on Articial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 2007. 2599-2604
[177]	Luis R, Sucar L E, Morales E F. Inductive transfer for learning Bayesian networks. Machine Learning, 2010, 79(1-2): 227-255
[178]	Honorio J, Samaras D. Multi-task learning of Gaussian graphical models. In: Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel: Omnipress, 2010. 447-454
[179]	Oyen D, Lane T. Leveraging domain knowledge in multitask Bayesian network structure learning. In: Proceedings of the 26th AAAI Conference on Articial Intelligence. Toronto, Canada: AAAI Press, 2012. 1091-1097