两两关系马尔科夫网的自适应组稀疏化学习

刘建伟; 任正平; 刘泽宇; 黎海恩; 罗雄麟

doi:10.16383/j.aas.2015.c140682

两两关系马尔科夫网的自适应组稀疏化学习

doi: 10.16383/j.aas.2015.c140682 cstr: 32138.14.j.aas.2015.c140682

1.
中国石油大学自动化研究所北京 102249;
2.
中国科学院软件研究所基础软件国家工程研究中心北京 100190

基金项目:

中国石油大学(北京)基础学科研究基金项目(JCXK-2011-07)资助

详细信息

作者简介:
任正平中国石油大学(北京)地球物理与信息工程学院硕士研究生.主要研究方向为机器学习.E-mail:renzhengping1225@sina.com

计量
- 文章访问数: 1924
- HTML全文浏览量: 114
- PDF下载量: 1556
- 被引次数: 0
出版历程
- 收稿日期: 2014-09-24
- 修回日期: 2015-02-16
- 刊出日期: 2015-08-20

Adaptive Group Sparse Learning of Pairwise Markov Network

1.
Research Institute of Automation, China University of Petroleum, Beijing 102249;
2.
National Engineering Research Center for Fundamental Software, Institute of Software, Chinese Academy of Sciences, Beijing 100190

Funds:

Supported by Foundation Sciences China University of Petroleum (JCXK-2011-07)

摘要

摘要: 稀疏化学习能显著降低无向图模型的参数学习与结构学习的复杂性, 有效地处理无向图模型的学习问题. 两两关系马尔科夫网在多值变量情况下, 每条边具有多个参数, 本文对此给出边参数向量的组稀疏化学习, 提出自适应组稀疏化, 根据参数向量的模大小自适应调整惩罚程度. 本文不仅对比了不同边势情况下的稀疏化学习性能, 为了加速模型在复杂网络中的训练过程, 还对目标函数进行伪似然近似、平均场自由能近似和Bethe自由能近似. 本文还给出自适应组稀疏化目标函数分别使用谱投影梯度算法和投影拟牛顿算法时的最优解, 并对比了两种优化算法进行稀疏化学习的性能. 实验表明自适应组稀疏化具有良好的性能.
- 无向图模型 /
- 两两马尔科夫网 /
- 稀疏化学习 /
- 自适应组稀疏化
Abstract: Sparse learning can significantly reduce the complexity of parameter learning and structure learning and effectively deal with learning problems of undirected graphical models. In the case of pairwise Markov network, in which each variable has more than two values, the number of parameters associated with an edge is more than one. This paper proposes a group sparse learning approach for the parameters associated with edges, and puts forward an adaptive group sparse learning algorithm, which can adaptively adjust the degree of penalty according to the norm of the parameters vector. This paper compares the performance of sparse learning using different edge potentials. In order to speed up the training process, three approximate object functions are given, including pseudo likelihood approximation, mean field approximation and Bethe free energy approximation. Two optimization algorithms, i.e., projected quasi-Newton algorithm and spectral projected gradient algorithm, are also compared. Experimental results show that the proposed adaptive group sparse learning algorithm outperforms the normal sparse learning ones.
- Undirected graphical models /
- pairwise Markov network /
- sparse learning /
- adaptive group sparsity

HTML全文

参考文献(31)

[1]	Liu Jian-Wei, Li Hai-En, Luo Xiong-Lin. Learning technique of probabilistic graphical models: a review. Acta Automatica Sinica, 2014, 40(6): 1025-1044(刘建伟, 黎海恩, 罗雄麟. 概率图模型学习技术研究进展. 自动化学报, 2014, 40(6): 1025-1044)
[2]	Liu Jian-Wei, Li Hai-En, Luo Xiong-Lin. Representation theory of probabilistic graphical models. Computer Science, 2014, 41(9): 1-17(刘建伟, 黎海恩, 罗雄麟. 概率图模型表示理论. 计算机科学, 2014, 41(9): 1-17)
[3]	Liu Jian-Wei, Cui Li-Peng, Luo Xiong-Lin. Survey on the sparse learning of probabilistic graphical models. Chinese Journal of Computers, 2014, 37: Online Publishing No.114 (刘建伟, 崔立鹏, 罗雄麟. 概率图模型的稀疏化学习综述. 计算机学报, 2014, 37: 在线出版号No.114)
[4]	Schmidt M. Graphical Model Structure Learning with L1 Regularization [Ph.D. dissertation], The University of British Columbia, Vancouver, BC, 2010
[5]	Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2006, 68(1): 49-67
[6]	Huang J Z, Zhang T. The benefit of group sparsity. The Annals of Statistics, 2010, 38(4): 1978-2004
[7]	Schlüter F. A survey on independence-based Markov networks learning. Artificial Intelligence Review, 2014, 42(4): 1069-1093
[8]	Lee K, Anguelov D, Sumengen B, Gokturk S B. Markov random field models for hair and face segmentation. In: Proceedings of 8th IEEE International Conference on Automatic Face and Gesture Recognition. Amsterdam, The Netherlands: IEEE, 2008. 1-6
[9]	Amizadeh S, Hauskrecht M. Latent variable model for learning in pairwise Markov networks. In: Proceedings of the 2010 AAAI Conference on Artificial Intelligence. Atlanta, Georgia, USA: AAAI, 2010. 382-387
[10]	Zhang X H, Saha A, Vishwanathan S V N. Accelerated training of max-margin Markov networks with kernels. Theoretical Computer Science, 2014, 519: 88-102
[11]	Lee S I, Ganapathi V, Koller D. Efficient structure learning of Markov networks using L1-regularization. In: Proceedings of the 20th Annual Conference on Neural Information Processing Systems. Vancouver, British Columbia, Canada, 2007. 817-824
[12]	Wang H S, Leng C L. A note on adaptive group lasso. Computational Statistics and Data Analysis, 2008, 52(12): 5277-5286
[13]	Wei F R, Huang J. Consistent group selection in high-dimensional linear regression. Bernoulli: Official Journal of the Bernoulli Society for Mathematical Statistics and Probability, 2010, 16(4): 1369-1384
[14]	Besag J. Efficiency of pseudolikelihood estimation for simple Gaussian fields. Biometrika, 1977, 64(3): 616-618
[15]	Kok S, Domingos P. Learning the structure of Markov logic networks. In: Proceedings of the 22nd International Conference on Machine Learning. Bonn, Germany: ACM, 2005. 441-448
[16]	Campbell N D F, Subr K, Kautz J. Fully-connected CRFs with non-parametric pairwise potential. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Portland, OR, USA: IEEE, 2013. 1658-1665
[17]	Yedidia J S, Freeman W T, Weiss Y. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 2005, 51(7): 2282-2312
[18]	Wainwright M J, Jordan M I. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 2008, 1(1-2): 1-305
[19]	Zhu J, Lao N, Xing E P. Grafting-light: fast, incremental feature selection and structure learning of Markov random fields. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington D.C., USA: ACM, 2010. 303-312
[20]	Lee S, Zhu J, Xing E P. Adaptive multi-task lasso: with application to eQTL detection. In: Proceedings of 24th Annual Conference on Neural Information Processing Systems. Vancouver, British Columbia, Canada, 2010. 1306-1314
[21]	Cheng Qiang, Chen Feng, Dong Jian-Wu, Xu Wen-Li. Variational approximate inference methods for graphical models. Acta Automatica Sinica, 2012, 38(11): 1721-1734 (程强, 陈峰, 董建武, 徐文立. 概率图模型中的变分近似推理方法. 自动化学报, 2012, 38(11): 1721-1734)
[22]	Li Hai-En, Liu Jian-Wei, Luo Xiong-Lin Variational approximate inference for probabilistic graphical models. In: Proceeding of the 2013 Chinese Intelligent Automation Conference (4). Yangzhou, Jiangsu, China, 2013. (黎海恩, 刘建伟, 罗雄麟. 概率图模型的变分近似推理. 见: 2013年中国智能自动化学术会议论文集(第四分册). 扬州, 江苏, 中国, 2013.)
[23]	Yedidia J S, Freeman W T, Weiss Y. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 2005, 51(7): 2282-2312
[24]	Sun S L. A review of deterministic approximate inference techniques for Bayesian machine learning. Neural Computing and Applications, 2013, 23(7-8): 2039-2050
[25]	Györfi L, Györfi Z, Vajda I. Bayesian decision with rejection. Problems of Control and Information Theory, 1979, 8(5-6): 445-452
[26]	Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2006, 68(1): 49-67
[27]	Birgin E, Martínez J, Raydan M. Nonmonotone spectral projected gradient methods on convex sets. SIAM Journal on Optimization, 2000, 10(4): 1196-1211
[28]	Yu Z S. Solving bound constrained optimization via a new nonmonotone spectral projected gradient method. Applied Numerical Mathematics, 2008, 58(9): 1340-1348
[29]	Fu W J. Penalized regressions: the bridge versus the lasso. Journal of Computational and Graphical Statistics, 1998, 7(3): 397-416
[30]	Schmidt M, van den Berg E, Friedlander M P, Murphy K. Optimizing costly functions with simple constraints: a limited-memory projected quasi-Newton algorithm. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics. Clearwater Beach, Florida, USA, 2009. 456-463
[31]	Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 2011, 3(1): 1-122