| [1] | Tikhonov A N, Arsenin V Y. Solution of Ill-posed Problems. Washington:Winston and Sons, 1977. |
| [2] | Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 1974, 19(6):716-723 doi: 10.1109/TAC.1974.1100705 |
| [3] | Akaike H. A Bayesian analysis of the minimum AIC procedure. Annals of the Institute of Statistical Mathematics, 1978, 30(1):9-14 doi: 10.1007/BF02480194 |
| [4] | Schwarz G. Estimating the dimension of a model. Annals of Statistics, 1978, 6(2):461-464 doi: 10.1214/aos/1176344136 |
| [5] | Kolmogorov A N. On tables of random numbers. Sankhya:The Indian Journal of Statistics, 1963, 25:369-376 http://cn.bing.com/academic/profile?id=2319581f805b56782c9de35bf823fe97&encoded=0&v=paper_preview&mkt=zh-cn |
| [6] | Kolmogorov A N. Three approaches to the quantitative definition of information. Problems of Information Transmission, 1965, 1(1):1-7 http://cn.bing.com/academic/profile?id=c9935bf10e0666e93d39e022ca502b3d&encoded=0&v=paper_preview&mkt=zh-cn |
| [7] | Chaitin G J. On the length of programs for computing finite binary sequences. Journal of the ACM, 1966, 13(4):547-569 doi: 10.1145/321356.321363 |
| [8] | Solomonoff R J. A formal theory of inductive inference. Part II. Information and Control, 1964, 7(2):224-254 doi: 10.1016/S0019-9958(64)90131-7 |
| [9] | Wallace C S, Boulton D M. An information measure for classification. The Computer Journal, 1968, 11(2):185-194 doi: 10.1093/comjnl/11.2.185 |
| [10] | Vapnik V N, Chervonenkis A Y. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and Its Applications, 1971, 16(2):264-280 doi: 10.1137/1116025 |
| [11] | Vapnik V N. The Nature of Statistical Learning Theory. New York:Springer, 1995. |
| [12] | Vapnik V N. An overview of statistical learning theory. IEEE Transactions on Neural Networks, 1999, 10(5):988-999 doi: 10.1109/72.788640 |
| [13] | Popper K R. The Logic of Scientific Discovery. United Kingdom:Hutchinson, 1959. |
| [14] | Valiant L G. A theory of the learnable. Communications of the ACM, 1984, 27(11):1134-1142 doi: 10.1145/1968.1972 |
| [15] | Kearns M J, Schapire R E. Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 1994, 48(3):464-497 doi: 10.1016/S0022-0000(05)80062-5 |
| [16] | Bartlett P L, Long P M, Williamson R C. Fat-shattering and the learnability of real-valued functions. Journal of Computer and System Sciences, 1996, 52(3):434-452 doi: 10.1006/jcss.1996.0033 |
| [17] | Shawe-Taylor J, Bartlett P L, Williamson R C, Anthony M. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 1998, 44(5):1926-1940 doi: 10.1109/18.705570 |
| [18] | Koltchinskii V. Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory, 2001, 47(5):1902-1914 doi: 10.1109/18.930926 |
| [19] | Bartlett P L, Mendelson S. Rademacher and Gaussian complexities:risk bounds and structural results. The Journal of Machine Learning Research, 2003, 3:463-482 http://cn.bing.com/academic/profile?id=59b9b1ed8c26e154cc8cc3bb9f31c21f&encoded=0&v=paper_preview&mkt=zh-cn |
| [20] | Bartlett P L, Bousquet O, Mendelson S. Local Rademacher complexities. The Annals of Statistics, 2005, 33(4):1497-1537 doi: 10.1214/009053605000000282 |
| [21] | Koltchinskii V. Local Rademacher complexities and oracle inequalities in risk minimization. The Annals of Statistics, 2006, 34(6):2593-2656 doi: 10.1214/009053606000001019 |
| [22] | Koltchinskii V, Panchenko D. Rademacher processes and bounding the risk of function learning. High Dimensional Probability II. Boston:Birkhäuser, 2004. 443-457 |
| [23] | Bousquet O, Koltchinskii V, Panchenko D. Some local measures of complexity of convex hulls and generalization bounds. Computational Learning Theory. Sydney, Australia:Springer, 2004. 59-73 |
| [24] | 张海, 徐宗本. 学习理论综述(I):稳定性与泛化性. 工程数学学报, 2008, 25(1):1-9 http://www.cnki.com.cn/Article/CJFDTOTAL-GCSX200801004.htm Zhang Hai, Xu Zong-Ben. A survey on learning theory (I):stability and generalization. Chinese Journal of Engineering Mathematics, 2008, 25(1):1-9 http://www.cnki.com.cn/Article/CJFDTOTAL-GCSX200801004.htm |
| [25] | 胡政发. 统计量复杂性估计及其在机器学习中的应用. 自动化学报, 2008, 34(10):1332-1336 http://www.aas.net.cn/CN/abstract/abstract17904.shtml Hu Zheng-Fa. The statistic complexity estimates and its application to machine learning. Acta Automatica Sinica, 2008, 34(10):1332-1336 http://www.aas.net.cn/CN/abstract/abstract17904.shtml |
| [26] | 陈将宏. Rademacher复杂性与支持向量机的推广性能[硕士学位论文], 湖北大学, 中国, 2005. Chen Jiang-Hong. Rademacher Complexities and the Generalization Performance of SVM[Master dissertation], Hubei University, China, 2005. |
| [27] | Lei Y W, Ying Y M. Generalization analysis of multi-modal metric learning[Online], available:https://www.research-gate.net/publication/282969709, November 3, 2015 |
| [28] | Lei Y W, Ding L X, Bi Y Z. Local Rademacher complexity bounds based on covering numbers[Online], available:http://arxiv.org/pdf/1510.01463v1.pdf, October 6, 2015 |
| [29] | Lei Y W, Ding L X. Refined rademacher chaos complexity bounds with applications to the multikernel learning problem. Neural Computation, 2014, 26(4):739-760 doi: 10.1162/NECO_a_00566 |
| [30] | Lei Y W, Ding L X, Zhang W S. Generalization performance of radial basis function networks. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(3):551-564 doi: 10.1109/TNNLS.2014.2320280 |
| [31] | Lei Y W, Ding L X, Wu W L. Universal learning using free multivariate splines. Neurocomputing, 2013, 119(16):253-263 |
| [32] | 雷云文. Rademacher复杂度及其在机器学习中的应用[博士学位论文], 武汉大学, 中国, 2015. Lei Yun-Wen. Rademacher Complexities with Applications to Machine Learning[Ph.D. dissertation], Wuhan University, China, 2015. |
| [33] | Xu C, Liu T L, Tao D C, Xu C. Local Rademacher complexity for multi-label learning[Online], available:http://arxiv.org/pdf/1410.6990.pdf, October 26, 2014 |
| [34] | Gao W, Zhou Z H. Dropout Rademacher complexity of deep neural networks. Science China:Information Sciences, 2016, 59(7):072104:1-072104:12 doi: 10.1007/s11432-015-5470-z |
| [35] | de la Peña V, Giné E. Decoupling:From Dependence to Independence. New York:Springer-Verlag, 1999. |
| [36] | Cao Q, Guo Z C, Ying Y M. Generalization bounds for metric and similarity learning. Machine Learning, 2016, 102(1):115-132 doi: 10.1007/s10994-015-5499-7 |
| [37] | Mohri M, Medina A M. New analysis and algorithm for learning with drifting distributions. In:Proceedings of the 23rd International Conference on Algorithmic Learning Theory. Lyon, France:Springer-Verlag, 2012. 124-138 |
| [38] | Mohri M, Rostamizadeh A. Rademacher complexity bounds for non-I.I.D. processes. In:Proceedings of the 2008 Advances in Neural Information Processing Systems 21. Vancouver, BC, Canada:Curran Associates, Inc., 2008. 1097-1104 |
| [39] | Rakhlin A, Sridharan K. On martingale extensions of Vapnik-Chervonenkis theory with applications to online learning. Measures of Complexity. Switzerland:Springer International Publishing, 2015. 197-215 |
| [40] | Rakhlin A, Sridharan K, Tewari A. Sequential complexities and uniform martingale laws of large numbers. Probability Theory and Related Fields, 2015, 161(1-2):111-153 doi: 10.1007/s00440-013-0545-5 |
| [41] | Rakhlin A, Sridharan K, Tewari A. Online learning:random averages, combinatorial parameters, and learnability. In:Proceedings of the 2010 Advances in Neural Information Processing Systems 23. Vancouver, BC, Canada:Curran Associates, Inc., 2010. 1984-1992 |
| [42] | Kuznetsov V, Mohri M. Generalization bounds for time series prediction with non-stationary processes. In:Proceedings of the 25th International Conference on Algorithmic Learning Theory. Bled, Slovenia:Springer International Publishing, 2014. 260-274 |
| [43] | Kuznetsov V, Mohri M. Forecasting non-stationary time series:from theory to algorithms[Online], available:http://www.cims.nyu.edu/~vitaly/pub/fts.pdf, July 12, 2016 |
| [44] | Anguita D, Ghio A, Oneto L, Ridella S. A deep connection between the Vapnik-Chervonenkis entropy and the Rademacher complexity. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(12):2202-2211 doi: 10.1109/TNNLS.2014.2307359 |
| [45] | Kääriäinen M. Relating the Rademacher and VC Bounds, Technical Report, C-2004-57, Department of Computer Science, University of Helsinki, Finland, 2004. |
| [46] | Srebro N, Sridharan K. Note on refined dudley integral covering number bound[Online], available:http://ttic.uchicago.edu/~karthik/dudley.pdf, July 12, 2016 |
| [47] | Srebro N, Sridharan K, Tewari A. Smoothness, low noise and fast rates. In:Proceedings of the 2010 Advances in Neural Information Processing Systems 23. Vancouver, BC, Canada:Curran Associates, Inc., 2010. 2199-2207 |
| [48] | V'Yugin V V. VC dimension, fat-shattering dimension, Rademacher averages, and their applications. Measures of Complexity. Switzerland:Springer International Publishing, 2015. 57-74 |
| [49] | Ying Y M, Campbell C. Rademacher chaos complexities for learning the kernel problem. Neural Computation, 2010, 22(11):2858-2886 doi: 10.1162/NECO_a_00028 |
| [50] | Massart P. Some applications of concentration inequalities to statistics. Annales De La Faculté Des Sciences De Toulouse, 2000, 9(2):245-303 doi: 10.5802/afst.961 |
| [51] | Gnecco G, Sanguineti M. Approximation error bounds via Rademacher's complexity. Applied Mathematical Sciences, 2008, 2(4):153-176 |
| [52] | Boucheron S, Bousquet O, Lugosi G. Theory of classification:a survey of some recent advances. ESAIM:Probability and Statistics, 2005, 9:323-375 doi: 10.1051/ps:2005018 |
| [53] | Oneto L, Ghio A, Ridella S, Anguita D. Global Rademacher complexity bounds:from slow to fast convergence rates. Neural Processing Letters, 2016, 43(2):567-602 doi: 10.1007/s11063-015-9429-2 |
| [54] | Oneto L, Ghio A, Ridella S, Anguita D. Fast convergence of extended Rademacher complexity bounds. In:Proceedings of the 2015 International Joint Conference on Neural Networks. Killarney, Ireland:IEEE, 2015. 1-10 |
| [55] | Oneto L, Ghio A, Anguita D, Ridella S. An improved analysis of the Rademacher data-dependent bound using its self bounding property. Neural Networks, 2013, 44:107-111 doi: 10.1016/j.neunet.2013.03.017 |
| [56] | Yu H F, Jain P, Kar P, Dhillon I S. Large-scale multi-label learning with missing labels. In:Proceedings of the 31st International Conference on Machine Learning. Beijing, China, 2014. 593-601 |
| [57] | 丁万鼎. 测度论概要. 合肥:安徽人民出版社, 2005. Ding Wan-Ding. Essentials of Measure Theory. Hefei:Anhui People's Publishing House, 2005. |
| [58] | Hastie T, Friedman J, Tibshirani R. The Elements of Statistical Learning:Data Mining, Inference, and Prediction. New York:Springer, 2001. |
| [59] | van der Vaart A W, Wellner J A. Weak Convergence and Empirical Processes. New York:Springer-Verlag, 1996. |
| [60] | Ledoux M, Talagrand M. Probability in Banach Spaces:Isoperimetry and Processes. Berlin:Springer-Verlag, 1991. |
| [61] | Dudley R M. The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. Journal of Functional Analysis, 1967, 1(3):290-330 doi: 10.1016/0022-1236(67)90017-1 |
| [62] | Lozano F. Model selection using Rademacher penalization. In:Proceedings of the 2nd ICSC Symposium on Neural Networks. London:Academic Press, 2000. |
| [63] | Boucheron S, Lugosi G, Massart P. Concentration Inequalities:A Nonasymptotic Theory of Independence. United Kingdom:Oxford University Press, 2013. |
| [64] | 周志华. 机器学习. 北京:清化大学出版社, 2016. Zhou Zhi-Hua. Machine Learning. Beijing:Tsinghua University Press, 2016. |
| [65] | Mohri M, Rostamizadeh A, Talwalkar A. Foundations of Machine Learning. Cambridge:MIT Press, 2012. |
| [66] | 汪洪桥, 孙富春, 蔡艳宁, 陈宁, 丁林阁. 多核学习方法. 自动化学报, 2010, 36(8):1037-1050 doi: 10.3724/SP.J.1004.2010.01037 Wang Hong-Qiao, Sun Fu-Chun, Cai Yan-Ning, Chen Ning, Ding Lin-Ge. On multiple kernel learning methods. Acta Automatica Sinica, 2010, 36(8):1037-1050 doi: 10.3724/SP.J.1004.2010.01037 |
| [67] | McFee B, Lanckriet G. Learning multi-modal similarity. Journal of Machine Learning Research, 2011, 12:491-523 http://cn.bing.com/academic/profile?id=938f4935300ae26d31169f6fc0730139&encoded=0&v=paper_preview&mkt=zh-cn |
| [68] | Xie P T, Xing E P. Multi-modal distance metric learning. In:Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Beijing, China:AAAI, 2013. 1806-1812 |
| [69] | Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R R. Improving neural networks by preventing co-adaptation of feature detectors[Online], available:http://arxiv.org/pdf/1207.0580v1.pdf, July 3, 2012 |
| [70] | Wan L, Zeiler M, Zhang S X, LeCun Y, Fergus R. Regularization of neural networks using dropconnect. In:Proceedings of the 30th International Conference on Machine Learning. Atlanta, Georgia, USA, 2013. 1058-1066 |
| [71] | Rudin W. Functional Analysis (2nd edition). New York:McGraw-Hill, 1991. |
| [72] | Mendelson S. A few notes on statistical learning theory. Advanced Lectures on Machine Learning. Berlin Heidelberg:Springer, 2003. 1-40 |
| [73] | Sauer N. On the density of families of sets. Journal of Combinatorial Theory, Series A, 1972, 13(1):145-147 doi: 10.1016/0097-3165(72)90019-2 |
| [74] | Pollard D. Convergence of Stochastic Processes. New York:Springer-Verlag, 1984. |
| [75] | Duan H H. Bounding the fat shattering dimension of a composition function class built using a continuous logic connective[Online], available:http://arxiv.org/pdf/1105.4618v1. pdf, May 23, 2011 |
| [76] | Anthony M, Bartlett P L. Neural Network Learning-Theoretical Foundations. Cambridge:Cambridge University Press, 2009. |
| [77] | Alon N, Ben-David S, Cesa-Bianchi N, Haussler D. Scale-sensitive dimensions, uniform convergence, and learnability. Journal of the ACM, 1997, 44(4):615-631 doi: 10.1145/263867.263927 |
| [78] | Kolmogorov A N, Tihomirov V M. ε-entropy and ε-capacity of sets in functional space. American Mathematical Society Translations, 1961, 17(2):277-364 http://citeseerx.ist.psu.edu/showciting?cid=139921 |
| [79] | Bousquet O. New approaches to statistical learning theory. Journal of the Institute of Statistical Mathematics, 2003, 55(2):371-389 http://cn.bing.com/academic/profile?id=672bc751662466966b4cbbca41bef961&encoded=0&v=paper_preview&mkt=zh-cn |
| [80] | Wu P C, Hoi S C H, Xia H, Zhao P L, Wang D Y, Miao C Y. Online multimodal deep similarity learning with application to image retrieval. In:Proceedings of the 21st ACM International Conference on Multimedia. New York, USA:ACM, 2013. 153-162 |
| [81] | Xia H, Wu P C, Hoi S C H. Online multi-modal distance learning for scalable multimedia retrieval. In:Proceedings of the 6th International Conference on Web Search and Data Mining. New York, USA:ACM, 2013. 455-464 |
| [82] | Krzyzak A, Linder T. Radial basis function networks and complexity regularization in function learning. IEEE Transactions on Neural Networks, 1998, 9(2):247-256 doi: 10.1109/72.661120 |
| [83] | Niyogi P, Girosi F. On the relationship between generalization error, hypothesis complexity, and sample complexity for radial basis functions. Neural Computation, 1996, 8(4):819-842 doi: 10.1162/neco.1996.8.4.819 |
| [84] | Park J, Sandberg I W. Universal approximation using radial-basis-function networks. Neural Computation, 1991, 3(2):246-257 doi: 10.1162/neco.1991.3.2.246 |
| [85] | Girosi F. Approximation error bounds that use VC-bounds[Online], available:https://www.researchgate.net/public-ation/2782224_Approximation_Error_Bounds_That_Use_Vc-Bounds, February 18, 2013 |
| [86] | Györfi L, Kohler M, Krzyżak A, Walk H. A Distribution-Free Theory of Nonparametric Regression. Berlin:Springer-Verlag, 2010. |
| [87] | Haussler D. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 1992, 100(1):78-150 doi: 10.1016/0890-5401(92)90010-D |
| [88] | Yu B. Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, 1994, 22(1):94-116 doi: 10.1214/aop/1176988849 |
| [89] | Meir R. Nonparametric time series prediction through adaptive model selection. Machine Learning, 2000, 39(1):5-34 doi: 10.1023/A:1007602715810 |
| [90] | Bartlett P L. Learning with a slowly changing distribution. In:Proceedings of the 5th Annual Workshop on Computational Learning Theory. New York, USA:ACM, 1992. 243-252 |
| [91] | Mansour Y, Mohri M, Rostamizadeh A. Domain adaptation:learning bounds and algorithms. In:Proceedings of the 22nd Annual Conference on Learning Theory. Montreal, QC:Morgan Kaufmann Publishers, 2009. |
| [92] | Anguita D, Ghio A, Oneto L, Ridella S. In-sample and out-of-sample model selection and error estimation for support vector machines. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(9):1390-1406 doi: 10.1109/TNNLS.2012.2202401 |
| [93] | El-Yaniv R, Pechyony D. Transductive Rademacher complexity and its applications. Journal of Artificial Intelligence Research, 2007, 35:193-234 |
| [94] | Tolstikhin I, Blanchard G, Kloft M. Localized complexities for transductive learning[Online], available:http://arxiv. org/pdf/1411.7200.pdf, November 26, 2014 |
| [95] | Tolstikhin I, Zhivotovskiy N, Blanchard G. Permutational Rademacher complexity. Algorithmic Learning Theory. Switzerland:Springer International Publishing, 2015. 209-223 |
| [96] | Chazottes J R, Collet P, Külske C, Redig F. Concentration inequalities for random fields via coupling. Probability Theory and Related Fields, 2007, 137(1):201-225 |
| [97] | Belomestny D, Spokoiny V. Concentration inequalities for smooth random fields. Theory of Probability and Its Applications, 2014, 58(2):314-323 doi: 10.1137/S0040585X9798659X |
| [98] | Zhu X J, Rogers T T, Gibson B R. Human Rademacher complexity. In:Proceedings of the 2009 Advances in Neural Information Processing Systems 22. Vancouver, BC, Canada:Curran Associates, Inc., 2009. 2322-2330 |
| [99] | Vahdat M, Oneto L, Ghio A, Anguita D, Funk M, Rauterberg M. Human algorithmic stability and human Radema-cher complexity. In:Proceedings of the 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges, Belgium:Katholieke Universiteit Leuven, 2015. 313-318 |
| [100] | Nolan D, Pollard D. U-processes:rates of convergence. The Annals of Statistics, 1987, 15(2):780-799 doi: 10.1214/aos/1176350374 |
| [101] | Karlin S, Taylor H M. A First Course in Stochastic Processes (2nd edition). New York:Academic Press, 1975. |