2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

Rademacher复杂度在统计学习理论中的研究

吴新星 张军平

吴新星, 张军平. Rademacher复杂度在统计学习理论中的研究. 自动化学报, 2017, 43(1): 20-39. doi: 10.16383/j.aas.2017.c160149
引用本文: 吴新星, 张军平. Rademacher复杂度在统计学习理论中的研究. 自动化学报, 2017, 43(1): 20-39. doi: 10.16383/j.aas.2017.c160149
WU Xin-Xing, ZHANG Jun-Ping. Researches on Rademacher Complexities in Statistical Learning Theory: A Survey. ACTA AUTOMATICA SINICA, 2017, 43(1): 20-39. doi: 10.16383/j.aas.2017.c160149
Citation: WU Xin-Xing, ZHANG Jun-Ping. Researches on Rademacher Complexities in Statistical Learning Theory: A Survey. ACTA AUTOMATICA SINICA, 2017, 43(1): 20-39. doi: 10.16383/j.aas.2017.c160149

Rademacher复杂度在统计学习理论中的研究

doi: 10.16383/j.aas.2017.c160149
基金项目: 

上海浦江人才计划 16PJD009

国家自然科学基金 61673118, 61273299

上海市人才发展资金 201629

详细信息
    作者简介:

    吴新星 复旦大学计算机科学技术学院访问学者,上海电子信息职业技术学院计算机应用系副教授.主要研究方向为统计学习理论与形式化方法.E-mail:xinxingwu@yeah.net

    通讯作者:

    张军平 复旦大学计算机科学技术学院教授.主要研究方向为机器学习,智能交通,生物认证与图像处理.本文通信作者.E-mail:jpzhang@fudan.edu.cn.

Researches on Rademacher Complexities in Statistical Learning Theory: A Survey

Funds: 

Shanghai Pujiang Program 16PJD009

Supported by National Natural Science Foundation of China 61673118, 61273299

and Shanghai Talents Development Funds 201629

More Information
    Author Bio:

    WU Xin-Xing Visiting scholar at the School of Computer Science, Fu-dan University, associate professor in the Department of Computer, Shanghai Technical Insti-tute of Electronics and Information. His research interest covers statistical learning theory and formal methods.

    Corresponding author: ZHANG Jun-Ping Professor at the School of Computer Science, Fudan University. His research interest cov-ers machine learning, intelligent trans- portation systems, biometric authentication, and image processing. Corresponding author of this paper.. E-mail:jpzhang@fudan.edu.cn.
  • 摘要: 假设空间复杂性是统计学习理论中用于分析学习模型泛化能力的关键因素.与数据无关的复杂度不同,Rademacher复杂度是与数据分布相关的,因而通常能得到比传统复杂度更紧致的泛化界表达.近年来,Rademacher复杂度在统计学习理论泛化能力分析的应用发展中起到了重要的作用.鉴于其重要性,本文梳理了各种形式的Rademacher复杂度及其与传统复杂度之间的关联性,并探讨了基于Rademacher复杂度进行学习模型泛化能力分析的基本技巧.考虑样本数据的独立同分布和非独立同分布两种产生环境,总结并分析了Rademacher复杂度在泛化能力分析方面的研究现状.展望了当前Rademacher复杂度在非监督框架与非序列环境等方面研究的不足,及其进一步应用与发展.
  • 图  1  复杂性度量及其相互关系

    Fig.  1  Complexity measures and their relationships

    表  1  Rademacher 复杂度及传统复杂度

    Table  1  Rademacher complexities and kinds of complexities of function classes

    复杂度类型本数据集
    产生环境
    复杂度名称
    传统复杂度 i.i.d./non-
    i.i.d.
    VC 熵,退火 VC 熵,生长函数,VC 维,
    覆盖数,伪维度,Fat-shattering维等
    经典 Rademacher 复杂度,局部
    Rademacher 复杂度
    Rade-
    macher
    复杂度
    i.i.d.Rademacher chaos 复杂度,单模态
    Rademacher 复杂度,
    多模态 Rademacher 复杂度,Dropout
    Rademacher 复杂度
    non-i.i.d.独立不同分布 Rademacher 复杂度,
    块Rademacher 复杂度,
    序列 Rademacher 复杂度
    下载: 导出CSV

    表  2  i.i.d.情形的泛化界分析

    Table  2  Generalization analysis for i.i.d.

    本数据集产生环境学习策略假设空间泛化能力
    i.i.d.ERM1) $\mathcal{X}\rightarrow\{-1,+1\}$O$O\left( \frac{1}{\sqrt{n}} \right)$ [52],O$\left(\frac{1}{n} \right)$[53-55]
    2) $\mathcal{X}\rightarrow{\bf R}$O$\left( \frac{1}{n} \right)$[47],$\text{O}{{\left( \frac{{{\log }^{p}}n}{n} \right)}^{1/(2-\alpha )}}$ [28, 32]
    3) ${\bf R}^{d}\rightarrow{\bf R}$O$O\left( \frac{1}{\sqrt{n}} \right)$ [51]
    4) $\mathcal{X}\rightarrow\{-1,+1\}^{L}$ O$O\left( \frac{1}{\sqrt{n}} \right)$ [56],$O\left( \frac{\log n}{n} \right)$[33]
    5) 自由样条函数空间$O\left( {{\left( \frac{\log n}{n} \right)}^{2\alpha /(1+2\alpha )}} \right)$ [31-32]
    6) $\mathcal{X}× R_{s}\rightarrow\mathcal{{\bf R}}$ $O\left( \frac{{{p}^{(k+1)/2}}}{\sqrt{n}}+\frac{1}{\sqrt{n}} \right),O\left( \frac{{{p}^{k+1}}}{\sqrt{n}}+\frac{1}{\sqrt{n}} \right)$[34]
    正则化7) RKHS$O\left(\frac{\sqrt{{\rm{tr}}(G)}}{n}+\frac{1}{\sqrt{n}}\right)$ [48]
    8) ${S}^{d}$$O\left( \frac{1}{\sqrt{n}} \right)$ [36]
    9) ${S}^{d×(md)}$$O\left( \frac{1}{\sqrt{n}} \right)$ [27, 32]
    SRM10) 神经网络空间$O\left(\left(\frac{1}{n}\right)^{\frac{1}{2-\alpha_{p}}}\right)$[30, 32]
    下载: 导出CSV

    表  3  non-i.i.d.情形的泛化界分析

    Table  3  Generalization analysis for non-i.i.d.

    样本数据集产生环境假设空间泛化能力
    独立不同分布1)$\mathcal{X}\rightarrow\mathcal{Y}$$O\left(\frac{1}{\sqrt{n}}\right)$[37]
    平稳分布且$\beta\textrm{-mixing}$2) $\mathcal{X}\rightarrow{\bf R}$$O\left(\frac{\sqrt{{\rm{tr}}(G)}}{n}+\frac{1}{\sqrt{n}}\right)$[38]
    非平稳分布且$\beta\textrm{-mixing}$3) $\mathcal{X}\rightarrow\mathcal{Y}$非平稳性可能导致
    不收敛于0[42]
    4) $\mathcal{Z}\rightarrow $[-1,1]一致收敛[39-41]
    下载: 导出CSV
  • [1] Tikhonov A N, Arsenin V Y. Solution of Ill-posed Problems. Washington:Winston and Sons, 1977.
    [2] Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 1974, 19(6):716-723 doi: 10.1109/TAC.1974.1100705
    [3] Akaike H. A Bayesian analysis of the minimum AIC procedure. Annals of the Institute of Statistical Mathematics, 1978, 30(1):9-14 doi: 10.1007/BF02480194
    [4] Schwarz G. Estimating the dimension of a model. Annals of Statistics, 1978, 6(2):461-464 doi: 10.1214/aos/1176344136
    [5] Kolmogorov A N. On tables of random numbers. Sankhya:The Indian Journal of Statistics, 1963, 25:369-376 http://cn.bing.com/academic/profile?id=2319581f805b56782c9de35bf823fe97&encoded=0&v=paper_preview&mkt=zh-cn
    [6] Kolmogorov A N. Three approaches to the quantitative definition of information. Problems of Information Transmission, 1965, 1(1):1-7 http://cn.bing.com/academic/profile?id=c9935bf10e0666e93d39e022ca502b3d&encoded=0&v=paper_preview&mkt=zh-cn
    [7] Chaitin G J. On the length of programs for computing finite binary sequences. Journal of the ACM, 1966, 13(4):547-569 doi: 10.1145/321356.321363
    [8] Solomonoff R J. A formal theory of inductive inference. Part II. Information and Control, 1964, 7(2):224-254 doi: 10.1016/S0019-9958(64)90131-7
    [9] Wallace C S, Boulton D M. An information measure for classification. The Computer Journal, 1968, 11(2):185-194 doi: 10.1093/comjnl/11.2.185
    [10] Vapnik V N, Chervonenkis A Y. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and Its Applications, 1971, 16(2):264-280 doi: 10.1137/1116025
    [11] Vapnik V N. The Nature of Statistical Learning Theory. New York:Springer, 1995.
    [12] Vapnik V N. An overview of statistical learning theory. IEEE Transactions on Neural Networks, 1999, 10(5):988-999 doi: 10.1109/72.788640
    [13] Popper K R. The Logic of Scientific Discovery. United Kingdom:Hutchinson, 1959.
    [14] Valiant L G. A theory of the learnable. Communications of the ACM, 1984, 27(11):1134-1142 doi: 10.1145/1968.1972
    [15] Kearns M J, Schapire R E. Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 1994, 48(3):464-497 doi: 10.1016/S0022-0000(05)80062-5
    [16] Bartlett P L, Long P M, Williamson R C. Fat-shattering and the learnability of real-valued functions. Journal of Computer and System Sciences, 1996, 52(3):434-452 doi: 10.1006/jcss.1996.0033
    [17] Shawe-Taylor J, Bartlett P L, Williamson R C, Anthony M. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 1998, 44(5):1926-1940 doi: 10.1109/18.705570
    [18] Koltchinskii V. Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory, 2001, 47(5):1902-1914 doi: 10.1109/18.930926
    [19] Bartlett P L, Mendelson S. Rademacher and Gaussian complexities:risk bounds and structural results. The Journal of Machine Learning Research, 2003, 3:463-482 http://cn.bing.com/academic/profile?id=59b9b1ed8c26e154cc8cc3bb9f31c21f&encoded=0&v=paper_preview&mkt=zh-cn
    [20] Bartlett P L, Bousquet O, Mendelson S. Local Rademacher complexities. The Annals of Statistics, 2005, 33(4):1497-1537 doi: 10.1214/009053605000000282
    [21] Koltchinskii V. Local Rademacher complexities and oracle inequalities in risk minimization. The Annals of Statistics, 2006, 34(6):2593-2656 doi: 10.1214/009053606000001019
    [22] Koltchinskii V, Panchenko D. Rademacher processes and bounding the risk of function learning. High Dimensional Probability II. Boston:Birkhäuser, 2004. 443-457
    [23] Bousquet O, Koltchinskii V, Panchenko D. Some local measures of complexity of convex hulls and generalization bounds. Computational Learning Theory. Sydney, Australia:Springer, 2004. 59-73
    [24] 张海, 徐宗本. 学习理论综述(I):稳定性与泛化性. 工程数学学报, 2008, 25(1):1-9 http://www.cnki.com.cn/Article/CJFDTOTAL-GCSX200801004.htm

    Zhang Hai, Xu Zong-Ben. A survey on learning theory (I):stability and generalization. Chinese Journal of Engineering Mathematics, 2008, 25(1):1-9 http://www.cnki.com.cn/Article/CJFDTOTAL-GCSX200801004.htm
    [25] 胡政发. 统计量复杂性估计及其在机器学习中的应用. 自动化学报, 2008, 34(10):1332-1336 http://www.aas.net.cn/CN/abstract/abstract17904.shtml

    Hu Zheng-Fa. The statistic complexity estimates and its application to machine learning. Acta Automatica Sinica, 2008, 34(10):1332-1336 http://www.aas.net.cn/CN/abstract/abstract17904.shtml
    [26] 陈将宏. Rademacher复杂性与支持向量机的推广性能[硕士学位论文], 湖北大学, 中国, 2005.

    Chen Jiang-Hong. Rademacher Complexities and the Generalization Performance of SVM[Master dissertation], Hubei University, China, 2005.
    [27] Lei Y W, Ying Y M. Generalization analysis of multi-modal metric learning[Online], available:https://www.research-gate.net/publication/282969709, November 3, 2015
    [28] Lei Y W, Ding L X, Bi Y Z. Local Rademacher complexity bounds based on covering numbers[Online], available:http://arxiv.org/pdf/1510.01463v1.pdf, October 6, 2015
    [29] Lei Y W, Ding L X. Refined rademacher chaos complexity bounds with applications to the multikernel learning problem. Neural Computation, 2014, 26(4):739-760 doi: 10.1162/NECO_a_00566
    [30] Lei Y W, Ding L X, Zhang W S. Generalization performance of radial basis function networks. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(3):551-564 doi: 10.1109/TNNLS.2014.2320280
    [31] Lei Y W, Ding L X, Wu W L. Universal learning using free multivariate splines. Neurocomputing, 2013, 119(16):253-263
    [32] 雷云文. Rademacher复杂度及其在机器学习中的应用[博士学位论文], 武汉大学, 中国, 2015.

    Lei Yun-Wen. Rademacher Complexities with Applications to Machine Learning[Ph.D. dissertation], Wuhan University, China, 2015.
    [33] Xu C, Liu T L, Tao D C, Xu C. Local Rademacher complexity for multi-label learning[Online], available:http://arxiv.org/pdf/1410.6990.pdf, October 26, 2014
    [34] Gao W, Zhou Z H. Dropout Rademacher complexity of deep neural networks. Science China:Information Sciences, 2016, 59(7):072104:1-072104:12 doi: 10.1007/s11432-015-5470-z
    [35] de la Peña V, Giné E. Decoupling:From Dependence to Independence. New York:Springer-Verlag, 1999.
    [36] Cao Q, Guo Z C, Ying Y M. Generalization bounds for metric and similarity learning. Machine Learning, 2016, 102(1):115-132 doi: 10.1007/s10994-015-5499-7
    [37] Mohri M, Medina A M. New analysis and algorithm for learning with drifting distributions. In:Proceedings of the 23rd International Conference on Algorithmic Learning Theory. Lyon, France:Springer-Verlag, 2012. 124-138
    [38] Mohri M, Rostamizadeh A. Rademacher complexity bounds for non-I.I.D. processes. In:Proceedings of the 2008 Advances in Neural Information Processing Systems 21. Vancouver, BC, Canada:Curran Associates, Inc., 2008. 1097-1104
    [39] Rakhlin A, Sridharan K. On martingale extensions of Vapnik-Chervonenkis theory with applications to online learning. Measures of Complexity. Switzerland:Springer International Publishing, 2015. 197-215
    [40] Rakhlin A, Sridharan K, Tewari A. Sequential complexities and uniform martingale laws of large numbers. Probability Theory and Related Fields, 2015, 161(1-2):111-153 doi: 10.1007/s00440-013-0545-5
    [41] Rakhlin A, Sridharan K, Tewari A. Online learning:random averages, combinatorial parameters, and learnability. In:Proceedings of the 2010 Advances in Neural Information Processing Systems 23. Vancouver, BC, Canada:Curran Associates, Inc., 2010. 1984-1992
    [42] Kuznetsov V, Mohri M. Generalization bounds for time series prediction with non-stationary processes. In:Proceedings of the 25th International Conference on Algorithmic Learning Theory. Bled, Slovenia:Springer International Publishing, 2014. 260-274
    [43] Kuznetsov V, Mohri M. Forecasting non-stationary time series:from theory to algorithms[Online], available:http://www.cims.nyu.edu/~vitaly/pub/fts.pdf, July 12, 2016
    [44] Anguita D, Ghio A, Oneto L, Ridella S. A deep connection between the Vapnik-Chervonenkis entropy and the Rademacher complexity. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(12):2202-2211 doi: 10.1109/TNNLS.2014.2307359
    [45] Kääriäinen M. Relating the Rademacher and VC Bounds, Technical Report, C-2004-57, Department of Computer Science, University of Helsinki, Finland, 2004.
    [46] Srebro N, Sridharan K. Note on refined dudley integral covering number bound[Online], available:http://ttic.uchicago.edu/~karthik/dudley.pdf, July 12, 2016
    [47] Srebro N, Sridharan K, Tewari A. Smoothness, low noise and fast rates. In:Proceedings of the 2010 Advances in Neural Information Processing Systems 23. Vancouver, BC, Canada:Curran Associates, Inc., 2010. 2199-2207
    [48] V'Yugin V V. VC dimension, fat-shattering dimension, Rademacher averages, and their applications. Measures of Complexity. Switzerland:Springer International Publishing, 2015. 57-74
    [49] Ying Y M, Campbell C. Rademacher chaos complexities for learning the kernel problem. Neural Computation, 2010, 22(11):2858-2886 doi: 10.1162/NECO_a_00028
    [50] Massart P. Some applications of concentration inequalities to statistics. Annales De La Faculté Des Sciences De Toulouse, 2000, 9(2):245-303 doi: 10.5802/afst.961
    [51] Gnecco G, Sanguineti M. Approximation error bounds via Rademacher's complexity. Applied Mathematical Sciences, 2008, 2(4):153-176
    [52] Boucheron S, Bousquet O, Lugosi G. Theory of classification:a survey of some recent advances. ESAIM:Probability and Statistics, 2005, 9:323-375 doi: 10.1051/ps:2005018
    [53] Oneto L, Ghio A, Ridella S, Anguita D. Global Rademacher complexity bounds:from slow to fast convergence rates. Neural Processing Letters, 2016, 43(2):567-602 doi: 10.1007/s11063-015-9429-2
    [54] Oneto L, Ghio A, Ridella S, Anguita D. Fast convergence of extended Rademacher complexity bounds. In:Proceedings of the 2015 International Joint Conference on Neural Networks. Killarney, Ireland:IEEE, 2015. 1-10
    [55] Oneto L, Ghio A, Anguita D, Ridella S. An improved analysis of the Rademacher data-dependent bound using its self bounding property. Neural Networks, 2013, 44:107-111 doi: 10.1016/j.neunet.2013.03.017
    [56] Yu H F, Jain P, Kar P, Dhillon I S. Large-scale multi-label learning with missing labels. In:Proceedings of the 31st International Conference on Machine Learning. Beijing, China, 2014. 593-601
    [57] 丁万鼎. 测度论概要. 合肥:安徽人民出版社, 2005.

    Ding Wan-Ding. Essentials of Measure Theory. Hefei:Anhui People's Publishing House, 2005.
    [58] Hastie T, Friedman J, Tibshirani R. The Elements of Statistical Learning:Data Mining, Inference, and Prediction. New York:Springer, 2001.
    [59] van der Vaart A W, Wellner J A. Weak Convergence and Empirical Processes. New York:Springer-Verlag, 1996.
    [60] Ledoux M, Talagrand M. Probability in Banach Spaces:Isoperimetry and Processes. Berlin:Springer-Verlag, 1991.
    [61] Dudley R M. The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. Journal of Functional Analysis, 1967, 1(3):290-330 doi: 10.1016/0022-1236(67)90017-1
    [62] Lozano F. Model selection using Rademacher penalization. In:Proceedings of the 2nd ICSC Symposium on Neural Networks. London:Academic Press, 2000.
    [63] Boucheron S, Lugosi G, Massart P. Concentration Inequalities:A Nonasymptotic Theory of Independence. United Kingdom:Oxford University Press, 2013.
    [64] 周志华. 机器学习. 北京:清化大学出版社, 2016.

    Zhou Zhi-Hua. Machine Learning. Beijing:Tsinghua University Press, 2016.
    [65] Mohri M, Rostamizadeh A, Talwalkar A. Foundations of Machine Learning. Cambridge:MIT Press, 2012.
    [66] 汪洪桥, 孙富春, 蔡艳宁, 陈宁, 丁林阁. 多核学习方法. 自动化学报, 2010, 36(8):1037-1050 doi: 10.3724/SP.J.1004.2010.01037

    Wang Hong-Qiao, Sun Fu-Chun, Cai Yan-Ning, Chen Ning, Ding Lin-Ge. On multiple kernel learning methods. Acta Automatica Sinica, 2010, 36(8):1037-1050 doi: 10.3724/SP.J.1004.2010.01037
    [67] McFee B, Lanckriet G. Learning multi-modal similarity. Journal of Machine Learning Research, 2011, 12:491-523 http://cn.bing.com/academic/profile?id=938f4935300ae26d31169f6fc0730139&encoded=0&v=paper_preview&mkt=zh-cn
    [68] Xie P T, Xing E P. Multi-modal distance metric learning. In:Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Beijing, China:AAAI, 2013. 1806-1812
    [69] Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R R. Improving neural networks by preventing co-adaptation of feature detectors[Online], available:http://arxiv.org/pdf/1207.0580v1.pdf, July 3, 2012
    [70] Wan L, Zeiler M, Zhang S X, LeCun Y, Fergus R. Regularization of neural networks using dropconnect. In:Proceedings of the 30th International Conference on Machine Learning. Atlanta, Georgia, USA, 2013. 1058-1066
    [71] Rudin W. Functional Analysis (2nd edition). New York:McGraw-Hill, 1991.
    [72] Mendelson S. A few notes on statistical learning theory. Advanced Lectures on Machine Learning. Berlin Heidelberg:Springer, 2003. 1-40
    [73] Sauer N. On the density of families of sets. Journal of Combinatorial Theory, Series A, 1972, 13(1):145-147 doi: 10.1016/0097-3165(72)90019-2
    [74] Pollard D. Convergence of Stochastic Processes. New York:Springer-Verlag, 1984.
    [75] Duan H H. Bounding the fat shattering dimension of a composition function class built using a continuous logic connective[Online], available:http://arxiv.org/pdf/1105.4618v1. pdf, May 23, 2011
    [76] Anthony M, Bartlett P L. Neural Network Learning-Theoretical Foundations. Cambridge:Cambridge University Press, 2009.
    [77] Alon N, Ben-David S, Cesa-Bianchi N, Haussler D. Scale-sensitive dimensions, uniform convergence, and learnability. Journal of the ACM, 1997, 44(4):615-631 doi: 10.1145/263867.263927
    [78] Kolmogorov A N, Tihomirov V M. ε-entropy and ε-capacity of sets in functional space. American Mathematical Society Translations, 1961, 17(2):277-364 http://citeseerx.ist.psu.edu/showciting?cid=139921
    [79] Bousquet O. New approaches to statistical learning theory. Journal of the Institute of Statistical Mathematics, 2003, 55(2):371-389 http://cn.bing.com/academic/profile?id=672bc751662466966b4cbbca41bef961&encoded=0&v=paper_preview&mkt=zh-cn
    [80] Wu P C, Hoi S C H, Xia H, Zhao P L, Wang D Y, Miao C Y. Online multimodal deep similarity learning with application to image retrieval. In:Proceedings of the 21st ACM International Conference on Multimedia. New York, USA:ACM, 2013. 153-162
    [81] Xia H, Wu P C, Hoi S C H. Online multi-modal distance learning for scalable multimedia retrieval. In:Proceedings of the 6th International Conference on Web Search and Data Mining. New York, USA:ACM, 2013. 455-464
    [82] Krzyzak A, Linder T. Radial basis function networks and complexity regularization in function learning. IEEE Transactions on Neural Networks, 1998, 9(2):247-256 doi: 10.1109/72.661120
    [83] Niyogi P, Girosi F. On the relationship between generalization error, hypothesis complexity, and sample complexity for radial basis functions. Neural Computation, 1996, 8(4):819-842 doi: 10.1162/neco.1996.8.4.819
    [84] Park J, Sandberg I W. Universal approximation using radial-basis-function networks. Neural Computation, 1991, 3(2):246-257 doi: 10.1162/neco.1991.3.2.246
    [85] Girosi F. Approximation error bounds that use VC-bounds[Online], available:https://www.researchgate.net/public-ation/2782224_Approximation_Error_Bounds_That_Use_Vc-Bounds, February 18, 2013
    [86] Györfi L, Kohler M, Krzyżak A, Walk H. A Distribution-Free Theory of Nonparametric Regression. Berlin:Springer-Verlag, 2010.
    [87] Haussler D. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 1992, 100(1):78-150 doi: 10.1016/0890-5401(92)90010-D
    [88] Yu B. Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, 1994, 22(1):94-116 doi: 10.1214/aop/1176988849
    [89] Meir R. Nonparametric time series prediction through adaptive model selection. Machine Learning, 2000, 39(1):5-34 doi: 10.1023/A:1007602715810
    [90] Bartlett P L. Learning with a slowly changing distribution. In:Proceedings of the 5th Annual Workshop on Computational Learning Theory. New York, USA:ACM, 1992. 243-252
    [91] Mansour Y, Mohri M, Rostamizadeh A. Domain adaptation:learning bounds and algorithms. In:Proceedings of the 22nd Annual Conference on Learning Theory. Montreal, QC:Morgan Kaufmann Publishers, 2009.
    [92] Anguita D, Ghio A, Oneto L, Ridella S. In-sample and out-of-sample model selection and error estimation for support vector machines. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(9):1390-1406 doi: 10.1109/TNNLS.2012.2202401
    [93] El-Yaniv R, Pechyony D. Transductive Rademacher complexity and its applications. Journal of Artificial Intelligence Research, 2007, 35:193-234
    [94] Tolstikhin I, Blanchard G, Kloft M. Localized complexities for transductive learning[Online], available:http://arxiv. org/pdf/1411.7200.pdf, November 26, 2014
    [95] Tolstikhin I, Zhivotovskiy N, Blanchard G. Permutational Rademacher complexity. Algorithmic Learning Theory. Switzerland:Springer International Publishing, 2015. 209-223
    [96] Chazottes J R, Collet P, Külske C, Redig F. Concentration inequalities for random fields via coupling. Probability Theory and Related Fields, 2007, 137(1):201-225
    [97] Belomestny D, Spokoiny V. Concentration inequalities for smooth random fields. Theory of Probability and Its Applications, 2014, 58(2):314-323 doi: 10.1137/S0040585X9798659X
    [98] Zhu X J, Rogers T T, Gibson B R. Human Rademacher complexity. In:Proceedings of the 2009 Advances in Neural Information Processing Systems 22. Vancouver, BC, Canada:Curran Associates, Inc., 2009. 2322-2330
    [99] Vahdat M, Oneto L, Ghio A, Anguita D, Funk M, Rauterberg M. Human algorithmic stability and human Radema-cher complexity. In:Proceedings of the 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges, Belgium:Katholieke Universiteit Leuven, 2015. 313-318
    [100] Nolan D, Pollard D. U-processes:rates of convergence. The Annals of Statistics, 1987, 15(2):780-799 doi: 10.1214/aos/1176350374
    [101] Karlin S, Taylor H M. A First Course in Stochastic Processes (2nd edition). New York:Academic Press, 1975.
  • 加载中
图(1) / 表(3)
计量
  • 文章访问数:  3663
  • HTML全文浏览量:  2247
  • PDF下载量:  1970
  • 被引次数: 0
出版历程
  • 收稿日期:  2016-02-18
  • 录用日期:  2016-07-11
  • 刊出日期:  2017-01-01

目录

    /

    返回文章
    返回