2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

面向大模型时代的持续学习方法论演变

王全子昂 王仁振 孟德宇 徐宗本

王全子昂, 王仁振, 孟德宇, 徐宗本. 面向大模型时代的持续学习方法论演变. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240805
引用本文: 王全子昂, 王仁振, 孟德宇, 徐宗本. 面向大模型时代的持续学习方法论演变. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240805
Wang Quan-Zi-Ang, Wang Ren-Zhen, Meng De-Yu, Xu Zong-Ben. The evolution of continual learning methodologies in the era of large models. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240805
Citation: Wang Quan-Zi-Ang, Wang Ren-Zhen, Meng De-Yu, Xu Zong-Ben. The evolution of continual learning methodologies in the era of large models. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240805

面向大模型时代的持续学习方法论演变

doi: 10.16383/j.aas.c240805 cstr: 32138.14.j.aas.c240805
基金项目: 国家重点研发计划 (2020YFA0713900), 鹏城实验室重大项目 (PCL2024A06), 国家自然科学基金数学天元基金天元数学西北中心强化项目 (12426105), 国家自然科学基金 (62272375, 62306233) 资助
详细信息
    作者简介:

    王全子昂:西安交通大学数学与统计学院博士研究生. 2019 年获得西安交通大学学士学位. 主要研究方向为机器学习, 持续学习和半监督学习. E-mail: sniperwqza@stu.xjtu.edu.cn

    王仁振:西安交通大学数学与统计学院助理教授. 2022 年获得西安交通大学博士学位. 主要研究方向为半监督学习, 持续学习和医学图像分析. E-mail: rzwang@xjtu.edu.cn

    孟德宇:西安交通大学数学与统计学院教授, 澳门科技大学特聘教授. 2008 年获得西安交通大学博士学位. 主要研究方向为机器学习, 人工智能和计算机视觉. 本文通信作者. E-mail: dymeng@mail.xjtu.edu.cn

    徐宗本:中国科学院院士, 西安交通大学数学与统计学院教授. 1987 年获得西安交通大学博士学位. 主要研究方向为大数据与人工智能的数学基础与核心算法. E-mail: zbxu@mail.xjtu.edu.cn

The Evolution of Continual Learning Methodologies in the Era of Large Models

Funds: Supported by National Key R&D Program of China (2020YFA0713900), Major Key Project of PCL (PCL2024A06), Tianyuan Fund for Mathematics of the National Natural Science Foundation of China (12426105), National Natural Science Foundation of China (62272375, 62306233)
More Information
    Author Bio:

    WANG Quan-Zi-Ang Ph.D. candidate at the School of Mathematics and Statistics, Xi'an Jiaotong University. He received his bachelor degree from Xi'an Jiaotong University in 2019. His research interest covers machine learning, continual learning, and semi-supervised learning

    WANG Ren-Zhen Assistant professor at the School of Mathematics and Statistics, Xi'an Jiaotong University. He received his Ph.D. degree from Xi'an Jiaotong University in 2022. His research interest covers semi-supervised learning, continual learning, and medical image analysis

    MENG De-Yu Professor at the School of Mathematics and Statistics, Xi'an Jiaotong University, and distinguished professor at Macau University of Science and Technology. He received his Ph.D. degree from Xi'an Jiaotong University in 2008. His research interest covers machine learning, artificial intelligence, and computer vision. Corresponding author of this paper

    XU Zong-Ben Academician of the Chinese Academy of Sciences, and professor at the School of Mathematics and Statistics, Xi'an Jiaotong University. He received his Ph.D. degree from Xi'an Jiaotong University in 1987. His main research interest is mathematical foundation and core algorithms of big data and artificial intelligence

  • 摘要: 以深度学习为代表的机器学习方法已经在多个领域取得显著进展, 然而大多方法局限于静态场景, 难以像人类一样在开放世界的动态场景中不断学习新知识, 同时保持已经学过的旧知识. 为解决该挑战, 持续学习 (Continual learning, CL) 受到越来越多的关注. 现有的持续学习方法大致可以分为两类, 即相对传统的非预训练模型持续学习方法以及大模型时代下逐步演进的预训练模型持续学习方法. 本文旨在对这两类方法的研究进展进行详细的综述, 主要从四个层面对比非预训练模型和预训练模型方法的异同点, 即数据层面、模型层面、损失/优化层面以及理论层面. 着重分析从应用非预训练模型的方法发展到应用预训练模型的方法的技术变化, 并分析出现此类差异的内在本质. 最后, 总结并展望未来持续学习发展的趋势.
  • 图  1  动态场景下的机器学习图示

    Fig.  1  Illustration of machine learning under the dynamic enviornment

    图  2  持续学习设置图示

    Fig.  2  Illustration of the continual learning setting

    图  3  数据层面的持续学习方法图示

    Fig.  3  Illustration of CL methods on the perspective of data

    图  4  模型结构相关持续学习方法图示

    Fig.  4  Illustration of CL methods on model architecture

    图  5  损失/优化层面梯度的对齐方法和损失平滑方法图示

    Fig.  5  Illustration of gradient alignment and loss flatness CL methods on the perspective of loss and optimizaiton

    表  1  持续学习方法总结

    Table  1  Summary of continual learning methods

    方法分类 非预训练持续学习方法 预训练持续学习方法
    数据层面基于重放数据增广: [47, 5253][9698]
    数据表征: [4748, 54]
    数据选择: [3738, 46, 5659, 6164]
    基于伪重放生成模型: [6673]生成预训练模型: [8590, 93, 99101]
    合成数据集: [7680]合成数据集: [95]
    特征重放: [8283]特征重放: [94]
    模型层面模型表征[105108, 110114, 116118][146150]
    模型偏差[83, 119131][148, 151153]
    模型结构扩展模型: [132139]提示微调: [154168]
    路径模型: [140145]适配器及专家模型: [169190]
    损失/优化层面正则化[194199][170, 219]
    梯度对齐[200204][220221]
    损失平滑[205211]
    元持续学习[121, 131, 199, 214218][222223]
    理论层面PAC-Bayesian 理论[138, 224226]
    概率模型[195, 197198, 227228]
    线性模型[229233]
    其他[234235, 237240]
    下载: 导出CSV
  • [1] Simonyan K. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv: 1409.1556, 2014.
    [2] He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016770−778
    [3] Alexey D, Lucas B, Alexander K, Dirk W, Zhai X, Thomas U, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv preprint arXiv: 2010.11929, 2020.
    [4] Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, et al. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. Proceedings of the IEEE/CVF International Conference on Computer Vision, 202110012−10022
    [5] Zhao H, Shi J, Qi X, Wang X Jia J. Pyramid Scene Parsing Network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 20172881−2890
    [6] Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, et al. Rethinking Semantic Segmentation from a Sequence-to-sequence Perspective with Transformers. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20216881−6890
    [7] He K, Chen X, Xie S, Li Y, Dollár, P, Girshick R. Masked Autoencoders are Scalable Vision Learners. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202216000−16009
    [8] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative Adversarial Networks. Communications of the ACM, 2020, 63(11): 139−144 doi: 10.1145/3422622
    [9] Kingma D P, Welling M. Auto-encoding Variational Bayes. arXiv preprint arXiv: 1312.6114, 2013.
    [10] Ho J, Jain A, Abbeel P. Denoising Diffusion Probabilistic Models. Advances in neural information processing systems, 2020, 33: 6840−6851
    [11] De Lange M, Aljundi R, Masana M, Parisot S, Jia X, Leonardis G, et al. A Continual Learning Survey: Defying Forgetting in Classification Tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(7): 3366−3385
    [12] Wang L, Zhang X, Su H, Zhu J. A Comprehensive Survey of Continual Learning: Theory, Method and Application. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.
    [13] Zhou D W, Wang Q W, Qi Z H, Ye H J, Zhan D C. Class-incremental Learning: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.
    [14] M. McCloskey and N. J. Cohen. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem. Psychology of Learning and Motivation. Elsevier, 1989, 24: 109−165
    [15] Zhang Y, Yang Q. A Survey on Multi-task Learning. IEEE Transactions on Knowledge and Data Engineering, 2021, 34(12): 5586−5609
    [16] Sener O, Koltun V. Multi-task Learning as Multi-objective Optimization. Advances in Neural Information Processing Systems, 2018, 31.
    [17] Hoi S C H, Sahoo D, Lu J, Zhao P. Online Learning: A Comprehensive Survey. Neurocomputing, 2021, 459: 249−289 doi: 10.1016/j.neucom.2021.04.112
    [18] Glorot X, Bengio Y. Understanding the Difficulty of Training Deep Feedforward Neural Networks. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 2010: 249-256.
    [19] He K, Zhang X, Ren S, Sun J. Delving Deep into Rectifiers: Surpassing Human-level Performance on Imagenet Classification. Proceedings of the IEEE International Conference on Computer Vision, 20151026−1034
    [20] Bommasani R, Hudson D A, Adeli E, Altman R, Arora S, von Arx S, et al. On the Opportunities and Risks of Foundation Models. arXiv preprint arXiv: 2108.07258, 2021.
    [21] Radford A, Kim J W, Hallacy C, Ramesh A, Goh G, Agarwal S, et al. Learning Transferable Visual Models from Natural Language Supervision. International Conference on Machine Learning. PMLR, 2021: 8748-8763.
    [22] Zhao W X, Zhou K, Li J, Tang T, Wang X, Hou Y, et al. A Survey of Large Language Models. arXiv preprint arXiv: 2303.18223, 2023, 1(2).
    [23] Han Z, Gao C, Liu J, Zhang J, Zhang S Q. Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey. arXiv preprint arXiv: 2403.14608, 2024.
    [24] Xin Y, Luo S, Zhou H, Du J, Liu X, Fan Y, et al. Parameter-Efficient Fine-Tuning for Pre-trained Vision Models: A Survey. arXiv preprint arXiv: 2402.02242, 2024.
    [25] Lester B, Al-Rfou R, Constant N. The Power of Scale for Parameter-Efficient Prompt Tuning. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 20213045−3059
    [26] Jia M, Tang L, Chen B C, Cardie C, Belongie S, Hariharan B, et al. Visual Prompt Tuning. European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 709-727.
    [27] Hu E J, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. LoRA: Low-Rank Adaptation of Large Language Models. International Conference on Learning Representations, 2022.
    [28] Zhou D W, Sun H L, Ning J, Ye H J, Zhan D C. Continual Learning with Pre-trained Models: A Survey. arXiv preprint arXiv: 2401.16386, 2024.
    [29] Wu T, Luo L, Li Y F, Pan S, Vu TT, Haffari G. Continual Learning for Large Language Models: A Survey. arXiv preprint arXiv: 2402.01364, 2024.
    [30] Shi H, Xu Z, Wang H, Qin W, Wang W, Wang Y, et al. Continual Learning of Large Language Models: A Comprehensive Survey. arXiv preprint arXiv: 2404.16789, 2024.
    [31] Zhang J, Liu L, Silven O, Pietikainen M, Hu D. Few-shot Class-incremental Learning: A Survey. arXiv preprint arXiv: 2308.06764, 2023.
    [32] Tian S, Li L, Li W, Ran H, Ning X, Tiwari P. A Survey on Few-shot Class-incremental Learning. Neural Networks, 2024, 169: 307−324 doi: 10.1016/j.neunet.2023.10.039
    [33] Yu D, Zhang X, Chen Y, Liu A, Zhang Y, Yu PS, et al. Recent Advances of Multimodal Continual Learning: A Comprehensive Survey. arXiv preprint arXiv: 2410.0535, 2024.
    [34] Van de Ven G M, Tolias A S. Three Scenarios for Continual Learning. arXiv preprint arXiv: 1904.07734, 2019.
    [35] Aljundi R, Kelchtermans K, Tuytelaars T. Task-free Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 201911254−11263
    [36] Lee S, Ha J, Zhang D, Kim G. A Neural Dirichlet Process Mixture Model for Task-free Continual Learning. arXiv preprint arXiv: 2001.00689, 2020.
    [37] Aljundi R, Lin M, Goujaud B, Bengio Y. Gradient-based Sample Selection for Online Continual Learning. Advances in Neural Information Processing Systems, 2019, 32.
    [38] Bang J, Kim H, Yoo Y J, Ha J W, Choi J. Rainbow Memory: Continual Learning with a Memory of Diverse Samples. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20218218−8227
    [39] Kim C D, Jeong J, Moon S, Kim G. Continual Learning on Noisy Data Streams via Self-purified Replay. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021537−547
    [40] Karim N, Khalid U, Esmaeili A, Rahnavard N. Cnll: A Semi-supervised Approach for Continual Noisy Label Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20223878−3888
    [41] Chrysakis A, Moens M F. Online Continual Learning from Imbalanced Data. International Conference on Machine Learning. PMLR, 2020: 1952-1961.
    [42] Kim C D, Jeong J, Kim G. Imbalanced Continual Learning with Partitioning Reservoir Sampling. European Conference on Computer Vision, 2020411−428
    [43] Koh H, Kim D, Ha J W, Choi J. Online Continual Learning on Class Incremental Blurry Task Configuration with Anytime Inference. International Conference on Learning Representations, 2022.
    [44] Ratcliff R. Connectionist Models of Recognition Memory: Constraints Imposed by Learning and Forgetting Functions. Psychological Review, 1990, 97(2): 285−308 doi: 10.1037/0033-295X.97.2.285
    [45] Robins A. Catastrophic Forgetting, Rehearsal and Pseudo Rehearsal. Connection Science, 1995, 7(2): 123−146 doi: 10.1080/09540099550039318
    [46] Rebuffi S A, Kolesnikov A, Sperl G, Lampert C H. iCaRL: Incremental Classifier and Representation Learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 20172001−2010
    [47] Buzzega P, Boschini M, Porrello A, Abati D, Calderara S. Dark Experience for General Continual Learning: a Strong, Simple Baseline. Advances in Neural Information Processing Systems, 2020, 33: 15920−15930
    [48] Bellitto G, Salanitri F P, Pennisi M, Bonicelli L, Porrello A, Calderara S, et al. Saliency-driven Experience Replay for Continual Learning. Advances in Neural Information Processing Systems, 2024.
    [49] LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based Learning Applied to Document Recognition. Proceedings of the IEEE, 1998, 86(11): 2278−2324 doi: 10.1109/5.726791
    [50] Krizhevsky A, Sutskever I, Hinton G E. Imagenet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 2012.
    [51] Zhang H, Cisse M, Dauphin Y N, Paz D L. Mixup: Beyond Empirical Risk Minimization. arXiv preprint arXiv: 1710.09412, 2017.
    [52] Buzzega P, Boschini M, Porrello A, Calderara S. Rethinking Experience Replay: a Bag of Tricks for Continual Learning. 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021: 2180-2187.
    [53] Zhang Y, Pfahringer B, Frank E, Bifet A, Lim N J S, Jia Y. A Simple but Strong Baseline for Online Continual Learning: Repeated Augmented Rehearsal. Advances in Neural Information Processing Systems, 2022, 35: 14771−14783
    [54] Wang L, Zhang X, Yang K, Yu, L, Li, C, Hong, L, et al. Memory Replay with Data Compression for Continual Learning. International Conference on Learning Representations, 2022.
    [55] Wallace G K. The JPEG Still Picture Compression Standard. Communications of the ACM, 1991, 34(4): 30−44 doi: 10.1145/103085.103089
    [56] Isele D, Cosgun A. Selective Experience Replay for Lifelong Learning. Proceedings of the AAAI Conference on Artificial Intelligence. 2018, 32(1).
    [57] Killamsetty K, Sivasubramanian D, Ramakrishnan G, et al. Glister: Generalization-based Data Subset Selection for Efficient and Robust Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(9): 8110−8118 doi: 10.1609/aaai.v35i9.16988
    [58] Yoon J, Madaan D, Yang E, Hwang S J. Online Coreset Selection for Rehearsal-based Continual Learning. International Conference on Learning Representations, ICLR, 2022.
    [59] Sun S, Calandriello D, Hu H, Li A, Titsias M. Information-theoretic Online Memory Selection for Continual Learning. International Conference on Learning Representations, 2022.
    [60] Welling M. Herding Dynamical Weights to Learn. Proceedings of the 26th Annual International Conference on Machine Learning, 20091121−1128
    [61] Borsos Z, Mutny M, Krause A. Coresets via Bilevel Optimization for Continual Learning and Streaming. Advances in Neural Information Processing Systems, 2020, 33: 14879−14890
    [62] Zhou X, Pi R, Zhang W, Lin Y, Chen Z, Zhang T. Probabilistic Bilevel Coreset Selection. International Conference on Machine Learning. PMLR, 2022: 27287-27302.
    [63] Hao J, Ji K, Liu M. Bilevel Coreset Selection in Continual Learning: A New Formulation and Algorithm. Advances in Neural Information Processing Systems, 2024, 36.
    [64] Tong R, Liu Y, Shi J Q, Gong D. Coreset Selection via Reducible Loss in Continual Learning. The Thirteenth International Conference on Learning Representations, 2025.
    [65] Verma T, Jin L, Zhou J, Huang J, Tan M, Choong B C M, et al. Privacy-Preserving Continual Learning Methods for Medical Image Classification: A Comparative Analysis. Frontiers in Medicine, 2023, 10: 1227515 doi: 10.3389/fmed.2023.1227515
    [66] Robins A. Catastrophic Forgetting, Rehearsal and Pseudorehearsal. Connection Science, 1995, 7(2): 123−146 doi: 10.1080/09540099550039318
    [67] Shin H, Lee J K, Kim J, Kim J. Continual Learning with Deep Generative Replay. Advances in Neural Information Processing Systems, 2017, 30.
    [68] Wu C, Herranz L, Liu X, Van De Weijer J, Raducanu B. Memory Replay GANs: Learning to Generate New Categories without Forgetting. Advances in Neural Information Processing Systems, 2018, 31.
    [69] Rios A, Itti L. Closed-loop Memory GAN for Continual Learning. Proceedings of the 28th International Joint Conference on Artificial Intelligence, 20193332−3338
    [70] Xiang Y, Fu Y, Ji P, Huang H. Incremental Learning Using Conditional Adversarial Networks. Proceedings of the IEEE/CVF International Conference on Computer Vision, 20196619−6628
    [71] Wang Z, Liu L, Duan Y, Tao D. Continual Learning through Retrieval and Imagination. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(8): 8594−8602 doi: 10.1609/aaai.v36i8.20837
    [72] Ayub A, Wagner A. EEC: Learning to Encode and Regenerate Images for Continual Learning. International Conference on Learning Representations, 2021.
    [73] Chen P H, Wei W, Hsieh C J, Dai B. Overcoming Catastrophic Forgetting by Bayesian Generative Regularization. International Conference on Machine Learning. PMLR, 2021: 1760-1770.
    [74] Wang T, Zhu J Y, Torralba A, Efros A A. Dataset Distillation. arXiv preprint arXiv: 1811.10959, 2018.
    [75] Lei S, Tao D. A Comprehensive Survey of Dataset Distillation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
    [76] Wiewel F, Yang B. Condensed Composite Memory Continual Learning. International Joint Conference on Neural Networks (IJCNN). IEEE, 2021: 1-8.
    [77] Sangermano M, Carta A, Cossu A, Bacciu D. Sample Condensation in Online Continual Learning. International Joint Conference on Neural Networks (IJCNN). IEEE, 2022: 01-08.
    [78] Gu J, Wang K, Jiang W, You Y. Summarizing Stream Data for Memory-Constrained Online Continual Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(11): 12217−12225 doi: 10.1609/aaai.v38i11.29111
    [79] Yin H, Molchanov P, Alvarez J M, Li Z, Mallya A, Hoiem D, et al. Dreaming to Distill: Data-free Knowledge Transfer via Deepinversion. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 20208715−8724
    [80] Yin H, Mallya A, Vahdat A, Alvarez J M, Kautz J, Molchanov P. See Through Gradients: Image Batch Recovery via Gradinversion. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202116337−16346
    [81] Smith J, Hsu Y C, Balloch J, Shen Y, Jin H, Kira Z. Always be Dreaming: A New Approach for Data-free Class-incremental Learning. Proceedings of the IEEE/CVF International Conference on Computer Vision, 20219374−9384
    [82] Liu X, Wu C, Menta M, Herranz L, Raducanu B, Bagdanov A D, et al. Generative Feature Replay for Class-incremental Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020226−227
    [83] Iscen A, Zhang J, Lazebnik S, Schmid C. Memory-Efficient Incremental Learning Through Feature Adaptation. European Conference on Computer Vision, Springer, 2020: 699-715.
    [84] Smith J S, Tian J, Halbe S, Hsu Y C, Kira Z. A Closer Look at Rehearsal-free Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20232410−2420
    [85] Gao R, Liu W. DDGR: Continual Learning with Deep Diffusion-based Generative Replay. International Conference on Machine Learning. PMLR, 2023: 10744-10763.
    [86] Smith J S, Hsu Y C, Zhang L, Hua T, Kira Z, Shen Y, et al. Continual Diffusion: Continual Customization of Text-to-image Diffusion with C-LoRA. arXiv preprint arXiv: 2304.06027, 2023.
    [87] Jodelet Q, Liu X, Phua Y J, Murata T. Class-incremental Learning Using Diffusion Model for Distillation and Replay. Proceedings of the IEEE/CVF International Conference on Computer Vision, 20233425−3433
    [88] Zajac M, Deja K, Kuzina A, Tomczak J M, Trzciński T, Shkurti F, et al. Exploring Continual Learning of Diffusion Models. arXiv preprint arXiv: 2303.15342, 2023.
    [89] Masip S, Rodriguez P, Tuytelaars T, van de Ven G M. Continual Learning of Diffusion Models with Generative Distillation. arXiv preprint arXiv: 2311.14028, 2023.
    [90] Cywiński B, Deja K, Trzciński T, Twardowski B, Kucinski L. GUIDE: Guidance-based Incremental Learning with Diffusion Models. arXiv preprint arXiv: 2403.03938, 2024.
    [91] Hataya R, Bao H, Arai H. Will Large-Scale Generative Models Corrupt Future Datasets?. Proceedings of the IEEE/CVF International Conference on Computer Vision, 202320555−20565
    [92] Martínez G, Watson L, Reviriego P, Hernandez J A, Juarez M, Sarkar R. Towards Understanding the Interplay of Generative Artificial Intelligence and the Internet. International Workshop on Epistemic Uncertainty in Artificial Intelligence. Cham: Springer Nature Switzerland, 2023: 59-73.
    [93] Wang M, Michel N, Mao J, Yamasaki T. Dealing with Synthetic Data Contamination in Online Continual Learning. The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024.
    [94] Zuo Y, Yao H, Yu L, Zhuang L, Xu C. Hierarchical Prompts for Rehearsal-free Continual Learning. arXiv preprint arXiv: 2401.11544, 2024.
    [95] Hatamizadeh A, Yin H, Roth H R, Li W, Kautz J, Xu D, et al. Gradvit: Gradient Inversion of Vision Transformers. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202210021−10030
    [96] Cai Y, Thomason J, Rostami M. Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation. Conference on Empirical Methods in Natural Language Processing, 2023.
    [97] Zhang X, Zhang F, Xu C. VQACL: A Novel Visual Question Answering Continual Learning Setting. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202319102−19112
    [98] Yang R, Wang S, Zhang H, Xu S, Guo Y, Ye X, et al. Knowledge Decomposition and Replay: A Novel Cross-modal Image-Text Retrieval Continual Learning Method. Proceedings of the 31st ACM International Conference on Multimedia, 20236510−6519
    [99] Yan S, Hong L, Xu H, Han J, Tuytelaars T, Li Z, et al. Generative Negative Text Replay for Continual Vision-Language Pretraining. European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 22-38.
    [100] Lei S W, Gao D, Wu J Z, Wang Y, Liu W, Zhang M, et al. Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(1): 1250−1259 doi: 10.1609/aaai.v37i1.25208
    [101] Cheng S, He C, Chen K, Xu L, Li H, Meng F, et al. Vision-Sensor Attention Based Continual Multimodal Egocentric Activity Recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024: 6300-6304.
    [102] Geirhos R, Jacobsen J H, Michaelis C, Zemel R, Brendel W, Bethge M, et al. Shortcut Learning in Deep Neural Networks. Nature Machine Intelligence, 2020, 2(11): 665−673 doi: 10.1038/s42256-020-00257-z
    [103] Wei Y, Ye J, Huang Z, Zhang J, Shan H. Online Prototype Learning for Online Continual Learning. Proceedings of the IEEE/CVF International Conference on Computer Vision, 202318764−18774
    [104] Kim D, Park D, Shin Y, Bang J, Song H, Lee J G. Adaptive Shortcut Debiasing for Online Continual Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(12): 13122−13131 doi: 10.1609/aaai.v38i12.29211
    [105] Jing L, Tian Y. Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(11): 4037−4058
    [106] Cha H, Lee J, Shin J. Co2L: Contrastive Continual Learning. Proceedings of the IEEE/CVF International Conference on Computer Vision, 20219516−9525
    [107] Gomez-Villa A, Twardowski B, Yu L, Bagdanov A D, Van de Weijer J. Continually Learning Self-supervised Representations with Projected Functional Regularization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20223867−3877
    [108] Purushwalkam S, Morgado P, Gupta A. The Challenges of Continuous Self-supervised Learning. European Conference on Computer Vision, 2022702−721
    [109] Yao L, Chu Z, Li S, Li Y, Gao J, Zhang A. A Survey on Causal Inference. ACM Transactions on Knowledge Discovery from Data (TKDD), 2021, 15(5): 1−46
    [110] Hu X, Tang K, Miao C, Hua X S, Zhang H. Distilling Causal Effect of Data in Class-incremental Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20213957−3966
    [111] Chu Z, Li R, Rathbun S, Li S. Continual Causal Inference with Incremental Observational Data. 2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE, 2023: 3430-3439.
    [112] Wang L, Yang K, Li C, Hong L, Li Z, Zhu J. Ordisco: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-supervised Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20215383−5392
    [113] Smith J, Balloch J, Hsu Y C, Kira Z. Memory-efficient Semi-supervised Continual Learning: The World is Its Own Replay Buffer. 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021: 1-8.
    [114] Luo Y, Wong Y, Kankanhalli M, Zhao Q. Learning to Predict Gradients for Semi-Supervised Continual Learning. IEEE Transactions on Neural Networks and Learning Systems, 2024.
    [115] O'Reilly R C, Bhattacharyya R, Howard M D, Ketz N. Complementary Learning Systems. Cognitive science, 2014, 38(6): 1229−1248 doi: 10.1111/j.1551-6709.2011.01214.x
    [116] Pham Q, Liu C, Hoi S. DualNet: Continual Learning, Fast and Slow. Advances in Neural Information Processing Systems, 2021, 34: 16131−16144
    [117] Arani E, Sarfraz F, Zonooz B. Learning Fast, Learning Slow: A General Continual Learning Method based on Complementary Learning System. International Conference on Learning Representations, 2022.
    [118] Ren X, Qin Y, Wang B, Cheng X, Jia L. A Complementary Continual Learning Framework Using Incremental Samples for Remaining Useful Life Prediction of Machinery. IEEE Transactions on Industrial Informatics, 2024.
    [119] Mai Z, Li R, Kim H, Sanner S. Supervised Contrastive Replay: Revisiting the Nearest Class Mean Classifier in Online Class-incremental Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20213589−3599
    [120] Rypesc G, Cygert S, Trzcinski T, Twardowski B. Task-recency bias strikes back: Adapting covariances in Exemplar-Free Class Incremental Learning. Advances in Neural Information Processing Systems, 2024, 37: 63268−63289
    [121] Wang Q, Wang R, Wu Y, Jia X, Meng D. CBA: Improving Online Continual Learning via Continual Bias Adaptor. Proceedings of the IEEE/CVF International Conference on Computer Vision, 202319082−19092
    [122] Hou S, Pan X, Loy C C, Wang Z, Lin D. Learning a Unified Classifier Incrementally via Rebalancing. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019831−839
    [123] Ahn H, Kwak J, Lim S, Bang H, Kim H, Moon T. SS-IL: Separated Softmax for Incremental Learning. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021844−853
    [124] Wu Y, Chen Y, Wang L, Ye Y, Liu Z, Guo Y, et al. Large Scale Incremental Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019374−382
    [125] Caccia L, Aljundi R, Asadi N, Tuytelaars T, Pineau J, Belilovsky E. New Insights on Reducing Abrupt Representation Change in Online Continual Learning. arXiv preprint arXiv: 2104.05025, 2021.
    [126] Yu L, Twardowski B, Liu X, Herranz L, Wang K, Cheng Y, et al. Semantic Drift Compensation for Class-incremental Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20206982−6991
    [127] Zhu K, Zhai W, Cao Y, Luo J, Zha Z J. Self-sustaining Representation Expansion for Non-exemplar Class-incremental Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20229296−9305
    [128] Pham Q, Liu C, Hoi S. Continual Normalization: Rethinking Batch Normalization for Online Continual Learning. International Conference on Learning Representations, 2022.
    [129] Cha S, Cho S, Hwang D, Hong S, Lee M, Moon T. Rebalancing Batch Normalization for Exemplar-based Class-incremental Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202320127−20136
    [130] Lyu Y, Wang L, Zhang X, Sun Z, Su H, Zhu J. Overcoming Recency Bias of Normalization Statistics in Continual Learning: Balance and adaptation. Advances in Neural Information Processing Systems, 2024, 36.
    [131] Wang Q, Wang R, Wu Y, Jia X, Meng D. Dual-CBA: Improving Online Continual Learning via Dual Continual Bias Adaptors from a Bi-level Optimization Perspective. arXiv preprint arXiv: 2408.13991, 2024.
    [132] Rusu A A, Rabinowitz N C, Desjardins G, Soyer H, Kirkpatrick J, Kavukcuoglu K, et al. Progressive Neural Networks. arXiv preprint arXiv: 1606.04671, 2016.
    [133] Yan S, Xie J, He X. DER: Dynamically Expandable Representation for Class Incremental Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20213014−3023
    [134] Mallya A, Lazebnik S. Packnet: Adding Multiple Tasks to a Single Network by Iterative Pruning. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 20187765−7773
    [135] Golkar S, Kagan M, Cho K. Continual Learning via Neural Pruning. arXiv preprint arXiv: 1903.04476, 2019.
    [136] Yoon J, Kim S, Yang E, Hwang, S. J. Scalable and Order-robust Continual Learning with Additive Parameter Decomposition. International Conference on Learning Representations, 2020.
    [137] Hihn H, Braun D A. Mixture-of-Variational-Experts for Continual Learning. ICLR Workshop on Agent Learning in Open-Endedness, 2021.
    [138] Wang L, Zhang X, Li Q, Zhu J. CoSCL: Cooperation of Small Continual Learners is Stronger Than a Big One. European Conference on Computer Vision, 2022254−271
    [139] Zhou Y, Lei T, Liu H, Du N, Huang Y, Zhao V, et al. Mixture-of-experts with Expert Choice Routing. Advances in Neural Information Processing Systems, 2022, 35: 7103−7114
    [140] Abati D, Tomczak J, Blankevoort T, Calderara S, Cucchiara R, Bejnordi B E. Conditional Channel Gated Networks for Task-aware Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20203931−3940
    [141] Mallya A, Davis D, Lazebnik S. Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights. European Conference on Computer Vision, 201867−82
    [142] Wortsman M, Ramanujan V, Liu R, Kembhavi, A., Rastegari, M., Yosinski, J., et al. Supermasks in Superposition. Advances in Neural Information Processing Systems, 2020, 33: 15173−15184
    [143] Kang H, Mina R J L, Madjid S R H, Yoon J, Hasegawa-Johnson M, Hwang S J, et al. Forget-free Continual Learning with Winning Subnetworks. International Conference on Machine Learning. PMLR, 2022: 10734-10750.
    [144] Yoon J, Madjid S, Hwang S J, Yoo C D. On the Soft-Subnetwork for Few-Shot Class Incremental Learning. International Conference on Learning Representations, 2023.
    [145] Gao Q, Shan X, Zhang Y, Zhou F. Enhancing Knowledge Transfer for Task Incremental Learning with Data-free Subnetwork. Advances in Neural Information Processing Systems, 2023, 36: 68471−68484
    [146] Fini E, Da Costa V G T, Alameda-Pineda X, Ricci E, Alahari K, Mairal J. Self-supervised Models are Continual Learners. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20229621−9630
    [147] Ye Y, Xie Y, Zhang J, Chen Z, Wu Q, Xia Y. Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202411114−11124
    [148] McDonnell M D, Gong D, Parvaneh A, Abbasnejad E, van den Hengel A. RanPAC: Random Projections and Pre-trained Models for Continual Learning. Advances in Neural Information Processing Systems, 2024, 36.
    [149] Zhang G, Wang L, Kang G, Cheng L, Wei Y. SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model. Proceedings of the IEEE/CVF International Conference on Computer Vision, 202319148−19158
    [150] Zhang G, Wang L, Kang G, Cheng L, Wei Y. SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training. arXiv preprint arXiv: 2408.08295, 2024.
    [151] He J, Zhu F. Exemplar-free Online Continual Learning. 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 2022: 541-545.
    [152] Zhuang H, Weng Z, Wei H, Xie R, Toh K A, Lin Z. ACIL: Analytic Class-incremental Learning with Absolute Memorization and Privacy Protection. Advances in Neural Information Processing Systems, 2022, 35: 11602−11614
    [153] Zhuang H, He R, Tong K, et al. DS-AL: A Dual-stream Analytic Learning for Exemplar-free Class-incremental Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(15): 17237−17244 doi: 10.1609/aaai.v38i15.29670
    [154] Wang Z, Zhang Z, Lee C Y, Zhang H, Sun R, Ren X, et al. Learning to Prompt for Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022139−149
    [155] Wang Z, Zhang Z, Ebrahimi S, Sun R, Zhang H, Lee C Y, et al. DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning. European Conference on Computer Vision, 2022631−648
    [156] Smith J S, Karlinsky L, Gutta V, Cascante-Bonilla P, Kim D, Arbelle A, et al. Coda-Prompt: Continual Decomposed Attention-based Prompting for Rehearsal-free Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202311909−11919
    [157] Wang L, Xie J, Zhang X, Huang M, Zhu J. Hierarchical Decomposition of Prompt-based Continual Learning: Rethinking Obscured Sub-optimality. Advances in Neural Information Processing Systems, 2024, 36.
    [158] Chen H, Wu Z, Han X, Jia M, Jiang Y G. PromptFusion: Decoupling Stability and Plasticity for Continual Learning. arXiv preprint arXiv: 2303.07223, 2023.
    [159] Wang Y, Huang Z, Hong X. S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning. Advances in Neural Information Processing Systems, 2022, 35: 5682−5695
    [160] Kang Z Q, Wang L, Zhang X, Alahari K. Advancing Prompt-Based Methods for Replay-Independent General Continual Learning. The Thirteenth International Conference on Learning Representations, 2025.
    [161] Liu Y, Yang M. SEC-Prompt: SEmantic Complementary Prompting for Few-Shot Class-Incremental Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025.
    [162] Huang W C, Chen C F, Hsu H. OVOR: OnePrompt with Virtual Outlier Regularization for Rehearsal-Free Class-Incremental Learning. The Twelfth International Conference on Learning Representations, 2024.
    [163] Jung D, Han D, Bang J, Song H. Generating Instance-level Prompts for Rehearsal-free Continual Learning. Proceedings of the IEEE/CVF International Conference on Computer Vision, 202311847−11857
    [164] Tang Y M, Peng Y X, Zheng W S. When Prompt-based Incremental Learning Does Not Meet Strong Pretraining. Proceedings of the IEEE/CVF International Conference on Computer Vision, 20231706−1716
    [165] Yang C, Liu W, Chen S, Qi J, Zhou A. Generating Prompts in Latent Space for Rehearsal-free Continual Learning. Proceedings of the 32nd ACM International Conference on Multimedia, 20248913−8922
    [166] Zheng J, Ma Q, Liu Z, Wu B, Feng H. Beyond Anti-Forgetting: Multimodal Continual Instruction Tuning with Positive Forward Transfer. arXiv preprint arXiv: 2401.09181, 2024.
    [167] D'Alessandro M, Alonso A, Calabres E, Galar M. Multimodal Parameter-Efficient Few-shot Class Incremental Learning. Proceedings of the IEEE/CVF International Conference on Computer Vision, 20233393−3403
    [168] Qian Z, Wang X, Duan X, Qin P, Li Y, Zhu W. Decouple before Interact: Multi-modal Prompt Learning for Continual Visual Question Answering. Proceedings of the IEEE/CVF International Conference on Computer Vision, 20232953−2962
    [169] Li J, Wang S, Qian B, He Y, Wei X, Gong Y. Dynamic Integration of Task-Specific Adapters for Class Incremental Learning. arXiv preprint arXiv: 2409.14983, 2024.
    [170] Liang Y S, Li W J. InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202423638−23647
    [171] Zhao L, Zhang X, Yan K, Ding S, Huang W. SAFE: Slow and Fast Parameter-Efficient Tuning for Continual Learning with Pre-Trained Models. The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024.
    [172] Zhang X, Bai L, Yang X, Liang J. C-LoRA: Continual Low-Rank Adaptation for Pre-trained Models. arXiv preprint arXiv: 2502.17920, 2025.
    [173] Wu Y, Piao H, Huang L K, Wang R, Li W, Pfister H, et al. SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning. The Thirteenth International Conference on Learning Representations, 2025.
    [174] Wei X, Li G, Marculescu R. Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 20256634−6645
    [175] He J P, Duan Z H, Zhu F Q. CL-LoRA: Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025.
    [176] Zhu H, Zhang Y F, Dong J H, Koniusz P. BiLoRA: Almost-Orthogonal Parameter Spaces for Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025.
    [177] Liu X, Chang X B. LoRA Subtraction for Drift-Resistant Space in Exemplar-Free Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025.
    [178] Yu J, Zhuge Y, Zhang L, Hu P, Wang D, Lu H, et al. Boosting Continual Learning of Vision-language Models via Mixture-of-experts Adapters. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202423219−23230
    [179] Le M, Nguyen A, Nguyen H, Nguyen T, Pham T, Van Ngo L, et al. Mixture of Experts Meets Prompt-Based Continual Learning. The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024.
    [180] Jung M J, Kim J H. PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning. arXiv preprint arXiv: 2407.21571, 2024.
    [181] Yang S, Ali M A, Wang C L, Hu L, Wang D. MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning. arXiv preprint arXiv: 2402.11260, 2024.
    [182] Marouf I E, Roy S, Tartaglione E, Lathuiliere S. Weighted Ensemble Models are Strong Continual Learners. European Conference on Computer Vision. Springer, Cham, 2024: 306-324.
    [183] Wang H, Lu H, Yao L, Gong D. Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning. NeurIPS 2024 Workshop on Scalable Continual Learning for Lifelong Foundation Models, 2024.
    [184] Li H, Lin S, Duan L, Liang Y, Shroff N B. Theory on Mixture-of-Experts in Continual Learning. The Thirteenth International Conference on Learning Representations, 2025.
    [185] Song G, Tan X. Real-world Cross-modal Retrieval via Sequential Learning. IEEE Transactions on Multimedia, 2020, 23: 1708−1721
    [186] Sun F, Liu H, Yang C, Fang B. Multimodal Continual Learning Using Online Dictionary Updating. IEEE Transactions on Cognitive and Developmental Systems, 2020, 13(1): 171−178
    [187] Peng Y, Qi J, Ye Z, Zhuo Y. Hierarchical Visual-Textual Knowledge Distillation for Life-long Correlation Learning. International Journal of Computer Vision, 2021, 129(4): 921−941 doi: 10.1007/s11263-020-01392-1
    [188] Yu J, Zhuge Y, Zhang L, Hu P, Wang D, Lu H, et al. Boosting Continual Learning of Vision-Llanguage Models via Mixture-of-Experts Adapters. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202423219−23230
    [189] Jha S, Gong D, Yao L. CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models. Neural Information Processing Systems, 2024.
    [190] Gao Z, Zhang X, Xu K, Mao X, Wang H. Stabilizing Zero-Shot Prediction: A Novel Antidote to Forgetting in Continual Vision-Language Tasks. The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024.
    [191] Zheng J, Cai X, Qiu S, Ma Q. Spurious Forgetting in Continual Learning of Language Models. The Thirteenth International Conference on Learning Representations, 2025.
    [192] Hinton G. Distilling the Knowledge in a Neural Network. arXiv preprint arXiv: 1503.02531, 2015.
    [193] Gou J, Yu B, Maybank S J, Tao D. Knowledge Distillation: A Survey. International Journal of Computer Vision, 2021, 129(6): 1789−1819 doi: 10.1007/s11263-021-01453-z
    [194] Li Z, Hoiem D. Learning without Forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(12): 2935−2947
    [195] Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, et al. Overcoming Catastrophic Forgetting in Neural Networks. Proceedings of the National Academy of Sciences, 2017, 114(13): 3521−3526 doi: 10.1073/pnas.1611835114
    [196] Ferenc Huszár. On Quadratic Penalties in Elastic Weight Consolidation. arXiv preprint arXiv: 1712.03847, 2017.
    [197] Ritter H, Botev A, Barber D. Online Structured Laplace Approximations for Overcoming Catastrophic Forgetting. Advances in Neural Information Processing Systems, 2018, 31.
    [198] Zenke F, Poole B, Ganguli S. Continual Learning Through Synaptic Intelligence. International conference on machine learning. PMLR, 2017: 3987-3995.
    [199] Wu Y, Huang L K, Wang R, Meng D, Wei Y. Meta Continual Learning Revisited: Implicitly Enhancing Online Hessian Approximation via Variance Reduction. The Twelfth International Conference on Learning Representations. 2024.
    [200] Lopez-Paz D, Ranzato M A. Gradient Episodic Memory for Continual Learning. Advances in neural information processing systems, 2017, 30.
    [201] Chaudhry A, Ranzato M A, Rohrbach M, Elhoseiny M. Efficient Lifelong Learning with A-GEM. International Conference on Learning Representations, 2018.
    [202] Tang S, Chen D, Zhu J, Yu S, Ouyang W. Layerwise Optimization by Gradient Decomposition for Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20219634−9643
    [203] Wang S, Li X, Sun J, Xu Z. Training Networks in Null Space of Feature Covariance for Continual Learning. Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2021184−193
    [204] Kong Y, Liu L, Wang Z, Tao D. Balancing Stability and Plasticity Through Advanced Null Space in Continual Learning. European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 219-236.
    [205] Dinh L, Pascanu R, Bengio S, Bengio Y. Sharp Minima Can Generalize for Deep Nets. International Conference on Machine Learning. PMLR, 2017: 1019-1028.
    [206] Foret P, Kleiner A, Mobahi H, Neyshabur B. Sharpness-aware Minimization for Efficiently Improving Generalization. arXiv preprint arXiv: 2010.01412, 2020.
    [207] Liu Y, Mai S, Chen X, et al. Towards Efficient and Scalable Sharpness-aware Minimization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202212360−12370
    [208] Yang E, Shen L, Wang Z, Liu S, Guo G, Wang X. Data Augmented Flatness-aware Gradient Projection for Continual Learning. Proceedings of the IEEE/CVF International Conference on Computer Vision, 20235630−5639
    [209] Chen R, Jing X Y, Wu F, Chen H. Sharpness-aware Gradient Guidance for Few-shot Class-incremental Learning. Knowledge-Based Systems, 2024112030
    [210] Yang E, Shen L, Wang Z, Liu S, Guo G, Wang X, et al. Revisiting Flatness-aware Optimization in Continual Learning with Orthogonal Gradient Projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025.
    [211] Bian A, Li W, Yuan H, Wang M, Zhao Z, Lu A, et al. Make Continual Learning Stronger via C-flat. Advances in Neural Information Processing Systems, 2024, 37: 7608−7630
    [212] Finn C, Abbeel P, Levine S. Model-agnostic Meta-learning for Fast Adaptation of Deep Networks. International Conference on Machine Learning. PMLR, 2017: 1126-1135.
    [213] Hospedales T, Antoniou A, Micaelli P, Storkey A. Meta-learning in Neural Networks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(9): 5149−5169
    [214] Riemer M, Cases I, Ajemian R, Liu M, Rish I, Tu Y, et al. Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference. International Conference on Learning Representations, 2019.
    [215] Gupta G, Yadav K, Paull L. Look-ahead Meta Learning for Continual Learning. Advances in Neural Information Processing Systems, 2020, 33: 11588−11598
    [216] Javed K, White M. Meta-learning Representations for Continual Learning. Advances in Neural Information Processing Systems, 2019, 32.
    [217] He X, Sygnowski J, Galashov A, Rusu A A, Teh Y W, Pascanu R. Task Agnostic Continual Learning via Meta Learning. 4th Lifelong Machine Learning Workshop at ICML, 2020.
    [218] Beaulieu S, Frati L, Miconi T, Lehman J, Stanley K O, Clune J, et al. Learning to Continually Learn. ECAI. IOS Press, 2020: 992-1001.
    [219] He J, Guo H, Tang M, Wang J. Continual Instruction Tuning for Large Multimodal Models. arXiv preprint arXiv: 2311.16206, 2023.
    [220] Qiao J, Tan X, Chen C, Qu Y, Peng Y, Xie Y. Prompt Gradient Projection for Continual Learning. The Twelfth International Conference on Learning Representations, 2023.
    [221] Lu Y, Zhang S, Cheng D, Xing Y, Wang N, Wang P, et al. Visual Prompt Tuning in Null Space for Continual Learning. The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024.
    [222] Liu R, Zhang J, Song Y, Zhang Y, Yang B. Discarding the Crutches: Adaptive Parameter-Efficient Expert Meta-Learning for Continual Semantic Parsing. Proceedings of the 31st International Conference on Computational Linguistics, 20253560−3578
    [223] Yeongbin S, Lee D, Yeo J. Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning. Advances in Neural Information Processing Systems, 2025, 37: 58284−58308
    [224] Pentina A, Lampert C. A PAC-Bayesian Bound for Lifelong Learning. International Conference on Machine Learning, PMLR, 2014: 991-999.
    [225] Pentina A, Lampert C H. Lifelong Learning with Non-iid Tasks. Advances in Neural Information Processing Systems, 2015, 28.
    [226] Ramesh R, Chaudhari P. Model Zoo: A Growing Brain That Learns Continually. International Conference on Learning Representations, 2022.
    [227] Nguyen C V, Li Y, Bui T D, Turner R E. Variational Continual Learning. International Conference on Learning Representations, 2018.
    [228] Andle J, Yasaei Sekeh S. Theoretical Understanding of the Information Flow on Continual Learning Performance. European Conference on Computer Vision, 202286−101
    [229] Peng B, Risteski A. Continual Learning: A Feature Extraction Formalization, an Efficient Algorithm, and Barriers. Advances in Neural Information Processing Systems, 2022.
    [230] Lin S, Ju P, Liang Y, Shroff N. Theory on Forgetting and Generalization of Continual Learning. International Conference on Machine Learning, PMLR, 2023: 21078-21100.
    [231] Goldfarb D, Hand P. Analysis of Catastrophic Forgetting for Random Orthogonal Transformation Tasks in the Overparameterized Regime. International Conference on Artificial Intelligence and Statistics, PMLR, 2023: 2975-2993.
    [232] Ding M, Ji K Y, Wang D, Xu J H. Understanding Forgetting in Continual Learning with Linear Regression. Forty-first International Conference on Machine Learning, 2024.
    [233] Li H, Lin S, Duan L, Liang Y, Shroff N B. Theory on Mixture-of-experts in Continual Learning. The Thirteenth International Conference on Learning Representations, 2025.
    [234] Alquier P, Pontil M. Regret Bounds for Lifelong Learning. International Conference on Artificial Intelligence and Statistics, PMLR, 2017: 261-269.
    [235] Wu Y S, Wang P A, Lu C J. Lifelong Optimization with Low Regret. International Conference on Artificial Intelligence and Statistics, PMLR, 2019: 448-456.
    [236] Jacot A, Gabriel F, Hongler C. Neural Tangent Kernel: Convergence and Generalization in Neural Networks. Advances in Neural Information Processing Systems, 2018, 31.
    [237] Doan T, Bennani M A, Mazoure B, Rabusseau G Alquier P. A Theoretical Analysis of Catastrophic Forgetting Through the NTK Overlap Matrix. International Conference on Artificial Intelligence and Statistics, PMLR, 2021: 1072-1080.
    [238] Raghavan K, Balaprakash P. Formalizing the Generalization-forgetting Trade-off in Continual Learning. Advances in Neural Information Processing Systems, 2021, 34: 17284−17297
    [239] Kim G, Xiao C, Konishi T, et al. A Theoretical Study on Solving Continual Learning. Advances in Neural Information Processing Systems, 2022, 35: 5065−5079
    [240] Sun S, Calandriello D, Hu H, Li A, Titsias M. Information-theoretic Online Memory Selection for Continual Learning. International Conference on Learning Representations, 2022.
    [241] Peng L, Elenter J, Agterberg J, Ribeiro A, Vidal R. TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models. The Thirteenth International Conference on Learning Representations, 2025.
    [242] Wang D, Shelhamer E, Liu S, Olshausen B, Darrell T. Tent: Fully Test-Time Adaptation by Entropy Minimization. International Conference on Learning Representations, 2021.
    [243] Wang Z, Yang E, Shen L, Huang H. A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.
    [244] Liang J, He R, Tan T. A Comprehensive Survey on Test-Time Adaptation Under Distribution Shifts. International Journal of Computer Vision, 2025, 133(1): 31−64 doi: 10.1007/s11263-024-02181-w
    [245] Gong T, Jeong J, Kim T, Shin J, Lee S J. Note: Robust Continual Test-Time Adaptation Against Temporal Correlation. Advances in Neural Information Processing Systems, 2022, 35: 27253−27266
    [246] Wang Q, Fink O, Van Gool L, Dai D. Continual Test-Time Domain Adaptation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20227201−7211
    [247] Chen H, Goldblum M, Wu Z, Jiang Y G. Adaptive Retention & Correction: Test-Time Training for Continual Learning. The Thirteenth International Conference on Learning Representations, 2025.
    [248] Niu S, Wu J, Zhang Y, Chen Y, Zheng S, Zhao P, et al. Efficient Test-Time Model Adaptation Without Forgetting. International conference on machine learning. PMLR, 2022: 16888-16905.
    [249] Dobler M, Marsden R A, Yang B. Robust Mean Teacher for Continual and Gradual Test-Time Adaptation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20237704−7714
    [250] Yang P, Liang J, Cao J, He R. Auto: Adaptive Outlier Optimization for Online Test-Time OOD Detection. arXiv preprint arXiv: 2303.12267, 2023.
    [251] Cao Y, Yang J. Towards Making Systems Forget with Machine Unlearning. IEEE symposium on security and privacy. IEEE, 2015: 463-480.
    [252] Bourtoule L, Chandrasekaran V, Choquette-Choo C A, Jia H, Travers A, Zhang B, et al. Machine Unlearning. IEEE Symposium on Security and Privacy. IEEE, 2021: 141-159.
    [253] Nguyen T T, Huynh T T, Ren Z, et al. A Survey of Machine Unlearning. arXiv preprint arXiv: 2209.02299, 2022.
    [254] Wang W, Tian Z, Zhang C, Yu S. Machine Unlearning: A Comprehensive Survey. arXiv preprint arXiv: 2405.07406, 2024.
    [255] Wu Y, Dobriban E, Davidson S. DeltaGrad: Rapid Retraining of Machine Learning Models. International Conference on Machine Learning. PMLR, 2020: 10355-10366.
    [256] Sekhari A, Acharya J, Kamath G, Suresh A T. Remember what you Want to Forget: Algorithms for Machine Unlearning. Advances in Neural Information Processing Systems, 2021, 34: 18075−18086
    [257] Guo C, Goldstein T, Hannun A, Van Der Maaten L. Certified Data Removal from Machine Learning Models. International Conference on Machine Learning. PMLR, 2020: 3832-3842.
    [258] Golatkar A, Achille A, Soatto S. Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20209304−9312
    [259] Nguyen Q P, Low B K H, Jaillet P. Variational Bayesian Unlearning. Advances in Neural Information Processing Systems, 2020, 33: 16025−16036
    [260] Du M, Chen Z, Liu C, Oak R, Song D. Lifelong Anomaly Detection Through Unlearning. Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, 20191283−1297
    [261] Ma Z, Liu Y, Liu X, Ma J, Ren K. Learn to Forget: Machine Unlearning via Neuron Masking. IEEE Transactions on Dependable and Secure Computing, 2022, 20(4): 3194−3207
    [262] Gao C, Wang L, Ding K, Weng C, Wang X, Zhu Q. On Large Language Model Continual Unlearning. The Thirteenth International Conference on Learning Representations, 2025.
    [263] Lin L J. Self-improving Reactive Agents based on Reinforcement Learning, Planning and Teaching. Machine Learning, 1992, 8: 293−321
    [264] Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, et al. Human-level Control Through Deep Reinforcement Learning. Nature, 2015, 518(7540): 529−533 doi: 10.1038/nature14236
    [265] Schaul T, Quan J, Antonoglou I, Silver D. Prioritized Experience Replay. arXiv preprint arXiv: 1511.05952, 2015.
    [266] Lyle C, Rowland M, Dabney W, Kwiatkowska M, Gal Y. Learning Dynamics and Generalization in Deep Reinforcement Learning. International Conference on Machine Learning. PMLR, 2022: 14560-14581.
    [267] Dohare S, Hernandez-Garcia J F, Lan Q, Rahman P, Mahmood A R, Sutton R S. Loss of Plasticity in Deep Continual Learning. Nature, 2024, 632(8026): 768−774 doi: 10.1038/s41586-024-07711-7
    [268] Kumar S, Marklund H, Rao A, Zhu Y, Jeon H J, Liu Y. Continual Learning as Computationally Constrained Reinforcement Learning. arXiv preprint arXiv: 2307.04345, 2023.
    [269] Abel D, Barreto A, Van Roy B, Precup D, van Hasselt H P, Singh S. A Definition of Continual Reinforcement Learning. Advances in Neural Information Processing Systems, 2023, 36: 50377−50407
    [270] Daniels Z A, Raghavan A, Hostetler J, Rahman A, Sur I, Piacentino M, et al. Model-Free Generative Replay for Lifelong Reinforcement Learning: Application to Starcraft-2. Conference on Lifelong Learning Agents. PMLR, 2022: 1120-1145.
    [271] Igl M, Farquhar G, Luketina J, Boehmer W, Whiteson S. Transient Non-stationarity and Generalisation in Deep Reinforcement Learning. International Conference on Learning Representations, 2021.
    [272] Gaya J B, Doan T, Caccia L, Soulier L, Denoyer L, Raileanu R. Building a Subspace of Policies for Scalable Continual Learning. International Conference of Learning Representations, 2023.
    [273] Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman F L, et al. Gpt-4 Technical Report. arXiv preprint arXiv: 2303.08774, 2023.
    [274] Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M A, Lacroix T, et al. Llama: Open and Efficient Foundation Language Models. arXiv preprint arXiv: 2302.13971, 2023.
    [275] Bai J, Bai S, Chu Y, Cui Z, Dang K, Deng X, et al. Qwen Technical Report. arXiv preprint arXiv: 2309.16609, 2023.
    [276] Liu A, Feng B, Xue B, Wang B, Wu B, Lu C, et al. Deepseek-V3 Technical Report. arXiv preprint arXiv: 2412.19437, 2024.
    [277] Sun Y, Wang S, Li Y, Feng S, Tian H, Wu H, et al. Ernie 2.0: A Continual Pre-training Framework for Language Understanding. Proceedings of the AAAI conference on artificial intelligence, 2020, 34(05): 8968−8975 doi: 10.1609/aaai.v34i05.6428
    [278] Jang J, Ye S, Yang S, Shin J, Han J, Kim G, et al. Towards Continual Knowledge Learning of Language Models. International Conference on Learning Representations, 2022.
    [279] Ke Z, Shao Y, Lin H, Konishi T, Kim G, Liu B. Continual Pre-training of Language Models. International Conference on Learning Representations. 2023.
    [280] Yang X, Gao J, Xue W, Alexandersson E. Pllama: An Open-source Large Language Model for Plant Science. arXiv preprint arXiv: 2401.01600, 2024.
    [281] Gogoulou E, Lesort T, Boman M, Nivre J. A Study of Continual Learning Under Language Shift. CoRR, 2023.
    [282] Razdaibiedina A, Mao Y, Hou R, Khabsa M, Lewis M, Almahairi A. Progressive Prompts: Continual Learning for Language Models. arXiv preprint arXiv: 2301.12314, 2023.
    [283] Bohao P, Tian Z, Liu S, Yang M C, Jia J. Scalable Language Model with Generalized Continual Learning. The Twelfth International Conference on Learning Representations. 2024.
    [284] Wang X, Zhang Y, Chen T, Gao S, Jin S, Yang X, et al. TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models. arXiv preprint arXiv: 2310.06762, 2023.
    [285] Song C, Han X, Zeng Z, Li K, Chen C, Liu Z, et al. Conpet: Continual Parameter-Efficient Tuning for Large Language Models. arXiv preprint arXiv: 2309.14763, 2023.
    [286] Hao S, Liu T, Wang Z, Hu Z. ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings. Advances in neural information processing systems, 2024, 36.
    [287] Zhang H, Gui L, Zhai Y, Wang H, Lei Y, Xu R. Copf: Continual Learning Human Preference Through Optimal Policy Fitting. arXiv preprint arXiv: 2310.15694, 2023.
    [288] Zhang H, Lei Y, Gui L, Yang M, He Y, Wang H, et al. CPPO: Continual Learning for Reinforcement Learning with Human Feedback. The Twelfth International Conference on Learning Representations. 2024.
    [289] Suhr A, Artzi Y. Continual Learning for Instruction Following from Realtime Feedback. Advances in Neural Information Processing Systems, 2024, 36.
    [290] Wang X, Chen T, Ge Q, Xia H, Bao R, Zheng R, et al. Orthogonal Subspace Learning for Language Model Continual Learning. Conference on Empirical Methods in Natural Language Processing, 2023.
    [291] Jang J, Kim S, Ye S, Kim D, Logeswaran L, Lee M, et al. Exploring the Benefits of Training Expert Language Models over Instruction Tuning. International Conference on Machine Learning. PMLR, 2023: 14702-14729.
    [292] Qiao F, Mahdavi M. Learn More, but Bother Less: Parameter Efficient Continual Learning. The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024.
  • 加载中
计量
  • 文章访问数:  12
  • HTML全文浏览量:  8
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-12-24
  • 录用日期:  2025-04-10
  • 网络出版日期:  2025-05-20

目录

    /

    返回文章
    返回