-
摘要: 以深度学习为代表的机器学习方法已经在多个领域取得显著进展, 然而大多方法局限于静态场景, 难以像人类一样在开放世界的动态场景中不断学习新知识, 同时保持已经学过的旧知识. 为解决该挑战, 持续学习 (Continual learning, CL) 受到越来越多的关注. 现有的持续学习方法大致可以分为两类, 即相对传统的非预训练模型持续学习方法以及大模型时代下逐步演进的预训练模型持续学习方法. 本文旨在对这两类方法的研究进展进行详细的综述, 主要从四个层面对比非预训练模型和预训练模型方法的异同点, 即数据层面、模型层面、损失/优化层面以及理论层面. 着重分析从应用非预训练模型的方法发展到应用预训练模型的方法的技术变化, 并分析出现此类差异的内在本质. 最后, 总结并展望未来持续学习发展的趋势.Abstract: Machine learning methods, especially deep learning, have achieved remarkable progress across various fields. However, most approaches are limited to static scenarios and struggle to continually learn new knowledge in dynamic, open-world environments while retaining previously acquired knowledge, unlike humans. To address this challenge, continual learning (CL) has attracted increasing attention. Existing CL methods can be broadly categorized into two types: traditional CL methods based on non-pretrained models, and CL methods based on pretrained models that have emerged with the advent of large models. This paper aims to provide a detailed review on these two categories of methods, mainly comparing the similarities and differences between non-pretrained and pretrained model approaches from four perspectives: data level, model level, loss/optimization level, and theoretical level. We focus on analyzing the technical evolution from methods employing non-pretrained models to those employing pretrained models, and analyze the underlying reasons for these differences. Finally, we summarize and envision the future trends in continual learning development.
-
Key words:
- Continual learning /
- catastrophic forgetting /
- pretrained model /
- machine learning /
- deep learning
-
表 1 持续学习方法总结
Table 1 Summary of continual learning methods
方法分类 非预训练持续学习方法 预训练持续学习方法 数据层面 基于重放 数据增广: [47, 52−53] [96−98] 数据表征: [47−48, 54] 数据选择: [37−38, 46, 56−59, 61−64] 基于伪重放 生成模型: [66−73] 生成预训练模型: [85−90, 93, 99−101] 合成数据集: [76−80] 合成数据集: [95] 特征重放: [82−83] 特征重放: [94] 模型层面 模型表征 [105−108, 110−114, 116−118] [146−150] 模型偏差 [83, 119−131] [148, 151−153] 模型结构 扩展模型: [132−139] 提示微调: [154−168] 路径模型: [140−145] 适配器及专家模型: [169−190] 损失/优化层面 正则化 [194−199] [170, 219] 梯度对齐 [200−204] [220−221] 损失平滑 [205−211] 元持续学习 [121, 131, 199, 214−218] [222−223] 理论层面 PAC-Bayesian 理论 [138, 224−226] 概率模型 [195, 197−198, 227−228] 线性模型 [229−233] 其他 [234−235, 237−240] -
[1] Simonyan K. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv: 1409.1556, 2014. [2] He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016770−778 [3] Alexey D, Lucas B, Alexander K, Dirk W, Zhai X, Thomas U, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv preprint arXiv: 2010.11929, 2020. [4] Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, et al. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. Proceedings of the IEEE/CVF International Conference on Computer Vision, 202110012−10022 [5] Zhao H, Shi J, Qi X, Wang X Jia J. Pyramid Scene Parsing Network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 20172881−2890 [6] Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, et al. Rethinking Semantic Segmentation from a Sequence-to-sequence Perspective with Transformers. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20216881−6890 [7] He K, Chen X, Xie S, Li Y, Dollár, P, Girshick R. Masked Autoencoders are Scalable Vision Learners. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202216000−16009 [8] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative Adversarial Networks. Communications of the ACM, 2020, 63(11): 139−144 doi: 10.1145/3422622 [9] Kingma D P, Welling M. Auto-encoding Variational Bayes. arXiv preprint arXiv: 1312.6114, 2013. [10] Ho J, Jain A, Abbeel P. Denoising Diffusion Probabilistic Models. Advances in neural information processing systems, 2020, 33: 6840−6851 [11] De Lange M, Aljundi R, Masana M, Parisot S, Jia X, Leonardis G, et al. A Continual Learning Survey: Defying Forgetting in Classification Tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(7): 3366−3385 [12] Wang L, Zhang X, Su H, Zhu J. A Comprehensive Survey of Continual Learning: Theory, Method and Application. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. [13] Zhou D W, Wang Q W, Qi Z H, Ye H J, Zhan D C. Class-incremental Learning: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. [14] M. McCloskey and N. J. Cohen. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem. Psychology of Learning and Motivation. Elsevier, 1989, 24: 109−165 [15] Zhang Y, Yang Q. A Survey on Multi-task Learning. IEEE Transactions on Knowledge and Data Engineering, 2021, 34(12): 5586−5609 [16] Sener O, Koltun V. Multi-task Learning as Multi-objective Optimization. Advances in Neural Information Processing Systems, 2018, 31. [17] Hoi S C H, Sahoo D, Lu J, Zhao P. Online Learning: A Comprehensive Survey. Neurocomputing, 2021, 459: 249−289 doi: 10.1016/j.neucom.2021.04.112 [18] Glorot X, Bengio Y. Understanding the Difficulty of Training Deep Feedforward Neural Networks. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 2010: 249-256. [19] He K, Zhang X, Ren S, Sun J. Delving Deep into Rectifiers: Surpassing Human-level Performance on Imagenet Classification. Proceedings of the IEEE International Conference on Computer Vision, 20151026−1034 [20] Bommasani R, Hudson D A, Adeli E, Altman R, Arora S, von Arx S, et al. On the Opportunities and Risks of Foundation Models. arXiv preprint arXiv: 2108.07258, 2021. [21] Radford A, Kim J W, Hallacy C, Ramesh A, Goh G, Agarwal S, et al. Learning Transferable Visual Models from Natural Language Supervision. International Conference on Machine Learning. PMLR, 2021: 8748-8763. [22] Zhao W X, Zhou K, Li J, Tang T, Wang X, Hou Y, et al. A Survey of Large Language Models. arXiv preprint arXiv: 2303.18223, 2023, 1(2). [23] Han Z, Gao C, Liu J, Zhang J, Zhang S Q. Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey. arXiv preprint arXiv: 2403.14608, 2024. [24] Xin Y, Luo S, Zhou H, Du J, Liu X, Fan Y, et al. Parameter-Efficient Fine-Tuning for Pre-trained Vision Models: A Survey. arXiv preprint arXiv: 2402.02242, 2024. [25] Lester B, Al-Rfou R, Constant N. The Power of Scale for Parameter-Efficient Prompt Tuning. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 20213045−3059 [26] Jia M, Tang L, Chen B C, Cardie C, Belongie S, Hariharan B, et al. Visual Prompt Tuning. European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 709-727. [27] Hu E J, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. LoRA: Low-Rank Adaptation of Large Language Models. International Conference on Learning Representations, 2022. [28] Zhou D W, Sun H L, Ning J, Ye H J, Zhan D C. Continual Learning with Pre-trained Models: A Survey. arXiv preprint arXiv: 2401.16386, 2024. [29] Wu T, Luo L, Li Y F, Pan S, Vu TT, Haffari G. Continual Learning for Large Language Models: A Survey. arXiv preprint arXiv: 2402.01364, 2024. [30] Shi H, Xu Z, Wang H, Qin W, Wang W, Wang Y, et al. Continual Learning of Large Language Models: A Comprehensive Survey. arXiv preprint arXiv: 2404.16789, 2024. [31] Zhang J, Liu L, Silven O, Pietikainen M, Hu D. Few-shot Class-incremental Learning: A Survey. arXiv preprint arXiv: 2308.06764, 2023. [32] Tian S, Li L, Li W, Ran H, Ning X, Tiwari P. A Survey on Few-shot Class-incremental Learning. Neural Networks, 2024, 169: 307−324 doi: 10.1016/j.neunet.2023.10.039 [33] Yu D, Zhang X, Chen Y, Liu A, Zhang Y, Yu PS, et al. Recent Advances of Multimodal Continual Learning: A Comprehensive Survey. arXiv preprint arXiv: 2410.0535, 2024. [34] Van de Ven G M, Tolias A S. Three Scenarios for Continual Learning. arXiv preprint arXiv: 1904.07734, 2019. [35] Aljundi R, Kelchtermans K, Tuytelaars T. Task-free Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 201911254−11263 [36] Lee S, Ha J, Zhang D, Kim G. A Neural Dirichlet Process Mixture Model for Task-free Continual Learning. arXiv preprint arXiv: 2001.00689, 2020. [37] Aljundi R, Lin M, Goujaud B, Bengio Y. Gradient-based Sample Selection for Online Continual Learning. Advances in Neural Information Processing Systems, 2019, 32. [38] Bang J, Kim H, Yoo Y J, Ha J W, Choi J. Rainbow Memory: Continual Learning with a Memory of Diverse Samples. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20218218−8227 [39] Kim C D, Jeong J, Moon S, Kim G. Continual Learning on Noisy Data Streams via Self-purified Replay. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021537−547 [40] Karim N, Khalid U, Esmaeili A, Rahnavard N. Cnll: A Semi-supervised Approach for Continual Noisy Label Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20223878−3888 [41] Chrysakis A, Moens M F. Online Continual Learning from Imbalanced Data. International Conference on Machine Learning. PMLR, 2020: 1952-1961. [42] Kim C D, Jeong J, Kim G. Imbalanced Continual Learning with Partitioning Reservoir Sampling. European Conference on Computer Vision, 2020411−428 [43] Koh H, Kim D, Ha J W, Choi J. Online Continual Learning on Class Incremental Blurry Task Configuration with Anytime Inference. International Conference on Learning Representations, 2022. [44] Ratcliff R. Connectionist Models of Recognition Memory: Constraints Imposed by Learning and Forgetting Functions. Psychological Review, 1990, 97(2): 285−308 doi: 10.1037/0033-295X.97.2.285 [45] Robins A. Catastrophic Forgetting, Rehearsal and Pseudo Rehearsal. Connection Science, 1995, 7(2): 123−146 doi: 10.1080/09540099550039318 [46] Rebuffi S A, Kolesnikov A, Sperl G, Lampert C H. iCaRL: Incremental Classifier and Representation Learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 20172001−2010 [47] Buzzega P, Boschini M, Porrello A, Abati D, Calderara S. Dark Experience for General Continual Learning: a Strong, Simple Baseline. Advances in Neural Information Processing Systems, 2020, 33: 15920−15930 [48] Bellitto G, Salanitri F P, Pennisi M, Bonicelli L, Porrello A, Calderara S, et al. Saliency-driven Experience Replay for Continual Learning. Advances in Neural Information Processing Systems, 2024. [49] LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based Learning Applied to Document Recognition. Proceedings of the IEEE, 1998, 86(11): 2278−2324 doi: 10.1109/5.726791 [50] Krizhevsky A, Sutskever I, Hinton G E. Imagenet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 2012. [51] Zhang H, Cisse M, Dauphin Y N, Paz D L. Mixup: Beyond Empirical Risk Minimization. arXiv preprint arXiv: 1710.09412, 2017. [52] Buzzega P, Boschini M, Porrello A, Calderara S. Rethinking Experience Replay: a Bag of Tricks for Continual Learning. 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021: 2180-2187. [53] Zhang Y, Pfahringer B, Frank E, Bifet A, Lim N J S, Jia Y. A Simple but Strong Baseline for Online Continual Learning: Repeated Augmented Rehearsal. Advances in Neural Information Processing Systems, 2022, 35: 14771−14783 [54] Wang L, Zhang X, Yang K, Yu, L, Li, C, Hong, L, et al. Memory Replay with Data Compression for Continual Learning. International Conference on Learning Representations, 2022. [55] Wallace G K. The JPEG Still Picture Compression Standard. Communications of the ACM, 1991, 34(4): 30−44 doi: 10.1145/103085.103089 [56] Isele D, Cosgun A. Selective Experience Replay for Lifelong Learning. Proceedings of the AAAI Conference on Artificial Intelligence. 2018, 32(1). [57] Killamsetty K, Sivasubramanian D, Ramakrishnan G, et al. Glister: Generalization-based Data Subset Selection for Efficient and Robust Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(9): 8110−8118 doi: 10.1609/aaai.v35i9.16988 [58] Yoon J, Madaan D, Yang E, Hwang S J. Online Coreset Selection for Rehearsal-based Continual Learning. International Conference on Learning Representations, ICLR, 2022. [59] Sun S, Calandriello D, Hu H, Li A, Titsias M. Information-theoretic Online Memory Selection for Continual Learning. International Conference on Learning Representations, 2022. [60] Welling M. Herding Dynamical Weights to Learn. Proceedings of the 26th Annual International Conference on Machine Learning, 20091121−1128 [61] Borsos Z, Mutny M, Krause A. Coresets via Bilevel Optimization for Continual Learning and Streaming. Advances in Neural Information Processing Systems, 2020, 33: 14879−14890 [62] Zhou X, Pi R, Zhang W, Lin Y, Chen Z, Zhang T. Probabilistic Bilevel Coreset Selection. International Conference on Machine Learning. PMLR, 2022: 27287-27302. [63] Hao J, Ji K, Liu M. Bilevel Coreset Selection in Continual Learning: A New Formulation and Algorithm. Advances in Neural Information Processing Systems, 2024, 36. [64] Tong R, Liu Y, Shi J Q, Gong D. Coreset Selection via Reducible Loss in Continual Learning. The Thirteenth International Conference on Learning Representations, 2025. [65] Verma T, Jin L, Zhou J, Huang J, Tan M, Choong B C M, et al. Privacy-Preserving Continual Learning Methods for Medical Image Classification: A Comparative Analysis. Frontiers in Medicine, 2023, 10: 1227515 doi: 10.3389/fmed.2023.1227515 [66] Robins A. Catastrophic Forgetting, Rehearsal and Pseudorehearsal. Connection Science, 1995, 7(2): 123−146 doi: 10.1080/09540099550039318 [67] Shin H, Lee J K, Kim J, Kim J. Continual Learning with Deep Generative Replay. Advances in Neural Information Processing Systems, 2017, 30. [68] Wu C, Herranz L, Liu X, Van De Weijer J, Raducanu B. Memory Replay GANs: Learning to Generate New Categories without Forgetting. Advances in Neural Information Processing Systems, 2018, 31. [69] Rios A, Itti L. Closed-loop Memory GAN for Continual Learning. Proceedings of the 28th International Joint Conference on Artificial Intelligence, 20193332−3338 [70] Xiang Y, Fu Y, Ji P, Huang H. Incremental Learning Using Conditional Adversarial Networks. Proceedings of the IEEE/CVF International Conference on Computer Vision, 20196619−6628 [71] Wang Z, Liu L, Duan Y, Tao D. Continual Learning through Retrieval and Imagination. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(8): 8594−8602 doi: 10.1609/aaai.v36i8.20837 [72] Ayub A, Wagner A. EEC: Learning to Encode and Regenerate Images for Continual Learning. International Conference on Learning Representations, 2021. [73] Chen P H, Wei W, Hsieh C J, Dai B. Overcoming Catastrophic Forgetting by Bayesian Generative Regularization. International Conference on Machine Learning. PMLR, 2021: 1760-1770. [74] Wang T, Zhu J Y, Torralba A, Efros A A. Dataset Distillation. arXiv preprint arXiv: 1811.10959, 2018. [75] Lei S, Tao D. A Comprehensive Survey of Dataset Distillation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. [76] Wiewel F, Yang B. Condensed Composite Memory Continual Learning. International Joint Conference on Neural Networks (IJCNN). IEEE, 2021: 1-8. [77] Sangermano M, Carta A, Cossu A, Bacciu D. Sample Condensation in Online Continual Learning. International Joint Conference on Neural Networks (IJCNN). IEEE, 2022: 01-08. [78] Gu J, Wang K, Jiang W, You Y. Summarizing Stream Data for Memory-Constrained Online Continual Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(11): 12217−12225 doi: 10.1609/aaai.v38i11.29111 [79] Yin H, Molchanov P, Alvarez J M, Li Z, Mallya A, Hoiem D, et al. Dreaming to Distill: Data-free Knowledge Transfer via Deepinversion. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 20208715−8724 [80] Yin H, Mallya A, Vahdat A, Alvarez J M, Kautz J, Molchanov P. See Through Gradients: Image Batch Recovery via Gradinversion. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202116337−16346 [81] Smith J, Hsu Y C, Balloch J, Shen Y, Jin H, Kira Z. Always be Dreaming: A New Approach for Data-free Class-incremental Learning. Proceedings of the IEEE/CVF International Conference on Computer Vision, 20219374−9384 [82] Liu X, Wu C, Menta M, Herranz L, Raducanu B, Bagdanov A D, et al. Generative Feature Replay for Class-incremental Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020226−227 [83] Iscen A, Zhang J, Lazebnik S, Schmid C. Memory-Efficient Incremental Learning Through Feature Adaptation. European Conference on Computer Vision, Springer, 2020: 699-715. [84] Smith J S, Tian J, Halbe S, Hsu Y C, Kira Z. A Closer Look at Rehearsal-free Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20232410−2420 [85] Gao R, Liu W. DDGR: Continual Learning with Deep Diffusion-based Generative Replay. International Conference on Machine Learning. PMLR, 2023: 10744-10763. [86] Smith J S, Hsu Y C, Zhang L, Hua T, Kira Z, Shen Y, et al. Continual Diffusion: Continual Customization of Text-to-image Diffusion with C-LoRA. arXiv preprint arXiv: 2304.06027, 2023. [87] Jodelet Q, Liu X, Phua Y J, Murata T. Class-incremental Learning Using Diffusion Model for Distillation and Replay. Proceedings of the IEEE/CVF International Conference on Computer Vision, 20233425−3433 [88] Zajac M, Deja K, Kuzina A, Tomczak J M, Trzciński T, Shkurti F, et al. Exploring Continual Learning of Diffusion Models. arXiv preprint arXiv: 2303.15342, 2023. [89] Masip S, Rodriguez P, Tuytelaars T, van de Ven G M. Continual Learning of Diffusion Models with Generative Distillation. arXiv preprint arXiv: 2311.14028, 2023. [90] Cywiński B, Deja K, Trzciński T, Twardowski B, Kucinski L. GUIDE: Guidance-based Incremental Learning with Diffusion Models. arXiv preprint arXiv: 2403.03938, 2024. [91] Hataya R, Bao H, Arai H. Will Large-Scale Generative Models Corrupt Future Datasets?. Proceedings of the IEEE/CVF International Conference on Computer Vision, 202320555−20565 [92] Martínez G, Watson L, Reviriego P, Hernandez J A, Juarez M, Sarkar R. Towards Understanding the Interplay of Generative Artificial Intelligence and the Internet. International Workshop on Epistemic Uncertainty in Artificial Intelligence. Cham: Springer Nature Switzerland, 2023: 59-73. [93] Wang M, Michel N, Mao J, Yamasaki T. Dealing with Synthetic Data Contamination in Online Continual Learning. The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. [94] Zuo Y, Yao H, Yu L, Zhuang L, Xu C. Hierarchical Prompts for Rehearsal-free Continual Learning. arXiv preprint arXiv: 2401.11544, 2024. [95] Hatamizadeh A, Yin H, Roth H R, Li W, Kautz J, Xu D, et al. Gradvit: Gradient Inversion of Vision Transformers. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202210021−10030 [96] Cai Y, Thomason J, Rostami M. Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation. Conference on Empirical Methods in Natural Language Processing, 2023. [97] Zhang X, Zhang F, Xu C. VQACL: A Novel Visual Question Answering Continual Learning Setting. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202319102−19112 [98] Yang R, Wang S, Zhang H, Xu S, Guo Y, Ye X, et al. Knowledge Decomposition and Replay: A Novel Cross-modal Image-Text Retrieval Continual Learning Method. Proceedings of the 31st ACM International Conference on Multimedia, 20236510−6519 [99] Yan S, Hong L, Xu H, Han J, Tuytelaars T, Li Z, et al. Generative Negative Text Replay for Continual Vision-Language Pretraining. European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 22-38. [100] Lei S W, Gao D, Wu J Z, Wang Y, Liu W, Zhang M, et al. Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(1): 1250−1259 doi: 10.1609/aaai.v37i1.25208 [101] Cheng S, He C, Chen K, Xu L, Li H, Meng F, et al. Vision-Sensor Attention Based Continual Multimodal Egocentric Activity Recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024: 6300-6304. [102] Geirhos R, Jacobsen J H, Michaelis C, Zemel R, Brendel W, Bethge M, et al. Shortcut Learning in Deep Neural Networks. Nature Machine Intelligence, 2020, 2(11): 665−673 doi: 10.1038/s42256-020-00257-z [103] Wei Y, Ye J, Huang Z, Zhang J, Shan H. Online Prototype Learning for Online Continual Learning. Proceedings of the IEEE/CVF International Conference on Computer Vision, 202318764−18774 [104] Kim D, Park D, Shin Y, Bang J, Song H, Lee J G. Adaptive Shortcut Debiasing for Online Continual Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(12): 13122−13131 doi: 10.1609/aaai.v38i12.29211 [105] Jing L, Tian Y. Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 43(11): 4037−4058 [106] Cha H, Lee J, Shin J. Co2L: Contrastive Continual Learning. Proceedings of the IEEE/CVF International Conference on Computer Vision, 20219516−9525 [107] Gomez-Villa A, Twardowski B, Yu L, Bagdanov A D, Van de Weijer J. Continually Learning Self-supervised Representations with Projected Functional Regularization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20223867−3877 [108] Purushwalkam S, Morgado P, Gupta A. The Challenges of Continuous Self-supervised Learning. European Conference on Computer Vision, 2022702−721 [109] Yao L, Chu Z, Li S, Li Y, Gao J, Zhang A. A Survey on Causal Inference. ACM Transactions on Knowledge Discovery from Data (TKDD), 2021, 15(5): 1−46 [110] Hu X, Tang K, Miao C, Hua X S, Zhang H. Distilling Causal Effect of Data in Class-incremental Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20213957−3966 [111] Chu Z, Li R, Rathbun S, Li S. Continual Causal Inference with Incremental Observational Data. 2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE, 2023: 3430-3439. [112] Wang L, Yang K, Li C, Hong L, Li Z, Zhu J. Ordisco: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-supervised Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20215383−5392 [113] Smith J, Balloch J, Hsu Y C, Kira Z. Memory-efficient Semi-supervised Continual Learning: The World is Its Own Replay Buffer. 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021: 1-8. [114] Luo Y, Wong Y, Kankanhalli M, Zhao Q. Learning to Predict Gradients for Semi-Supervised Continual Learning. IEEE Transactions on Neural Networks and Learning Systems, 2024. [115] O'Reilly R C, Bhattacharyya R, Howard M D, Ketz N. Complementary Learning Systems. Cognitive science, 2014, 38(6): 1229−1248 doi: 10.1111/j.1551-6709.2011.01214.x [116] Pham Q, Liu C, Hoi S. DualNet: Continual Learning, Fast and Slow. Advances in Neural Information Processing Systems, 2021, 34: 16131−16144 [117] Arani E, Sarfraz F, Zonooz B. Learning Fast, Learning Slow: A General Continual Learning Method based on Complementary Learning System. International Conference on Learning Representations, 2022. [118] Ren X, Qin Y, Wang B, Cheng X, Jia L. A Complementary Continual Learning Framework Using Incremental Samples for Remaining Useful Life Prediction of Machinery. IEEE Transactions on Industrial Informatics, 2024. [119] Mai Z, Li R, Kim H, Sanner S. Supervised Contrastive Replay: Revisiting the Nearest Class Mean Classifier in Online Class-incremental Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20213589−3599 [120] Rypesc G, Cygert S, Trzcinski T, Twardowski B. Task-recency bias strikes back: Adapting covariances in Exemplar-Free Class Incremental Learning. Advances in Neural Information Processing Systems, 2024, 37: 63268−63289 [121] Wang Q, Wang R, Wu Y, Jia X, Meng D. CBA: Improving Online Continual Learning via Continual Bias Adaptor. Proceedings of the IEEE/CVF International Conference on Computer Vision, 202319082−19092 [122] Hou S, Pan X, Loy C C, Wang Z, Lin D. Learning a Unified Classifier Incrementally via Rebalancing. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019831−839 [123] Ahn H, Kwak J, Lim S, Bang H, Kim H, Moon T. SS-IL: Separated Softmax for Incremental Learning. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021844−853 [124] Wu Y, Chen Y, Wang L, Ye Y, Liu Z, Guo Y, et al. Large Scale Incremental Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019374−382 [125] Caccia L, Aljundi R, Asadi N, Tuytelaars T, Pineau J, Belilovsky E. New Insights on Reducing Abrupt Representation Change in Online Continual Learning. arXiv preprint arXiv: 2104.05025, 2021. [126] Yu L, Twardowski B, Liu X, Herranz L, Wang K, Cheng Y, et al. Semantic Drift Compensation for Class-incremental Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20206982−6991 [127] Zhu K, Zhai W, Cao Y, Luo J, Zha Z J. Self-sustaining Representation Expansion for Non-exemplar Class-incremental Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20229296−9305 [128] Pham Q, Liu C, Hoi S. Continual Normalization: Rethinking Batch Normalization for Online Continual Learning. International Conference on Learning Representations, 2022. [129] Cha S, Cho S, Hwang D, Hong S, Lee M, Moon T. Rebalancing Batch Normalization for Exemplar-based Class-incremental Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202320127−20136 [130] Lyu Y, Wang L, Zhang X, Sun Z, Su H, Zhu J. Overcoming Recency Bias of Normalization Statistics in Continual Learning: Balance and adaptation. Advances in Neural Information Processing Systems, 2024, 36. [131] Wang Q, Wang R, Wu Y, Jia X, Meng D. Dual-CBA: Improving Online Continual Learning via Dual Continual Bias Adaptors from a Bi-level Optimization Perspective. arXiv preprint arXiv: 2408.13991, 2024. [132] Rusu A A, Rabinowitz N C, Desjardins G, Soyer H, Kirkpatrick J, Kavukcuoglu K, et al. Progressive Neural Networks. arXiv preprint arXiv: 1606.04671, 2016. [133] Yan S, Xie J, He X. DER: Dynamically Expandable Representation for Class Incremental Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20213014−3023 [134] Mallya A, Lazebnik S. Packnet: Adding Multiple Tasks to a Single Network by Iterative Pruning. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 20187765−7773 [135] Golkar S, Kagan M, Cho K. Continual Learning via Neural Pruning. arXiv preprint arXiv: 1903.04476, 2019. [136] Yoon J, Kim S, Yang E, Hwang, S. J. Scalable and Order-robust Continual Learning with Additive Parameter Decomposition. International Conference on Learning Representations, 2020. [137] Hihn H, Braun D A. Mixture-of-Variational-Experts for Continual Learning. ICLR Workshop on Agent Learning in Open-Endedness, 2021. [138] Wang L, Zhang X, Li Q, Zhu J. CoSCL: Cooperation of Small Continual Learners is Stronger Than a Big One. European Conference on Computer Vision, 2022254−271 [139] Zhou Y, Lei T, Liu H, Du N, Huang Y, Zhao V, et al. Mixture-of-experts with Expert Choice Routing. Advances in Neural Information Processing Systems, 2022, 35: 7103−7114 [140] Abati D, Tomczak J, Blankevoort T, Calderara S, Cucchiara R, Bejnordi B E. Conditional Channel Gated Networks for Task-aware Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20203931−3940 [141] Mallya A, Davis D, Lazebnik S. Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights. European Conference on Computer Vision, 201867−82 [142] Wortsman M, Ramanujan V, Liu R, Kembhavi, A., Rastegari, M., Yosinski, J., et al. Supermasks in Superposition. Advances in Neural Information Processing Systems, 2020, 33: 15173−15184 [143] Kang H, Mina R J L, Madjid S R H, Yoon J, Hasegawa-Johnson M, Hwang S J, et al. Forget-free Continual Learning with Winning Subnetworks. International Conference on Machine Learning. PMLR, 2022: 10734-10750. [144] Yoon J, Madjid S, Hwang S J, Yoo C D. On the Soft-Subnetwork for Few-Shot Class Incremental Learning. International Conference on Learning Representations, 2023. [145] Gao Q, Shan X, Zhang Y, Zhou F. Enhancing Knowledge Transfer for Task Incremental Learning with Data-free Subnetwork. Advances in Neural Information Processing Systems, 2023, 36: 68471−68484 [146] Fini E, Da Costa V G T, Alameda-Pineda X, Ricci E, Alahari K, Mairal J. Self-supervised Models are Continual Learners. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20229621−9630 [147] Ye Y, Xie Y, Zhang J, Chen Z, Wu Q, Xia Y. Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202411114−11124 [148] McDonnell M D, Gong D, Parvaneh A, Abbasnejad E, van den Hengel A. RanPAC: Random Projections and Pre-trained Models for Continual Learning. Advances in Neural Information Processing Systems, 2024, 36. [149] Zhang G, Wang L, Kang G, Cheng L, Wei Y. SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model. Proceedings of the IEEE/CVF International Conference on Computer Vision, 202319148−19158 [150] Zhang G, Wang L, Kang G, Cheng L, Wei Y. SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training. arXiv preprint arXiv: 2408.08295, 2024. [151] He J, Zhu F. Exemplar-free Online Continual Learning. 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 2022: 541-545. [152] Zhuang H, Weng Z, Wei H, Xie R, Toh K A, Lin Z. ACIL: Analytic Class-incremental Learning with Absolute Memorization and Privacy Protection. Advances in Neural Information Processing Systems, 2022, 35: 11602−11614 [153] Zhuang H, He R, Tong K, et al. DS-AL: A Dual-stream Analytic Learning for Exemplar-free Class-incremental Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(15): 17237−17244 doi: 10.1609/aaai.v38i15.29670 [154] Wang Z, Zhang Z, Lee C Y, Zhang H, Sun R, Ren X, et al. Learning to Prompt for Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022139−149 [155] Wang Z, Zhang Z, Ebrahimi S, Sun R, Zhang H, Lee C Y, et al. DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning. European Conference on Computer Vision, 2022631−648 [156] Smith J S, Karlinsky L, Gutta V, Cascante-Bonilla P, Kim D, Arbelle A, et al. Coda-Prompt: Continual Decomposed Attention-based Prompting for Rehearsal-free Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202311909−11919 [157] Wang L, Xie J, Zhang X, Huang M, Zhu J. Hierarchical Decomposition of Prompt-based Continual Learning: Rethinking Obscured Sub-optimality. Advances in Neural Information Processing Systems, 2024, 36. [158] Chen H, Wu Z, Han X, Jia M, Jiang Y G. PromptFusion: Decoupling Stability and Plasticity for Continual Learning. arXiv preprint arXiv: 2303.07223, 2023. [159] Wang Y, Huang Z, Hong X. S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning. Advances in Neural Information Processing Systems, 2022, 35: 5682−5695 [160] Kang Z Q, Wang L, Zhang X, Alahari K. Advancing Prompt-Based Methods for Replay-Independent General Continual Learning. The Thirteenth International Conference on Learning Representations, 2025. [161] Liu Y, Yang M. SEC-Prompt: SEmantic Complementary Prompting for Few-Shot Class-Incremental Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025. [162] Huang W C, Chen C F, Hsu H. OVOR: OnePrompt with Virtual Outlier Regularization for Rehearsal-Free Class-Incremental Learning. The Twelfth International Conference on Learning Representations, 2024. [163] Jung D, Han D, Bang J, Song H. Generating Instance-level Prompts for Rehearsal-free Continual Learning. Proceedings of the IEEE/CVF International Conference on Computer Vision, 202311847−11857 [164] Tang Y M, Peng Y X, Zheng W S. When Prompt-based Incremental Learning Does Not Meet Strong Pretraining. Proceedings of the IEEE/CVF International Conference on Computer Vision, 20231706−1716 [165] Yang C, Liu W, Chen S, Qi J, Zhou A. Generating Prompts in Latent Space for Rehearsal-free Continual Learning. Proceedings of the 32nd ACM International Conference on Multimedia, 20248913−8922 [166] Zheng J, Ma Q, Liu Z, Wu B, Feng H. Beyond Anti-Forgetting: Multimodal Continual Instruction Tuning with Positive Forward Transfer. arXiv preprint arXiv: 2401.09181, 2024. [167] D'Alessandro M, Alonso A, Calabres E, Galar M. Multimodal Parameter-Efficient Few-shot Class Incremental Learning. Proceedings of the IEEE/CVF International Conference on Computer Vision, 20233393−3403 [168] Qian Z, Wang X, Duan X, Qin P, Li Y, Zhu W. Decouple before Interact: Multi-modal Prompt Learning for Continual Visual Question Answering. Proceedings of the IEEE/CVF International Conference on Computer Vision, 20232953−2962 [169] Li J, Wang S, Qian B, He Y, Wei X, Gong Y. Dynamic Integration of Task-Specific Adapters for Class Incremental Learning. arXiv preprint arXiv: 2409.14983, 2024. [170] Liang Y S, Li W J. InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202423638−23647 [171] Zhao L, Zhang X, Yan K, Ding S, Huang W. SAFE: Slow and Fast Parameter-Efficient Tuning for Continual Learning with Pre-Trained Models. The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. [172] Zhang X, Bai L, Yang X, Liang J. C-LoRA: Continual Low-Rank Adaptation for Pre-trained Models. arXiv preprint arXiv: 2502.17920, 2025. [173] Wu Y, Piao H, Huang L K, Wang R, Li W, Pfister H, et al. SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning. The Thirteenth International Conference on Learning Representations, 2025. [174] Wei X, Li G, Marculescu R. Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 20256634−6645 [175] He J P, Duan Z H, Zhu F Q. CL-LoRA: Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025. [176] Zhu H, Zhang Y F, Dong J H, Koniusz P. BiLoRA: Almost-Orthogonal Parameter Spaces for Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025. [177] Liu X, Chang X B. LoRA Subtraction for Drift-Resistant Space in Exemplar-Free Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025. [178] Yu J, Zhuge Y, Zhang L, Hu P, Wang D, Lu H, et al. Boosting Continual Learning of Vision-language Models via Mixture-of-experts Adapters. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202423219−23230 [179] Le M, Nguyen A, Nguyen H, Nguyen T, Pham T, Van Ngo L, et al. Mixture of Experts Meets Prompt-Based Continual Learning. The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. [180] Jung M J, Kim J H. PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning. arXiv preprint arXiv: 2407.21571, 2024. [181] Yang S, Ali M A, Wang C L, Hu L, Wang D. MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning. arXiv preprint arXiv: 2402.11260, 2024. [182] Marouf I E, Roy S, Tartaglione E, Lathuiliere S. Weighted Ensemble Models are Strong Continual Learners. European Conference on Computer Vision. Springer, Cham, 2024: 306-324. [183] Wang H, Lu H, Yao L, Gong D. Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning. NeurIPS 2024 Workshop on Scalable Continual Learning for Lifelong Foundation Models, 2024. [184] Li H, Lin S, Duan L, Liang Y, Shroff N B. Theory on Mixture-of-Experts in Continual Learning. The Thirteenth International Conference on Learning Representations, 2025. [185] Song G, Tan X. Real-world Cross-modal Retrieval via Sequential Learning. IEEE Transactions on Multimedia, 2020, 23: 1708−1721 [186] Sun F, Liu H, Yang C, Fang B. Multimodal Continual Learning Using Online Dictionary Updating. IEEE Transactions on Cognitive and Developmental Systems, 2020, 13(1): 171−178 [187] Peng Y, Qi J, Ye Z, Zhuo Y. Hierarchical Visual-Textual Knowledge Distillation for Life-long Correlation Learning. International Journal of Computer Vision, 2021, 129(4): 921−941 doi: 10.1007/s11263-020-01392-1 [188] Yu J, Zhuge Y, Zhang L, Hu P, Wang D, Lu H, et al. Boosting Continual Learning of Vision-Llanguage Models via Mixture-of-Experts Adapters. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202423219−23230 [189] Jha S, Gong D, Yao L. CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models. Neural Information Processing Systems, 2024. [190] Gao Z, Zhang X, Xu K, Mao X, Wang H. Stabilizing Zero-Shot Prediction: A Novel Antidote to Forgetting in Continual Vision-Language Tasks. The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. [191] Zheng J, Cai X, Qiu S, Ma Q. Spurious Forgetting in Continual Learning of Language Models. The Thirteenth International Conference on Learning Representations, 2025. [192] Hinton G. Distilling the Knowledge in a Neural Network. arXiv preprint arXiv: 1503.02531, 2015. [193] Gou J, Yu B, Maybank S J, Tao D. Knowledge Distillation: A Survey. International Journal of Computer Vision, 2021, 129(6): 1789−1819 doi: 10.1007/s11263-021-01453-z [194] Li Z, Hoiem D. Learning without Forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(12): 2935−2947 [195] Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, et al. Overcoming Catastrophic Forgetting in Neural Networks. Proceedings of the National Academy of Sciences, 2017, 114(13): 3521−3526 doi: 10.1073/pnas.1611835114 [196] Ferenc Huszár. On Quadratic Penalties in Elastic Weight Consolidation. arXiv preprint arXiv: 1712.03847, 2017. [197] Ritter H, Botev A, Barber D. Online Structured Laplace Approximations for Overcoming Catastrophic Forgetting. Advances in Neural Information Processing Systems, 2018, 31. [198] Zenke F, Poole B, Ganguli S. Continual Learning Through Synaptic Intelligence. International conference on machine learning. PMLR, 2017: 3987-3995. [199] Wu Y, Huang L K, Wang R, Meng D, Wei Y. Meta Continual Learning Revisited: Implicitly Enhancing Online Hessian Approximation via Variance Reduction. The Twelfth International Conference on Learning Representations. 2024. [200] Lopez-Paz D, Ranzato M A. Gradient Episodic Memory for Continual Learning. Advances in neural information processing systems, 2017, 30. [201] Chaudhry A, Ranzato M A, Rohrbach M, Elhoseiny M. Efficient Lifelong Learning with A-GEM. International Conference on Learning Representations, 2018. [202] Tang S, Chen D, Zhu J, Yu S, Ouyang W. Layerwise Optimization by Gradient Decomposition for Continual Learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20219634−9643 [203] Wang S, Li X, Sun J, Xu Z. Training Networks in Null Space of Feature Covariance for Continual Learning. Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2021184−193 [204] Kong Y, Liu L, Wang Z, Tao D. Balancing Stability and Plasticity Through Advanced Null Space in Continual Learning. European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 219-236. [205] Dinh L, Pascanu R, Bengio S, Bengio Y. Sharp Minima Can Generalize for Deep Nets. International Conference on Machine Learning. PMLR, 2017: 1019-1028. [206] Foret P, Kleiner A, Mobahi H, Neyshabur B. Sharpness-aware Minimization for Efficiently Improving Generalization. arXiv preprint arXiv: 2010.01412, 2020. [207] Liu Y, Mai S, Chen X, et al. Towards Efficient and Scalable Sharpness-aware Minimization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 202212360−12370 [208] Yang E, Shen L, Wang Z, Liu S, Guo G, Wang X. Data Augmented Flatness-aware Gradient Projection for Continual Learning. Proceedings of the IEEE/CVF International Conference on Computer Vision, 20235630−5639 [209] Chen R, Jing X Y, Wu F, Chen H. Sharpness-aware Gradient Guidance for Few-shot Class-incremental Learning. Knowledge-Based Systems, 2024112030 [210] Yang E, Shen L, Wang Z, Liu S, Guo G, Wang X, et al. Revisiting Flatness-aware Optimization in Continual Learning with Orthogonal Gradient Projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. [211] Bian A, Li W, Yuan H, Wang M, Zhao Z, Lu A, et al. Make Continual Learning Stronger via C-flat. Advances in Neural Information Processing Systems, 2024, 37: 7608−7630 [212] Finn C, Abbeel P, Levine S. Model-agnostic Meta-learning for Fast Adaptation of Deep Networks. International Conference on Machine Learning. PMLR, 2017: 1126-1135. [213] Hospedales T, Antoniou A, Micaelli P, Storkey A. Meta-learning in Neural Networks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(9): 5149−5169 [214] Riemer M, Cases I, Ajemian R, Liu M, Rish I, Tu Y, et al. Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference. International Conference on Learning Representations, 2019. [215] Gupta G, Yadav K, Paull L. Look-ahead Meta Learning for Continual Learning. Advances in Neural Information Processing Systems, 2020, 33: 11588−11598 [216] Javed K, White M. Meta-learning Representations for Continual Learning. Advances in Neural Information Processing Systems, 2019, 32. [217] He X, Sygnowski J, Galashov A, Rusu A A, Teh Y W, Pascanu R. Task Agnostic Continual Learning via Meta Learning. 4th Lifelong Machine Learning Workshop at ICML, 2020. [218] Beaulieu S, Frati L, Miconi T, Lehman J, Stanley K O, Clune J, et al. Learning to Continually Learn. ECAI. IOS Press, 2020: 992-1001. [219] He J, Guo H, Tang M, Wang J. Continual Instruction Tuning for Large Multimodal Models. arXiv preprint arXiv: 2311.16206, 2023. [220] Qiao J, Tan X, Chen C, Qu Y, Peng Y, Xie Y. Prompt Gradient Projection for Continual Learning. The Twelfth International Conference on Learning Representations, 2023. [221] Lu Y, Zhang S, Cheng D, Xing Y, Wang N, Wang P, et al. Visual Prompt Tuning in Null Space for Continual Learning. The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. [222] Liu R, Zhang J, Song Y, Zhang Y, Yang B. Discarding the Crutches: Adaptive Parameter-Efficient Expert Meta-Learning for Continual Semantic Parsing. Proceedings of the 31st International Conference on Computational Linguistics, 20253560−3578 [223] Yeongbin S, Lee D, Yeo J. Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning. Advances in Neural Information Processing Systems, 2025, 37: 58284−58308 [224] Pentina A, Lampert C. A PAC-Bayesian Bound for Lifelong Learning. International Conference on Machine Learning, PMLR, 2014: 991-999. [225] Pentina A, Lampert C H. Lifelong Learning with Non-iid Tasks. Advances in Neural Information Processing Systems, 2015, 28. [226] Ramesh R, Chaudhari P. Model Zoo: A Growing Brain That Learns Continually. International Conference on Learning Representations, 2022. [227] Nguyen C V, Li Y, Bui T D, Turner R E. Variational Continual Learning. International Conference on Learning Representations, 2018. [228] Andle J, Yasaei Sekeh S. Theoretical Understanding of the Information Flow on Continual Learning Performance. European Conference on Computer Vision, 202286−101 [229] Peng B, Risteski A. Continual Learning: A Feature Extraction Formalization, an Efficient Algorithm, and Barriers. Advances in Neural Information Processing Systems, 2022. [230] Lin S, Ju P, Liang Y, Shroff N. Theory on Forgetting and Generalization of Continual Learning. International Conference on Machine Learning, PMLR, 2023: 21078-21100. [231] Goldfarb D, Hand P. Analysis of Catastrophic Forgetting for Random Orthogonal Transformation Tasks in the Overparameterized Regime. International Conference on Artificial Intelligence and Statistics, PMLR, 2023: 2975-2993. [232] Ding M, Ji K Y, Wang D, Xu J H. Understanding Forgetting in Continual Learning with Linear Regression. Forty-first International Conference on Machine Learning, 2024. [233] Li H, Lin S, Duan L, Liang Y, Shroff N B. Theory on Mixture-of-experts in Continual Learning. The Thirteenth International Conference on Learning Representations, 2025. [234] Alquier P, Pontil M. Regret Bounds for Lifelong Learning. International Conference on Artificial Intelligence and Statistics, PMLR, 2017: 261-269. [235] Wu Y S, Wang P A, Lu C J. Lifelong Optimization with Low Regret. International Conference on Artificial Intelligence and Statistics, PMLR, 2019: 448-456. [236] Jacot A, Gabriel F, Hongler C. Neural Tangent Kernel: Convergence and Generalization in Neural Networks. Advances in Neural Information Processing Systems, 2018, 31. [237] Doan T, Bennani M A, Mazoure B, Rabusseau G Alquier P. A Theoretical Analysis of Catastrophic Forgetting Through the NTK Overlap Matrix. International Conference on Artificial Intelligence and Statistics, PMLR, 2021: 1072-1080. [238] Raghavan K, Balaprakash P. Formalizing the Generalization-forgetting Trade-off in Continual Learning. Advances in Neural Information Processing Systems, 2021, 34: 17284−17297 [239] Kim G, Xiao C, Konishi T, et al. A Theoretical Study on Solving Continual Learning. Advances in Neural Information Processing Systems, 2022, 35: 5065−5079 [240] Sun S, Calandriello D, Hu H, Li A, Titsias M. Information-theoretic Online Memory Selection for Continual Learning. International Conference on Learning Representations, 2022. [241] Peng L, Elenter J, Agterberg J, Ribeiro A, Vidal R. TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models. The Thirteenth International Conference on Learning Representations, 2025. [242] Wang D, Shelhamer E, Liu S, Olshausen B, Darrell T. Tent: Fully Test-Time Adaptation by Entropy Minimization. International Conference on Learning Representations, 2021. [243] Wang Z, Yang E, Shen L, Huang H. A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024. [244] Liang J, He R, Tan T. A Comprehensive Survey on Test-Time Adaptation Under Distribution Shifts. International Journal of Computer Vision, 2025, 133(1): 31−64 doi: 10.1007/s11263-024-02181-w [245] Gong T, Jeong J, Kim T, Shin J, Lee S J. Note: Robust Continual Test-Time Adaptation Against Temporal Correlation. Advances in Neural Information Processing Systems, 2022, 35: 27253−27266 [246] Wang Q, Fink O, Van Gool L, Dai D. Continual Test-Time Domain Adaptation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20227201−7211 [247] Chen H, Goldblum M, Wu Z, Jiang Y G. Adaptive Retention & Correction: Test-Time Training for Continual Learning. The Thirteenth International Conference on Learning Representations, 2025. [248] Niu S, Wu J, Zhang Y, Chen Y, Zheng S, Zhao P, et al. Efficient Test-Time Model Adaptation Without Forgetting. International conference on machine learning. PMLR, 2022: 16888-16905. [249] Dobler M, Marsden R A, Yang B. Robust Mean Teacher for Continual and Gradual Test-Time Adaptation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20237704−7714 [250] Yang P, Liang J, Cao J, He R. Auto: Adaptive Outlier Optimization for Online Test-Time OOD Detection. arXiv preprint arXiv: 2303.12267, 2023. [251] Cao Y, Yang J. Towards Making Systems Forget with Machine Unlearning. IEEE symposium on security and privacy. IEEE, 2015: 463-480. [252] Bourtoule L, Chandrasekaran V, Choquette-Choo C A, Jia H, Travers A, Zhang B, et al. Machine Unlearning. IEEE Symposium on Security and Privacy. IEEE, 2021: 141-159. [253] Nguyen T T, Huynh T T, Ren Z, et al. A Survey of Machine Unlearning. arXiv preprint arXiv: 2209.02299, 2022. [254] Wang W, Tian Z, Zhang C, Yu S. Machine Unlearning: A Comprehensive Survey. arXiv preprint arXiv: 2405.07406, 2024. [255] Wu Y, Dobriban E, Davidson S. DeltaGrad: Rapid Retraining of Machine Learning Models. International Conference on Machine Learning. PMLR, 2020: 10355-10366. [256] Sekhari A, Acharya J, Kamath G, Suresh A T. Remember what you Want to Forget: Algorithms for Machine Unlearning. Advances in Neural Information Processing Systems, 2021, 34: 18075−18086 [257] Guo C, Goldstein T, Hannun A, Van Der Maaten L. Certified Data Removal from Machine Learning Models. International Conference on Machine Learning. PMLR, 2020: 3832-3842. [258] Golatkar A, Achille A, Soatto S. Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20209304−9312 [259] Nguyen Q P, Low B K H, Jaillet P. Variational Bayesian Unlearning. Advances in Neural Information Processing Systems, 2020, 33: 16025−16036 [260] Du M, Chen Z, Liu C, Oak R, Song D. Lifelong Anomaly Detection Through Unlearning. Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, 20191283−1297 [261] Ma Z, Liu Y, Liu X, Ma J, Ren K. Learn to Forget: Machine Unlearning via Neuron Masking. IEEE Transactions on Dependable and Secure Computing, 2022, 20(4): 3194−3207 [262] Gao C, Wang L, Ding K, Weng C, Wang X, Zhu Q. On Large Language Model Continual Unlearning. The Thirteenth International Conference on Learning Representations, 2025. [263] Lin L J. Self-improving Reactive Agents based on Reinforcement Learning, Planning and Teaching. Machine Learning, 1992, 8: 293−321 [264] Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, et al. Human-level Control Through Deep Reinforcement Learning. Nature, 2015, 518(7540): 529−533 doi: 10.1038/nature14236 [265] Schaul T, Quan J, Antonoglou I, Silver D. Prioritized Experience Replay. arXiv preprint arXiv: 1511.05952, 2015. [266] Lyle C, Rowland M, Dabney W, Kwiatkowska M, Gal Y. Learning Dynamics and Generalization in Deep Reinforcement Learning. International Conference on Machine Learning. PMLR, 2022: 14560-14581. [267] Dohare S, Hernandez-Garcia J F, Lan Q, Rahman P, Mahmood A R, Sutton R S. Loss of Plasticity in Deep Continual Learning. Nature, 2024, 632(8026): 768−774 doi: 10.1038/s41586-024-07711-7 [268] Kumar S, Marklund H, Rao A, Zhu Y, Jeon H J, Liu Y. Continual Learning as Computationally Constrained Reinforcement Learning. arXiv preprint arXiv: 2307.04345, 2023. [269] Abel D, Barreto A, Van Roy B, Precup D, van Hasselt H P, Singh S. A Definition of Continual Reinforcement Learning. Advances in Neural Information Processing Systems, 2023, 36: 50377−50407 [270] Daniels Z A, Raghavan A, Hostetler J, Rahman A, Sur I, Piacentino M, et al. Model-Free Generative Replay for Lifelong Reinforcement Learning: Application to Starcraft-2. Conference on Lifelong Learning Agents. PMLR, 2022: 1120-1145. [271] Igl M, Farquhar G, Luketina J, Boehmer W, Whiteson S. Transient Non-stationarity and Generalisation in Deep Reinforcement Learning. International Conference on Learning Representations, 2021. [272] Gaya J B, Doan T, Caccia L, Soulier L, Denoyer L, Raileanu R. Building a Subspace of Policies for Scalable Continual Learning. International Conference of Learning Representations, 2023. [273] Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman F L, et al. Gpt-4 Technical Report. arXiv preprint arXiv: 2303.08774, 2023. [274] Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M A, Lacroix T, et al. Llama: Open and Efficient Foundation Language Models. arXiv preprint arXiv: 2302.13971, 2023. [275] Bai J, Bai S, Chu Y, Cui Z, Dang K, Deng X, et al. Qwen Technical Report. arXiv preprint arXiv: 2309.16609, 2023. [276] Liu A, Feng B, Xue B, Wang B, Wu B, Lu C, et al. Deepseek-V3 Technical Report. arXiv preprint arXiv: 2412.19437, 2024. [277] Sun Y, Wang S, Li Y, Feng S, Tian H, Wu H, et al. Ernie 2.0: A Continual Pre-training Framework for Language Understanding. Proceedings of the AAAI conference on artificial intelligence, 2020, 34(05): 8968−8975 doi: 10.1609/aaai.v34i05.6428 [278] Jang J, Ye S, Yang S, Shin J, Han J, Kim G, et al. Towards Continual Knowledge Learning of Language Models. International Conference on Learning Representations, 2022. [279] Ke Z, Shao Y, Lin H, Konishi T, Kim G, Liu B. Continual Pre-training of Language Models. International Conference on Learning Representations. 2023. [280] Yang X, Gao J, Xue W, Alexandersson E. Pllama: An Open-source Large Language Model for Plant Science. arXiv preprint arXiv: 2401.01600, 2024. [281] Gogoulou E, Lesort T, Boman M, Nivre J. A Study of Continual Learning Under Language Shift. CoRR, 2023. [282] Razdaibiedina A, Mao Y, Hou R, Khabsa M, Lewis M, Almahairi A. Progressive Prompts: Continual Learning for Language Models. arXiv preprint arXiv: 2301.12314, 2023. [283] Bohao P, Tian Z, Liu S, Yang M C, Jia J. Scalable Language Model with Generalized Continual Learning. The Twelfth International Conference on Learning Representations. 2024. [284] Wang X, Zhang Y, Chen T, Gao S, Jin S, Yang X, et al. TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models. arXiv preprint arXiv: 2310.06762, 2023. [285] Song C, Han X, Zeng Z, Li K, Chen C, Liu Z, et al. Conpet: Continual Parameter-Efficient Tuning for Large Language Models. arXiv preprint arXiv: 2309.14763, 2023. [286] Hao S, Liu T, Wang Z, Hu Z. ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings. Advances in neural information processing systems, 2024, 36. [287] Zhang H, Gui L, Zhai Y, Wang H, Lei Y, Xu R. Copf: Continual Learning Human Preference Through Optimal Policy Fitting. arXiv preprint arXiv: 2310.15694, 2023. [288] Zhang H, Lei Y, Gui L, Yang M, He Y, Wang H, et al. CPPO: Continual Learning for Reinforcement Learning with Human Feedback. The Twelfth International Conference on Learning Representations. 2024. [289] Suhr A, Artzi Y. Continual Learning for Instruction Following from Realtime Feedback. Advances in Neural Information Processing Systems, 2024, 36. [290] Wang X, Chen T, Ge Q, Xia H, Bao R, Zheng R, et al. Orthogonal Subspace Learning for Language Model Continual Learning. Conference on Empirical Methods in Natural Language Processing, 2023. [291] Jang J, Kim S, Ye S, Kim D, Logeswaran L, Lee M, et al. Exploring the Benefits of Training Expert Language Models over Instruction Tuning. International Conference on Machine Learning. PMLR, 2023: 14702-14729. [292] Qiao F, Mahdavi M. Learn More, but Bother Less: Parameter Efficient Continual Learning. The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. -
计量
- 文章访问数: 12
- HTML全文浏览量: 8
- 被引次数: 0