基于深度学习的视频插帧研究进展

吴晨阳; 张勇; 韩树豪; 郭春乐; 李重仪; 程明明

doi:10.16383/j.aas.c240572

基于深度学习的视频插帧研究进展

doi: 10.16383/j.aas.c240572 cstr: 32138.14.j.aas.c240572

吴晨阳^1,,
张勇^{1, 2,},
韩树豪^1,,
郭春乐^{1, 3,},
李重仪^{1, 3,},
程明明^{1, 3,}

1.
南开大学计算机学院天津 300350
2.
重庆长安望江工业集团有限公司重庆 404100
3.
南开国际先进研究院(深圳福田) 深圳 518048

基金项目: 国家自然科学基金(62306153, U23B2011, 62176130), 中央高校基本科研业务费(070-63243143), 天津市自然科学基金(24JCJQJC00020), 深圳市科技计划(JCYJ20240813114237048)资助

详细信息

作者简介:
吴晨阳：南开大学计算机学院博士研究生. 主要研究方向为深度学习和视频插帧. E-mail: wucy0519@gmail.com

张勇：重庆长安望江工业集团有限公司工程师, 南开大学计算机学院博士研究生. 主要研究方向为目标检测与跟踪和多模态感知数据融合. 本文通信作者. E-mail: zhangyongtju@163.com

韩树豪：南开大学计算机学院硕士研究生. 主要研究方向为深度学习和视频插帧. E-mail: hansh@mail.nankai.edu.cn

郭春乐：南开大学计算机学院副教授, 南开国际先进研究院(深圳福田)副教授. 主要研究方向为计算成像, 图像增强与复原. E-mail: guochunle@nankai.edu.cn

李重仪：南开大学计算机学院教授, 南开国际先进研究院(深圳福田)教授. 主要研究方向为计算成像. E-mail: lichongyi@nankai.eud.cn

程明明：南开大学计算机学院教授, 南开国际先进研究院(深圳福田)教授. 主要研究方向为人工智能, 计算机视觉和计算机图形学. E-mail: cmm@nankai.edu.cn

计量
- 文章访问数: 365
- HTML全文浏览量: 389
- PDF下载量: 34
- 被引次数: 0
出版历程
- 收稿日期: 2024-08-14
- 网络出版日期: 2025-04-20

Research Advances on Deep-learning Based Video Frame Interpolation

WU Chen-Yang^1
,,
ZHANG Yong^{1, 2
,},
HAN Shu-Hao^1
,,
GUO Chun-Le^{1, 3
,},
LI Chong-Yi^{1, 3
,},
CHENG Ming-Ming^{1, 3
,}

1.
College of Computer Science, Nankai University, Tianjin 300350
2.
Chongqing Chang＇an Wangjiang Industry Co., Ltd., Chongqing 404100
3.
Nankai International Advanced Research Institute (SHENZHEN-FUTIAN), Shenzhen 518048

Funds: Supported by National Natural Science Foundation of China (62306153, U23B2011, 62176130), Fundamental Research Funds for the Central Universities (070-63243143), Natural Science Foundation of Tianjin (24JCJQJC00020) and Shenzhen Science and Technology Program (JCYJ20240813114237048)

More Information

Author Bio:
WU Chen-Yang　Ph.D. candidate at the College of Computer Science, Nankai University. His research interest covers deep learning and video frame interpolation

ZHANG Yong　Engineer at Chongqing Chang＇an Wangjiang Industry Co., Ltd., and Ph.D. candidate at the College of Computer Science, Nankai University. His research interest covers object detection and tracking, and multimodal perception data fusion. Corresponding author of this paper

HAN Shu-Hao　Master student at the College of Computer Science, Nankai University. His research interest covers deep learning and video frame interpolation

GUO Chun-Le　Associate professor at the College of Computer Science, Nankai University, and Nankai International Advanced Research Institute (SHENZHEN-FUTIAN). His research interest covers computational imaging, image enhancement and restoration

LI Chong-Yi　Professor at the College of Computer Science, Nankai University, and Nankai International Advanced Research Institute (SHENZHEN-FUTIAN). His main research interest is computational imaging

CHENG Ming-Ming　Professor at the College of Computer Science, Nankai University, and Nankai International Advanced Research Institute (SHENZHEN-FUTIAN). His research interest covers artificial intelligence, computer vision and computer graphics

摘要

摘要: 视频插帧技术是视频处理领域的研究热点问题. 它通过生成中间帧来提高视频的帧率, 从而使视频播放更加流畅, 在老视频修复、电影后期制作和慢动作生成等领域发挥着重要的作用. 随着深度学习技术的迅猛发展, 基于深度学习的视频插帧技术已经成为主流. 本文全面综述现有的基于深度学习的视频插帧工作, 并且深入分析这些方法的优点与不足. 随后, 详细介绍视频插帧领域的常用数据集, 这些数据集为视频插帧相关研究和算法训练提供重要支撑. 最后, 对当前视频插帧研究中仍然存在的挑战进行深入思考, 并且从多个角度展望未来的研究方向, 旨在为该领域后续的发展提供参考.
- 视频插帧 /
- 深度神经网络 /
- 卷积神经网络
Abstract: Video frame interpolation technology has become a hot research topic in the field of video processing. It improves the frame rate of videos by generating intermediate frames, thereby making video playback smoother. It plays a crucial role in various fields such as old video restoration, film post-production and slow-motion generation. With the rapid development of deep learning technology, video frame interpolation technology based on deep learning has become mainstream. This paper comprehensively reviews the existing deep-learning based video frame interpolation works and deeply analyzes the advantages and disadvantages of these methods. Subsequently, this paper elaborately introduces the commonly used datasets in the field of video frame interpolation. These datasets provide important support for video frame interpolation-related research and algorithm training. Finally, the paper deeply contemplates the challenges existing in current video frame interpolation research and looks ahead to future research directions from multiple perspectives, aiming to provide references for the subsequent development of this field.
- Video frame interpolation /
- deep neural networks /
- convolutional neural networks

HTML全文

图 1 基于深度学习的视频插帧发展流程图

Fig. 1 The flowchart of deep-learning based video frame interpolation development

下载: 全尺寸图片幻灯片

图 2 视频插帧算法在数据集上的可视化结果

Fig. 2 Visualization results of video frame interpolation algorithms on datasets

下载: 全尺寸图片幻灯片

表 1 基于深度学习的视频插帧方法对比

Table 1 Comparison of deep-learning based video frame interpolation methods

发表年份	方法	归类	损失函数	训练集	评价指标	训练框架
2017	AdaConv^[1]	核	色彩损失, 梯度损失	Flickr	SSIM, IE	PyTorch
2017	SepConv^[2]	核	重构损失	YouTube	PSNR, SSIM, MAE, RMSE	PyTorch
2018	PhaseNet^[3]	相位	重构损失, 相位损失	DAVIS	SSIM	Tensorflow
	Super SloMo^[4]	光流/后向	重构损失, 感知损失, 翘曲损失, 平滑损失	Adobe240, YouTube240	PSNR, SSIM, IE	PyTorch
	CS-VFI^[5]	光流/前向	重构损失, 感知损失, 色彩损失	YouTube	PSNR, SSIM, IE	PyTorch
2019	IM-Net^[6]	核	重构损失, 翘曲损失, 相似损失	YouTube	PSNR, SSIM, IE	Caffe
	MEMC-Net^[7]	光流/后向	Charbon损失	Vimeo-90K	PSNR, SSIM, IE	PyTorch
	TOFlow^[8]	光流/后向	重构损失	Vimeo-90K	PSNR, SSIM, SSD	PyTorch
	Q-VFI^[9]	光流/后向	重构损失, 感知损失	来自互联网	PSNR, SSIM, IE	PyTorch
	DAIN^[10]	光流/后向	Charbon损失	Vimeo-90K	PSNR, SSIM, IE, NIE	PyTorch
2020	FeFlow^[11]	核	MMG损失, 重建损失	Vimeo-90K	PSNR, SSIM, IE	PyTorch
	DSepConv^[12]	核	Charbon损失, 梯度损失	Vimeo-90K	PSNR, SSIM, IE	PyTorch
	AdaCoF^[13]	核	重构损失, 失真损失, 感知损失	Vimeo-90K	PSNR, SSIM, IE	PyTorch
	CAIN^[14]	生成	重构损失, 感知损失	Vimeo-90K	PSNR, SSIM	PyTorch
	FISR^[15]	生成	时间损失, 重构损失	YouTube	PSNR, SSIM	Tensorflow
	BMBC^[16]	光流/后向	光度损失, 平滑损失	Vimeo-90K	PSNR, SSIM, IE, NIE	PyTorch
	SoftSplat^[17]	光流/前向	色彩损失, 感知损失	Vimeo-90K	PSNR, SSIM, LPIPS	PyTorch
2021	EDSC^[18]	核	Charbon损失, 感知损失	Vimeo-90K	PSNR, SSIM, IE, LPIPS	PyTorch
	CDFI^[19]	核	Charbon损失, 感知损失, 偏移损失	Vimeo-90K	PSNR, SSIM, LPIPS	PyTorch
	XVFI^[20]	光流/后向	重构损失, 平滑损失	X-TRAIN	PSNR, SSIM, tOF, EPE	PyTorch
	ABME^[21]	光流/后向	Charbon损失, Census损失	Vimeo-90K	PSNR, SSIM	PyTorch
2022	VFIT^[22]	生成	重构损失	Vimeo-90K	PSNR, SSIM	PyTorch
	M2M^[23]	光流/前向	Charbon损失, Census损失	Vimeo-90K	PSNR, SSIM	PyTorch
	RIFE^[24]	光流/后向	重构损失, 蒸馏损失	Vimeo-90K	PSNR, SSIM, IE	PyTorch
	IFRNet^[25]	光流/后向	Charbon损失, Census损失, 蒸馏损失, 几何一致性损失	Vimeo-90K	PSNR, SSIM, IE, NIE	PyTorch
	FILM^[26]	光流/后向	重构损失, 感知损失, Gram损失	Vimeo-90K	PSNR, SSIM	Tensorflow
	VFIFormer^[27]	光流/后向	重构损失, Census损失, 蒸馏损失	Vimeo-90K	PSNR, SSIM	PyTorch
2023	FLAVR^[28]	生成	重构损失	GoPro, Vimeo-90K	PSNR, SSIM, TCC	PyTorch
	UPR-Net^[29]	光流/前向	Charbon损失, Census损失	Vimeo-90K	PSNR, SSIM	PyTorch
	AMT^[30]	光流/后向	Charbon损失, Census损失, 光流损失	Vimeo-90K	PSNR, SSIM	PyTorch
	EMA-VFI^[31]	光流/后向	重构损失, 蒸馏损失, 色彩损失, 感知损失	Vimeo-90K	PSNR, SSIM	PyTorch
	BiFormer^[32]	光流/后向	Charbon损失, Census损失	X-TRAIN	PSNR, SSIM	PyTorch
2024	MSEConv^[33]	核	重构损失, 感知损失, 对抗损失	Vimeo-90K	PSNR, SSIM	PyTorch
	LDMVFI^[34]	生成	LDM损失	Vimeo-90K	LPIPS, FloLPIPS, FID	PyTorch
	SwinCS-VFIT^[35]	生成	重构损失	Vimeo-90K	PSNR, SSIM	PyTorch
	VFIMamba^[36]	生成	拉普拉斯损失, 翘曲损失	X-Train, Vimeo-90K	PSNR, SSIM	PyTorch
	PerVFI^[37]	光流/前向	负对数似然损失, 感知损失	Vimeo-90K	PSNR, SSIM, LPIPS, FloLPIPS, VFIPS	PyTorch
	IQ-VFI^[38]	光流/前向	重构损失, 蒸馏损失	Vimeo-90K	PSNR, SSIM	PyTorch
	SGM^[39]	光流/后向	重构损失, 翘曲损失	X-Train, Vimeo-90K	PSNR, SSIM	PyTorch

下载: 导出CSV

表 2 基于深度学习的视频插帧方法性能对比(评价指标: PSNR$ {\uparrow} $/SSIM$ {\uparrow} $/LPIPS$ {\downarrow} $)

Table 2 Performance comparison of deep-learning based video frame interpolation methods (Evaluation metrics: PSNR$ {\uparrow} $/SSIM$ {\uparrow} $/LPIPS$ {\downarrow} $)

发表年份	方法	Vimeo-90K	UCF101	X-TEST	Xiph	DAVIS	SNU-FILM
发表年份	方法	Vimeo-90K	UCF101	X-TEST	Xiph	DAVIS	Easy	Medium	Hard	Extreme
2017	AdaConv^[1]	32.33/0.957/−	—	—	—	—	—	—	—	—
2017	SepConv^[2]	33.45/0.967/0.019	33.02/0.935/0.024	24.34/0.742/−	32.61/0.880/−	26.21/0.857/−	39.68/0.990/−	35.07/0.976/−	29.39/0.926/−	34.32/0.845/−
2018	Super SloMo^[4]	32.90/0.957/−	33.14/0.938/−	—	—	25.76/0.850/−	37.28/0.986/−	33.80/0.973/−	28.98/0.925/−	24.15/0.845/−
2019	MEMC-Net^[7]	34.02/0.970/0.027	34.95/0.968/0.030	—	—	—	—	—	—	—
	IM-Net^[6]	33.50/0.947/−	—	—	—	—	—	—	—	—
	TOFlow^[8]	33.53/0.967/0.027	34.58/0.967/0.027	—	—	—	39.08/0.989/−	34.39/0.974/−	28.44/0.918/−	23.39/0.831/−
	Q-VFI^[9]	35.15/0.971/−	32.54/0.948/−	—	—	27.73/0.894/−	—	—	—	—
	DAIN^[10]	34.71/0.976/0.022	34.99/0.968/0.028	26.78/0.807/−	—	26.12/0.870/−	39.73/0.990/−	35.46/0.978/−	30.17/0.934/−	25.09/0.858/−
2020	FeFlow^[11]	35.28/0.976/−	—	24.00/0.756/−	—	—	—	—	—	—
	DSepConv^[12]	34.73/0.974/0.028	35.08/0.969/0.030	—	—	—	—	—	—	—
	AdaCoF^[13]	35.40/0.971/0.031	35.06/0.974/0.033	24.13/0.734/−	32.72/0.881/−	27.07/0.874/−	39.80/0.990/0.019	35.05/0.975/0.036	29.46/0.924/0.075	24.30/0.844/0.148
	CAIN^[14]	34.65/0.973/−	34.91/0.969/−	24.50/0.752/−	24.50/0.752/−	26.46/0.856/−	39.78/0.990/−	35.49/0.977/−	29.86/0.929/−	24.69/0.850/−
	FISR^[15]	—	—	—	—	—	—	—	—	—
	BMBC^[16]	35.01/0.976/−	32.61/0.955/0.032	22.86/0.727/−	31.27/0.880/−	26.42/0.868/−	39.89/0.990/0.018	35.31/0.977/0.034	29.32/0.927/0.075	23.92/0.843/0.152
	SoftSplat^[17]	35.48/0.964/0.013	35.10/0.948/0.022	25.48/0.725/−	—	27.42/0.878/−	—	—	—	—
2021	EDSC^[18]	34.84/0.975/0.026	35.13/0.968/0.029	—	—	24.54/0.768/0.205	—	—	—	—
	CDFI^[19]	35.17/0.964/0.010	35.21/0.950/0.015	24.49/0.742/−	33.01/0.872/−	—	40.11/0.990/0.013	35.50/0.978/0.024	29.74/0.928/0.056	24.54/0.847/0.121
	XVFI^[20]	35.07/0.968/−	32.65/0.968/0.033	30.12/0.870/−	34.06/0.895/−	—	39.55/0.989/0.020	35.06/0.976/0.037	29.51/0.927/0.075	24.43/0.848/0.143
	ABME^[21]	36.18/0.981/−	32.05/0.967/0.058	30.16/0.879/−	33.81/0.903/−	—	39.69/0.990/0.022	35.28/0.977/0.042	29.64/0.929/0.092	24.54/0.853/0.182
2022	RIFE^[24]	35.61/0.978/0.020	35.28/0.969/−	24.67/0.797/−	—	25.89/0.803/0.134	40.06/0.991/−	35.75/0.979/−	30.10/0.933/−	24.84/0.853/−
	VFIT^[22]	36.96/0.978/−	33.44/0.971/−	—	—	28.09/0.888/−	—	—	—	—
	IFRNet^[25]	36.20/0.981/−	35.42/0.970/0.031	30.46/−/−	—	—	40.10/0.991/0.017	36.12/0.980/0.029	30.63/0.937/0.058	25.27/0.861/0.128
	FILM^[26]	35.87/0.968/−	35.16/0.949/−	—	—	—	—	—	—	—
	M2M^[23]	35.40/0.978/−	35.17/0.970/−	30.81/0.912/−	34.46/0.925/−	—	39.66/0.991/−	35.74/0.980/−	30.32/0.936/−	25.07/0.860/−
	VFIFormer^[27]	36.50/0.982/0.021	35.43/0.970/0.034	24.58/0.805/−	33.69/0.925/−	—	40.13/0.991/0.018	36.09/0.980/0.033	30.67/0.938/0.069	25.43/0.864/0.146
2023	FLAVR^[28]	36.25/0.975/−	33.31/0.971/−	—	—	27.43/0.874/−	—	—	—	—
	AMT^[30]	36.53/0.982/0.021	35.45/0.970/−	—	—	—	39.88/0.991/−	36.12/0.981/−	30.78/0.939/−	25.43/0.865/−
	EMA-VFI^[31]	36.64/0.982/0.026	35.48/0.970/−	31.46/−/−	—	37.61/0.846/0.203	39.98/0.991/−	36.09/0.980/−	30.94/0.939/−	25.69/0.866/−
	BiFormer^[32]	—	—	31.32/0.921/−	34.48/0.927/−	—	—	—	—	—
	UPR-Net^[29]	36.42/0.982/−	35.47/0.970/−	30.50/0.905/−	—	—	40.44/0.991/−	36.29/0.980/−	30.86/0.938/−	25.63/0.864/−
2024	LDMVFI^[34]	—	32.16/0.964/0.026	—	—	—	38.89/0.988/0.013	33.97/0.971/0.027	28.14/0.911/0.068	23.34/0.827/0.139
	SGM^[39]	—	—	29.91/0.897/−	29.25/0.818/−	—	40.15/0.991/−	36.05/0.980/−	28.88/0.922/−	23.62/0.838/−
	PerVFI^[37]	33.89/0.953/0.018	—	—	—	26.23/0.808/0.114	—	—	—	—
	IQ-VFI^[38]	36.60/0.982/−	35.48/0.970/−	—	—	—	40.24/0.991/−	36.24/0.980/−	30.83/0.938/−	25.45/0.863/−
	SwinCS-VFIT^[35]	37.13/0.978/−	33.36/0.971/−	—	—	28.28/0.891/−	—	—	—	—
	VFIMamba^[36]	36.64/0.982/−	35.45/0.970/−	32.15/0.925/−	34.62/0.906/−	—	40.51/0.991/−	36.40/0.981/−	30.99/0.940/−	25.79/0.868/−
	MSEConv^[33]	—	35.10/0.966/−	—	—	—	—	—	—	—

下载: 导出CSV

表 3 基于深度学习的视频插帧技术使用的数据集

Table 3 Datasets used for deep-learning based video frame interpolation technology

数据集	发布年份	视频数目	分辨率(像素)	常用评价指标
Xiph	1994	8	4096 × 2160	PSNR, SSIM, IPIPS
Middlebury^[61]	2011	24	640 × 480	IE
UCF101^[62]	2012	13 320	256 × 256	PSNR, SSIM, IPIPS
DAVIS^[60]	2016	90	4096 × 2160	PSNR, SSIM, IPIPS
GOPRO^[63]	2017	33	1280 × 720	PSNR, SSIM, IE
Adobe240^[64]	2017	71	1280 × 720	PSNR, SSIM, IE
Vimeo-90K^[8]	2019	4 278	448 × 256	PSNR, SSIM, IPIPS
HD^[7]	2019	7	1280 × 720	PSNR
SNU-FILM^[14]	2020	31	1280 × 720	PSNR, SSIM, IPIPS
X4K1000FPS (X-TEST)^[20]	2021	15	4096 × 2160	PSNR, SSIM, IPIPS
SportsSloMo^[65]	2024	8 498	1280 × 720	PSNR, SSIM, IE

下载: 导出CSV

参考文献(66)

[1]	Niklaus S, Mai L, Liu F. Video frame interpolation via adaptive convolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 2270−2279
[2]	Niklaus S, Mai L, Liu F. Video frame interpolation via adaptive separable convolution. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017. 261−270
[3]	Meyer S, Djelouah A, McWilliams B, Sorkine-Hornung A, Gross M, Schroers C. PhaseNet for video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 498−507
[4]	Jiang H Z, Sun D Q, Jampani V, Yang M H, Learned-Miller E, Kautz J. Super SloMo: High quality estimation of multiple intermediate frames for video interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 9000−9008
[5]	Niklaus S, Liu F. Context-aware synthesis for video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 1701−1710
[6]	Peleg T, Szekely P, Sabo D, Sendik O. IM-Net for high resolution video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 2393−2402
[7]	Bao W B, Lai W S, Zhang X Y, Gao Z Y, Yang M H. MEMC-Net: Motion estimation and motion compensation driven neural network for video interpolation and enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(3): 933−948 doi: 10.1109/TPAMI.2019.2941941
[8]	Xue T F, Chen B A, Wu J J, Wei D L, Freeman W T. Video enhancement with task-oriented flow. International Journal of Computer Vision, 2019, 127(8): 1106−1125 doi: 10.1007/s11263-018-01144-2
[9]	Xu X Y, Li S Y, Sun W X, Yin Q, Yang M H. Quadratic video interpolation. In: Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS). Vancouver, Canada: MIT Press, 2019. 1645−1654
[10]	Bao W B, Lai W S, Ma C, Zhang X Y, Gao Z Y, Yang M H. Depth-aware video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 3698−3707
[11]	Gui S R, Wang C Y, Chen Q H, Tao D C. FeatureFlow: Robust video interpolation via structure-to-texture generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 14001−14010
[12]	Cheng X H, Chen Z Z. Video frame interpolation via deformable separable convolution. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 10607−10614
[13]	Lee H, Kim T, Chung T Y, Pak D, Ban Y, Lee S. AdaCoF: Adaptive collaboration of flows for video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 5315−5324
[14]	Choi M, Kim H, Han B, Xu N, Lee K M. Channel attention is all you need for video frame interpolation. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 10663−10671
[15]	Kim S Y, Oh J, Kim M. FISR: Deep joint frame interpolation and super-resolution with a multi-scale temporal loss. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 11278−11286
[16]	Park J, Ko K, Lee C, Kim C S. BMBC: Bilateral motion estimation with bilateral cost volume for video interpolation. In: Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 109−125
[17]	Niklaus S, Liu F. Softmax splatting for video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 5436−5445
[18]	Cheng X H, Chen Z Z. Multiple video frame interpolation via enhanced deformable separable convolution. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(10): 7029−7045
[19]	Ding T Y, Liang L M, Zhu Z H, Zharkov I. CDFI: Compression-driven network design for frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE, 2021. 7997−8007
[20]	Sim H, Oh J, Kim M. XVFI: Extreme video frame interpolation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE, 2021. 14469−14478
[21]	Park J, Lee C, Kim C S. Asymmetric bilateral motion estimation for video frame interpolation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE, 2021. 14519−14528
[22]	Shi Z H, Xu X Y, Liu X H, Chen J, Yang M H. Video frame interpolation transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 17461−17470
[23]	Hu P, Niklaus S, Sclaroff S, Saenko K. Many-to-many splatting for efficient video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 3543−3552
[24]	Huang Z W, Zhang T Y, Heng W, Shi B X, Zhou S C. Real-time intermediate flow estimation for video frame interpolation. In: Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer, 2022. 624−642
[25]	Kong L T, Jiang B Y, Luo D H, Chu W Q, Huang X M, Tai Y, et al. IFRNET: Intermediate feature refine network for efficient frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 1959−1968
[26]	Reda F, Kontkanen J, Tabellion E, Sun D Q, Pantofaru C, Curless B. FILM: Frame interpolation for large motion. In: Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer, 2022. 250−266
[27]	Lu L Y, Wu R Z, Lin H J, Lu J B, Jia J Y. Video frame interpolation with transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 3522−3532
[28]	Kalluri T, Pathak D, Chandraker M, Tran D. FLAVR: Flow-agnostic video representations for fast frame interpolation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa, USA: IEEE, 2023. 2070−2081
[29]	Jin X, Wu L H, Chen J, Chen Y X, Koo J, Hahm C H. A unified pyramid recurrent network for video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada: IEEE, 2023. 1578−1587
[30]	Li Z, Zhu Z L, Han L H, Hou Q B, Guo C L, Cheng M M. AMT: All-pairs multi-field transforms for efficient frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada: IEEE, 2023. 9801−9810
[31]	Zhang G Z, Zhu Y H, Wang H N, Chen Y X, Wu G S, Wang L M. Extracting motion and appearance via inter-frame attention for efficient video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada: IEEE, 2023. 5682−5692
[32]	Park J, Kim J, Kim C S. BiFormer: Learning bilateral motion estimation via bilateral transformer for 4K video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada: IEEE, 2023. 1568−1577
[33]	Ding X L, Huang P, Zhang D Y, Liang W, Li F, Yang G B, et al. MSEConv: A unified warping framework for video frame interpolation. ACM Transactions on Asian and Low-resource Language Information Processing, DOI: 10.1145/3648364
[34]	Danier D, Zhang F, Bull D. LDMVFI: Video frame interpolation with latent diffusion models. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. Vancouver, Canada: AAAI, 2024. 1472−1480
[35]	石昌通, 单鸿涛, 郑光远, 张玉金, 刘怀远, 宗智浩. 改进视觉Transformer的视频插帧方法. 计算机应用研究, 2024, 41(4): 1252−1257 Shi Chang-Tong, Shan Hong-Tao, Zheng Guang-Yuan, Zhang Yu-Jin, Liu Huai-Yuan, Zong Zhi-Hao. Video frame interpolation method based on improved visual Transformer. Application Research of Computers, 2024, 41(4): 1252−1257
[36]	Zhang G Z, Liu C X, Cui Y T, Zhao X T, Ma K, Wang L M. VFIMamba: Video frame interpolation with state space models. In: Proceedings of the 38th Conference on Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2024. 1645−1654
[37]	Wu G Y, Tao X, Li C L, Wang W Y, Liu X H, Zheng Q Q. Perception-oriented video frame interpolation via asymmetric blending. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2024. 2753−2762
[38]	Hu M S, Jiang K, Zhong Z H, Wang Z, Zheng Y Q. IQ-VFI: Implicit quadratic motion estimation for video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2024. 6410−6419
[39]	Liu C X, Zhang G Z, Zhao R, Wang L M. Sparse global matching for video frame interpolation with large motion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2024. 19125−19134
[40]	Parihar A S, Varshney D, Pandya K, Aggarwal A. A comprehensive survey on video frame interpolation techniques. The Visual Computer, 2022, 38(1): 295−319 doi: 10.1007/s00371-020-02016-y
[41]	Dong J, Ota K, Dong M X. Video frame interpolation: A comprehensive survey. ACM Transactions on Multimedia Computing, Communications and Applications, 2023, 19(2s): Article No. 78
[42]	Meyer S, Wang O, Zimmer H, Grosse M, Sorkine-Hornung A. Phase-based frame interpolation for video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015. 1410−1418
[43]	Prashnani E, Noorkami M, Vaquero D, Sen P. A phase-based approach for animating images using video examples. Computer Graphics Forum, 2017, 36(6): 303−311 doi: 10.1111/cgf.12940
[44]	Wadhwa N, Rubinstein M, Durand F, Freeman W T. Phase-based video motion processing. ACM Transactions on Graphics (TOG), 2013, 32(4): Article No. 80
[45]	Tran D, Bourdev L, Fergus R, Torresani L, Paluri M. Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 4489−4497
[46]	林传健, 邓炜, 童同, 高钦泉. 基于深度体素流的模糊视频插帧方法. 计算机应用, 2020, 40(3): 819−824 doi: 10.11772/j.issn.1001-9081.2019081474 Lin Chuan-Jian, Deng Wei, Tong Tong, Gao Qin-Quan. Blurred video frame interpolation method based on deep voxel flow. Journal of Computer Applications, 2020, 40(3): 819−824 doi: 10.11772/j.issn.1001-9081.2019081474
[47]	Cho H, Kim T, Jeong Y, Yoon K J. TTA-EVF: Test-time adaptation for event-based video frame interpolation via reliable pixel and sample estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2024. 25701−25711
[48]	Dosovitskiy A, Fischer P, Ilg E, Häusser P, Hazirbas C, Golkov V, et al. FlowNet: Learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 2758−2766
[49]	Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T. Flownet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 2462−2470
[50]	Ranjan A, Black M J. Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 2720−2729
[51]	Sun D Q, Yang X D, Liu M Y, Kautz J. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 8934−8943
[52]	Hui T W, Tang X O, Loy C C. LiteFlowNet: A lightweight convolutional neural network for optical flow estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 8981−8989
[53]	Yang G S, Ramanan D. Volumetric correspondence networks for optical flow. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc., 2019. 794−805
[54]	Teed Z, Deng J. RAFT: Recurrent all-pairs field transforms for optical flow. In: Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 402−419
[55]	Zhao S Y, Zhao L, Zhang Z X, Zhou E Y, Metaxas D. Global matching with overlapping attention for optical flow estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 17592−17601
[56]	张倩, 姜峰. 基于深度学习的视频插帧算法. 智能计算机与应用, 2019, 9(4): 252−257 doi: 10.3969/j.issn.2095-2163.2019.04.058 Zhang Qian, Jiang Feng. Video interpolation based on deep learing. Intelligent Computer and Applications, 2019, 9(4): 252−257 doi: 10.3969/j.issn.2095-2163.2019.04.058
[57]	马境远, 王川铭. 一种多尺度光流预测与融合的实时视频插帧方法. 小型微型计算机系统, 2021, 42(12): 2567−2571 Ma Jing-Yuan, Wang Chuan-Ming. Real-time video frame interpolation based on multi-scale optical prediction and fusion. Journal of Chinese Computer Systems, 2021, 42(12): 2567−2571
[58]	杨华, 王姣, 张维君, 吴杰宏, 高利军. 基于光流估计的轻量级视频插帧算法. 沈阳航空航天大学学报, 2022, 39(6): 57−64 doi: 10.3969/j.issn.2095-1248.2022.06.008 Yang Hua, Wang Jiao, Zhang Wei-Jun, Wu Jie-Hong, Gao Li-Jun. Lightweight video frame interpolation algorithm based on optical flow estimation. Journal of Shenyang Aerospace ace University, 2022, 39(6): 57−64 doi: 10.3969/j.issn.2095-1248.2022.06.008
[59]	Ding C, Lin M Y, Zhang H J, Liu J Z, Yu L. Video frame interpolation with stereo event and intensity cameras. IEEE Transactions on Multimedia, 2024, 26: 9187−9202 doi: 10.1109/TMM.2024.3387690
[60]	Perazzi F, Pont-Tuset J, McWilliams B, van Gool L, Gross M, Sorkine-Hornung A. A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 724−732
[61]	Baker S, Scharstein D, Lewis J P, Roth S, Black M J, Szeliski R. A database and evaluation methodology for optical flow. International Journal of Computer Vision, 2011, 92(1): 1−31 doi: 10.1007/s11263-010-0390-2
[62]	Soomro K, Zamir A R, Shah M. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv: 1212.0402, 2012.
[63]	Nah S, Kim T H, Lee K M. Deep multi-scale convolutional neural network for dynamic scene deblurring. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 257−265
[64]	Su S C, Delbracio M, Wang J, Sapiro G, Heidrich W, Wang O. Deep video deblurring for hand-held cameras. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 237−246
[65]	Chen J B, Jiang H Z. SportsSloMo: A new benchmark and baselines for human-centric video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2024. 6475−6486
[66]	Kiefhaber S, Niklaus S, Liu F, Schaub-Meyer S. Benchmarking video frame interpolation. arXiv preprint arXiv: 2403.17128, 2024.