-
摘要: 视频插帧技术是视频处理领域的研究热点问题. 它通过生成中间帧来提高视频的帧率, 从而使视频播放更加流畅, 在老视频修复、电影后期制作和慢动作生成等领域发挥着重要的作用. 随着深度学习技术的迅猛发展, 基于深度学习的视频插帧技术已经成为主流. 本文全面综述现有的基于深度学习的视频插帧工作, 并且深入分析这些方法的优点与不足. 随后, 详细介绍视频插帧领域的常用数据集, 这些数据集为视频插帧相关研究和算法训练提供重要支撑. 最后, 对当前视频插帧研究中仍然存在的挑战进行深入思考, 并且从多个角度展望未来的研究方向, 旨在为该领域后续的发展提供参考.Abstract: Video frame interpolation technology has become a hot research topic in the field of video processing. It improves the frame rate of videos by generating intermediate frames, thereby making video playback smoother. It plays a crucial role in various fields such as old video restoration, film post-production, and slow-motion generation. With the rapid development of deep learning technology, video frame interpolation technology based on deep learning has become mainstream. This paper comprehensively reviews the existing deep-learning based video frame interpolation works and deeply analyzes the advantages and disadvantages of these methods. Subsequently, this paper elaborately introduces the commonly used datasets in the field of video frame interpolation. These datasets provide important support for video frame interpolation-related research and algorithm training. Finally, the paper deeply contemplates the challenges existing in current video frame interpolation research and looks ahead to future research directions from multiple perspectives, aiming to provide references for the subsequent development of this field.
-
表 1 基于深度学习的视频插帧方法对比
Table 1 Comparison of deep-learning based video frame interpolation methods
发表年份 方法 归类 损失函数 训练集 评价指标 训练框架 2017 AdaConv[1] 核 色彩损失, 梯度损失 Flickr SSIM, IE PyTorch SepConv[2] 核 重构损失 YouTube PSNR, SSIM, MAE, RMSE PyTorch 2018 PhaseNet[3] 相位 重构损失, 相位损失 DAVIS SSIM Tensorflow Super SloMo[4] 光流/后向 重构损失, 感知损失,
翘曲损失, 平滑损失Adobe240, YouTube240 PSNR, SSIM, IE PyTorch CS-VFI[5] 光流/前向 重构损失, 感知损失, 色彩损失 YouTube PSNR, SSIM, IE PyTorch 2019 IM-Net[6] 核 重构损失, 翘曲损失, 相似损失 YouTube PSNR, SSIM, IE Caffe MEMC-Net[7] 光流/后向 Charbon损失 Vimeo-90K PSNR, SSIM, IE PyTorch TOFlow[8] 光流/后向 重构损失 Vimeo-90K PSNR, SSIM, SSD PyTorch Q-VFI[9] 光流/后向 重构损失, 感知损失 来自互联网 PSNR, SSIM, IE PyTorch DAIN[10] 光流/后向 Charbon损失 Vimeo-90K PSNR, SSIM, IE, NIE PyTorch 2020 FeFlow[11] 核 MMG损失, 重建损失 Vimeo-90K PSNR, SSIM, IE PyTorch DSepConv[12] 核 Charbon损失, 梯度损失 Vimeo-90K PSNR, SSIM, IE PyTorch AdaCoF[13] 核 重构损失, 失真损失, 感知损失 Vimeo-90K PSNR, SSIM, IE PyTorch CAIN[14] 生成 重构损失, 感知损失 Vimeo-90K PSNR, SSIM PyTorch FISR[15] 生成 时间损失, 重构损失 YouTube PSNR, SSIM Tensorflow BMBC[16] 光流/后向 光度损失, 平滑损失 Vimeo-90K PSNR, SSIM, IE, NIE PyTorch SoftSplat[17] 光流/前向 色彩损失, 感知损失 Vimeo-90K PSNR, SSIM, LPIPS PyTorch 2021 EDSC[18] 核 Charbon损失, 感知损失 Vimeo-90K PSNR, SSIM, IE, LPIPS PyTorch CDFI[19] 核 Charbon损失, 感知损失, 偏移损失 Vimeo-90K PSNR, SSIM, LPIPS PyTorch XVFI[20] 光流/后向 重构损失, 平滑损失 X-TRAIN PSNR, SSIM, tOF, EPE PyTorch ABME[21] 光流/后向 Charbon损失, Census损失 Vimeo-90K PSNR, SSIM PyTorch 2022 VFIT[22] 生成 重构损失 Vimeo-90K PSNR, SSIM PyTorch M2M[23] 光流/前向 Charbon损失, Census损失 Vimeo-90K PSNR, SSIM PyTorch RIFE[24] 光流/后向 重构损失, 蒸馏损失 Vimeo-90K PSNR, SSIM, IE PyTorch IFRNet[25] 光流/后向 Charbon损失, Census损失,
蒸馏损失, 几何一致性损失Vimeo-90K PSNR, SSIM, IE, NIE PyTorch FILM[26] 光流/后向 重构损失, 感知损失, Gram损失 Vimeo-90K PSNR, SSIM Tensorflow VFIFormer[27] 光流/后向 重构损失, Census损失, 蒸馏损失 Vimeo-90K PSNR, SSIM PyTorch 2023 FLAVR[28] 生成 重构损失 GoPro, Vimeo-90K PSNR, SSIM, TCC PyTorch UPR-Net[29] 光流/前向 Charbon损失, Census损失 Vimeo-90K PSNR, SSIM PyTorch AMT[30] 光流/后向 Charbon损失, Census损失, 光流损失 Vimeo-90K PSNR, SSIM PyTorch EMA-VFI[31] 光流/后向 重构损失, 蒸馏损失,
色彩损失, 感知损失Vimeo-90K PSNR, SSIM PyTorch BiFormer[32] 光流/后向 Charbon损失, Census损失 X-TRAIN PSNR, SSIM PyTorch 2024 MSEConv[33] 核 重构损失, 感知损失, 对抗损失 Vimeo-90K PSNR, SSIM PyTorch LDMVFI[34] 生成 LDM损失 Vimeo-90K LPIPS, FloLPIPS, FID PyTorch SwinCS-VFIT[35] 生成 重构损失 Vimeo-90K PSNR, SSIM PyTorch VFIMamba[36] 生成 拉普拉斯损失, 翘曲损失 X-Train, Vimeo-90K PSNR, SSIM PyTorch PerVFI[37] 光流/前向 负对数似然损失, 感知损失 Vimeo-90K PSNR, SSIM, LPIPS,
FloLPIPS, VFIPSPyTorch IQ-VFI[38] 光流/前向 重构损失, 蒸馏损失 Vimeo-90K PSNR, SSIM PyTorch SGM[39] 光流/后向 重构损失, 翘曲损失 X-Train, Vimeo-90K PSNR, SSIM PyTorch 表 2 基于深度学习的视频插帧方法性能对比(评价指标: PSNR$ {\uparrow} $/SSIM$ {\uparrow} $/LPIPS$ {\downarrow} $)
Table 2 Performance comparison of deep-learning based video frame interpolation methods(Evaluation metrics: PSNR$ {\uparrow} $/SSIM$ {\uparrow} $/LPIPS$ {\downarrow} $)
发表年份 方法 Vimeo-90K UCF101 X-Test Xiph DAVIS SNU-FILM Easy Medium Hard Extreme 2017 AdaConv[1] 32.33/0.957/− — — — — — — — — SepConv[2] 33.45/0.967/0.019 33.02/0.935/0.024 24.34/0.742/− 32.61/0.880/− 26.21/0.857/− 39.68/0.990/− 35.07/0.976/− 29.39/0.926/− 34.32/0.845/− 2018 Super SloMo[4] 32.90/0.957/− 33.14/0.938/− — — 25.76/0.850/− 37.28/0.986/− 33.80/0.973/− 28.98/0.925/− 24.15/0.845/− 2019 MEMC-Net[7] 34.02/0.970/0.027 34.95/0.968/0.030 — — — — — — — IM-Net[6] 33.50/0.947/− — — — — — — — — TOFlow[8] 33.53/0.967/0.027 34.58/0.967/0.027 — — — 39.08/0.989/− 34.39/0.974/− 28.44/0.918/− 23.39/0.831/− Q-VFI[9] 35.15/0.971/− 32.54/0.948/− — — 27.73/0.894/− — — — — DAIN[10] 34.71/0.976/0.022 34.99/0.968/0.028 26.78/0.807/− — 26.12/0.870/− 39.73/0.990/− 35.46/0.978/− 30.17/0.934/− 25.09/0.858/− 2020 FeFlow[11] 35.28/0.976/− — 24.00/0.756/− — — — — — — DSepConv[12] 34.73/0.974/0.028 35.08/0.969/0.030 — — — — — — — AdaCoF[13] 35.40/0.971/0.031 35.06/0.974/0.033 24.13/0.734/− 32.72/0.881/− 27.07/0.874/− 39.80/0.990/0.019 35.05/0.975/0.036 29.46/0.924/0.075 24.30/0.844/0.148 CAIN[14] 34.65/0.973/− 34.91/0.969/− 24.50/0.752/− 24.50/0.752/− 26.46/0.856/− 39.78/0.990/− 35.49/0.977/− 29.86/0.929/− 24.69/0.850/− FISR[15] — — — — — — — — — BMBC[16] 35.01/0.976/− 32.61/0.955/0.032 22.86/0.727/− 31.27/0.880/− 26.42/0.868/− 39.89/0.990/0.018 35.31/0.977/0.034 29.32/0.927/0.075 23.92/0.843/0.152 SoftSplat[17] 35.48/0.964/0.013 35.10/0.948/0.022 25.48/0.725/− — 27.42/0.878/− — — — — 2021 EDSC[18] 34.84/0.975/0.026 35.13/0.968/0.029 — — 24.54/0.768/0.205 — — — — CDFI[19] 35.17/0.964/0.010 35.21/0.950/0.015 24.49/0.742/− 33.01/0.872/− — 40.11/0.990/0.013 35.50/0.978/0.024 29.74/0.928/0.056 24.54/0.847/0.121 XVFI[20] 35.07/0.968/− 32.65/0.968/0.033 30.12/0.870/− 34.06/0.895/− — 39.55/0.989/0.020 35.06/0.976/0.037 29.51/0.927/0.075 24.43/0.848/0.143 ABME[21] 36.18/0.981/− 32.05/0.967/0.058 30.16/0.879/− 33.81/0.903/− — 39.69/0.990/0.022 35.28/0.977/0.042 29.64/0.929/0.092 24.54/0.853/0.182 2022 RIFE[24] 35.61/0.978/0.020 35.28/0.969/− 24.67/0.797/− — 25.89/0.803/0.134 40.06/0.991/− 35.75/0.979/− 30.10/0.933/− 24.84/0.853/− VFIT[22] 36.96/0.978/− 33.44/0.971/− — — 28.09/0.888/− — — — — IFRNet[25] 36.20/0.981/− 35.42/0.970/0.031 30.46/−/− — — 40.10/0.991/0.017 36.12/0.980/0.029 30.63/0.937/0.058 25.27/0.861/0.128 FILM[26] 35.87/0.968/− 35.16/0.949/− — — — — — — — M2M[23] 35.40/0.978/− 35.17/0.970/− 30.81/0.912/− 34.46/0.925/− — 39.66/0.991/− 35.74/0.980/− 30.32/0.936/− 25.07/0.860/− VFIFormer[27] 36.50/0.982/0.021 35.43/0.970/0.034 24.58/0.805/− 33.69/0.925/− — 40.13/0.991/0.018 36.09/0.980/0.033 30.67/0.938/0.069 25.43/0.864/0.146 2023 FLAVR[28] 36.25/0.975/− 33.31/0.971/− — — 27.43/0.874/− — — — — AMT[30] 36.53/0.982/0.021 35.45/0.970/− — — — 39.88/0.991/− 36.12/0.981/− 30.78/0.939/− 25.43/0.865/− EMA-VFI[31] 36.64/0.982/0.026 35.48/0.970/− 31.46/−/− — 37.61/0.846/0.203 39.98/0.991/− 36.09/0.980/− 30.94/0.939/− 25.69/0.866/− BiFormer[32] — — 31.32/0.921/− 34.48/0.927/− — — — — — UPR-Net[29] 36.42/0.982/− 35.47/0.970/− 30.50/0.905/− — — 40.44/0.991/− 36.29/0.980/− 30.86/0.938/− 25.63/0.864/− 2024 LDMVFI[34] — 32.16/0.964/0.026 — — — 38.89/0.988/0.013 33.97/0.971/0.027 28.14/0.911/0.068 23.34/0.827/0.139 SGM[39] — — 29.91/0.897/− 29.25/0.818/− — 40.15/0.991/− 36.05/0.980/− 28.88/0.922/− 23.62/0.838/− PerVFI[37] 33.89/0.953/0.018 — — — 26.23/0.808/0.114 — — — — IQ-VFI[38] 36.60/0.982/− 35.48/0.970/− — — — 40.24/0.991/− 36.24/0.980/− 30.83/0.938/− 25.45/0.863/− SwinCS-VFIT[35] 37.13/0.978/− 33.36/0.971/− — — 28.28/0.891 — — — — VFIMamba[36] 36.64/0.982/− 35.45/0.970/− 32.15/0.925/− 34.62/0.906 — 40.51/0.991/− 36.40/0.981/− 30.99/0.940/− 25.79/0.868/− MSEConv[33] — 35.10/0.966/− — — — — — — — 表 3 基于深度学习的视频插帧技术使用的数据集
Table 3 Datasets used for deep learning based video frame interpolation
数据集 发布年份 视频数目 分辨率 常用评价指标 Xiph 1994 8 4096 $ {\times} $2160 PSNR, SSIM, IPIPS Middlebury[61] 2011 24 640 $ {\times} $ 480 IE UCF101[64] 2012 13 320 256 $ {\times} $ 256 PSNR, SSIM, IPIPS DAVIS[59] 2017 90 4096 $ {\times} $2160 PSNR, SSIM, IPIPS GOPRO[62] 2017 33 1280 $ {\times} $ 720PSNR, SSIM, IE Adobe240[63] 2017 71 1280 $ {\times} $ 720PSNR, SSIM, IE Vimeo-90K[8] 2019 4 278 448 $ {\times} $ 256 PSNR, SSIM, IPIPS HD[7] 2019 7 1280 $ {\times} $ 720PSNR SNU-FILM[14] 2020 31 1280 $ {\times} $ 720PSNR, SSIM, IPIPS X4K1000FPS (Test)[20] 2021 15 4096 $ {\times} $2160 PSNR, SSIM, IPIPS SportsSloMo[65] 2024 8 498 1280 $ {\times} $ 720PSNR, SSIM, IE -
[1] Niklaus S, Mai L, Liu F. Video frame interpolation via adaptive convolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 2270−2279 [2] Niklaus S, Mai L, Liu F. Video frame interpolation via adaptive separable convolution. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017. 261−270 [3] Meyer S, Djelouah A, McWilliams B, Sorkine-Hornung A, Gross M, Schroers C. PhaseNet for video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 498−507 [4] Jiang H Z, Sun D Q, Jampani V, Yang M H, Learned-Miller E, Kautz J. Super SloMo: High quality estimation of multiple intermediate frames for video interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 9000−9008 [5] Niklaus S, Liu F. Context-aware synthesis for video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 1701−1710 [6] Peleg T, Szekely P, Sabo D, Sendik O. IM-Net for high resolution video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 2393−2402 [7] Bao W B, Lai W S, Zhang X Y, Gao Z Y, Yang M H. MEMC-Net: Motion estimation and motion compensation driven neural network for video interpolation and enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(3): 933−948 doi: 10.1109/TPAMI.2019.2941941 [8] Xue T F, Chen B A, Wu J J, Wei D L, Freeman W T. Video enhancement with task-oriented flow. International Journal of Computer Vision, 2019, 127(8): 1106−1125 doi: 10.1007/s11263-018-01144-2 [9] Xu X Y, Li S Y, Sun W X, Yin Q, Yang M H. Quadratic video interpolation. In: Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS). Vancouver, Canada: MIT Press, 2019. 1645−1654 [10] Bao W B, Lai W S, Ma C, Zhang X Y, Gao Z Y, Yang M H. Depth-aware video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 3698−3707 [11] Gui S R, Wang C Y, Chen Q H, Tao D C. FeatureFlow: Robust video interpolation via structure-to-texture generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 14001−14010 [12] Cheng X H, Chen Z Z. Video frame interpolation via deformable separable convolution. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 10607−10614 [13] Lee H, Kim T, Chung T Y, Pak D, Ban Y, Lee S. AdaCoF: Adaptive collaboration of flows for video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE 2020. 5315−5324 [14] Choi M, Kim H, Han B, Xu N, Lee K M. Channel attention is all you need for video frame interpolation. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 10663−10671 [15] Kim S Y, Oh J, Kim M. FISR: Deep joint frame interpolation and super-resolution with a multi-scale temporal loss. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 11278−11286 [16] Park J, Ko K, Lee C, Kim C S. BMBC: Bilateral motion estimation with bilateral cost volume for video interpolation. In: Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 109−125 [17] Niklaus S, Liu F. Softmax splatting for video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 5436−5445 [18] Cheng X H, Chen Z Z. Multiple video frame interpolation via enhanced deformable separable convolution. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(10): 7029−7045 [19] Ding T Y, Liang L M, Zhu Z H, Zharkov I. CDFI: Compression-driven network design for frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE, 2021. 7997−8007 [20] Sim H, Oh J, Kim M. XVFI: Extreme video frame interpolation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE 2021. 14469−14478 [21] Park J, Lee C, Kim C S. Asymmetric bilateral motion estimation for video frame interpolation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, Canada: IEEE, 2021. 14519−14528 [22] Shi Z H, Xu X Y, Liu X H, Chen J, Yang M H. Video frame interpolation transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 17461−17470 [23] Hu P, Niklaus S, Sclaroff S, Saenko K. Many-to-many splatting for efficient video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 3543−3552 [24] Huang Z W, Zhang T Y, Heng W, Shi B X, Zhou S C. Real-time intermediate flow estimation for video frame interpolation. In: Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer, 2022. 624−642 [25] Kong L T, Jiang B Y, Luo D H, Chu W Q, Huang X M, Tai Y, et al. IFRNET: Intermediate feature refine network for efficient frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 1959−1968 [26] Reda F, Kontkanen J, Tabellion E, Sun D Q, Pantofaru C, Curless B. FILM: Frame interpolation for large motion. In: Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer, 2022. 250−266 [27] Lu L Y, Wu R Z, Lin H J, Lu J B, Jia J Y. Video frame interpolation with transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 3522−3532 [28] Kalluri T, Pathak D, Chandraker M, Tran D. FLAVR: Flow-agnostic video representations for fast frame interpolation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa, USA: IEEE, 2023. 2070−2081 [29] Jin X, Wu L H, Chen J, Chen Y X, Koo J, Hahm C H. A unified pyramid recurrent network for video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada: IEEE, 2023. 1578−1587 [30] Li Z, Zhu Z L, Han L H, Hou Q B, Guo C L, Cheng M M. AMT: All-pairs multi-field transforms for efficient frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada: IEEE, 2023. 9801−9810 [31] Zhang G Z, Zhu Y H, Wang H N, Chen Y X, Wu G S, Wang L M. Extracting motion and appearance via inter-frame attention for efficient video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada: IEEE, 2023. 5682−5692 [32] Park J, Kim J, Kim C S. BiFormer: Learning bilateral motion estimation via bilateral transformer for 4K video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, Canada: IEEE, 2023. 1568−1577 [33] Ding X L, Huang P, Zhang D Y, Liang W, Li F, Yang G B, et al. MSEConv: A unified warping framework for video frame interpolation. ACM Transactions on Asian and Low-Resource Language Information Processing, to be published, DOI: 10.1145/3648364 [34] Danier D, Zhang F, Bull D. LDMVFI: Video frame interpolation with latent diffusion models. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. Vancouver, Canada: AAAI, 2024. 1472−-1480 [35] 石昌通, 单鸿涛, 郑光远, 张玉金, 刘怀远, 宗智浩. 改进视觉Transformer的视频插帧方法. 计算机应用研究, 2024, 41(4): 1252−1257Shi Chang-Tong, Shan Hong-Tao, Zheng Guang-Yuan, Zhang Yu-Jin, Liu Huai-Yuan, Zong Zhi-Hao. Video frame interpolation method based on improved Visual Transformer. Application Research of Computers, 2024, 41(4): 1252−1257 [36] Zhang G Z, Liu C X, Cui Y T, Zhao X T, Ma K, Wang L M. VFIMamba: Video frame interpolation with state space models. In: Proceedings of the 38th Conference on Neural Information Processing Systems. Vancouver, Canada: NeurIPS, 2024. [37] Wu G Y, Tao X, Li C L, Wang W Y, Liu X H, Zheng Q Q. Perception-oriented video frame interpolation via asymmetric blending. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2024. 2753−2762 [38] Hu M S, Jiang K, Zhong Z H, Wang Z, Zheng Y Q. IQ-VFI: Implicit quadratic motion estimation for video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2024. 6410−6419 [39] Liu C X, Zhang G Z, Zhao R, Wang L M. Sparse global matching for video frame interpolation with large motion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2024. 19125−19134 [40] Parihar A S, Varshney D, Pandya K, Aggarwal A. A comprehensive survey on video frame interpolation techniques. The Visual Computer, 2022, 38(1): 295−319 doi: 10.1007/s00371-020-02016-y [41] Dong J, Ota K, Dong M X. Video frame interpolation: A comprehensive survey. ACM Transactions on Multimedia Computing, Communications and Applications, 2023, 19(2s): Article No. 78 [42] Meyer S, Wang O, Zimmer H, Grosse M, Sorkine-Hornung A. Phase-based frame interpolation for video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015. 1410−1418 [43] Prashnani E, Noorkami M, Vaquero D, Sen P. A phase-based approach for animating images using video examples. Computer Graphics Forum, 2017, 36(6): 303−311 doi: 10.1111/cgf.12940 [44] Wadhwa N, Rubinstein M, Durand F, Freeman W T. Phase-based video motion processing. ACM Transactions on Graphics (TOG), 2013, 32(4): Article No. 80 [45] Tran D, Bourdev L, Fergus R, Torresani L, Paluri M. Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 4489−4497 [46] 林传健, 邓炜, 童同, 高钦泉. 基于深度体素流的模糊视频插帧方法. 计算机应用, 2020, 40(3): 819−824 doi: 10.11772/j.issn.1001-9081.2019081474Lin Chuan-Jian, Deng Wei, Tong Tong, Gao Qin-Quan. Blurred video frame interpolation method based on deep voxel flow. Journal of Computer Applications, 2020, 40(3): 819−824 doi: 10.11772/j.issn.1001-9081.2019081474 [47] Cho H, Kim T, Jeong Y, Yoon K J. TTA-EVF: Test-time adaptation for event-based video frame interpolation via reliable pixel and sample estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2024. 25701−25711 [48] Dosovitskiy A, Fischer P, Ilg E, Häusser P, Hazirbas C, Golkov V, et al. FlowNet: Learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 2758−2766 [49] Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T. Flownet 2.0: Evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 2462−2470 [50] Ranjan A, Black M J. Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 2720−2729 [51] Sun D Q, Yang X D, Liu M Y, Kautz J. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 8934−8943 [52] Hui T W, Tang X O, Loy C C. LiteFlowNet: A lightweight convolutional neural network for optical flow estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 8981−8989 [53] Yang G S, Ramanan D. Volumetric correspondence networks for optical flow. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc., 2019. Article No. 72 [54] Teed Z, Deng J. RAFT: Recurrent all-pairs field transforms for optical flow. In: Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 402−419 [55] Zhao S Y, Zhao L, Zhang Z X, Zhou E Y, Metaxas D. Global matching with overlapping attention for optical flow estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE, 2022. 17592−17601 [56] 张倩, 姜峰. 基于深度学习的视频插帧算法. 智能计算机与应用, 2019, 9(4): 252−257 doi: 10.3969/j.issn.2095-2163.2019.04.058Zhang Qian, Jiang Feng. Video interpolation based on deep learing. Intelligent Computer and Applications, 2019, 9(4): 252−257 doi: 10.3969/j.issn.2095-2163.2019.04.058 [57] 马境远, 王川铭. 一种多尺度光流预测与融合的实时视频插帧方法. 小型微型计算机系统, 2021, 42(12): 2567−2571Ma Jing-Yuan, Wang Chuan-Ming. Real-time video frame interpolation based on multi-scale optical prediction and fusion. Journal of Chinese Computer Systems, 2021, 42(12): 2567−2571 [58] 杨华, 王姣, 张维君, 吴杰宏, 高利军. 基于光流估计的轻量级视频插帧算法. 沈阳航空航天大学学报, 2022, 39(6): 57−64 doi: 10.3969/j.issn.2095-1248.2022.06.008Yang Hua, Wang Jiao, Zhang Wei-Jun, Wu Jie-Hong, Gao Li-Jun. Lightweight video frame interpolation algorithm based on optical flow estimation. Journal of Shenyang Aerospace ace University, 2022, 39(6): 57−64 doi: 10.3969/j.issn.2095-1248.2022.06.008 [59] Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M, Sorkine-Hornung A. A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 724−732 [60] Ding C, Lin M Y, Zhang H J, Liu J Z, Yu L. Video frame interpolation with stereo event and intensity cameras. IEEE Transactions on Multimedia, 2024, 26: 9187−9202 doi: 10.1109/TMM.2024.3387690 [61] Baker S, Scharstein D, Lewis J P, Roth S, Black M J, Szeliski R. A database and evaluation methodology for optical flow. International Journal of Computer Vision, 2011, 92(1): 1−31 doi: 10.1007/s11263-010-0390-2 [62] Nah S, Kim T H, Lee K M. Deep multi-scale convolutional neural network for dynamic scene deblurring. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 257−265 [63] Su S C, Delbracio M, Wang J, Sapiro G, Heidrich W, Wang O. Deep video deblurring for hand-held cameras. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 237−246 [64] Soomro K, Zamir A R, Shah M. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv: 1212.0402, 2012. [65] Chen J B, Jiang H Z. Sportsslomo: A new benchmark and baselines for human-centric video frame interpolation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE, 2024. 6475−6486 [66] Kiefhaber S, Niklaus S, Liu F, Schaub-Meyer S. Benchmarking video frame interpolation. arXiv preprint arXiv: 2403.17128, 2024. -
计量
- 文章访问数: 39
- HTML全文浏览量: 31
- 被引次数: 0