基于中心点搜索的无锚框全卷积孪生跟踪器

谭建豪; 郑英帅; 王耀南; 马小萍

doi:10.16383/j.aas.c200469

基于中心点搜索的无锚框全卷积孪生跟踪器

doi: 10.16383/j.aas.c200469

谭建豪^{1, 2,},
郑英帅^{1, 2,},
王耀南^{1, 2,},
马小萍^{1, 2,}

1.
湖南大学电气与信息工程学院长沙 410082
2.
机器人视觉感知与控制技术国家工程实验室长沙 410082

基金项目: 国家自然科学基金(61433016)资助

详细信息

作者简介:
谭建豪：湖南大学电气与信息工程学院教授. 主要研究方向为智能机器人, 数据挖掘和模式识别. E-mail: tanjianhao96@sina.com

郑英帅：湖南大学电气与信息工程学院硕士研究生. 主要研究方向为计算机视觉, 机器学习. 本文通信作者. E-mail: zheng_ys415@163.com

王耀南：湖南大学电气与信息工程学院教授. 主要研究方向为智能控制理论, 机器人系统和计算机视觉. E-mail: yaonan@hnu.edu.cn

马小萍：湖南大学电气与信息工程学院硕士研究生. 主要研究方向为机器视觉, 无人机控制技术. E-mail: maxiaoping@hnu.edu.cn

计量
- 文章访问数: 1113
- HTML全文浏览量: 502
- PDF下载量: 234
- 被引次数: 0
出版历程
- 收稿日期: 2020-06-28
- 录用日期: 2020-11-18
- 网络出版日期: 2021-01-14
- 刊出日期: 2021-04-23

AFST: Anchor-free Fully Convolutional Siamese Tracker With Searching Center Point

TAN Jian-Hao^{1, 2
,},
ZHENG Ying-Shuai^{1, 2
,},
WANG Yao-Nan^{1, 2
,},
MA Xiao-Ping^{1, 2
,}

1.
School of Electrical and Information Engineering, Hunan University, Changsha 410082
2.
National Engineering Laboratory for Robot Visual Perception and Control Technology, Changsha 410082

Funds: Supported by National Natural Science Foundation of China (61433016)

More Information

Author Bio:
TAN Jian-Hao　Professor at the School of Electrical and Information Engineering, Hunan University. His research interest covers intelligent robots, data mining, and pattern recognition

ZHENG Ying-Shuai　Master student at the School of Electrical and Information Engineering, Hunan University. His research interest covers computer vision and machine learning. Corresponding author of this paper

WANG Yao-Nan　Professor at the School of Electrical and Information Engineering, Hunan University. His research interest covers intelligent control theory, robot systems, and computer vision

MA Xiao-Ping　Master student at the School of Electrical and Information Engineering, Hunan University. Her research interest covers machine vision and UAV control technology

摘要

摘要: 为解决孪生网络跟踪器鲁棒性差的问题, 重新设计了孪生网络跟踪器的分类与回归分支, 提出一种基于像素上直接预测方式的高鲁棒性跟踪算法—无锚框全卷积孪生跟踪器(Anchor-free fully convolutional siamese tracker, AFST). 目前高性能的跟踪算法, 如SiamRPN、SiamRPN++、CRPN都是基于预定义的锚框进行分类和目标框回归. 与之相反, 提出的AFST则是直接在每个像素上进行分类和预测目标框. 通过去掉锚框, 大大简化了分类任务和回归任务的复杂程度, 并消除了锚框和目标误匹配问题. 在训练中, 还进一步添加了同类不同实例的图像对, 从而引入了相似语义干扰物, 使得网络的训练更加充分. 在VOT2016、GOT-10k、OTB2015三个公开的基准数据集上的实验表明, 与现有的跟踪算法对比, AFST达到了先进的性能.
- 孪生跟踪器 /
- 像素预测 /
- 相似语义干扰物 /
- 无锚框 /
- 中心得分
Abstract: In order to solve the problem of poor robustness of siamese trackers, this paper redesigns the classification and regression branches, and proposes a high robustness siamese tracker AFST (Anchor-free fully convolutional siamese tracker) based on direct prediction on pixels. Current high-performance object tracker, such as SiamRPN, SiamRPN++, CRPN, are based on predefined anchor boxes for classification and regression. On the contrary, the proposed AFST is to directly classify and predict the target box on each pixel. By removing the anchor, this paper greatly simplifies the complexity of classification task and regression task, and eliminates the problem of mismatching between anchor and target. In the training, we have further added image pairs of different instances of the same kind, thereby introducing similar semantic interferers, making the network training more adequate. Experiments on three open benchmarks datasets, VOT2016, GOT-10k and OTB2015, show that AFST achieves advanced performance compared with existing tracking algorithms.
- Siamese tracker /
- prediction on pixels /
- similar semantic interferers /
- anchor-free /
- center score

HTML全文

图 1 AFST网络流程框架图

Fig. 1 AFST network flow diagram

下载: 全尺寸图片幻灯片

图 2 多级融合模块

Fig. 2 Multistage feature fusion

下载: 全尺寸图片幻灯片

图 3 回归方式

Fig. 3 Regression approach

下载: 全尺寸图片幻灯片

图 4 两种计算CS的方式

Fig. 4 Two ways to calculate center score

下载: 全尺寸图片幻灯片

图 5 基于中心得分的搜索过程图

Fig. 5 A search process graph based on the center score

下载: 全尺寸图片幻灯片

图 6 采样策略对比图

Fig. 6 Sampling strategy comparison diagram

下载: 全尺寸图片幻灯片

图 7 不同挑战下的精度−鲁棒性曲线图

Fig. 7 Accuracy-Robustness curves for different challenges

下载: 全尺寸图片幻灯片

图 8 不同视频序列跟踪结果

Fig. 8 Tracking results for different video sequences

下载: 全尺寸图片幻灯片

图 9 OTB2015结果对比图

Fig. 9 Comparison chart of results on OTB2015

下载: 全尺寸图片幻灯片

图 10 GOT-10k成功率对比图

Fig. 10 Success rate comparison graph on GOT-10k

下载: 全尺寸图片幻灯片

图 11 锚框与目标框误匹配

Fig. 11 The anchor box is mismatched with the target box

下载: 全尺寸图片幻灯片

图 12 锚框分布图

Fig. 12 Anchor box distribution map

下载: 全尺寸图片幻灯片

表 1 消融实验

Table 1 Ablation experiments

序号	主干网络	子网络	质量得分	A	R	EAO	融合方式	新采样策略
1	Alex	cls	none	0.530	0.466	0.235	none	none
2	ResNet50	cls	none	0.579	0.386	0.280	none	none
3	ResNet50	cls + reg	none	0.592	0.333	0.345	none	none
4	ResNet50	cls + reg	none	0.602	0.302	0.355	sum	none
5	ResNet50	cls + reg	none	0.607	0.242	0.382	sum	yes
6	ResNet50	cls + reg	CS	0.610	0.224	0.415	concat	yes
7	ResNet50	cls + reg	CS	0.614	0.238	0.397	sum	yes
8	ResNet50	cls + reg	CS	0.624	0.205	0.412	msf	yes

下载: 导出CSV

表 2 VOT2016上与多个跟踪器对比

Table 2 Compare with multiple trackers on VOT2016

	CCOT	ECO	MDNet	DeepSRDCF	SiamRPN	DaSiamRPN	Ours	SiamRPN++
A	0.541	0.550	0.542	0.529	0.560	0.609	0.651	0.642
R	0.238	0.200	0.337	0.326	0.260	0.224	0.149	0.196
EAO	0.331	0.375	0.257	0.276	0.344	0.411	0.485	0.464

下载: 导出CSV

表 3 不同挑战因素下的失败率

Table 3 Failure rates under different challenge factors

	相机运动	目标丢失	光照变化	物体运动	遮挡	尺度变化	平均	加权
CCOT	24	11	2	20	14	13	14.0	16.6
Ours	20	3	2	9	11	7	8.7	10.2
DaSiamRPN	26	4	2	15	16	10	12.2	14.2
SiamRPN	33	13	1	22	20	11	16.7	20.1
SiamRPN++	20	7	1	12	15	9	10.7	12.4
MDNet	33	18	4	21	13	12	17.0	21.1
DeepSRDCF	28	17	3	23	25	11	17.9	20.3

下载: 导出CSV

表 4 GOT-10k上与多个跟踪器对比

Table 4 Compare with multiple trackers on GOT-10k

	SiamFC	ECO	MDNet	DeepSRDCF	SiamRPN++	Ours
AO	0.348	0.316	0.299	0.451	0.507	0.529
SR75	0.098	0.111	0.099	0.216	0.311	0.370
SR5	0.353	0.303	0.303	0.543	0.605	0.617

下载: 导出CSV

参考文献(28)

[1]	Li B, Yan J J, Wu W, Zhu Z, Hu X L. High performance visual tracking with siamese region proposal network. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 8971−8980
[2]	Bertinetto L, Valmadre J, Henriques J F, Vedaldi A, Torr P H. Fully-convolutional siamese networks for object tracking. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016. 850−865
[3]	Zhu Z, Wang Q, Li B, Wu W, Yan J J, Hu W M. Distractor-aware siamese networks for visual object tracking. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 101−117
[4]	Li B, Wu W, Wang Q, Zhang F Y, Xing J L, Yan J J. SiamRPN++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 4282−4291
[5]	Fan H, Ling H B. Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 7952−7961
[6]	Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. In: Procoeedings of the 2015 Advances in Neural Information Pro cessing Systems. Montreal, Canada: MIT Press, 2015. 91−99
[7]	Tao R, Gavves E, Smeulders A W M. Siamese instance search for tracking. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 1420−1429
[8]	Held D, Thrun S, Savarese S. Learning to track at 100 fps with deep regression networks. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016. 749−765
[9]	Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211−252 doi: 10.1007/s11263-015-0816-y
[10]	Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, et al. Microsoft COCO: Common objects in context. In: Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014. 740−755
[11]	Lin T Y, Dollar P, Girshick R, He K M, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 2117−2125
[12]	Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, et al. Ssd: Single shot multibox detector. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016. 21−37
[13]	Lin T Y, Goyal P, Girshick R, He K M, Dollar P. Focal loss for dense object detection. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 2980−2988
[14]	Cai Z W, Vasconcelos N. Cascade R-CNN: Delving into high quality object detection. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 6154−6162
[15]	Zhang S F, Wen L Y, Bian X, Lei Z, Li S Z. Single-shot refinement neural network for object detection. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 4203−4212
[16]	Jiang B R, Luo R X, Mao J Y, Xiao T T, Jiang Y N. Acquisition of localization confidence for accurate object detection. In: Proceedings of the 15th European Conference on Computer Vision. Salt Lake City, USA: IEEE, 2018. 784−799
[17]	Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 779−788
[18]	Law H, Deng J. Cornernet: Detecting objects as paired keypoints. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 734−750
[19]	Yang Z, Liu S H, Hu H, Wang L W, Lin S. Reppoints: Point set representation for object detection. In: Proceedings of the 2019 IEEE International Conference on Computer Vision. Seoul, Korea: IEEE, 2019. 9657−9666
[20]	Tian Z, Shen C H, Chen H, He T. Fcos: Fully convolutional one-stage object detection. In: Proceedings of the 2019 IEEE International Conference on Computer Vision. Seoul, Korea: IEEE, 2019. 9627−9636
[21]	He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 770−778
[22]	Kristan M, Leonardis A, Matas J, Felsberg M, Chi Z Z. The visual object tracking VOT2016 challenge results. In: Proceedings of the 14th European Conference on Computer Vision Workshop. Amsterdam, The Netherlands: Springer, 2016. 191−217
[23]	Huang L, Zhao X, Huang K. GOT-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019: 1−1
[24]	Nam H, Han B. Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 4293−4302
[25]	Wu Y, Lim J, Yang M. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1834−1848 doi: 10.1109/TPAMI.2014.2388226
[26]	Li P X, Chen B Y, Ouyang W L, Wang D, Yang X Y, Lu H C. Gradnet: Gradient-guided network for visual object tracking. In: Proceedings of the 2019 IEEE International Conference on Computer Vision. Seoul, Korea: IEEE, 2019. 6162−6171
[27]	Danelljan M, Hager G, Shahbaz Khan F, Felsberg M. Convolutional features for correlation filter based visual tracking. In: Proceedings of the 2015 IEEE International Conference on Computer Vision Workshops. Santiago, Chile: IEEE, 2015. 58−66
[28]	Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr P H. End-to-end representation learning for correlation filter based tracking. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017. 2805−2813