2.624

2020影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

融合生成对抗网络和姿态估计的视频行人再识别方法

刘一敏 蒋建国 齐美彬 刘皓 周华捷

刘一敏, 蒋建国, 齐美彬, 刘皓, 周华捷. 融合生成对抗网络和姿态估计的视频行人再识别方法. 自动化学报, 2020, 46(3): 576-584. doi: 10.16383/j.aas.c180054
引用本文: 刘一敏, 蒋建国, 齐美彬, 刘皓, 周华捷. 融合生成对抗网络和姿态估计的视频行人再识别方法. 自动化学报, 2020, 46(3): 576-584. doi: 10.16383/j.aas.c180054
LIU Yi-Min, JIANG Jian-Guo, QI Mei-Bin, LIU Hao, ZHOU Hua-Jie. Video-based Person Re-identification Method Based on GAN and Pose Estimation. ACTA AUTOMATICA SINICA, 2020, 46(3): 576-584. doi: 10.16383/j.aas.c180054
Citation: LIU Yi-Min, JIANG Jian-Guo, QI Mei-Bin, LIU Hao, ZHOU Hua-Jie. Video-based Person Re-identification Method Based on GAN and Pose Estimation. ACTA AUTOMATICA SINICA, 2020, 46(3): 576-584. doi: 10.16383/j.aas.c180054

融合生成对抗网络和姿态估计的视频行人再识别方法

doi: 10.16383/j.aas.c180054
基金项目: 

国家自然科学基金 61371155

国家自然科学基金 61771180

安徽省重点研究与开发项目 1704d0802183

详细信息
    作者简介:

    刘一敏  合肥工业大学计算机与信息学院硕士研究生.主要研究方向为计算机视觉, 图像处理, 行人再识别. E-mail: yiminliu@mail.hfut.edu.cn

    蒋建国  合肥工业大学计算机与信息学院教授.主要研究方向为数字图像分析和处理, 分布式智能系统和数字信号处理技术及应用. E-mail: jgjiang@hfut.edu.cn

    刘皓  腾讯优图实验室研究员. 2018年获得合肥工业大学博士学位.主要研究方向为计算机视觉, 行人再识别, 图像检索. E-mail: hfut.haoliu@gmail.com

    周华捷  合肥工业大学计算机与信息学院硕士研究生.主要研究方向为计算机视觉, 图像处理, 行人再识别. E-mail: Zhou hj@mail.hfut.edu.cn

    通讯作者:

    齐美彬  合肥工业大学计算机与信息学院教授.主要研究方向为视频编码, 运动目标检测与跟踪和DSP技术.本文通信作者. E-mail: qimeibin@163.com

Video-based Person Re-identification Method Based on GAN and Pose Estimation

Funds: 

National Natural Science Foundation of China 61371155

National Natural Science Foundation of China 61771180

Anhui Province Key Research and Development Projects 1704d0802183

More Information
    Author Bio:

    LIU Yi-Min Master student at the School of Computer and Information, Hefei University of Technology. His research interest covers computer vision, image processing, and person re-identiflcation

    JIANG Jian-Guo Professor at the School of Computer and Information, Hefei University of Technology. His research interest covers digital image analysis and processing, distributed intelligent systems, digital signal processing DSP) technology, and applications

    LIU HAO Researcher of Tencent YouTu Laboratory, He received his Ph. D. degree from Hefei University of Technology in 2018. His research interest covers computer vision, person re-identiflcation, and image retrieval

    ZHOU Hua-Jie Master student at the School of Computer and Information, Hefei University of Technology. His research interest covers computer vision, image processing, and person re-identiflcation

    Corresponding author: QI Mei-Bin Professor at the School of Computer and Information, Hefei University of Technology. His research interest covers video coding, moving target detection and tracking, and DSP technology. Corresponding author of this paper
  • 摘要: 随着国家对社会公共安全的日益重视, 无重叠视域监控系统已大规模的普及.行人再识别任务通过匹配不同视域摄像机下的行人目标, 在当今环境下显得尤为重要.由于深度学习依赖大数据解决过拟合的特性, 针对当前视频行人再识别数据量较小和学习特征单一的问题, 我们提出了一种基于视频的改进行人再识别方法, 该方法通过生成对抗网络去生成视频帧序列来增加样本数量和加入了行人关节点的特征信息去提升模型效率.实验结果表明, 本文提出的改进方法可以有效地提高公开数据集的识别率, 在PRID2011, iLIDS-VID数据集上进行实验, Rank 1分别达到了80.2%和66.3 %.
    Recommended by Associate Editor LIU Qing-Shan
    1)  本文责任编委 刘青山
  • 图  1  多尺度结构

    Fig.  1  Multi-scale architecture

    图  2  生成对抗网络生成的视频帧序列(后5帧)

    Fig.  2  A sequence of video frames generated by GAN (last five frames)

    图  3  CPM算法的网络结构

    Fig.  3  Structure of CPM algorithm

    图  4  CPM算法检测到的行人关节点特征

    Fig.  4  Pedestrian keypoint features detected by CPM algorithm

    图  5  融合生成对抗网络和姿态估计算法网络结构

    Fig.  5  The structure of integration of GAN and pose estimation algorithm

    表  1  不同算法在PRID2011数据集上的识别率(%)

    Table  1  Matching rates of different methods on the PRID2011 dataset (%)

    方法 Rank 1 Rank 5 Rank 10 Rank 20
    AFDA[5] 43.0 72.7 84.6 91.9
    VR[4] 41.8 64.5 77.5 89.4
    STA[7] 64.1 87.3 89.9 92.0
    RFA[9] 64.1 85.8 93.7 98.4
    RNN-CNN[8] 70.0 90.0 95.0 97.0
    ASTPN[11] 77.0 95.0 99.0 99.0
    本文方法 80.2 96.0 99.1 99.2
    下载: 导出CSV

    表  2  不同算法在PRID2011数据集上对识别率的影响(%)

    Table  2  The influence of different methods on matching rates based on PRID2011 dataset (%)

    方法 Rank 1 Rank 5 Rank 10 Rank 20
    ASTPN 77.0 95.0 99.0 99.0
    ASTPN+GAN 79.2 95.3 99.2 99.2
    ASTPN+KeyPoint 78.6 95.1 99.1 99.1
    本文方法 80.2 96.0 99.1 99.2
    下载: 导出CSV

    表  3  不同算法在iLIDS-VID数据集上的识别率(%)

    Table  3  Matching rates of different methods on the iLIDS-VID dataset (%)

    方法 Rank 1 Rank 5 Rank 10 Rank 20
    AFDA[5] 37.5 62.7 73.0 81.8
    VR[4] 34.5 56.7 67.5 77.5
    STA[7] 44.3 71.7 83.7 91.7
    RFA[9] 49.3 76.8 85.3 90.1
    RNN-CNN [8] 58.0 84.0 91.0 96.0
    ASTPN[11] 62.0 86.0 94.0 98.0
    本文方法 66.3 88.4 96.2 98.1
    下载: 导出CSV

    表  4  不同算法在iLIDS-VID数据集上对识别率的影响(%)

    Table  4  The influence of different methods on matching rates based on iLIDS-VID dataset (%)

    方法 Rank 1 Rank 5 Rank 10 Rank 20
    ASTPN 62.0 86.0 94.0 98.0
    ASTPN+GAN 64.4 87.5 95.1 98.0
    ASTPN+KeyPoint 64.5 87.5 96.1 98.1
    本文方法 66.3 88.4 96.2 98.1
    下载: 导出CSV

    表  5  每个行人轨迹递归生成的图片张数$N$对PRID2011数据集上识别率的影响(%)

    Table  5  The influence of the number $N$ of pictures generated recursively by each pedestrian trace on matching rate based on PRID2011 dataset (%)

    $ N$ Rank 1 Rank 5 Rank 10 Rank 20
    1 77.4 95.2 99.0 99.0
    3 78.6 95.6 99.1 99.1
    5 80.2 96.0 99.2 99.3
    7 80.1 96.0 99.2 99.2
    9 77.6 95.5 98.7 99.1
    11 76.7 95.4 98.2 99.0
    下载: 导出CSV

    表  6  每个行人轨迹递归生成的图片张数$N$对iLIDS-VID数据集上识别率的影响(%)

    Table  6  The influence of the number $N$ of pictures generated recursively by each pedestrian trace on matching rate based on iLIDS-VID dataset (%)

    $N $ Rank 1 Rank 5 Rank 10 Rank 20
    1 61.9 86.7 94.2 97.8
    3 63.1 87.5 95.6 98.0
    5 66.3 88.4 96.2 98.1
    7 66.0 88.4 96.0 98.1
    9 64.8 87.9 94.3 97.9
    11 64.6 86.6 94.1 97.6
    下载: 导出CSV
  • [1] Yi D, Lei Z, Liao S C, Li S Z. Deep metric learning for person re-identification. In: Proceedings of the 22nd International Conference on Pattern Recognition. Stockholm, Sweden: IEEE, 2014. 34-39 http://dl.acm.org/citation.cfm?id=2703838
    [2] Varior R R, Shuai B, Lu J W, Xu D, Wang G. A Siamese long short-term memory architecture for human re-identification. In: Proceedings of the 14th European Conference on Computer Vision (ECCV). Amsterdam, The Netherlands: Springer, 2016. 135-153 doi: 10.1007/978-3-319-46478-7_9
    [3] Liu H, Feng J S, Qi M B, Jiang J G, Yan S C. End-to-end comparative attention networks for person re-identification. IEEE Transactions on Image Processing, 2017, 26(7): 3492 -3506 doi: 10.1109/TIP.2017.2700762
    [4] Wang T Q, Gong S G, Zhu X T, Wang S J. Person re-identification by video ranking. In: Proceedings of the 13th European Conference on Computer Vision (ECCV). Zurich, Switzerland: Springer, 2014. 688-703
    [5] Li Y, Wu Z Y, Karanam S, Radke R J. Multi-shot human re-identification using adaptive fisher discriminant analysis. In: Proceedings of the 2015 British Machine Vision Conference (BMVC). Swansea, UK: BMVA Press, 2015. 73.1-73.12
    [6] Zhu X K, Jing X Y, Wu F, Feng H. Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI). New York, USA: ACM, 2016. 3552-3558 http://dl.acm.org/citation.cfm?id=3061053.3061117
    [7] Liu K, Ma B P, Zhang W, Huang R. A spatio-temporal appearance representation for video-based pedestrian re-identification. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 3810-3818 http://www.researchgate.net/publication/304409873_A_Spatio-Temporal_Appearance_Representation_for_Viceo-Based_Pedestrian_Re-Identification
    [8] McLaughlin N, del Rincon J M, Miller P. Recurrent convolutional network for video-based person re-identification. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 1325-1334 http://ieeexplore.ieee.org/document/7780517/
    [9] Yan Y C, Ni B B, Song Z C, Ma C, Yan Y, Yang X K. Person re-identification via recurrent feature aggregation. In: Proceedings of the 14th European Conference on Computer Vision (ECCV). Amsterdam, The Netherlands: Springer, 2016. 701-716 doi: 10.1007/978-3-319-46466-4_42.pdf
    [10] Liu H, Jie Z Q, Jayashree K, Qi M B, Jiang J G, Yan S C, et al. Video-based person re-identification with accumulative motion context. IEEE Transactions on Circuits and Systems for Video Technology, 2017, DOI: 10.1109/TCSVT.2017. 2715499
    [11] Xu S J, Cheng Y, Gu K, Yang Y, Chang S Y, Zhou P. Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017. 4743-4752 http://ieeexplore.ieee.org/document/8237769/
    [12] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS). Montreal, Canada: ACM, 2014. 2672-2680
    [13] Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. In: Proceedings of the 2016 International Conference on Learning Representations (ICLR). Caribe Hilton, San Juan, Puerto Rico, 2016
    [14] Mirza M, Osindero S. Conditional generative adversarial nets. arXiv: 1411.1784, 2014. 2672-2680
    [15] Denton E, Chintala S, Szlam A, Fergus R. Deep generative image models using a Laplacian pyramid of adversarial networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS). Montreal, Canada: ACM, 2015. 1486-1494
    [16] Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. arXiv: 1701.07875, 2017
    [17] Agarwal A, Triggs B. 3D human pose from silhouettes by relevance vector regression. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). Washington, DC, USA: IEEE, 2004. Ⅱ-882-Ⅱ-888
    [18] Mori G, Malik J. Estimating human body configurations using shape context matching. In: Proceedings of the 7th European Conference on Computer Vision (ECCV). Copenhagen, Denmark: Springer, 2002. 666-680
    [19] Taylor G W, Fergus R, Williams G, Spiro I, Bregler C. Pose-sensitive embedding by nonlinear NCA regression. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems (NIPS). Vancouver, BC, Canada: ACM, 2010. 2280-2288
    [20] Felzenszwalb P F, Huttenlocher D P. Pictorial structures for object recognition. International Journal of Computer Vision, 2005, 61(1): 55-79 http://d.old.wanfangdata.com.cn/OAPaper/oai_arXiv.org_1108.4079
    [21] Jain A, Tompson J, Andriluka M, Taylor G, Bregler C. Learning human pose estimation features with convolutional networks. In: Proceedings of the 2014 ICLR. Banff, Canada, 2014. 1-14
    [22] Pfister T, Charles J, Zisserman A. Flowing convnets for human pose estimation in videos. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 1913-1921
    [23] Jing X Y, Zhu X K, Wu F, Hu R M, You X G, Wang Y H, et al. Super-resolution person re-identification with semi-coupled low-rank discriminant dictionary learning. IEEE Transactions on Image Processing, 2017, 26(3): 1363-1378 doi: 10.1109/TIP.2017.2651364
    [24] Zheng Z D, Zheng L, Yang Y. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017. 3774-3782
    [25] Qian X L, Fu Y W, Xiang T, Wang W X, Qiu J, Wu Y, et al. Pose-normalized image generation for person re-identification. arXiv: 1712.02225, 2018.
    [26] Deng W J, Zheng L, Ye Q X, Kang G L, Yang Y, Jiao J B. Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018. 994-1003
    [27] Mathieu M, Couprie C, LeCun Y. Deep multi-scale video prediction beyond mean square error. In: Proceedings of the 4th International Conference on Learning Representations (ICLR). Caribe Hilton, San Juan, Argentina, 2016.
    [28] Wei S E, Ramakrishna V, Kanade T, Sheikh Y. Convolutional pose machines. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 4724-4732
    [29] Ramakrishna V, Munoz D, Hebert M, Bagnell J A, Sheikh Y. Pose machines: articulated pose estimation via inference machines. In: Proceedings of the 13th European Conference on Computer Vision (ECCV). Zurich, Switzerland: Springer, 2014. 33-47 http://link.springer.com/openurl?id=doi:10.1007/978-3-319-10605-2_3
    [30] Cao Z, Simon T, Wei S E, Sheikh Y. Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 1302-1310 http://www.researchgate.net/publication/310953055_Realtime_Multi-Person_2D_Pose_Estimation_using_Part_Affinity_Fields
  • 加载中
图(5) / 表(6)
计量
  • 文章访问数:  1589
  • HTML全文浏览量:  357
  • PDF下载量:  291
  • 被引次数: 0
出版历程
  • 收稿日期:  2018-01-22
  • 录用日期:  2018-07-02
  • 刊出日期:  2020-03-30

目录

    /

    返回文章
    返回