2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于可解释注意力部件模型的行人重识别方法

周勇 王瀚正 赵佳琦 陈莹 姚睿 陈思霖

周勇, 王瀚正, 赵佳琦, 陈莹, 姚睿, 陈思霖. 基于可解释注意力部件模型的行人重识别方法. 自动化学报, 2020, 41(x): 1−13 doi: 10.16383/j.aas.c200493
引用本文: 周勇, 王瀚正, 赵佳琦, 陈莹, 姚睿, 陈思霖. 基于可解释注意力部件模型的行人重识别方法. 自动化学报, 2020, 41(x): 1−13 doi: 10.16383/j.aas.c200493
Zhou Yong, Wang Han-Zheng, Zhao Jia-Qi, Chen Ying, Yao Rui, Chen Si-Lin. Interpretable attention part model for person re-identification. Acta Automatica Sinica, 2020, 41(x): 1−13 doi: 10.16383/j.aas.c200493
Citation: Zhou Yong, Wang Han-Zheng, Zhao Jia-Qi, Chen Ying, Yao Rui, Chen Si-Lin. Interpretable attention part model for person re-identification. Acta Automatica Sinica, 2020, 41(x): 1−13 doi: 10.16383/j.aas.c200493

基于可解释注意力部件模型的行人重识别方法

doi: 10.16383/j.aas.c200493
基金项目: 国家自然科学基金(61806206, U1610124, 61772530, 61773383), 江苏省自然科学基金(BK20180639, BK20171192), 江苏省六大人才高峰计划(2015-DZXX-010)资助
详细信息
    作者简介:

    周勇:中国矿业大学计算机科学与技术学院教授. 主要研究方向为数据挖掘, 机器学习和人工智能. E-mail: yzhou@cumt.edu.cn

    王瀚正:中国矿业大学计算机科学与技术学院硕士研究生. 主要研究方向为计算机视觉, 图像处理, 行人重识别. E-mail: hzwang@cumt.edu.cn

    赵佳琦:中国矿业大学计算机科学与技术学院副教授. 主要研究方向为多目标优化, 深度学习, 图像处理. E-mail: jiaqizhao88@126.com

    陈莹:中国矿业大学计算机科学与技术学院博士研究生. 主要研究方向为计算机视觉, 图像处理, 行人重识别. E-mail: cheny@cumt.edu.cn

    姚睿:中国矿业大学计算机科学与技术学院副教授. 主要研究方向为计算机视觉, 机器学习. E-mail: ruiyao@cumt.edu.cn

    陈思霖:中国矿业大学计算机科学与技术学院硕士研究生. 主要研究方向为计算机视觉, 图像处理, 目标检测. E-mail: silin.chen@cumt.edu.cn

Interpretable Attention Part Model for Person Re-Identification

Funds: Supported by the National Natural Science Foundation of China (61806206, U1610124, 61772530, 61773383), the Natural Science Foundation of Jiangsu Province (BK20180639, BK20171192), the Six Talent Peaks Project in Jiangsu Province (2015-DZXX-010)
  • 摘要: 大多数行人重识别方法仅将注意力机制作为提取显著特征的辅助手段, 缺少网络对行人图像关注程度的量化研究. 本文基于此, 提出一种可解释注意力部件模型(Interpretable Attention Part Model, IAPM). 该模型有三个优点: 1)利用注意力掩码提取部件特征, 解决部件不对齐问题; 2)为了根据部件的显著性程度生成可解释权重, 设计可解释权重生成模块(Interpretable Weight Generation Module, IWM); 3)提出显著部件三元损失(Salient Part Triplet Loss, SPTL)用于IWM的训练, 提高识别精度和可解释性. 在三个主流数据集上进行实验, 验证所提出的方法优于现有行人重识别方法. 最后通过一项人群主观测评比较IWM生成可解释权重的相对大小与人类直观判断得分, 证明本方法具有良好的可解释性.
  • 图  1  IAPM整体结构

    Fig.  1  Structure of IAPM

    图  2  横向分割示意图

    Fig.  2  Schematic diagram of horizontal split

    图  3  PS模块使用的伪标签[16]

    Fig.  3  Pseudo-labels used by PS[16]

    图  4  注意力权重生成模块结构

    Fig.  4  Structure of IWM

    图  5  负样本对距离变化图

    Fig.  5  Negative sample pair distance graph

    图  6  正样本对距离变化图

    Fig.  6  Positive sample pair distance graph

    图  7  SPTL损失曲线图

    Fig.  7  SPTL loss curve graph

    图  8  可解释权重展示

    Fig.  8  The display of interpretable weights

    图  9  主观测评结果

    Fig.  9  The display of subjective evaluation results

    图  10  可解释权重与主观测评结果对比

    Fig.  10  Comparison of interpretable weights and subjective evaluation results

    表  8  消融实验4

    Table  8  Ablation experiment 4

    $\lambda $ Rank-1(%) mAP(%)
    0.2 94.4 85.4
    0.4 94.8 85.4
    0.6 94.4 85.1
    0.8 94.8 85.7
    1.0 95.2 86.3
    下载: 导出CSV

    表  1  实验环境

    Table  1  Experimental environment

    软硬件环境 配置
    实验平台 Pytorch
    显卡 NVIDIA Tesla P100
    内存 40 GB
    显存 16 GB
    下载: 导出CSV

    表  2  实验参数

    Table  2  Experimental parameters

    实验参数 参数数值
    输入图像尺寸 384×128
    迭代次数 100
    优化器 SGD
    动量因子 0.9
    权重衰减系数 5×10−4
    Batchsize 128
    显著部件三元损失 $\alpha $ 1.2
    下载: 导出CSV

    表  3  与EANet的性能对比

    Table  3  Performance comparison with EANet

    方法 Market-1501 DukeMTMC-reID CUHK03
    PAP-6P 94.3(84.3) 85.6(72.4) 68.1(62.4)
    PAP 94.5(84.9) 86.1(73.3) 72.0(66.2)
    PAP-S-PS 94.6(85.6) 87.5(74.6) 72.5(66.8)
    IAPM-6P(ours) 95.0(85.3) 86.9(74.3) 72.5(65.2)
    IAPM-9P(ours) 95.1(86.0) 87.9(75.6) 72.6(67.4)
    IAPM(ours) 95.2(86.3) 88.0(75.7) 72.6(67.2)
    下载: 导出CSV

    表  4  与其他方法的性能对比

    Table  4  Comparison results with other methods

    方法 Market-1501 DukeMTMC-reID CUHK03
    Verif-Identify[37] 79.5(59.9) 68.9(49.3)
    MSCAN[29] 80.8(57.5)
    MGCAM[12] 83.8(74.3) 50.1(50.2)
    Part-Aligned[38] 91.7(79.6) 84.4(69.3)
    SPReID[39] 92.5(81.3) 84.4(71.0)
    AlignedReID[40] 91.8(79.3)
    Deep-Person[41] 92.3(79.6) 80.9(64.8)
    PCB[7] 85.3(68.5) 73.2(52.8) 43.8(38.9)
    PCB+RPP[7] 93.8(81.6) 83.3(69.2) 63.7(57.5)
    HA-CNN[42] 91.2(75.7) 80.5(63.8) 44.4(41.0)
    Mancs[43] 93.1(82.3) 84.9(71.8) 69.0(63.9)
    P2-Net [44] 95.1(85.6) 86.5(73.1) 74.9(68.9)
    M3+ResNet50[45] 95.4(82.6) 84.7(68.5) 66.9(60.7)
    IAPM(ours) 95.2(86.3) 88.0(75.7) 72.6(67.2)
    下载: 导出CSV

    表  5  消融实验1

    Table  5  Ablation experiment 1

    Model Rank-1(%) mAP(%)
    原始模型 92.4 80.5
    原始模型+IWM+SPTL 95.0 86.1
    原始模型+IWM+SPTL+中心损失 95.2 86.3
    下载: 导出CSV

    表  6  消融实验2

    Table  6  Ablation experiment 2

    人体部件个数 Rank-1(%) mAP(%)
    6 95.0 85.3
    7 95.2 86.3
    9 95.1 86.0
    下载: 导出CSV

    表  7  消融实验3

    Table  7  Ablation experiment 3

    $\alpha $ Rank-1(%) mAP(%)
    0.1 94.4 85.2
    0.5 94.5 85.3
    0.8 94.8 85.7
    1.0 94.7 85.6
    1.2 95.2 86.3
    1.5 94.6 85.6
    2.0 94.7 85.3
    5.0 93.5 83.5
    10.0 93.3 81.0
    下载: 导出CSV
  • [1] Yi D, Lei Z, Liao S C, Li S Z. Deep metric learning for person re-identification. In: Proceedings of the 22nd IEEE International Conference on Pattern Recognition. Stockholm, Sweden: IEEE, 2014.34−39.
    [2] Liao S C, Hu Y, Zhu X Y, Li S Z. Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015.2197−2206.
    [3] 罗浩, 姜伟, 范星, 张思朋. 基于深度学习的行人重识别研究进展. 自动化学报, 2019, 45(11): 2032−2049

    Luo Hao, Jiang Wei, Fan xing, Zhang Si-Peng. A survey on deep learning based person re-identification. Acta Automatica Sinica, 2019, 45(11): 2032−2049
    [4] Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors. Nature, 1986, 323(6088): 533−536 doi: 10.1038/323533a0
    [5] 吴飞, 廖彬兵, 韩亚洪. 深度学习的可解释性. 航空兵器, 2019, 26(01): 43−50

    Wu Fei, Liao Bin-Bing, Han Ya-Hong. Interpretability for Deep Learning. Aero Weaponry, 2019, 26(01): 43−50
    [6] Chen W H, Chen X T, Zhang J G, Huang K Q. A multi-task deep network for person re-identification. In: Proceedings of the Thirty-First Conference on Artificial Intelligence. San Francisco, USA: AAAI, 2017.3988−3994.
    [7] Sun Y G, Zheng L, Yang Y, Tian Q, Wang S J. Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Proceedings of the 2018 European Conference on Computer Vision. Munich, Germany: Springer, 2018.480−496.
    [8] Zhou S P, Wang J J, Wang J Y, Gong Y H, Zheng N N. Point to set similarity based deep feature learning for person re-identification. In: Proceedings of the 2017 Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017.5028−5037.
    [9] Sarfraz M S, Schumann A, Eberle A, Stiefelhagen R. A pose-pensitive embedding for person re-identification with expanded cross neighborhood re-ranking. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018.420−429.
    [10] Zhao L M, Li X, Zhuang Y T, Wang J D. Deeply-learned part-aligned representations for person re-identification. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017.3239−3248.
    [11] Zhou S P, Wang J J, Meng D Y, Liang Y D, Gong Y H, Zheng N N. Discriminative feature learning with foreground attention for person re-identification. IEEE Transactions on Image Processing, 2019, 28(9): 4671−4684
    [12] Song C F, Huang Y, Ouyang W L, Wang L. Mask-guided contrastive attention model for person re-identification. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018.1179−1188.
    [13] Xu J, Zhao R, Zhu F, Wang H M, Ouyang W L. Attention-aware compositional network for person re-identification. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018.2119−2128.
    [14] Tay C P, Roy S, Yap K H. Aanet: Attribute attention network for person re-identifications. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019.7134−7143.
    [15] Zhou S P, Wang F, Huang Z Y, Wang J J. Discriminative feature learning with consistent attention regularization for person re-identification. In: Proceedings of the 2019 IEEE International Conference on Computer Vision. Seoul, Korea (South): IEEE, 2019.8039−8048.
    [16] Huang H J, Yang W J, Chen X T, Zhao X, Huang K Q, Lin J B, et al. EANet: Enhancing alignment for cross-domain person re-identification [Online], available: http://arxiv.org/abs/1812.11369, October 21, 2020.
    [17] Bach S, Binder A, Montavon G, Klauschen F, Muller K, Samek W. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS One, 2015, 10(7): e0130140 doi: 10.1371/journal.pone.0130140
    [18] Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016.2921−2929.
    [19] Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I J, et al. Intriguing properties of neural networks [Online], available: http://arxiv.org/abs/1312.6199, October 21, 2020.
    [20] Bau D, Zhou B, Khosla A, Oliva A, Torralba A. Network dissection: quantifying interpretability of deep visual representations. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017.3319−-3327.
    [21] Dong Y P, Su H, Zhu J, Zhang B. Improving interpretability of deep neural networks with semantic information. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017.975−983.
    [22] Zhang Q S, Wu Y N, Zhu S C. Interpretable convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018.8827−8836.
    [23] Zheng L, Yang Y, Hauptmann A G. Person re-identification: past, present and future [Online], available: http://arxiv.org/abs/1610.02984, October 21, 2020.
    [24] Zheng L, Zhang H H, Sun S Y, Chandraker M, Yang Y, Tian Qi. Person re-identification in the wild. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017.3346−3355.
    [25] Lin Y T, Zheng L, Zheng Z D, Wu Y, Hu Z L, Yan C G, et al. Improving person re-identification by attribute and identity learning. Pattern Recognition, 2019, 95: 151−161 doi: 10.1016/j.patcog.2019.06.006
    [26] Geng M Y, Wang Y W, Xiang T, Tian Y H. Deep transfer learning for person re-identification [Online], available: http://arxiv.org/abs/http://arxiv.org/abs/1611.05244, October 21, 2020.
    [27] Varior R R, Haloi M, Wang G. Gated siamese convolutional neural network architecture for human re-identification. In: Proceedings of the 2016 European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016.791−808.
    [28] Hermans A, Beyer L, Leibe B. In defense of the triplet loss for person re-identification [Online], available: http://arxiv.org/abs/1703.07737, October 21, 2020.
    [29] Li D W, Chen X T, Zhang Z, Huang K Q. Learning deep context-aware features over body and latent parts for person re-identification. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2017.7398−7407.
    [30] Fang P F, Zhou J M, Roy S K, Petersson L, Harandi M. Bilinear attention networks for person retrieval. In: Proceedings of the 2019 IEEE International Conference on Computer Vision. Seoul, Korea (South): IEEE, 2019.8029−8038.
    [31] Liu H, Feng J S, Qi M B, Jiang J G, Yan S C. End-to-end comparative attention networks for person re-identification. IEEE Transactions on Image Processing, 2017, 26(7): 3492−3506 doi: 10.1109/TIP.2017.2700762
    [32] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735−1780 doi: 10.1162/neco.1997.9.8.1735
    [33] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, Nevada, USA: IEEE, 2016.770−778.
    [34] Zheng L, Shen L, Tian L, Wang S J, Wang J D, Tian Q. Scalable person re-identification: a benchmark. In: Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015.1116−1124.
    [35] Ristani E, Solera F, Zou R, Cucchiara, R, Tomasi C. Performance measures and a data set for multi-target, multi-camera tracking. In: Proceedings of the2016 European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016.17−35.
    [36] Li W, Zhao R, Xiao T, Wang X G. Deepreid: deep filter pairing neural network for person re-identification. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014.152−159.
    [37] Zheng Z D, Zheng L, Yang Y. A discriminatively learned cnn embedding for person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications, 2018, 14(1): Article No. 13
    [38] Suh Y, Wang J, Tang S, Mei T, Lee K M. Part-aligned bilinear representations for person re-identification. In: Proceedings of the 2018 European Conference on Computer Vision. Munich, Germany: Springer, 2018.418−437.
    [39] Kalayeh M M, Basaran E, Gökmen M, Kamasak M E, Shah M. Human semantic parsing for person re-identification. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018.1062−1071.
    [40] Zhang X, Luo H, Fan X, Xiang W L, Sun Y X, Xiao Q Q, et al. Alignedreid: surpassing human-level performance in person re-identification [Online], available: http://arxiv.org/abs/1711.08184, October 21, 2020.
    [41] Bai X, Yang M K, Huang T T, Dou Z Y, Yu R, Xu Y C. Deep-person: learning discriminative deep features for person re-identification. Pattern Recognition, 2020, 98: 107036 doi: 10.1016/j.patcog.2019.107036
    [42] Li W, Zhu X T, Gong S G. Harmonious attention network for person re-identification. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018.2285−2294.
    [43] Wang C, Zhang Q, Huang C, Liu W Y, Wang X G. Mancs: a multi-task attentional network with curriculum sampling for person re-identification. In: Proceedings of the 2018 European Conference on Computer Vision. Munich, Germany: Springer, 2018.384−400.
    [44] Guo J Y, Yuan Y H, Huang L, Zhang C, Yao J G, Han K. Beyond human parts: dual part-aligned representations for person re-identification. In: Proceedings of the 2019 IEEE International Conference on Computer Vision. Seoul, Korea (South): IEEE, 2019.3641−3650.
    [45] Zhou J H, Su B, Wu Y. Online joint multi-metric adaptation from frequent sharing-subset mining for person re-identification. In: Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition. ONLINE: IEEE, 2020.2909−2918.
    [46] Wen Y D, Zhang K P, Li Z F, Qiao Y. A discriminative feature learning approach for deep face recognition. In: Proceedings of the 2016 European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016.499−515.
  • 加载中
计量
  • 文章访问数:  87
  • HTML全文浏览量:  50
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-07-03
  • 修回日期:  2020-08-23

目录

    /

    返回文章
    返回