2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

融合目标定位与异构局部交互学习的细粒度图像分类

陈权 陈飞 王衍根 程航 王美清

陈权, 陈飞, 王衍根, 程航, 王美清. 融合目标定位与异构局部交互学习的细粒度图像分类. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c230507
引用本文: 陈权, 陈飞, 王衍根, 程航, 王美清. 融合目标定位与异构局部交互学习的细粒度图像分类. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c230507
Chen Quan, Chen Fei, Wang Yan-Gen, Cheng Hang, Wang Mei-Qing. Fine-grained image classification by integrating object localization and heterogeneous local interactive learning. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c230507
Citation: Chen Quan, Chen Fei, Wang Yan-Gen, Cheng Hang, Wang Mei-Qing. Fine-grained image classification by integrating object localization and heterogeneous local interactive learning. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c230507

融合目标定位与异构局部交互学习的细粒度图像分类

doi: 10.16383/j.aas.c230507
基金项目: 国家自然科学基金 (61771141, 62172098), 福建省自然科学基金 (2021J01620)
详细信息
    作者简介:

    陈权:福州大学计算机与大数据学院硕士研究生. 主要研究方向为计算机视觉. E-mail: justchenquan@gmail.com

    陈飞:福州大学计算机与大数据学院副教授. 主要研究方向为计算机视觉, 机器学习和图信号处理. 本文通信作者. E-mail: chenfei314@fzu.edu.cn

    王衍根:福州大学计算机与大数据学院硕士研究生. 主要研究方向为计算机视觉. E-mail: lCRZakHCfh237@hotmail.com

    程航:福州大学数学与统计学院教授. 主要研究方向为机器学习和多媒体信息安全. E-mail: hcheng@fzu.edu.cn

    王美清:福州大学数学与统计学院教授. 主要研究方向为图像处理和数值计算. E-mail: mqwang@fzu.edu.cn

Fine-grained Image Classification by Integrating Object Localization and Heterogeneous Local Interactive Learning

Funds: Supported by the National Natural Science Foundation of China (61771141, 62172098) and the Natural Science Foundation of Fujian Province (2021J01620)
More Information
    Author Bio:

    Chen Quan Master student at the College of Computer and Data Science, Fuzhou university. His research interest covers computer vision

    Chen Fei Associate professor at the College of Computer and Data Science, Fuzhou university. His research interest covers computer vision, machine learning and graph signal processing. Corresponding author of this paper

    Wang Yan-Gen Master student at the College of Computer and Data Science, Fuzhou university. His research interest covers computer vision

    Cheng Hang Professor at School of Mathematics and Statistics, Fuzhou university. His research interest covers machine learning and multimedia information security

    Wang Mei-Qing Professor at School of Mathematics and Statistics, Fuzhou university. Her research interest covers image processing and numerical calculation

  • 摘要: 由于细粒度图像之间存在小的类间方差和大的类内差异, 现有分类算法仅仅聚焦于单张图像显著的局部特征的提取与表示学习, 忽视了多张图像之间局部的异构语义判别信息, 较难关注到区分不同类别的微小细节, 导致学习到的特征缺乏足够区分度. 本文提出了一种渐进式网络以弱监督的方式学习图像不同粒度层级的信息, 首先构建一个注意力累计目标定位模块(Attention accumulation object location module, AAOLM), 在单张图像上从不同的训练轮次和特征提取阶段对注意力信息进行语义目标集成定位. 其次, 设计一个多张图像异构局部交互图模块(Heterogeneous local interaction graph module, HLIGM), 提取每张图像的显著性局部区域特征, 在类别标签引导下构建多张图像的局部区域特征之间的图网络, 聚合局部特征增强表示的判别力. 最后, 利用知识蒸馏将异构局部交互图模块产生的优化信息反馈给主干网络, 从而能够直接提取具有较强区分度的特征, 避免了在测试阶段建图的计算开销. 通过在多个数据集上进行的实验, 证明了提出的方法的有效性, 能够提高细粒度分类的精度.
  • 图  1  异构局部交互图模块说明图, 细粒度数据集图像由于目标只存在局部的细微不同, 依靠单个图像学习到的特征较难判别. 异构局部交互图模块对不同图像间的局部关系进行显示地对比区分, 并且聚合相互之间的信息获得更有区分度的特征表示

    Fig.  1  Illustration of HLIGM, because the objects of fine-grained image datasets are only slightly different locally, it is difficult to learn discriminative features by relying on a single image. HLIGM distinguishes the local relationships explicitly between different images and aggregate information each other to obtain a more discriminative feature representation

    图  2  (a, b)使用固定大小的检测框直接在原图中采样有用的目标局部部件, 没有很好地区分开不同的部位并且包含了更多无关的背景信息. (c, d) 展示了定位到目标后放大到一定的尺度再进行部件采样的效果

    Fig.  2  (a, b) shows that using the fixed-size anchor directly samples useful local patches of the object in the original image, which does not distinguish different patches well and contains more irrelevant background information. (c, d) shows the effect of patch sampling that it is zoomed in to a certain scale after the object is located

    图  3  模型的基本框架图, 该模型分为三个分支, 分别是全局流、目标流和部件流. 在全局流中提取原始图像的特征, 通过注意力累计目标定位模块获取目标的位置裁剪得到目标图像, 然后再在目标流中从不同anchor产生的候选块中根据得分选择显著块作为最终采样到的局部部件, 而部件流通过异构局部交互图学习这些局部部件之间的语义关系提取到更具有区分度的特征

    Fig.  3  The basic framework diagram of the model. The model includes three branches, namely global stream, object stream and part stream. The features of the original image are extracted from the global stream, and the position of the object is obtained by AAOLM to crop the object. In object stream, the discriminative patches are selected according to the scores from the default patches generated by different anchors as the final local parts. Part stream generate more discriminative features by learning semantic relationship between these local parts in HLIGM

    图  4  异构局部交互图模块结构. 将输入图像的局部部件特征向量作为图的节点表示, 首先对节点表示先进行线性变换, 再将变换后的任意表示两两间进行拼接通过一个单前馈层和非线性激活函数映射为实数作为图的边值反映节点间的关联度, 对各个节点的边值进行softmax得到一个可比较的注意力矩阵. 图中注意力矩阵暗的部分表示不同类别的局部区域节点之间的注意力, 而亮的部分则表示属于相同类别之间的注意力, 对它们计算相应的注意力正则化损失来约束其注意力权值. 对节点的邻居表示根据它们之间的注意力权值加权求和并通过残差学习来更新节点原始的表示

    Fig.  4  The structure of HLIGM. It takes the feature vectors of local parts of the input image as node representations in the graph. Firstly, a linear transformation is applied to the node representations. Then, any two transformed representations are concatenated and passed through a single feedforward layer combined with a non-linear activation function to map them into real numbers, serving as edge values in the graph to reflect the degree of association between nodes. A softmax operation is performed on the edge values of every node to obtain a comparable attention matrix. In this matrix, the darker areas indicate the attention between nodes of local regions of different categories, while the brighter areas represent the attention between nodes of the same category. Corresponding attention regularization losses are calculated for them to constrain their attention weights. The representations of neighboring nodes are then aggregated according to their attention-weighted sum, and through residual learning, the original representation of the nodes is updated

    图  5  使用CAM和本文AAOLM的峰值响应图的可视化结果. (a) 原始图像. (b) CAM生成的热力图. (c) AAOLM在Resnet-50的$ Conv_{5b} $卷积块输出特征上的注意力图. (d) AAOLM在Resnet-50的$ Conv_{5c} $卷积块输出特征上的注意力图

    Fig.  5  Visualization results of peak response maps using CAM and AAOLM in this paper. (a) Original image. (b) Heat map generated by CAM. (c) Attention map of $ Conv_{5b} $ convolution block of Resnet-50 by AAOLM. (d) The attention map of the $ Conv_{5c} $ convolution block of Resnet-50 by AAOLM

    图  6  (a) (b) 通过t-SNE可视化部件流主干网络输出特征的聚类分布, 在CUB-200-2011测试数据集上对比异构局部交互图模块对判别性的影响. (a) 不带异构局部交互图模块训练的部件流主干网络输出特征的可视化结果. (b) 学习异构局部交互图模块反馈的信息后部件流网络输出特征的可视化结果

    Fig.  6  The clustering distribution of the output features of the part stream backbone network is visualized through t-SNE, comparing the impact of HLIGM on discriminative performance on the CUB-200-2011 test dataset. (a) Visualization results of the output features of the part stream backbone network without the heterogeneous local interaction graph module. (b) Visualization results of the output features of the part stream network after learning feedback information from the HLIGM

    表  1  在CUB-200–2011数据集上的对比实验结果, Backbone表示所使用的作为特征提取器的主干网络, Anno./DATA表示是否使用了额外的标注信息或者辅助数据

    Table  1  The comparative experimental results on CUB-200–2011 dataset, Backbone indicates the backbone used as a feature extractor, and Anno./DATA indicates whether additional labeling information or auxiliary data is used

    Methods Backbone Anno./DATA Accuracy (%)
    RA-CNN[6] VGG-19 85.3
    HSnet[29] Inception Anno. 87.5
    PART[27] ResNet-50 89.6
    Mask-CNN[30] VGG-16 Anno. 87.3
    S3N[28] ResNet-50 88.5
    NTSN[46] ResNet-50 87.5
    ACNet[47] ResNet-50 88.1
    GDSMP-Net[48] ResNet-101 88.1
    MetaFGNet[31] ResNet-50 Data 87.6
    DCL[37] ResNet-50 88.6
    DBT[32] ResNet-101 88.1
    GCL[12] ResNet-50 88.3
    AENet[19] ResNet-101 88.6
    MGE-CNN[17] ResNet-101 89.4
    GHRD[20] ResNet-50 89.6
    PMG[33] ResNet-50 89.9
    Ours textNet-50 90.2
    Ours ResNet-101 90.5
    下载: 导出CSV

    表  2  在NA Birds数据集上的对比实验结果

    Table  2  Comparison Results on NA Birds dataset

    Methods Backbone Anno./DATA Accuracy (%)
    DSTL[49] Inception-v3 87.9
    MaxEnt[50] DenseNet-161 83.0
    PMG[33] ResNet-50 87.9
    MGE-CNN[17] ResNet-101 88.6
    CS-Part[51] ResNet-50 88.5
    API-NET[52] ResNet-101 88.1
    FixSENet-154[35] SENet-154 89.2
    GHRD[20] ResNet-50 88.0
    Ours ResNet-50 89.5
    Ours ResNet-101 89.9
    下载: 导出CSV

    表  3  在StanfordCars数据集上的对比实验结果

    Table  3  Comparison Results on StanfordCars dataset

    Methods Backbone Anno./DATA Accuracy (%)
    RA-CNN[6] VGG-19 92.5
    PSA-CNN[53] VGG-19 Anno. 92.6
    HSnet[29] Inception Anno. 93.9
    ACNet[47] ResNet-50 94.6
    S3N[28] ResNet-50 94.7
    NTSN[46] ResNet-50 93.9
    DCL[37] ResNet-50 94.5
    GCL[12] ResNet-50 94.0
    AENet[19] ResNet-101 93.7
    MGE-CNN[17] ResNet-101 93.9
    API-NET[52] ResNet-101 94.9
    SDNs[54] ResNet-101 94.6
    M2B[55] ResNet-50 94.7
    TransFG[36] ViT-B 16 94.8
    Ours ResNet-50 95.1
    Ours ResNet-101 95.5
    下载: 导出CSV

    表  4  在FGVC-Aircraft数据集上的对比实验结果

    Table  4  Comparison Results on FGVC-Aircraft dataset

    Methods Backbone Anno./DATA Accuracy (%)
    DTRG[56] ResNet-50 94.1
    MG-CNN[40] VGG-19 Anno. 83.0
    ACNet[47] ResNet-50 92.4
    S3N[28] ResNet-50 92.8
    NTSN[46] ResNet-50 91.4
    DCL[37] ResNet-50 93.0
    DBT[32] ResNet-101 91.6
    GCL[12] ResNet-50 93.2
    AENet[19] ResNet-101 93.8
    API-NET[52] ResNet-101 93.4
    GHRD[20] ResNet-50 94.3
    M2B[55] ResNet-50 93.3
    PMG[33] ResNet-50 94.1
    Ours ResNet-50 94.6
    Ours ResNet-101 94.8
    下载: 导出CSV

    表  5  在CUB-200-2011数据集上的消融实验结果

    Table  5  Ablation Results on CUB-200-2011 dataset

    Methods Accuracy (%)
    BL 84.5
    BL+DP 85.0
    BL+DP+HGAIM 88.4
    BL+DP+AAOLM 89.3
    BL+DP+AAOLM+HGAIM 90.2
    下载: 导出CSV

    表  6  泛化性实验分析对比结果

    Table  6  Comparison results of generalization experiment analysis

    Methodssun397aircraftcar
    SimCLR[43]63.9
    BYOL[44]63.7
    WSL[45]67.953.972.3
    Ours66.094.695.1
    下载: 导出CSV
  • [1] 罗建豪, 吴建鑫. 基于深度卷积特征的细粒度图像分类研究综述. 自动化学报, 2017, 43(8): 1306−1318

    Luo Jian-Hao, Wu Jian-Xin. A survey on fine-grained image categorization using deep convolutional features. Acta Automatica Sinica, 2017, 43(8): 1306−1318
    [2] 陈珺莹, 陈莹. 基于显著增强分层双线性池化网络的细粒度图像分类. 计算机辅助设计与图形学学报, 2021, 33(2): 241−249

    Chen Jun-Ying, Chen Ying. Saliency enhanced hierarchical bilinear pooling for fine-grained classification. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(2): 241−249
    [3] Liu D C, Zhao L J, Wang Y, Kato J. Learn from each other to classify better: Cross-layer mutual attention learning for fine-grained visual classification. Pattern Recognition, 2023, 140: Article No. 109550 doi: 10.1016/j.patcog.2023.109550
    [4] Song Y, Sebe N, Wang W. On the eigenvalues of global covariance pooling for fine-grained visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3554−3566
    [5] Chou P Y, Kao Y Y, Lin C H. Fine-grained visual classification with high-temperature refinement and background suppression. arXiv: 2303.06442, 2023. (查阅网上资料, 不确定文献类型及格式是否正确, 请确认)
    [6] Fu J L, Zheng H L, Mei T. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 4476−4484
    [7] Nie X, Chai B S, Wang L Y, Liao Q Y, Xu M. Learning enhanced features and inferring twice for fine-grained image classification. Multimedia Tools and Applications, 2023, 82(10): 14799−14813 doi: 10.1007/s11042-022-13619-z
    [8] Zheng S J, Wang G C, Yuan Y J, Huang S Q. Fine-grained image classification based on TinyVit object location and graph convolution network. Journal of Visual Communication and Image Representation, 2024, 100: Article No. 104120 doi: 10.1016/j.jvcir.2024.104120
    [9] Hu X B, Zhu S N, Peng T L. Hierarchical attention vision transformer for fine-grained visual classification. Journal of Visual Communication and Image Representation, 2023, 91: Article No. 103755 doi: 10.1016/j.jvcir.2023.103755
    [10] Zheng H L, Fu J L, Mei T, Luo J B. Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017. 5219−5227
    [11] He X T, Peng Y X, Zhao J J. Fine-grained discriminative localization via saliency-guided faster R-CNN. In: Proceedings of the 25th ACM International Conference on Multimedia. Mountain View, USA: ACM, 2017. 627−635
    [12] Wang Z H, Wang S J, Li H J, Dou Z, Li J J. Graph-propagation based correlation learning for weakly supervised fine-grained image classification. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 12289−12296
    [13] Wang S J, Wang Z H, Li H J, Ouyang W L. Category-specific semantic coherency learning for fine-grained image recognition. In: Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA: ACM, 2020. 174−183
    [14] Li K P, Wu Z Y, Peng K C, Ernst J, Fu Y. Tell me where to look: Guided attention inference network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, USA: IEEE, 2018. 9215−9223
    [15] Jiang P T, Han L H, Hou Q B, Cheng M M, Wei Y C. Online attention accumulation for weakly supervised semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(10): 7062−7077 doi: 10.1109/TPAMI.2021.3092573
    [16] Liu Y, Zhou L, Zhang P C, Bai X, Gu L, Yu X H, et al. Where to focus: Investigating hierarchical attention relationship for fine-grained visual classification. In: Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer, 2022. 57−73
    [17] Zhang L B, Huang S L, Liu W, Tao D C. Learning a mixture of granularity-specific experts for fine-grained categorization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE, 2019. 8330−8339
    [18] Chen W J, Ran S, Wang T, Cao L H. Learning how to zoom in: Weakly supervised ROI-based-DAM for fine-grained visual classification. In: Proceedings of the 30th International Conference on Artificial Neural Networks. Bratislava, Slovakia: Springer, 2021. 118−130
    [19] Hu Y T, Liu X H, Zhang B C, Han J G, Cao X B. Alignment enhancement network for fine-grained visual categorization. ACM Transactions on Multimedia Computing, Communications, and Applications, 2021, 17(1s): Article No. 12
    [20] Zhao Y F, Yan K, Huang F Y, Li J. Graph-based high-order relation discovery for fine-grained recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE, 2021. 15074−15083
    [21] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE, 2016. 770−778
    [22] Zhou B L, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 2921−2929
    [23] Wei X S, Luo J H, Wu J X, Zhou Z H. Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Transactions on Image Processing, 2017, 26(6): 2868−2881 doi: 10.1109/TIP.2017.2688133
    [24] Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640−651 doi: 10.1109/TPAMI.2016.2572683
    [25] Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137−1149 doi: 10.1109/TPAMI.2016.2577031
    [26] Wah C, Branson S, Welinder P, et al. The Caltech-UCSD Birds-200-2011 Dataset, Technical Report CNS-TR-2011-001, California Institute of Technology, University of California, San Diego La Jolla CA, USA, 2011.
    [27] Zhao Y F, Li J, Chen X W, Tian Y H. Part-guided relational transformers for fine-grained visual recognition. IEEE Transactions on Image Processing, 2021, 30: 9470−9481 doi: 10.1109/TIP.2021.3126490
    [28] Ding Y, Zhou Y Z, Zhu Y, Ye Q X, Jiao J B. Selective sparse sampling for fine-grained image recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE, 2019. 6598−6607
    [29] Lam M, Mahasseni B, Todorovic S. Fine-grained recognition as HSnet search for informative image parts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 6497−6506
    [30] Wei X S, Xie C W, Wu J X, Shen C H. Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recognition, 2018, 76: 704−714 doi: 10.1016/j.patcog.2017.10.002
    [31] Zhang Y B, Tang H, Jia K. Fine-grained visual categorization using meta-learning optimization with sample selection of auxiliary data. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 241−256
    [32] Zheng H L, Fu J L, Zha Z J, Luo J B. Learning deep bilinear transformation for fine-grained image representation. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: ACM, 2019. Article No. 385
    [33] Du R Y, Xie J Y, Ma Z Y, Chang D L, Song Y Z, Guo J. Progressive learning of category-consistent multi-granularity features for fine-grained visual classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(12): 9521−9535 doi: 10.1109/TPAMI.2021.3126668
    [34] Van Horn G, Branson S, Farrell R, Haber S, Barry J, Ipeirotis P, et al. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015. 595−604
    [35] Touvron H, Vedaldi A, Douze M, Jegou H. Fixing the train-test resolution discrepancy. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: ACM, 2019. Article No. 741
    [36] Krause J, Stark M, Deng J, Li F F. 3D object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops. Sydney, Australia: IEEE, 2013. 554−561 (查阅网上资料, 未能确认最后一位作者, 请确认)
    [37] Chen Y, Bai Y L, Zhang W, Mei T. Destruction and construction learning for fine-grained image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 5152−5161
    [38] He J, Chen J N, Liu S, Kortylewski A, Yang C, Bai Y T, et al. TransFG: A transformer architecture for fine-grained recognition. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence. Vancouver, Canada: AAAI, 2022. 852−860 (查阅网上资料, 未找到对应的出版地信息, 请确认)
    [39] Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A. Fine-grained visual classification of aircraft. arXiv: 1306.5151, 2013. (查阅网上资料, 不确定文献类型及格式是否正确, 请确认)
    [40] Wang D Q, Shen Z Q, Shao J, et al. Multiple granularity descriptors for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 2399−2406
    [41] Van Der Maaten L. Accelerating t-SNE using tree-based algorithms. The Journal of Machine Learning Research, 2014, 15(1): 3221−3245
    [42] Xiao J X, Hays J, Ehinger K A, Oliva A, Torralba A. Sun database: Large-scale scene recognition from abbey to zoo. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). San Francisco, USA: IEEE, 2010. 3485−3492
    [43] Chen T, Kornblith S, Norouzi M, Hinton G. A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning. Vienna, Austria: ACM, 2020. Article No. 149 (查阅网上资料, 未找到对应的出版地信息, 请确认)
    [44] Grill J B, Strub F, Altché F, Tallec C, Richemond P H, Buchatskaya E, et al. Bootstrap your own latent a new approach to self-supervised learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: ACM, 2020. Article No. 1786
    [45] Mahajan D, Girshick R, Ramanathan V, He K M, Paluri M, Li Y X, et al. Exploring the limits of weakly supervised pretraining. In: Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018. 185−201
    [46] Yang Z, Luo T G, Wang D, Hu Z Q, Gao J, Wang L W. Learning to navigate for fine-grained classification. In: Proceedings of the 15th European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018. 420−435
    [47] Ji R Y, Wen L Y, Zhang L B, Du D W, Wu Y J, Zhao C, et al. Attention convolutional binary neural tree for fine-grained visual categorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 10465−10474
    [48] Ke X, Cai Y H, Chen B T, Liu H, Guo W Z. Granularity-aware distillation and structure modeling region proposal network for fine-grained image classification. Pattern Recognition, 2023, 137: Article No. 109305 doi: 10.1016/j.patcog.2023.109305
    [49] Cui Y, Song Y, Sun C, Howard A, Belongie S. Large scale fine-grained categorization and domain-specific transfer learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, USA: IEEE, 2018. 4109−4118
    [50] Dubey A, Gupta O, Raskar R, Naik N. Maximum entropy fine-grained classification. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: ACM, 2018. 635−645
    [51] Korsch D, Bodesheim P, Denzler J. Classification-specific parts for improving fine-grained visual categorization. In: Proceedings of the 41st DAGM German Conference on Pattern Recognition. Dortmund, Germany: Springer, 2019. 62−75
    [52] Zhuang P Q, Wang Y L, Qiao Y. Learning attentive pairwise interaction for fine-grained classification. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 13130−13137
    [53] Krause J, Jin H L, Yang J C, Li F F. Fine-grained recognition without part annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE, 2015. 5546−5555 (查阅网上资料, 未能确认最后一位作者信息, 请确认)
    [54] Zhang L B, Huang S L, Liu W. Learning sequentially diversified representations for fine-grained categorization. Pattern Recognition, 2022, 121: Article No. 108219 doi: 10.1016/j.patcog.2021.108219
    [55] Liang Y Z, Zhu L C, Wang X H, Yang Y. Penalizing the hard example but not too much: A strong baseline for fine-grained visual classification. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(5): 7048−7059 doi: 10.1109/TNNLS.2022.3213563
    [56] Liu K J, Chen K, Jia K. Convolutional fine-grained classification with self-supervised target relation regularization. IEEE Transactions on Image Processing, 2022, 31: 5570−5584 doi: 10.1109/TIP.2022.3197931
  • 加载中
计量
  • 文章访问数:  51
  • HTML全文浏览量:  31
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-08-16
  • 录用日期:  2024-03-07
  • 网络出版日期:  2024-06-14

目录

    /

    返回文章
    返回