2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于映射字典学习的跨模态哈希检索

姚涛 孔祥维 付海燕 TIANQi

姚涛, 孔祥维, 付海燕, TIANQi. 基于映射字典学习的跨模态哈希检索. 自动化学报, 2018, 44(8): 1475-1485. doi: 10.16383/j.aas.2017.c160433
引用本文: 姚涛, 孔祥维, 付海燕, TIANQi. 基于映射字典学习的跨模态哈希检索. 自动化学报, 2018, 44(8): 1475-1485. doi: 10.16383/j.aas.2017.c160433
YAO Tao, KONG Xiang-Wei, Fu Hai-Yan, TIAN Qi. Projective Dictionary Learning Hashing for Cross-modal Retrieval. ACTA AUTOMATICA SINICA, 2018, 44(8): 1475-1485. doi: 10.16383/j.aas.2017.c160433
Citation: YAO Tao, KONG Xiang-Wei, Fu Hai-Yan, TIAN Qi. Projective Dictionary Learning Hashing for Cross-modal Retrieval. ACTA AUTOMATICA SINICA, 2018, 44(8): 1475-1485. doi: 10.16383/j.aas.2017.c160433

基于映射字典学习的跨模态哈希检索

doi: 10.16383/j.aas.2017.c160433
基金项目: 

国家自然科学基金 71421001

模式识别国家重点实验室开放课题 201407349

国家自然科学基金 61429201

国家自然科学基金 61172109

国家自然科学基金 61502073

详细信息
    作者简介:

    姚涛  大连理工大学信息与通信工程学院博士研究生.主要研究方向为多媒体检索, 计算机视觉与模式识别.E-mail:yaotaoedu@mail.dlut.edu.cn

    付海燕  大连理工大学信息与通信工程学院副教授.2014年获得大连理工大学博士学位.主要研究方向为图像检索和计算机视觉.E-mail:fuhy@dlut.edu.cn

    TIANQi:TIAN Qi  美国德克萨斯大学圣安东尼奥分校计算机科学学院教授.IEEEFellow.2002年获得伊利诺伊大学厄巴纳-香槟分校电子与计算工程博士学位.主要研究方向为多媒体信息检索, 模式识别和计算机视觉.E-mail:qitian@cs.utsa.edu

    通讯作者:

    孔祥维  浙江大学数据科学与管理工程学系教授.2003获得大连理工大学管理科学与工程专业博士学位.2006~2007年美国普渡大学访问学者.主要研究方向为人工智能和商务分析, 大数据分析, 跨媒体检索和安全.本文通信作者.E-mail:kongxiangwei@zju.edu.cn

Projective Dictionary Learning Hashing for Cross-modal Retrieval

Funds: 

National Natural Science Foundation of China 71421001

the Open Projects Program of National Laboratory of Pattern Recognition 201407349

National Natural Science Foundation of China 61429201

National Natural Science Foundation of China 61172109

National Natural Science Foundation of China 61502073

More Information
    Author Bio:

     Ph. D. candidate at the School of Information and Communication Engineering, Dalian University of Technology. His research interest covers multimedia retrieval, computer vision, and machine learning

     Associate professor at the School of Information and Communication Engineering, Dalian University of Technology. She received her Ph. D. degree from Dalian University of Technology in 2014. Her research interest covers image retrieval and computer vision

     Professor in the Department of Computer Science at the University of Texas at San Antonio, USA. IEEE Fellow. He received Ph. D. degree in electrical and computer engineering from the University of Illinois, Urbana-Champaign in 2002. His research interest covers multimedia information retrieval, machine learning, and computer vision

    Corresponding author: KONG Xiang-Wei  Professor at the Department of Data Science and Engineering Management, Zhejiang University. She received her Ph. D. degree in management science and engineering from Dalian University of Technology, in 2003. She is a visiting researcher at Purdue University, USA, from 2006 to 2007. Her research interest covers artificial intelligence and business analysis, big data analysis, cross-modal retrieval and security. Corresponding author of this paper
  • 摘要: 针对网络上出现越来越多的多模态数据,如何在海量数据中检索不同模态的数据成为一个新的挑战.哈希方法把数据映射到Hamming空间,大大降低了计算复杂度,为海量数据的跨模态检索提供了一条有效的路径.然而,大部分现存方法生成的哈希码不包含任何语义信息,从而导致算法性能的下降.为了解决这个问题,本文提出一种基于映射字典学习的跨模态哈希检索算法.首先,利用映射字典学习一个共享语义子空间,在子空间保持数据模态间的相似性.然后,提出一种高效的迭代优化算法得到哈希函数,但是可以证明问题的解并不是唯一的.因此,本文提出通过学习一个正交旋转矩阵最小化量化误差,得到性能更好的哈希函数.最后,在两个公开数据集上的实验结果说明了该算法优于其他现存方法.
    1)  本文责任编委 朱军
  • 图  1  算法的收敛性分析

    Fig.  1  Convergence analysis of the proposed optimization algorithm

    图  2  码长16 bits在Wiki数据集的PR曲线图

    Fig.  2  PR curves on Wiki dataset with the code length fixed to 16 bits

    图  3  码长32 bits在Wiki数据集的PR曲线图

    Fig.  3  PR curves on Wiki dataset with the code length fixed to 32 bits

    图  4  码长16 bits在NUS-WIDE数据集的PR曲线图

    Fig.  4  PR curves on NUS-WIDE dataset with the code length fixed to 16 bits

    图  5  码长32 bits在NUS-WIDE数据集的PR曲线图

    Fig.  5  PR curves on NUS-WIDE dataset with the code length fixed to 32 bits

    表  1  图像检索文本和文本检索图像任务在Wiki数据集上的实验结果(MAP@200)

    Table  1  MAP@200 results on Wiki dataset for the tasks of using the image to query texts and vice versa

    算法任务8162432任务8162432
    CCA0.20470.18150.16340.16940.20360.16630.15270.1595
    CVH0.20380.19510.16820.16740.19970.18330.17030.1613
    SCM-O0.19070.17180.16730.17040.18890.16690.16100.1661
    SCM-S0.21290.23530.23370.23770.20370.24110.24190.2507
    CMFH0.21850.23000.23770.24200.22160.23330.23520.2390
    LSSH0.18950.20840.22320.20940.18410.21270.23080.2157
    STMH0.19070.19260.22010.23210.18960.21300.22600.2240
    PDLH-0.21880.21750.23850.23160.22170.21620.23640.2325
    PDLH0.21960.23010.24990.23840.22250.22760.24230.2430
    下载: 导出CSV

    表  2  图像检索文本和文本检索图像任务在NUS-WIDE数据集上的实验结果(MAP@200)

    Table  2  MAP results on NUS-WIDE dataset for the tasks of using the image to query texts and vice versa (MAP@200)

    算法任务8162432任务8162432
    CCA0.34450.34130.34650.34240.37220.36200.37310.3562
    CVH0.33950.34350.34400.33570.36760.37060.36200.3481
    SCM-O0.36870.35800.35670.35010.42050.40230.38660.3977
    SCM-S0.40980.44430.44130.44820.48280.50120.50670.5222
    CMFH0.33740.35860.37780.38030.38430.39840.40930.4120
    LSSH0.34650.37160.37700.40730.36860.37360.38410.4184
    STMH0.37230.39220.40670.41560.39790.41140.42350.4322
    PDLH--0.40100.44230.44780.45050.43620.50030.50780.5128
    PDLH0.41370.44560.45300.47140.45300.50340.51350.5172
    下载: 导出CSV

    表  3  同数量训练样本的训练时间(s)和MAP结果

    Table  3  The time costs (s) and MAP results with different sizes of training dataset

    训练集训练时间文本检索图像图像检索文本
    大小(s)MAPMAP
    10 00030.250.48390.4603
    20 00058.750.54660.4973
    50 000750.770.56430.5520
    10 0000325.900.57190.5584
    150 000504.590.60280.5603
    下载: 导出CSV
  • [1] Andoni A, Indyk P. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science. Berkeley, USA: IEEE, 2006. 459-468 http://cn.bing.com/academic/profile?id=d697ff3b4000193b22b8e0e0e7ec6c83&encoded=0&v=paper_preview&mkt=zh-cn
    [2] Kulis B, Kristen G. Kernelized locality-sensitive hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(6):1092-1104 doi: 10.1109/TPAMI.2011.219
    [3] 李武军, 周志华.大数据哈希学习:现状与趋势.科学通报, 2015, 60(5-6):485-490 http://d.old.wanfangdata.com.cn/Periodical/jsjfzsjytxxxb201612015

    Li Wu-Jun, Zhou Zhi-Hua. Learning to hash for big data:current status and future trends. Chinese Science Bulletin, 2015, 60(5-6):485-490 http://d.old.wanfangdata.com.cn/Periodical/jsjfzsjytxxxb201612015
    [4] Weiss Y, Torralba A, Fergus R. Spectral hashing. In: Proceedings of the 22nd Annual Conference on Neural Information Processing Systems. British Columbia, Canada: MIT, 2008. 1753-1760
    [5] Liu W, Wang J, Ji R R, Jiang Y G, Chang S F. Supervised hashing with kernels. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, Rhode Island, USA: IEEE, 2012. 2074-2081 doi: 10.1109/CVPR.2012.6247912
    [6] Gong Y C, Lazebnik S. Iterative quantization: a procrustean approach to learning binary codes. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado, USA: IEEE, 2011. 817-824 http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=5995432
    [7] Shen F M, Shen C H, Liu W, Shen H T. Supervised discrete hashing. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015. 37-45 http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=7298598
    [8] Song J K, Yang Y, Huang Z, Shen H T, Hong R C. Multiple feature hashing for real-time large scale near-duplicate video retrieval. In: Proceedings of the 19th ACM International Conference on Multimedia. New York, USA: ACM, 2011. 423-432 http://dl.acm.org/citation.cfm?id=2072354
    [9] Zhang D, Wang F, Si L. Composite hashing with multiple information sources. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. Beijing, China: ACM, 2011. 225-234 http://dl.acm.org/citation.cfm?id=2009950
    [10] Xu H, Wang J D, Li Z, Zeng G, Li S P, Yu N H. Complementary hashing for approximate nearest neighbor search. In: Proceedings of the 2011 IEEE International Conference on Computer Vision. Barcelona, Spain: IEEE, 2011. 1631-1638 http://dl.acm.org/citation.cfm?id=2356416
    [11] Bronstein M M, Bronstein A M, Michel F, Paragios N. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE, 2010. 3594-3601 http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=5539928
    [12] Kumar S, Udupa R. Learning hash functions for cross-view similarity search. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. Barcelona, Spain: AAAI, 2011. 1360-1366 http://dl.acm.org/citation.cfm?id=2283623
    [13] Ding G G, Guo Y C, Zhou J L. Collective matrix factorization hashing for multimodal data. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014. 2083-2080 doi: 10.1109/CVPR.2014.267
    [14] Zhou J L, Ding G G, Guo Y C. Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th ACM SIGIR Conference on Research and Development in Information Retrieval. Gold Coast, Australia: ACM, 2014. 415-424 http://dl.acm.org/citation.cfm?id=2609610
    [15] Zhuang Y T, Wang Y F, Wu F, Zhang Y, Lu W M. Supervised coupled dictionary learning with group structures for multi-modal retrieval. In: Proceedings of the 27th AAAI Conference on Artificial Intelligence. Washington, USA: AAAI, 2013. 1070-1076 https://www.researchgate.net/publication/285957475_Supervised_coupled_dictionary_learning_with_group_structures_for_multi-modal_retrieval
    [16] Zhen Y, Yeung D Y. A probabilistic model for multimodal hash function learning. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Beijing, China: ACM, 2012. 940-948 http://dl.acm.org/citation.cfm?id=2339678
    [17] Rafailidis D, Crestani F. Cluster-based joint matrix factorization hashing for cross-modal retrieval. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. Pisa, Italy: ACM, 2016. 781-784 http://dl.acm.org/citation.cfm?id=2914710
    [18] Hotelling H. Relations between two sets of variates. Biometrika, 1936, 28(3-4):321-377 doi: 10.1093/biomet/28.3-4.321
    [19] Zhang D Q, Li W J. Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence. Québec, Canada: AAAI, 2014. 2177-2183 http://dl.acm.org/citation.cfm?id=2892854
    [20] 练秋生, 石保顺, 陈书贞.字典学习模型、算法及其应用研究进展.自动化学报, 2015, 41(2):240-260 http://www.aas.net.cn/CN/abstract/abstract18604.shtml

    Lian Qiu-Sheng, Shi Bao-Shun, Chen Shu-Zhen. Research advances on dictionary learning models, algorithms and applications. Acta Automatica Sinica, 2015, 41(2):240-260 http://www.aas.net.cn/CN/abstract/abstract18604.shtml
    [21] 陈思宝, 赵令, 罗斌.基于局部保持的核稀疏表示字典学习.自动化学报, 2014, 40(10):2295-2305 http://www.aas.net.cn/CN/abstract/abstract18504.shtml

    Chen Si-Bao, Zhao Ling, Luo Bin. Locality preserving based kernel dictionary learning for sparse representation. Acta Automatica Sinica, 2014, 40(10):2295-2305 http://www.aas.net.cn/CN/abstract/abstract18504.shtml
    [22] Yan Y, Yang Y, Shen H Q, Meng D Y, Liu G W, Hauptmann A, Sebe N. Complex event detection via event oriented dictionary learning. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence. Austin, USA: AAAI, 2015. 3841-3847 http://dl.acm.org/citation.cfm?id=2888249
    [23] 黄丹丹, 孙怡.基于判别性局部联合稀疏模型的多任务跟踪.自动化学报, 2016, 42(3):402-415 http://www.aas.net.cn/CN/abstract/abstract18829.shtml

    Huang Dan-Dan, Sun Yi. Tracking via multitask discriminative local joint sparse appearance model. Acta Automatica Sinica, 2016, 42(3):402-415 http://www.aas.net.cn/CN/abstract/abstract18829.shtml
    [24] Sun X X, Nasrabadi N M, Tran T D. Task-driven dictionary learning for hyperspectral image classification with structured sparsity constraints. IEEE Transactions on Geoscience and Remote Sensing, 2015, 53(8):4457-4471 doi: 10.1109/TGRS.2015.2399978
    [25] 马名浪, 何小海, 滕奇志, 陈洪刚, 卿粼波.基于自适应稀疏变换的指纹图像压缩.自动化学报, 2016, 42(8):1274-1284 http://www.aas.net.cn/CN/abstract/abstract18916.shtml

    Ma Ming-Lang, He Xiao-Hai, Teng Qi-Zhi, Chen Hong-Gang, Qing Lin-Bo. Fingerprint image compression algorithm via adaptive sparse transformation. Acta Automatica Sinica, 2016, 42(8):1274-1284 http://www.aas.net.cn/CN/abstract/abstract18916.shtml
    [26] 郑思龙, 李元祥, 魏宪, 彭希帅.基于字典学习的非线性降维方法.自动化学报, 2016, 42(7):1065-1076 http://www.aas.net.cn/CN/abstract/abstract18897.shtml

    Zheng Si-Long, Li Yuan-Xiang, Wei Xian, Peng Xi-Shuai. Nonlinear dimensionality reduction based on dictionary learning. Acta Automatica Sinica, 2016, 42(7):1065-1076 http://www.aas.net.cn/CN/abstract/abstract18897.shtml
    [27] Gu S H, Zhang L, Zuo W M, Feng X C. Projective dictionary pair learning for pattern classification. In: Proceedings of the 2014 Advances in Neural Information Processing Systems. Montréal, Canada: MIT, 2014. 793-801 http://hdl.handle.net/10397/16587
    [28] Guo J, Guo Y Q, Kong X W, He R. Discriminative analysis dictionary learning. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. Phoenix, USA: AAAI, 2016. 1617-1623 http://aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11918
    [29] Rasiwasia N, Pereira J C, Coviello E, Doyle G, Lanckriet G R G, Levy R, Vasconcelos N. A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM International Conference on Multimedia. New York, USA: ACM, 2010. 251-260 http://dl.acm.org/citation.cfm?id=1873987
    [30] Chua T S, Tang J H, Hong R C, Li H J, Luo Z P, Zheng Y T. NUS-WIDE: a real-world web image database from national university of Singapore. In: Proceedings of the 2009 ACM International Conference on Image and Video Retrieval. Santorini Island, Greece: ACM, 2009. Article No. 48 http://dl.acm.org/citation.cfm?id=1646452
    [31] Wang D, Gao X B, Wang X M, He L H. Semantic topic multimodal hashing for cross-media retrieval. In: Proceedings of the 24th International Conference on Artificial Intelligence. Buenos Aires, Argentina: AAAI, 2015. 3890-3896
  • 加载中
图(5) / 表(3)
计量
  • 文章访问数:  2082
  • HTML全文浏览量:  450
  • PDF下载量:  729
  • 被引次数: 0
出版历程
  • 收稿日期:  2016-05-27
  • 录用日期:  2017-04-21
  • 刊出日期:  2018-08-20

目录

    /

    返回文章
    返回