2.624

2020影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于成对约束的偏标记数据消歧算法

征察 吉立新 高超 李邵梅 吴翼腾

征察, 吉立新, 高超, 李邵梅, 吴翼腾. 基于成对约束的偏标记数据消歧算法. 自动化学报, 2020, 46(7): 1367-1377. doi: 10.16383/j.aas.c170522
引用本文: 征察, 吉立新, 高超, 李邵梅, 吴翼腾. 基于成对约束的偏标记数据消歧算法. 自动化学报, 2020, 46(7): 1367-1377. doi: 10.16383/j.aas.c170522
ZHENG Cha, JI Li-Xin, GAO Chao, LI Shao-Mei, WU Yi-Teng. Partial Label Data Disambiguation Algorithm Based on Pairwise Constraints. ACTA AUTOMATICA SINICA, 2020, 46(7): 1367-1377. doi: 10.16383/j.aas.c170522
Citation: ZHENG Cha, JI Li-Xin, GAO Chao, LI Shao-Mei, WU Yi-Teng. Partial Label Data Disambiguation Algorithm Based on Pairwise Constraints. ACTA AUTOMATICA SINICA, 2020, 46(7): 1367-1377. doi: 10.16383/j.aas.c170522

基于成对约束的偏标记数据消歧算法

doi: 10.16383/j.aas.c170522
基金项目: 

国家自然科学基金 61601513

详细信息
    作者简介:

    吉立新  国家数字交换系统工程技术研究中心研究员.主要研究方向为电信网信息关防, 信息安全. E-mail: jlx@ndsc.com.cn

    高超  国家数字交换系统工程技术研究中心助理研究员.主要研究方向为计算机视觉. E-mail: chaosndsc@163.com

    李邵梅  国家数字交换系统工程技术研究中心副研究员.主要研究方向为计算机视觉. E-mail: lishaomei may@126.com

    吴翼腾  国家数字交换系统工程技术研究中心博士研究生.主要研究方向为网络大数据分析. E-mail: wuyiteng1992@163.com

    通讯作者:

    征察  国家数字交换系统工程技术研究中心硕士研究生.主要研究方向为机器学习, 计算机视觉.本文通信作者. E-mail: zcpi31415926@163.com

Partial Label Data Disambiguation Algorithm Based on Pairwise Constraints

Funds: 

National Natural Science Foundation of China 61601513

More Information
    Author Bio:

    JI Li-Xin   Professor at the China National Digital Switching System Engineering and Technological Research and Development Center. His research interest covers telecom network information gateway, and information security

    GAO Chao   Assistant professor at the China National Digital Switching System Engineering and Technological Research and Development Center. His main research interest is computer vision

    LI Shao-Mei   Associate professor at the China National Digital Switching System Engineering and Technological Research and Development Center. Her main research interest is computer vision

    WU Yi-Teng   Ph. D. candidate at the China National Digital Switching System Engineering and Technological Research and Development Center. His main research interest is network big data analysis

    Corresponding author: ZHENG Cha   Master student at the China National Digital Switching System Engineering and Technological Research and Development Center. His research interest covers machine learning and computer vision. Corresponding author of this paper
  • 摘要: 偏标记数据消歧是利用偏标记数据进行机器学习的基础.针对偏标记数据中广泛存在的数据不平衡问题, 以及现有消歧算法对样本间约束信息利用不足的问题, 本文提出一种基于成对约束的偏标记数据消歧算法.首先, 基于低秩表示, 推导出数据不平衡条件下样本低秩表示系数和样本相似度之间的关系; 其次, 基于推导结果, 分别构建基于样本间正约束和负约束的图模型, 通过最小化图模型的能量函数求解偏标记数据的标签.在5个公开数据集上的实验结果表明本文方法相对基准算法在消歧准确率上平均提高了2.9 % ~ 14.9 %.
    Recommended by Associate Editor WANG Li-Wei
    1)  本文责任编委 王立威
  • 图  1  典型的偏标记数据

    Fig.  1  Examples of typical partial label data

    图  2  正负约束作用于消歧的效果

    Fig.  2  The effects of positive and negative constraints on disambiguation

    图  3  基于成对约束的偏标记数据消歧算法流程

    Fig.  3  The main procedure of PLDPC

    图  4  数据集中不同类别样本数量分布

    Fig.  4  The distributions of different categories$'$ sample number in datasets

    图  5  PLDPC消歧准确率随不同参数的变化趋势

    Fig.  5  The accuracy of disambiguation changes as different parameters varying

    表  1  数据集信息

    Table  1  The information of datasets

    数据集 样本数量 特征维度 类别数量 平均候选标签数量 领域
    Lost 1 122 108 16 2.23 人脸自动标注
    MSRCV2 1 758 48 23 3.16 目标分类
    BirdSong 4 998 38 13 2.18 鸟鸣分类
    Soccer Player 17 472 279 171 2.09 人脸自动标注
    Yahoo!News 22 991 163 219 1.91 人脸自动标注
    下载: 导出CSV

    表  2  各算法消歧准确率(%)

    Table  2  The disambiguation accuracy of each algorithm (%)

    算法 Lost MSRCV2 BirdSong Soccer Player Yahoo!News
    PLDPC-abs 57.93 65.81 76.73 70.28 82.03
    PLDPC-$p$ 65.81 67.46 77.05 71.58 83.29
    PLDPC-$n$ 41.98 37.26 62.30 62.84 57.17
    PL-KNN 64.53 58.25 70.99 57.76 72.32
    IPAL 77.54 71.44 76.61 67.35 82.37
    PL-LEAF 79.32 66.67 75.55 70.50 82.90
    MMS 91.71 68.27 66.47 70.03 87.32
    PLDPC 87.61 72.70 79.25 73.68 85.22
    下载: 导出CSV

    表  3  各算法消歧处理时间(秒(s)、分钟(min)、天(d))

    Table  3  The processing time of each algorithm (second (s), minute (min), day (d))

    算法 Lost MSRCV2 BirdSong Soccer Player Yahoo!News
    PLDPC-abs 2.13 s 2.39 s 11.62 s 4 min 6 min
    PLDPC-$p$ 2.05 s 2.30 s 11.07 s 4 min 6 min
    PLDPC-$n$ 2.12 s 2.69 s 11.71 s 4 min 6 min
    PL-KNN 0.06 s 0.08 s 0.10 s 59.27 69.78 s
    IPAL 0.51 s 0.63 s 1.56 s 73.75 s 94.62 s
    PL-LEAF 56.04 s 4 min 35 min $>$1 d $>$1 d
    MMS 57.02 s 1 min 2 min 34 min 35min
    PLDPC 2.16 s 2.45 s 11.61 s 4 min 6 min
    下载: 导出CSV
  • [1] Su X P, Peng J Y, Feng X Y, Wu J. Labeling faces with names based on the name semantic network. Multimedia Tools and Applications, 2016, 75(11): 6445-6462 doi: 10.1007/s11042-015-2581-x
    [2] Jin R, Ghahramani Z. Learning with multiple labels. In: Proceedings of the 15th International Conference on Neural Information Processing Systems. Cambridge, MA: MIT Press, 2002. 921-928
    [3] Zhang M L, Yu F, Tang C Z. Disambiguation-free partial label learning. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(10): 2155-2167 doi: 10.1109/TKDE.2017.2721942
    [4] 张敏灵.偏标记学习研究综述.数据采集与处理, 2015, 30(1): 77-87 http://d.old.wanfangdata.com.cn/Periodical/sjcjycl201501007

    Zhang Min-Ling. Research on partial label learning. Journal of Data Acquisition and Processing, 2015, 30(1): 77-87 http://d.old.wanfangdata.com.cn/Periodical/sjcjycl201501007
    [5] Nguyen N, Caruana R. Classification with partial labels. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Las Vegas, Nevada, USA: ACM, 2008: 551-559
    [6] Chen Y C, Patel V M, Chellappa R, Phillips P J. Ambiguously labeled learning using dictionaries. IEEE Transactions on Information Forensics and Security, 2014, 9(12): 2076-2088 doi: 10.1109/TIFS.2014.2359642
    [7] Hüllermeier E, Beringer J. Learning from ambiguously labeled examples. In: Proceedings of the 6th International Symposium on Intelligent Data Analysis. Madrid, Spain: Springer, 2005. 168-179
    [8] Cour T, Sapp B, Taskar B. Learning from partial labels. Journal of Machine Learning Research, 2011, 12: 1501-1536 http://d.old.wanfangdata.com.cn/NSTLQK/NSTL_QKJJ0230239804/
    [9] Zhang M L, Yu F. Solving the partial label learning problem: an instance-based approach. In: Proceedings of the 24th International Conference on Artificial Intelligence. Buenos Aires, Argentina: AAAI Press, 2015. 4048-4054
    [10] Yu F, Zhang M L. Maximum margin partial label learning. Machine Learning, 2017, 106(4): 573-593 doi: 10.1007/s10994-016-5606-4
    [11] Olson C C, Judd K P, Nichols J M. Manifold learning techniques for unsupervised anomaly detection. Expert Systems with Applications, 2018, 91: 374-385 doi: 10.1016/j.eswa.2017.08.005
    [12] Liu G C, Lin Z C, Yu Y. Robust subspace segmentation by low-rank representation. In: Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel: ICML, 2010. 663-670
    [13] 王卫卫, 李小平, 冯象初, 王斯琪.稀疏子空间聚类综述.自动化学报, 2015, 41(8): 1373-1384 doi: 10.16383/j.aas.2015.c140891

    Wang Wei-Wei, Li Xiao-Ping, Feng Xiang-Chu, Wang Si-Qi. A survey on sparse subspace clustering. Acta Automatica Sinica, 2015, 41(8): 1373-1384 doi: 10.16383/j.aas.2015.c140891
    [14] 李波, 卢春园, 冷成财, 金连宝.基于局部图拉普拉斯约束的鲁棒低秩表示聚类方法.自动化学报, 2015, 41(11): 1971-1980 doi: 10.16383/j.aas.2015.c150031

    Li Bo, Lu Chun-Yuan, Leng Cheng-Cai, Jin Lian-Bao. Robust low rank subspace clustering based on local graph Laplace constraint. Acta Automatica Sinica, 2015, 41(11): 1971-1980 doi: 10.16383/j.aas.2015.c150031
    [15] Hou X, Yao G J, Wang J. Semi-supervised classification based on low rank representation. Algorithms, 2016, 9(3): Article No. 48
    [16] Pasteris S, Vitale F, Gentile C, Herbster M. On pairwise clustering with side information[Online], available http://arxiv.org/abs/1706.06474, December 9, 2017
    [17] 徐明亮, 王士同, 杭文龙.一种基于同类约束的半监督近邻反射传播聚类方法.自动化学报, 2016, 42(2): 255-269 doi: 10.16383/j.aas.2016.c150059

    Xu Ming-Liang, Wang Shi-Tong, Hang Wen-Long. A semi-supervised affinity propagation clustering method with homogeneity constraint. Acta Automatica Sinica, 2016, 42(2): 255-269 doi: 10.16383/j.aas.2016.c150059
    [18] Zhu X J, Ghahramani Z, Lafferty J. Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the Twentieth International Conference on Machine Learning. Washington DC, USA: ICML, 2003. 912-919
    [19] Zhu X J, Goldberg A B. Introduction to Semi-Supervised Learning. San Rafael: Morgan and Claypool Publishers, 2009. 1-130
    [20] 由从哲.子空间聚类分析新算法及应用研究[博士学位论文], 江南大学, 中国, 2017

    You Cong-Zhe. Novel Subspace Clustering Algorithms and Applications[Ph. D. dissertation], Jiangnan University, China, 2017
    [21] Zeng Z N, Xiao S J, Jia K, Chan T H, Gao S H, Xu D, et al. Learning by associating ambiguously labeled images. In: Proceedings of the 2013 IEEE Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2013. 708-715
    [22] Guillaumin M, Verbeek J, Schmid C. Multiple instance metric learning from automatically labeled bags of faces. In: Proceedings of the 11th European Conference on Computer Vision. Heraklion, Crete, Greece: Springe, 2010. 634-647
    [23] Liu L P, Dietterich T G. A conditional multinomial mixture model for superset label learning. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: Curran Associates Inc., 2012. 548-556
    [24] Briggs F, Fern X Z, Raich R. Rank-loss support instance machines for MIML instance annotation. In: Proceedings of the 18th International Conference on Knowledge Discovery and Data Mining. Beijing, China: ACM, 2012. 534-542
    [25] Luo J, Orabona F. Learning from candidate labeling sets. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems. Vancouver, British Columbia, Canada: Curran Associates Inc., 2010. 1504-1512
    [26] Zhang M L, Zhou B B, Liu X Y. Partial label learning via feature-aware disambiguation. In: Proceedings of the 22th International Conference on Knowledge Discovery and Data Mining. San Francisco, California, USA: ACM, 2016. 1335-1344
  • 加载中
图(5) / 表(3)
计量
  • 文章访问数:  802
  • HTML全文浏览量:  143
  • PDF下载量:  90
  • 被引次数: 0
出版历程
  • 收稿日期:  2017-09-13
  • 录用日期:  2018-04-16
  • 刊出日期:  2020-07-24

目录

    /

    返回文章
    返回