2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

带任意长度通配符的模式匹配

强继朋 谢飞 高隽 胡学钢 吴信东

强继朋, 谢飞, 高隽, 胡学钢, 吴信东. 带任意长度通配符的模式匹配. 自动化学报, 2014, 40(11): 2499-2511. doi: 10.3724/SP.J.1004.2014.02499
引用本文: 强继朋, 谢飞, 高隽, 胡学钢, 吴信东. 带任意长度通配符的模式匹配. 自动化学报, 2014, 40(11): 2499-2511. doi: 10.3724/SP.J.1004.2014.02499
QIANG Ji-Peng, XIE Fei, GAO Jun, HU Xue-Gang, WU Xin-Dong. Pattern Matching with Arbitrary-length Wildcards. ACTA AUTOMATICA SINICA, 2014, 40(11): 2499-2511. doi: 10.3724/SP.J.1004.2014.02499
Citation: QIANG Ji-Peng, XIE Fei, GAO Jun, HU Xue-Gang, WU Xin-Dong. Pattern Matching with Arbitrary-length Wildcards. ACTA AUTOMATICA SINICA, 2014, 40(11): 2499-2511. doi: 10.3724/SP.J.1004.2014.02499

带任意长度通配符的模式匹配

doi: 10.3724/SP.J.1004.2014.02499
基金项目: 

国家自然科学基金(61229301),国家重点基础研究发展计划 (973计划) (2013CB329604),教育部长江学者和创新团队发展计划(IRT13059), 中国博士后科学基金(2013M541822)资助

详细信息
    作者简介:

    强继朋 合肥工业大学计算机与信息学院博士研究生. 主要研究方向为模式匹配与挖掘.E-mail: qjp2100@163.com

    通讯作者:

    吴信东, 合肥工业大学计算机与信息学院长江学者. 1993 年获英国爱丁堡大学人工智能博士学位. 美国佛蒙特大学计算机科学系统教授, IEEE Fellow 和AAAS Fellow. 主要研究领域为数据挖掘, 专家系统和万维网信息处理. 本文通信作者. E-mail: xwu@hfut.edu.cn

Pattern Matching with Arbitrary-length Wildcards

Funds: 

Supported by National Natural Science Foundation of China (61229301), National Basic Research Development Program of China (973 Program) (2013CB329604), the Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of the Ministry of Education (IRT13059), and the Postdoctoral Science Foundation of China (2013M541822)

  • 摘要: 基因序列中,许多病毒并不是简单的直接复制自己,而是相邻字符间插入或者删除序列片段,如何从序列数据中检索这些病毒具有重要的研究价值.提出了一个更普遍的问题,带任意长度通配符的模式匹配问题(Pattern matching with arbitrary-length wildcards,PMAW),这里模式中不仅可以有多个通配符约束,而且每个通配符的约束可以是两个整数,也可以从整数到无穷大.给定序列S和带通配符的模式P,目标是从S中检索P的所有出现和每一次出现的匹配位置,并且要求任意两次出现不能共享序列中同一位置.为了有效地解决该问题,设计了两个基于位并行的匹配算法MOTW (Method of ocurrence then window)算法和MWTO (Method of window then ocurrence)算法.同时,MWTO算法进行细微改动就可以满足全局长度约束.实验结果既验证了算法求解问题的正确性,又验证了比相关的模式匹配算法具有更好的时间性能.
  • [1] Pisanti N, Crochemore M, Grossi R, Sagot M F. Bases of motifs for generating repeated patterns with wildcards. IEEE/ACM Transactions on Computational Biology Bioinformatics, 2005, 2(1): 40-50
    [2] Yue Feng, Sun Liang, Wang Kuan-Quan, Wang Yong-Ji, Zuo Wang-Meng. State-of-the-art of cluster analysis of gene expression data. Acta Automatica Sinica, 2008, 34(2): 113-120(岳峰, 孙亮, 王宽全, 王永吉, 左旺孟. 基因表达数据的聚类分析研究进展. 自动化学报, 2008, 34(2): 113-120)
    [3] Rahman M S, Iliopoulos C S, Lee I, Mohamed M, Smyth W F. Finding patterns with variable length gaps or don't cares. Computing and Combinatorics. Berlin, Heidelberg: Springer, 2006. 146-155
    [4] Ding B L, Lo D, Han J W, Khoo S C. Efficient mining of closed repetitive gapped subsequences from a sequence database. In: Proceedings of the 25th International Conference on Data Engineering. Shanghai, China: IEEE, 2009. 1024-1035
    [5] Wu X D, Zhu X Q, He Y, Arslan A N. PMBC: Pattern mining from biological sequences with wildcard constraints. Computers in Biology and Medicine, 2013, 43(5): 481-492
    [6] Wang Bao-Xun, Liu Bing-Quan, Sun Cheng-Jie, Wang Xiao-Long, Sun Lin. Thread segmentation based answer detection in Chinese online forums. Acta Automatica Sinica, 2009, 39(1): 11-20(王宝勋, 刘秉权, 孙承杰, 王晓龙, 孙林. 基于论坛话题段落划分的答案识别. 自动化学报, 2013, 39(1): 11-20)
    [7] Pan Yun-He, Wang Jin-Long, Xu Cong-Fu. State-of-the-art on frequent pattern mining in data streams. Acta Automatica Sinica, 2006, 32(4): 594-602(潘云鹤, 王金龙, 徐从富. 数据流频繁模式挖掘研究进展. 自动化学报, 2006, 32(4): 594-602)
    [8] Tanbeer S K, Ahmed C F, Jeong B S, Lee Y K. Efficient frequent pattern mining over data streams. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management. California, USA: ACM, 2008. 1447-1448
    [9] Altschul S F, Gish W, Miller W, Myers E W, Lipman D J. Basic local alignment search tool. Journal of Molecular Biology, 1990, 215(3): 403-410
    [10] Fischer M J, Paterson M S. String-matching and other products. Proceeding of the 7th SIAM AMS Complexity of Computation. Cambridge, USA, 1974. 113-125
    [11] Clifford P, Clifford R. Simple deterministic wildcard matching. Knowledge and Information Systems, 2007, 101(2): 53-54
    [12] Cole R, Gottlieb L A, Lewenstein M. Dictionary matching and indexing with errors and don't cares. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing. New York, USA: ACM, 2004. 91-100
    [13] Kucherov G, Rusinowitch M. Matching a set of strings with variable length don't cares. Theoretical Computer Science, 1997, 78(1-2): 129-154
    [14] Chen G, Wu X D, Zhu X Q, Arslan A N, He Y. Efficient string matching with wildcards and length constraints. Knowledge and Information Systems, 2006, 10(4): 399-419
    [15] Wu You-Xi, Liu Ya-Wei, Guo Lei, Wu Xin-Dong. Subnettrees for strict pattern matching with general gaps and length constraints. Journal of Software, 2013, 24(5): 915-932(武优西, 刘亚伟, 郭磊, 吴信东. 子网树求解一般间隙和长度约束严格模式匹配. 软件学报, 2013, 24(5): 915-932)
    [16] Wu You-Xi, Wu Xin-Dong, Jiang He, Min Fan. A heuristic algorithm for MPMGOOC. Chinese Journal of Computers, 2011, 34(8): 1452-1462(武优西, 吴信东, 江贺, 闵帆. 一种求解MPMGOOC问题的启发式算法. 计算机学报, 2011, 34(8): 1452-1462)
    [17] Bille P, Görtz I L, Vildhój H W, Wind D K. String matching with variable length gaps. Theoretical Computer Science, 2012, 443(1): 25-34
    [18] Wu Xin-Dong, Xie Fei, Huang Yong-Ming, Hu Xue-Gang, Gao Jun. Mining sequential patterns with wildcards and the one-off condition. Journal of Software, 2013, 24(8): 1804-1815(吴信东, 谢飞, 黄咏明, 胡学钢, 高隽. 带通配符和One-Off条件的序列模式挖掘. 软件学报, 2013, 24(8): 1804-1815)
    [19] Ji X N, Bailey J, Dong G Z. Mining minimal distinguishing subsequence patterns with gap constraints. Knowledge and Information Systems, 2007, 11(3): 259-286
    [20] Bille P, Thorup M. Languages and Programming. Berlin: Springer Heidelberg, 2009. 171-182
    [21] Navarro G, Raffinot M. New techniques for regular expression searching. Algorithmica, 2005, 41(2): 89-116
    [22] Indyk P. Faster algorithms for string matching problems: matching the convolution bound. In: Proceedings of the 39th Symposium on Foundations of Computer Science. Washington, DC, USA: IEEE, 1998. 166-173
    [23] Kalai A. Efficient pattern-matching with don't cares. In: Proceedings of the 13th ACM-SIAM Symposium on Discrete Algorithms. Philadelphia, PA, USA: ACM, 2002. 655-656
    [24] Manber U, Baeza-Yates R. An algorithm for string matching with a sequence of don't cares. Information Processing Letters, 1991, 37(3): 133-136
    [25] Navarro G, Raffinot M. Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. Journal of Computational Biology, 2003, 10(6): 903-923
    [26] Min F, Wu X D, Lu Z Y. Pattern matching with independent wildcard gaps. In: Proceedings of the 8th IEEE International Conference on Dependable, Autonomic and Secure Computing. Chengdu, China: IEEE, 2009. 194-199
    [27] Ferreira P G, Azevedo P J. Protein sequence pattern mining with constraints. In: Proceedings of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases. Berlin Heidelberg: Springer, 2005. 96-107
    [28] Aho A V, Corasick M J. Efficient string matching: an aid to bibliographic search. Communications of the ACM, 1975, 18(6): 333-340
    [29] Guo D, Hong X L, Hu X G, Gao J, Liu Y L, Wu G Q, Wu X D. A bit-parallel algorithm for sequential pattern matching with wildcards. Cybernetics and Systems, 2011, 42(6): 382-401
    [30] Blumer A, Blumer J, Haussler D, Ehrenfeucht A, Chen M T, Seiferas J. The smallest automation recognizing the subwords of a text. Theoretical Computer Science, 1985, 40(1): 31-55
    [31] Zhang M, Zhang Y, Hu L. A faster algorithm for matching a set of patterns with variable length don't cares. Information Processing Letters, 2010, 110(6): 216-220
    [32] Haapasalo T, Silvasti P, Sippu S, Soisalon-Soininen E. Online dictionary matching with variable-length gaps. In: Proceedings of the 10th International Symposium, SEA Kolimpari, Chania, Crete, Greece. Berlin Heidelberg: Springer, 2011. 76-87
  • 加载中
计量
  • 文章访问数:  1939
  • HTML全文浏览量:  93
  • PDF下载量:  1041
  • 被引次数: 0
出版历程
  • 收稿日期:  2013-12-05
  • 修回日期:  2014-02-17
  • 刊出日期:  2014-11-20

目录

    /

    返回文章
    返回