2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一种用于蛋白质结构聚类的聚类中心选择算法

黄旭 吕强 钱培德

黄旭, 吕强, 钱培德. 一种用于蛋白质结构聚类的聚类中心选择算法. 自动化学报, 2011, 37(6): 682-692. doi: 10.3724/SP.J.1004.2011.00682
引用本文: 黄旭, 吕强, 钱培德. 一种用于蛋白质结构聚类的聚类中心选择算法. 自动化学报, 2011, 37(6): 682-692. doi: 10.3724/SP.J.1004.2011.00682
HUANG Xu, LV Qiang, QIAN Pei-De. An Exemplar Selection Algorithm for Protein Structures Clustering. ACTA AUTOMATICA SINICA, 2011, 37(6): 682-692. doi: 10.3724/SP.J.1004.2011.00682
Citation: HUANG Xu, LV Qiang, QIAN Pei-De. An Exemplar Selection Algorithm for Protein Structures Clustering. ACTA AUTOMATICA SINICA, 2011, 37(6): 682-692. doi: 10.3724/SP.J.1004.2011.00682

一种用于蛋白质结构聚类的聚类中心选择算法

doi: 10.3724/SP.J.1004.2011.00682

An Exemplar Selection Algorithm for Protein Structures Clustering

  • 摘要: 提出一种对蛋白质结构聚类中心进行选择的算法. 聚类是蛋白质结构预测过程中必不可少的一个后处理步骤, 而目前在蛋白质结构预测中常用的属性阈值(Quality threshold, QT)聚类算法依赖于由经验得出的聚类半径; 其他聚类算法, 如近邻传播(Affinity propagation, AP)聚类算法也存在影响聚类分布的参数. 为克服对主观经验参数的依赖,本文提出一种聚类中心选择算法(Exemplar selection algorithm, ESA), 用于对不同参数下的聚类结果进行分析,从而选择最佳聚类中心,进而确定聚类半径等经验参数. 该算法在真实蛋白质结构数据集上进行了实验,在未知经验参数情况下选择出最佳聚类中心, 同时也为不同聚类算法寻找适合相应数据集的客观聚类参数提供了支持.
  • [1] Anfinsen C B. Principles that govern the folding of protein chains. Science, 1973, 181(4096): 223-230[2] Bradley P, Misura K M S, Baker D. Toward high-resolution de novo structure prediction for small proteins. Science, 2005, 309(5742): 1868-1871[3] Zhang Y, Skolnick J. SPICKER: a clustering approach to identify near-native protein folds. Journal of Computational Chemistry, 2004, 25(6): 865-871[4] Wu S, Skolnich J, Zhang Y. Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biology, 2007, 5(1): 17-26 [5] Zhang Y. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins: Structure, Function, and Bioinformatics, 2007, 69(S8): 108-117 [6] Yue Feng, Sun Liang, Wang Kuan-Quan, Wang Yong-Ji, Zuo Wang-Meng. State-of-the-art of cluster analysis of gene expression data. Acta Automatica Sinica, 2008, 34(2): 113-120(岳峰, 孙亮, 王宽全, 王永吉, 左旺孟. 基因表达数据的聚类分析研究进展. 自动化学报, 2008, 34(2): 113-120)[7] Moult J, Fidelis K, Kryshtafovych A, Rost B, Hubbard T, Tramontano A. Critical assessment of methods of protein structure prediction --- round VII. Proteins: Structure, Function, and Bioinformatics, 2007, 69(S8): 3-9 [8] Heyer L J, Kruglyak S, Yooseph S. Exploring expression data: identification and analysis of coexpressed genes. Genome Research, 1999, 9: 1106-1115 [9] Wang Kai-Jun, Zhang Jun-Ying, Li Dan, Zhang Xin-Na, Guo Tao. Adaptive affinity propagation clustering. Acta Automatica Sinica, 2007, 33(12): 1242-1245(王开军, 张军英, 李丹, 张新娜, 郭涛. 自适应仿射传播聚类. 自动化学报, 2007, 33(12): 1242-1245)[10] Frey B J, Dueck D. Clustering by passing messages between data points. Science, 2007, 315(5814): 972-976[11] Shortle D, Simons K T, Baker D. Clustering of low-energy conformations near the native structures of small proteins. Proceedings of the National Academy of Sciences of the USA, 1998, 95(19): 11158-11162[12] Xiao Yu, Yu Jian. Semi-supervised clustering based on affinity propagation algorithm. Journal of Software, 2008, 19(11): 2803-2813(肖宇, 于剑. 基于近邻传播算法的半监督聚类. 软件学报, 2008, 19(11): 2803-2813)[13] Liu Ming, Wang Xiao-Long, Liu Yuan-Chao. A fast clustering algorithm for large-scale and high dimensional data. Acta Automatica Sinica, 2009, 35(7): 859-866(刘铭, 王晓龙, 刘远超. 一种大规模高维数据快速聚类算法. 自动化学报, 2009, 35(7): 859-866)[14] Ni Wei-Wei, Sun Zhi-Hui, Lu Jie-Ping. K-LDCHD --- a local density based k-neighborhood clustering algorithm for high dimensional space. Journal of Computer Research and Development, 2005, 42(5): 784-791(倪巍伟, 孙志挥, 陆介平. K-LDCHD --- 高维空间k邻域局部密度聚类算法. 计算机研究与发展, 2005, 42(5): 784-791)[15] Hubert M, Veeken S V. Outlier detection for skewed data. Journal of Chemometrics, 2008, 22(3-4): 235-246[16] Zhang Y. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics, 2008, 9(1): 40-47 [17] Rohl C A, Strrauss C E M, Misura K M S, Baker D. Protein structure prediction using Rosetta. Methods in Enzymology, 2004, 383: 66-93 [18] Kryshtafovych A, Milostan M, Szajkowski L, Daniluk P, Fidelis K. Casp6 data processing and automatic evaluation at the protein structure prediction center. Proteins: Structure, Function, and Bioinformatics, 2005, 61(S7): 19-23[19] Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins: Structure, Function, and Bioinformatics, 2004, 57(4): 702-710[20] Tress M, Ezkurdia I, Grana O, Lopez G, Valencia A. Assessment of predictions submitted for the CASP6 comparative modeling category. Proteins: Structure, Function, and Bioinformatics, 2005, 61(S7): 27-45
  • 加载中
计量
  • 文章访问数:  2237
  • HTML全文浏览量:  57
  • PDF下载量:  1033
  • 被引次数: 0
出版历程
  • 收稿日期:  2010-09-07
  • 修回日期:  2010-12-27
  • 刊出日期:  2011-06-20

目录

    /

    返回文章
    返回