-
摘要: 基因表达数据的爆炸性增长迫切需求自动、有效的数据分析工具. 目前聚类分析已成为分析基因表达数据获取生物学信息的有力工具. 为了更好地挖掘基因表达数据, 近年来提出了许多改进的传统聚类算法和新聚类算法. 本文首先简单介绍了基因表达数据的获取和表示, 之后系统地介绍了近年来应用在基因表达数据分析中的聚类算法. 根据聚类目标的不同将算法分为基于基因的聚类、基于样本的聚类和两路聚类, 并对每类算法介绍了其生物学的含义及其难点, 详细讨论了各种算法的基本原理及优缺点. 最后总结了当前的基因表达数据的聚类分析方法,并对发展趋势作了进一步的展望.Abstract: The flood of gene expression data provided by the DNA microarray technology has driven the development of automated analysis techniques and tools. Cluster analysis is an effective and practical method to mine the huge amount of gene expression data to gain important genetic and biological information. Many improved conventional clustering algorithms as well as new clustering algorithms have been proposed recently to process the gene expression data. This survey first introduces how to produce and represent the gene expression data, and then discusses the state-of-the-art cluster algorithms applied to gene expression data. According to the goals of clustering, clustering algorithms are divided into three categories: gene-based clustering, sample-based clustering, and biclustering. Basic biological principles and challenges for each category are presented. For each category, the basic principle is discussed in detail as well as its advantages and drawbacks. This paper concludes with a summarization in this field and a discussion of future trends.
-
Key words:
- DNA microarray /
- gene expression data /
- cluster analysis
计量
- 文章访问数: 3484
- HTML全文浏览量: 81
- PDF下载量: 3305
- 被引次数: 0