Nonparametric Bayesian Clustering Methods of DNA Methylation Microarray
-
摘要: 面向 Illumina GoldenGate 甲基化微阵列数据提出了一种基于模型的聚类算法. 算法通过建立贝塔无限混合模型, 采用 Dirichlet 过程作为先验, 实现了基于数据和模型的聚类结构的建立, 实验结果表明该算法能够有效估计出聚类类别个数、 每个聚类类别的混合权重、每个聚类类别的特征等信息, 达到比较理想的聚类效果.
-
关键词:
- DNA 甲基化微阵列 /
- Dirichlet 过程 /
- 贝塔混合模型 /
- 吉布斯抽样
Abstract: A model based clustering algorithm for Illumina GoldenGate microarray data is proposed in this paper. By infinite beta mixture model and by adopting Dirichlet process as prior knowledge, the cluster structure can be defined based on model and data. Simulation results demonstrate that this methodology can estimate the number of clusters, the cluster mixing weight and the own characteristic of each cluster, and can reach relatively ideal clustering effect.-
Key words:
- DNA methylation microarray /
- Dirichlet process /
- beta mixture model /
- Gibbs sampling
-
[1] Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. Journal of Molecular Biology, 1987, 196(2): 261-282[2] Fan Shi-Cai, Zhang Xue-Gong. Progress of bioinformatics study in DNA methylation. Progress of Biochemistry and Biophysics, 2009, 36(2): 143-150 (凡时财, 张学工. DNA 甲基化的生物信息学研究进展. 生物化学与生物物理进展, 2009, 36(2): 143-150)[3] Jones P A, Baylin S B. The fundamental role of epigenetic events in cancer. Nature Reviews Genetics, 2002, 3(6): 415-428[4] Ang P W, Li W Q, Soong R, Lacopetta B. BRAF mutation is associated with the CpG island methylator phenotype in colorectal cancer from young patients. Cancer Letters, 2009, 273(2): 221-224[5] Siegmund K D, Laird P W, Laird-Offringa I A. A comparison of cluster analysis methods using DNA methylation data. Bioinformatics, 2004, 20(12): 1896-1904[6] Houseman E A, Christensen B C, Yeh R F, Marsit C J, Karagas M R, Wrensch M, Nelson H H, Wiemels J, Zheng S C, Wiencke J K, Kelsey K T. Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions. BMC Bioinformatics, 2008, 9(1): 365[7] Zhang Lin, Liu Hui. A Clustering method based on Dirichlet process mixture model. Journal of China University of Mining Technology, 2012, 41(1): 159-163 (张林, 刘辉. Dirichlet过程混合模型的聚类算法. 中国矿业大学学报, 2012, 41(1): 159-163)[8] Zhou Jian-Ying, Wang Fei-Yue, Zeng Da-Jun. Hierarchical Dirichlet processes and their applications: a survey. Acta Automatic Sinica, 2011, 37(4): 389-407 (周建英, 王飞跃, 曾大军. 分层Dirichlet过程及其应用综述. 自动化学报, 2011, 37(4): 389-407)[9] Bouguila N, Ziou D. A Dirichlet process mixture of generalized Dirichlet distributions for proportional data modeling. IEEE Transactions on Neural Networks, 2010, 21(1): 107-122[10] Kuan P F, Wang S J, Zhou X, Chu H T. A statistical framework for Illumina DNA methylation arrays. Bioinformatics, 2010, 26(22): 2849-2855[11] Escobar M D, West M. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 1995, 90(430): 577-588[12] Pitman Jim. Some developments of the Blackwell-MacQueen urn scheme. Lecture Notes-Monograph Series, 1996, 30: 245-267[13] MacEachern S N, Müller P. Estimating mixture of Dirichlet process models. Journal of Computational and Graphical Statistics, 1998, 7(2): 223-238[14] Gelman A, Carlin J B, Stern H S, Rubin D B. Bayesian Data Analysis (Second edition). Boca Raton: CRC press, 2004[15] Amigó E, Gonzalo J, Artiles J, Verdejo F. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval, 2009, 12(4): 461-486
点击查看大图
计量
- 文章访问数: 1838
- HTML全文浏览量: 73
- PDF下载量: 1511
- 被引次数: 0