天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 醫(yī)學(xué)論文 > 病理論文 >

基因表達(dá)數(shù)據(jù)聚類(lèi)分析算法研究和應(yīng)用

發(fā)布時(shí)間:2018-10-18 19:26
【摘要】: 隨著基因芯片技術(shù)的廣泛應(yīng)用,產(chǎn)生了海量的基因表達(dá)數(shù)據(jù)。如何分析和處理這些數(shù)據(jù),從中提取有用的生物學(xué)或醫(yī)學(xué)信息,是基因芯片技術(shù)應(yīng)用的關(guān)鍵和難點(diǎn),其研究已成為后基因組時(shí)代的熱點(diǎn)之一。聚類(lèi)分析能將功能相關(guān)的基因按表達(dá)譜的相似程度歸納成共同表達(dá)類(lèi)別,有助于對(duì)基因功能、基因調(diào)控、細(xì)胞過(guò)程及細(xì)胞亞型等進(jìn)行綜合研究,是目前基因表達(dá)數(shù)據(jù)分析的主要技術(shù)之一。本文針對(duì)基因表達(dá)數(shù)據(jù)聚類(lèi)分析中聚類(lèi)算法和參數(shù)的選擇、聚類(lèi)結(jié)果的有效性評(píng)價(jià)和類(lèi)數(shù)估計(jì)等具體問(wèn)題,主要工作和創(chuàng)新點(diǎn)如下: 1.首次采用具有外部標(biāo)準(zhǔn)的基因表達(dá)數(shù)據(jù)集,研究了基因聚類(lèi)分析中層次聚類(lèi)、K-means聚類(lèi)和SOMs等最為常用的算法對(duì)相似度和數(shù)據(jù)轉(zhuǎn)換方式的選擇,比較了各類(lèi)算法的性能。結(jié)果表明:層次聚類(lèi)宜以Pearson相關(guān)系數(shù)為相似度,并對(duì)數(shù)據(jù)進(jìn)行行標(biāo)準(zhǔn)化轉(zhuǎn)換;K-means聚類(lèi)和SOMs則宜選擇Euclidean距離準(zhǔn)則和標(biāo)準(zhǔn)化對(duì)數(shù)轉(zhuǎn)換的數(shù)據(jù)。并且,應(yīng)盡量避免使用單連接層次聚類(lèi), K-means聚類(lèi)與SOMs算法的性能顯著優(yōu)于層次聚類(lèi)。 2.研究了Silhouette指數(shù)、Dunn’s指數(shù)、Davies-Bouldin指數(shù)及FOM測(cè)量對(duì)基因聚類(lèi)分析結(jié)果的確認(rèn)能力。結(jié)果表明:Silhouette指數(shù)和FOM測(cè)量能較好地反映聚類(lèi)算法的性能和聚類(lèi)結(jié)果的質(zhì)量,Dunn’s指數(shù)因其對(duì)噪聲的高度敏感性不能直接用于基因聚類(lèi)結(jié)果的確認(rèn),Davies-Bouldin指數(shù)的確認(rèn)能力好于Dunn’s指數(shù),但偏愛(ài)單連接聚類(lèi)。 3.對(duì)Silhouette指數(shù)、Davies-Bouldin指數(shù)、FOM測(cè)量等函數(shù)的類(lèi)數(shù)估計(jì)能力進(jìn)行了研究。結(jié)果表明:Silhouette指數(shù)和Davies-Bouldin指數(shù)估計(jì)確切類(lèi)數(shù)的正確率都比較低,難于實(shí)際應(yīng)用;FOM測(cè)量的拐點(diǎn)位置只能粗略估計(jì)大致的類(lèi)數(shù),并含有不確定性和主觀性。定義了新的相對(duì)Silhouette指數(shù)和相對(duì)Davies-Bouldin指數(shù),以擴(kuò)展現(xiàn)有Silhouette指數(shù)和Davies-Bouldin指數(shù)估計(jì)類(lèi)數(shù)的能力。引入了類(lèi)數(shù)估計(jì)專(zhuān)用函數(shù)-預(yù)測(cè)強(qiáng)度進(jìn)行基因聚類(lèi)分析中類(lèi)數(shù)的估計(jì),提高了類(lèi)數(shù)估計(jì)的可靠性。 4.針對(duì)高分辨率SOMs投影結(jié)果難于確定類(lèi)邊界的問(wèn)題,采用K-means對(duì)SOMs訓(xùn)練后的網(wǎng)絡(luò)單元聚類(lèi),實(shí)現(xiàn)了SOMs算法與K-means聚類(lèi)的有機(jī)結(jié)合。采用SOMs與K-means相結(jié)合的聚類(lèi)方法對(duì)酵母二次遷移全基因組表達(dá)數(shù)據(jù)進(jìn)行了系統(tǒng)分析,得到了表達(dá)譜十分相似的基因類(lèi),為未知基因的功能預(yù)測(cè)提供了重要線索。
[Abstract]:With the wide application of gene chip technology, huge amounts of gene expression data are produced. How to analyze and process these data and extract useful biological or medical information from them is a key and difficult point in the application of gene chip technology. Its research has become one of the hotspots in the post-genome era. Cluster analysis can induce functional related genes into coexpression categories according to the similarity of expression profile, which is helpful for the comprehensive study of gene function, gene regulation, cell process and cell subtype. It is one of the main techniques of gene expression data analysis. In this paper, the selection of clustering algorithms and parameters in clustering analysis of gene expression data, the evaluation of the validity of clustering results and the estimation of cluster number are discussed. The main work and innovation are as follows: 1. Using the gene expression data set with external standard for the first time, this paper studies the selection of similarity and data conversion methods among the most commonly used algorithms in gene clustering analysis, such as hierarchical clustering, K-means clustering and SOMs, and compares the performance of various algorithms. The results show that the hierarchical clustering should take the Pearson correlation coefficient as the similarity and the data should be standardized converted, while the K-means clustering and the SOMs clustering should choose the Euclidean distance criterion and the normalized logarithmic transformation data. Moreover, single join hierarchical clustering should be avoided as far as possible. The performance of K-means clustering and SOMs clustering is significantly better than that of hierarchical clustering. 2. The ability of Silhouette index, Dunn's index, Davies-Bouldin index and FOM to confirm the results of gene cluster analysis was studied. The results show that Silhouette exponent and FOM measurement can well reflect the performance of clustering algorithm and the quality of clustering results. Because of its high sensitivity to noise, Dunn's index can not be directly used to confirm gene clustering results, and Davies-Bouldin index can confirm clustering results. The force is better than the Dunn's index, But preferred single join clustering. 3. The ability of class number estimation of Silhouette exponent, Davies-Bouldin exponent and FOM measurement is studied. The results show that the correct rate of Silhouette exponent and Davies-Bouldin exponent estimate the exact number of classes is low, which is difficult to be applied in practice, and the inflection point position of FOM measurement can only roughly estimate the approximate number of classes, with uncertainty and subjectivity. New relative Silhouette exponents and relative Davies-Bouldin exponents are defined to extend the ability of existing Silhouette exponents and Davies-Bouldin exponents to estimate class numbers. In this paper, a special function of cluster number estimation is introduced to estimate the number of clusters in gene cluster analysis, which improves the reliability of cluster number estimation. 4. In order to solve the problem that the high resolution SOMs projection results are difficult to determine the class boundary, K-means is used to cluster the network units trained by SOMs, and the combination of SOMs algorithm and K-means clustering is realized. By using SOMs and K-means clustering method, the whole genome expression data of yeast secondary migration were systematically analyzed, and the gene classes with similar expression profiles were obtained, which provided an important clue for the function prediction of unknown genes.
【學(xué)位授予單位】:天津大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2006
【分類(lèi)號(hào)】:R311

【引證文獻(xiàn)】

相關(guān)期刊論文 前6條

1 汪雪紅;焦清局;常盼盼;黃繼風(fēng);;基于最小編碼長(zhǎng)度的基因數(shù)據(jù)聚類(lèi)[J];安徽農(nóng)業(yè)科學(xué);2012年19期

2 王祥林;;基于矩陣變換的層次聚類(lèi)在基因表達(dá)數(shù)據(jù)分析中的應(yīng)用研究[J];計(jì)算機(jī)光盤(pán)軟件與應(yīng)用;2012年24期

3 汪雪紅;焦清局;常盼盼;黃繼風(fēng);;基于最小編碼長(zhǎng)度的基因數(shù)據(jù)聚類(lèi)(英文)[J];Agricultural Science & Technology;2012年06期

4 梅娟;徐明亮;胡e,

本文編號(hào):2280135


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/yixuelunwen/binglixuelunwen/2280135.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶(hù)43ce6***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com