基于中醫(yī)臨床數(shù)據(jù)的疾病分類(lèi)關(guān)鍵方法研究
本文關(guān)鍵詞:基于中醫(yī)臨床數(shù)據(jù)的疾病分類(lèi)關(guān)鍵方法研究 出處:《西南石油大學(xué)》2017年碩士論文 論文類(lèi)型:學(xué)位論文
更多相關(guān)文章: 中醫(yī)臨床數(shù)據(jù) 疾病分類(lèi) 不均衡數(shù)據(jù)分類(lèi) 多標(biāo)記分類(lèi) 特征選擇
【摘要】:隨著中醫(yī)信息化發(fā)展,中醫(yī)診斷的客觀化研究日益受到國(guó)內(nèi)外重視。如何充分利用寶貴的中醫(yī)臨床數(shù)據(jù)資源來(lái)為中醫(yī)學(xué)診療提供科學(xué)決策支持,促進(jìn)中醫(yī)學(xué)進(jìn)一步發(fā)展,已成為研究的重點(diǎn)。數(shù)據(jù)挖掘是解決這些問(wèn)題的一個(gè)新方法,而分類(lèi)作為數(shù)據(jù)挖掘的主要研究?jī)?nèi)容之一,在中醫(yī)臨床輔助診斷中日益受到重視。特征選擇可以提高分類(lèi)性能,同時(shí)也為尋找中醫(yī)特征和疾病之間的關(guān)系提供一種新思路。本文結(jié)合已收集中醫(yī)臨床數(shù)據(jù)的實(shí)際情況,從不均衡數(shù)據(jù)分類(lèi)、多標(biāo)記分類(lèi)、特征選擇對(duì)分類(lèi)的影響這三個(gè)關(guān)鍵方面,對(duì)臨床數(shù)據(jù)進(jìn)行疾病分類(lèi)研究。期望通過(guò)提高分類(lèi)性能,進(jìn)而提高計(jì)算機(jī)輔助診斷能力。主要工作有:第一,不均衡數(shù)據(jù)疾病分類(lèi)方面。從數(shù)據(jù)層面入手,結(jié)合中醫(yī)臨床數(shù)據(jù)的實(shí)際情況,在欠采樣的基礎(chǔ)上進(jìn)行改進(jìn)。結(jié)合改進(jìn)的抽樣方式、Asymmetric Bagging提出改進(jìn)算法FPUSAB。實(shí)驗(yàn)結(jié)果表明,與Asymmetric Bagging相比,FPUSAB算法在AUC上平均提升了 10.5%,在Bacc上平均提升為8.4%。第二,多標(biāo)記數(shù)據(jù)疾病分類(lèi)方面。針對(duì)中醫(yī)臨床數(shù)據(jù)存在的類(lèi)別不均衡以及ML-kNN在尋找近鄰的缺點(diǎn),在WML-kNN的基礎(chǔ)上引入粒計(jì)算提出了改進(jìn)算法WM4LG-GkNN。實(shí)驗(yàn)結(jié)果表明,與改進(jìn)前的算法相比,WML-GkNN在Hammin Loss上平均提升11.2%,在Avg precision上平均提升5.3%,Coverage上平均提升2.1%,One-Error上平均提升5.1%Ranking loss上平均提升7.6%。第三,特征選擇對(duì)分類(lèi)的影響。中醫(yī)臨床數(shù)據(jù)特征較多,不利于計(jì)算機(jī)輔助診斷。針對(duì)不均衡數(shù)據(jù)疾病分類(lèi)的特征選擇,引入預(yù)測(cè)風(fēng)險(xiǎn)標(biāo)準(zhǔn),基于FPUSAB算法提出了PRFS-FPUSAB算法,實(shí)驗(yàn)表明特征選擇后AUC平均提升了 7.4%;對(duì)于多標(biāo)記疾病分類(lèi),使用在冠心病具有很好選擇性能的HOML算法對(duì)多標(biāo)記數(shù)據(jù)進(jìn)行特征選擇,實(shí)驗(yàn)表明特征選擇后分類(lèi)指標(biāo)Hamming Loss平均提升17.77%、Avg precision平均均提升6.28%、Coverage 平均提升 15.73%、One-Error 平均提升 10.21%、Ranking Loss、平均提升25.22%,并且選擇出的特征符合中醫(yī)學(xué)相關(guān)疾病理論。
[Abstract]:With the development of information technology of traditional Chinese medicine, the research on the objectification of TCM diagnosis has been paid more and more attention at home and abroad. How to make full use of valuable TCM clinical data resources to provide scientific decision support for TCM diagnosis and treatment and promote the further development of TCM has become the focus of research. Data mining is a new method to solve these problems. Classification as one of the main contents of data mining is attracting more and more attention in clinical assistant diagnosis of TCM. The feature selection can improve the classification performance, and also provide a new way of thinking for the relationship between the characteristics of traditional Chinese medicine and the disease. Based on the actual situation of clinical data collected from TCM, the three key aspects of unbalanced data classification, multi label classification and feature selection on classification are studied in this paper. It is expected to improve the ability of computer aided diagnosis by improving the classification performance. The main work is: first, disequilibrium data classification. From the data level, combined with the actual situation of clinical data of traditional Chinese medicine, it is improved on the basis of undersampling. Combined with the improved sampling method and Asymmetric Bagging, the improved algorithm FPUSAB is proposed. The experimental results show that, compared with Asymmetric Bagging, the FPUSAB algorithm increases by 10.5% on the average of AUC, and the average increase is 8.4% on Bacc. Second, multi label data classification. Aiming at the imbalance of TCM clinical data and the shortcoming of ML-kNN in finding neighbors, we propose an improved algorithm WM4LG-GkNN based on WML-kNN and introducing granular computing. The experimental results show that, compared with the improved algorithm, WML-GkNN increased by 11.2% on Hammin Loss, increased by 5.3% on Avg precision, increased by 2.1% on Coverage, and increased by 7.6% on average on 5.1%Ranking loss on One-Error. Third, the influence of feature selection on classification. The clinical data of traditional Chinese medicine are characterized by many characteristics, which are not conducive to computer aided diagnosis. According to the characteristics of imbalanced data classification of diseases, the prediction risk criterion, the proposed PRFS-FPUSAB algorithm based on FPUSAB algorithm, experiments show that the feature selection of AUC improved by 7.4% on average; for the classification of multi marker of disease, good use of HOML algorithm on the performance of multi label data for feature selection in coronary heart disease with experiment shows that after feature selection the classification index of Hamming Loss average increase 17.77%, average Avg precision 6.28%, Coverage average increase of 15.73% increase, the average increase of 10.21%, Ranking One-Error Loss, the average increased by 25.22%, and features selected in accordance with the theory of traditional Chinese medicine related diseases.
【學(xué)位授予單位】:西南石油大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:R24;TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 潘主強(qiáng);張林;顏仕星;李國(guó)正;張磊;;中醫(yī)睡眠情緒類(lèi)疾病不均衡數(shù)據(jù)的分類(lèi)研究[J];濟(jì)南大學(xué)學(xué)報(bào)(自然科學(xué)版);2017年01期
2 余鷹;;多標(biāo)記學(xué)習(xí)研究綜述[J];計(jì)算機(jī)工程與應(yīng)用;2015年17期
3 趙海峰;余強(qiáng);曹俞旦;;基于粒計(jì)算的多標(biāo)簽懶惰學(xué)習(xí)算法[J];計(jì)算機(jī)科學(xué);2014年12期
4 何志芬;楊明;劉會(huì)東;;多標(biāo)記分類(lèi)和標(biāo)記相關(guān)性的聯(lián)合學(xué)習(xí)[J];軟件學(xué)報(bào);2014年09期
5 謝娜娜;房斌;吳磊;;不均衡數(shù)據(jù)集上文本分類(lèi)方法研究[J];計(jì)算機(jī)工程與應(yīng)用;2013年20期
6 李敏;卡米力·木依丁;;特征選擇方法與算法的研究[J];計(jì)算機(jī)技術(shù)與發(fā)展;2013年12期
7 李國(guó)正;曾雪強(qiáng);;中醫(yī)臨床數(shù)據(jù)分析挖掘的研究進(jìn)展[J];國(guó)際生物醫(yī)學(xué)工程雜志;2013年02期
8 陶新民;郝思媛;張冬雪;徐鵬;;不均衡數(shù)據(jù)分類(lèi)算法的綜述[J];重慶郵電大學(xué)學(xué)報(bào)(自然科學(xué)版);2013年01期
9 趙自翔;王廣亮;李曉東;;基于支持向量機(jī)的不平衡數(shù)據(jù)分類(lèi)的改進(jìn)欠采樣方法[J];中山大學(xué)學(xué)報(bào)(自然科學(xué)版);2012年06期
10 朱明;陶新民;;基于隨機(jī)下采樣和SMOTE的不均衡SVM分類(lèi)算法[J];信息技術(shù);2012年01期
,本文編號(hào):1343119
本文鏈接:http://sikaile.net/zhongyixuelunwen/1343119.html