決策樹算法及其在冠心病診療中的應用研究
本文選題:多值屬性多類標 切入點:決策樹 出處:《大連海事大學》2017年碩士論文 論文類型:學位論文
【摘要】:近年來冠心病以其高發(fā)病率、高致死率的特點,給國人帶來了健康威脅與經(jīng)濟負擔。發(fā)展迅速的計算機技術(shù)為探索蘊含在中醫(yī)診療數(shù)據(jù)中的疾病防治規(guī)律和用藥知識提供了技術(shù)基礎。如何從已有的數(shù)據(jù)中挖掘到這些規(guī)律和知識,以及如何對這些規(guī)律和知識加以利用來為冠心病中醫(yī)診療提供輔助決策將是本文要研究的主要內(nèi)容。決策樹算法結(jié)果表達直觀易理解。為了直觀地反映冠心病癥狀與證候之間的關(guān)系,本文以多值屬性多類標決策樹算法為研究對象,針對該算法在冠心病中醫(yī)診療數(shù)據(jù)處理過程中出現(xiàn)的問題進行相應的改進,并通過實驗證明了改進的有效性。具體研究工作如下:第一,原有的多值屬性多類標決策樹算法在分裂屬性選取的過程中會忽略掉當前屬性取空值的數(shù)據(jù)。冠心病數(shù)據(jù)具有多缺失值的特點,原算法會大量丟失數(shù)據(jù),從而使分類準確率相對較低。針對這個問題,本文對算法屬性選擇進行了改進,增加了對空值的判斷,并將當前屬性值為空的數(shù)據(jù)作為新的子結(jié)點,從而保證數(shù)據(jù)不丟失。第二,若數(shù)據(jù)中出現(xiàn)過多的空值屬性,按照上述方法處理時容易出現(xiàn)過擬合的現(xiàn)象,造成決策樹規(guī)模過大,模型分類準確率不高的問題。針對這個問題,本文在建樹之前引入空值數(shù)目閾值,并根據(jù)該閾值對數(shù)據(jù)集進行預處理,排除掉空值數(shù)目過多的數(shù)據(jù),這樣分類準確率下降過快的問題得到解決。第三,在分裂效果評價階段,本文提出了新的相似度計算公式,使類標集之間的相似度計算更加合理,并且公式中的參數(shù)能夠反映集合之間的特征,根據(jù)這個特征自動調(diào)節(jié)。第四,實際應用方面,本文設計了一個輔助診療系統(tǒng),系統(tǒng)中應用了上述改進后的算法。根據(jù)選定的癥狀來對中醫(yī)證候進行預測,得到患者的中醫(yī)診斷結(jié)果供醫(yī)生參考。另外,系統(tǒng)中分類模型的訓練集能夠?qū)崿F(xiàn)動態(tài)增加,在系統(tǒng)之上能夠擴展出用于不同目的的專題挖掘子模塊。
[Abstract]:In recent years, coronary heart disease is characterized by its high morbidity and high mortality. The rapid development of computer technology provides a technical basis for exploring the laws of disease prevention and treatment and the knowledge of drug use contained in the data of TCM diagnosis and treatment. How to excavate from the existing data. To these laws and knowledge, And how to make use of these rules and knowledge to provide assistant decision for the diagnosis and treatment of coronary heart disease will be the main content of this paper. The results of decision tree algorithm are intuitionistic and easy to understand. In order to reflect coronary heart disease intuitively. The relationship between symptoms and syndromes, In this paper, the multi-valued attribute multi-class decision tree algorithm is taken as the research object, and the corresponding improvement of the algorithm in the process of data processing of TCM diagnosis and treatment of coronary heart disease is carried out. The effectiveness of the improvement is proved by experiments. The specific research work is as follows: first, The original multi-valued attribute multi-class decision tree algorithm will ignore the data of the current attribute null value in the process of splitting attribute selection. Coronary heart disease data has the characteristics of multiple missing values, the original algorithm will lose a large number of data. Therefore, the accuracy of classification is relatively low. In order to solve this problem, the algorithm attribute selection is improved, the judgment of null value is added, and the current data with null attribute value is regarded as a new child node. In order to ensure that the data is not lost. Secondly, if there are too many null attributes in the data, it is easy to appear the phenomenon of overfitting according to the above method, resulting in the decision tree scale is too large, and the accuracy of model classification is not high. In this paper, the threshold of the number of null values is introduced before the establishment of the tree, and the data set is preprocessed according to the threshold to exclude the data with too many empty values, so that the problem that the accuracy of classification falls too fast is solved. Thirdly, in the stage of split effect evaluation, In this paper, a new similarity calculation formula is put forward, which makes the similarity calculation between class sets more reasonable, and the parameters in the formula can reflect the characteristics of the set. According to this feature, 4th is automatically adjusted. In this paper, an auxiliary diagnosis and treatment system is designed. The improved algorithm is applied in the system. According to the selected symptoms, the TCM syndromes are predicted, and the results of TCM diagnosis of the patients are provided for the doctor's reference. The training set of classification model in the system can be dynamically increased, and the sub-module of topic mining for different purposes can be extended on the system.
【學位授予單位】:大連海事大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:R541.4;TP311.13
【相似文獻】
相關(guān)期刊論文 前5條
1 李懷慶;;決策樹算法在醫(yī)院數(shù)據(jù)挖掘中的應用探索[J];醫(yī)學信息學雜志;2009年08期
2 左鳳華;張?zhí)m華;王枚;;基于決策樹算法的護理質(zhì)量的應用研究(英文)[J];護理研究;2009年11期
3 陸榮展;相秉仁;徐建平;;決策樹算法在藥品GSP預警管理中的應用[J];醫(yī)學信息學雜志;2009年05期
4 楊婕;;決策樹算法在老年性癡呆病因病機分析中的應用[J];電子世界;2013年02期
5 ;[J];;年期
相關(guān)會議論文 前3條
1 韓松來;張輝;周華平;;決策樹算法中多值偏向問題的理論分析[A];全國自動化新技術(shù)學術(shù)交流會會議論文集(一)[C];2005年
2 楊林權(quán);呂維先;;基于決策樹算法的SimuroSot決策程序設計[A];馬斯特杯2003年中國機器人大賽及研討會論文集[C];2003年
3 王琦;;基于貝葉斯決策樹算法的垃圾郵件識別機制[A];2011年通信與信息技術(shù)新進展——第八屆中國通信學會學術(shù)年會論文集[C];2011年
相關(guān)碩士學位論文 前10條
1 王偉;具有降維容噪特性的決策樹算法改進[D];鄭州大學;2015年
2 薛硯丹;基于決策樹算法的高校財務管理與決策分析研究[D];寧夏大學;2015年
3 高帆;基于面向?qū)ο鬀Q策樹算法的土地利用遙感分類初步研究[D];云南師范大學;2015年
4 龍志勇;基于并行化的決策樹算法優(yōu)化及其應用研究[D];浙江大學;2015年
5 張敬軒;決策樹算法在違約預測中的應用[D];北京理工大學;2015年
6 李偉;決策樹算法應用及并行化研究[D];電子科技大學;2014年
7 張曉偉;銀行卡業(yè)務分析和數(shù)據(jù)挖掘系統(tǒng)的設計與實現(xiàn)[D];電子科技大學;2014年
8 劉勝濤;地源熱泵優(yōu)化控制系統(tǒng)設計與研究[D];電子科技大學;2016年
9 李海濤;基于Hadoop的決策樹算法改進及林業(yè)數(shù)據(jù)分類預測研究[D];東北林業(yè)大學;2016年
10 范志成;航空總線信息提取及優(yōu)化的研究[D];中國民航大學;2012年
,本文編號:1610631
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1610631.html