改進(jìn)的關(guān)聯(lián)規(guī)則算法在慢性病數(shù)據(jù)挖掘中的研究
本文選題:慢性病 + 關(guān)聯(lián)規(guī)則; 參考:《浙江理工大學(xué)》2017年碩士論文
【摘要】:關(guān)聯(lián)規(guī)則挖掘作為數(shù)據(jù)挖掘技術(shù)重要的研究分支,其目的是從大量數(shù)據(jù)中發(fā)現(xiàn)數(shù)據(jù)項之間的相關(guān)關(guān)系。由于挖掘產(chǎn)生的規(guī)則形式簡單、易于理解,關(guān)聯(lián)規(guī)則技術(shù)的研究和應(yīng)用得到了蓬勃發(fā)展。我國慢性病患者人數(shù)眾多。為了有效利用慢性病患者的醫(yī)療數(shù)據(jù),為預(yù)防和管控慢性病提供科學(xué)依據(jù),本文選取了慢性病之一的高血壓進(jìn)行數(shù)據(jù)挖掘方面的研究。本文主要探究高血壓患者體征與心血管風(fēng)險水平之間的相關(guān)性,以及高血壓與其他慢性病之間的關(guān)聯(lián)性,重點完成了以下工作:(1)查閱國內(nèi)外相關(guān)文獻(xiàn),分析了數(shù)據(jù)挖掘技術(shù)在慢性病等醫(yī)療領(lǐng)域的研究現(xiàn)狀,總結(jié)了我國現(xiàn)階段在醫(yī)療數(shù)據(jù)分析中存在的問題,確立了論文研究的主要內(nèi)容及路線。(2)對數(shù)據(jù)挖掘技術(shù)及關(guān)聯(lián)規(guī)則的相關(guān)理論進(jìn)行闡述,重點研究了關(guān)聯(lián)規(guī)則挖掘中的Apriori算法,分析了該算法在性能方面的瓶頸,并探討了現(xiàn)有的優(yōu)化方法,為算法的改進(jìn)拓寬了思路。(3)針對Apriori算法運(yùn)行效率上的缺陷,進(jìn)行如下改進(jìn):采用聚簇矩陣壓縮存儲事務(wù)數(shù)據(jù)庫,避免多次掃描原有的事務(wù)庫;引入事先剪枝策略以產(chǎn)生較少候選項集,避免頻繁項目集的大量連接的開銷;添加慢性病類型這一約束條件,減少頻繁項目集和無關(guān)規(guī)則的產(chǎn)生。最后通過Matlab仿真實驗對比分析,證明了改進(jìn)算法能夠有效降低候選項目集的數(shù)量,并提高運(yùn)行效率。(4)設(shè)計慢性病數(shù)據(jù)挖掘方案,將改進(jìn)的Apriori算法應(yīng)用到對高血壓患者的體檢數(shù)據(jù)處理中。對數(shù)據(jù)進(jìn)行預(yù)處理,設(shè)置最小支持度和置信度閾值,給定約束和相關(guān)度等條件,進(jìn)行關(guān)聯(lián)規(guī)則挖掘。利用Logistic回歸分析方法,探究慢性病之間的相關(guān)性,將分析結(jié)果與數(shù)據(jù)挖掘產(chǎn)生的規(guī)則對照,發(fā)現(xiàn)兩種方法產(chǎn)生的結(jié)果相吻合,證實了實驗的有效性。實驗最終挖掘出符合醫(yī)學(xué)規(guī)律的關(guān)聯(lián)規(guī)則,通過它們可以準(zhǔn)確判斷高血壓患者的心血管風(fēng)險水平,預(yù)估所患慢性病的并發(fā)癥,為醫(yī)生的診斷提供了有價值的參考,為實現(xiàn)自動化判診提供了理論研究基礎(chǔ)。(5)開發(fā)了慢性病數(shù)據(jù)挖掘系統(tǒng),并將改進(jìn)的Apriori算法融入。系統(tǒng)能夠探究慢性病醫(yī)療數(shù)據(jù)背后隱藏的知識,輔助醫(yī)生決策,具有一定的實用價值。
[Abstract]:As an important research branch of data mining, association rule mining aims to discover the correlation between data items from a large amount of data. Because the rules produced by mining are simple and easy to understand, the research and application of association rules are booming. The number of chronic disease patients in China is numerous. In order to effectively utilize the medical data of chronic disease patients and provide scientific basis for the prevention and control of chronic diseases, this paper selects hypertension, one of chronic diseases, to carry out data mining research. This paper mainly explores the correlation between physical signs and cardiovascular risk levels in patients with hypertension, and the correlation between hypertension and other chronic diseases, with the emphasis on completing the following work: 1) consulting relevant literature at home and abroad. This paper analyzes the research status of data mining technology in the field of chronic diseases and other medical fields, and summarizes the problems existing in medical data analysis in China at the present stage. In this paper, the main content and route of this paper are established. The related theories of data mining and association rules are expounded. The Apriori algorithm in association rule mining is studied, and the bottleneck of the algorithm in performance is analyzed. This paper also discusses the existing optimization methods, broadens the train of thought for the improvement of the algorithm. (3) aiming at the shortcomings of the Apriori algorithm, the following improvements are made: the clustering matrix is used to compress the storage transaction database to avoid scanning the original transaction database several times; A pruning strategy is introduced to generate fewer candidate itemsets to avoid the overhead of large connections of frequent itemsets and to add the constraint of chronic disease types to reduce the generation of frequent itemsets and irrelevant rules. Finally, through the comparative analysis of Matlab simulation experiments, it is proved that the improved algorithm can effectively reduce the number of candidate itemsets and improve the running efficiency. The improved Apriori algorithm is applied to the physical examination data processing of patients with hypertension. The data is preprocessed, the minimum support and confidence threshold is set, and the constraint and correlation are given to mine the association rules. Logistic regression analysis was used to explore the correlation between chronic diseases. The results were compared with the rules produced by data mining, and the results of the two methods were found to be consistent, which proved the validity of the experiment. The experiment finally excavated association rules that accord with the medical rules, through which we can accurately judge the cardiovascular risk level of patients with hypertension, predict the complications of chronic diseases, and provide a valuable reference for the diagnosis of doctors. The data mining system of chronic diseases is developed, and the improved Apriori algorithm is integrated into it. The system can explore the hidden knowledge behind the medical data of chronic diseases and assist doctors to make decisions. It has certain practical value.
【學(xué)位授予單位】:浙江理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 王素芹;田華;孫曉鳳;王素珍;;二分類Logistic回歸在冠心病危險因素研究中的應(yīng)用[J];中國醫(yī)院統(tǒng)計;2014年02期
2 王學(xué)松;郭強(qiáng);;醫(yī)療數(shù)據(jù)分析及數(shù)據(jù)挖掘方法的應(yīng)用[J];電子技術(shù)與軟件工程;2014年02期
3 翟鐵民;柴培培;魏強(qiáng);郭鋒;王從從;張毓輝;萬泉;趙郁馨;;我國慢性非傳染性疾病衛(wèi)生費(fèi)用與籌資分析[J];中國衛(wèi)生經(jīng)濟(jì);2014年02期
4 滕琪;樊小毛;何晨光;李燁;盧東昕;;醫(yī)療大數(shù)據(jù)特征挖掘及重大突發(fā)疾病早期預(yù)警[J];網(wǎng)絡(luò)新媒體技術(shù);2014年01期
5 王智鋼;王池社;馬青霞;;分布式并行關(guān)聯(lián)規(guī)則挖掘算法研究[J];計算機(jī)應(yīng)用與軟件;2013年10期
6 劉曉蔚;;量化交易中無需最小支持度閾值的模糊關(guān)聯(lián)規(guī)則挖掘[J];科學(xué)技術(shù)與工程;2013年26期
7 王臨虹;;慢性病防控要高度重視導(dǎo)致慢性病的社會決定因素[J];中國健康教育;2013年05期
8 王爽;;慢性病管理與循證醫(yī)學(xué)[J];中國實用內(nèi)科雜志;2012年04期
9 陳薇薇;;Mirth平臺和HL7標(biāo)準(zhǔn)下的PACS/RIS與HIS接口設(shè)計及其應(yīng)用[J];醫(yī)學(xué)信息學(xué)雜志;2012年01期
10 張琛;;BP神經(jīng)網(wǎng)絡(luò)模型優(yōu)化研究[J];吉林省教育學(xué)院學(xué)報;2011年07期
,本文編號:1934792
本文鏈接:http://sikaile.net/shoufeilunwen/xixikjs/1934792.html