粗糙集理論處理海量電子病歷的研究與應(yīng)用
本文選題:知識(shí)挖掘 + 粗糙集理論 ; 參考:《浙江理工大學(xué)》2017年碩士論文
【摘要】:隨著智慧醫(yī)療的興起,大量的醫(yī)療數(shù)據(jù)資源被整合到一起。醫(yī)療大數(shù)據(jù)作為一筆寶貴的財(cái)富,它的知識(shí)挖掘已經(jīng)成為當(dāng)前學(xué)術(shù)領(lǐng)域的一個(gè)研究重點(diǎn)。由于數(shù)據(jù)量和冗余屬性的增多,使得知識(shí)挖掘出現(xiàn)困難。如何對(duì)海量醫(yī)療數(shù)據(jù)實(shí)現(xiàn)有效的降維,提高知識(shí)挖掘的效率,是本文的研究方向。粗糙集理論在探究不完備數(shù)據(jù)、不準(zhǔn)確知識(shí)表述、概括、學(xué)習(xí)等方面十分強(qiáng)大,屬性約簡(jiǎn)是其主要應(yīng)用之一。本文總結(jié)常用粗糙集屬性約簡(jiǎn)算法存在的問題,提出粗糙集屬性約簡(jiǎn)與禁忌搜索算法相結(jié)合的優(yōu)化策略以及并行化方案,并利用仿真實(shí)驗(yàn)和疾病分類實(shí)驗(yàn)對(duì)算法性能進(jìn)行驗(yàn)證,不僅給約簡(jiǎn)算法的改進(jìn)提供了很好的思路,同時(shí)為大數(shù)據(jù)集的高效處理提供了可能。具體的研究?jī)?nèi)容如下:(1)通過查閱相關(guān)國內(nèi)外文獻(xiàn),對(duì)常見的粗糙集屬性約簡(jiǎn)算法進(jìn)行分析,總結(jié)出各算法之間存在的問題,確定了本文研究的主要內(nèi)容。(2)針對(duì)粗糙集理論和禁忌搜索算法的特點(diǎn),提出禁忌搜索屬性約簡(jiǎn)算法。首先描述算法的組成,包括解的表示形式、解精度度量、禁忌列表、產(chǎn)生鄰近候選解、廣泛性和集中性模式,然后介紹算法的整個(gè)實(shí)現(xiàn)流程。同時(shí)為了提高禁忌搜索的屬性約簡(jiǎn)算法的擴(kuò)展性,提出了禁忌搜索的屬性約簡(jiǎn)算法的并行化方案。(3)為了測(cè)試禁忌搜索屬性約簡(jiǎn)算法的基本性能,以UCI數(shù)據(jù)集作為實(shí)驗(yàn)數(shù)據(jù),利用本文提出的算法和幾種常見的屬性約簡(jiǎn)算法進(jìn)行仿真實(shí)驗(yàn),根據(jù)實(shí)驗(yàn)結(jié)果,分別從可行性、穩(wěn)定性、約簡(jiǎn)效果等方面對(duì)各算法進(jìn)行對(duì)比分析。(4)為了測(cè)試禁忌搜索屬性約簡(jiǎn)算法的有效性,搭建Hadoop實(shí)驗(yàn)環(huán)境,以海量電子病歷作為實(shí)驗(yàn)數(shù)據(jù),在數(shù)據(jù)預(yù)處理階段使用傳統(tǒng)的四種屬性約簡(jiǎn)算法和本文提出的基于禁忌搜索的屬性約簡(jiǎn)算法進(jìn)行屬性約簡(jiǎn),在分類階段使用樸素貝葉斯分類算法構(gòu)造5種疾病分類器。通過疾病分類實(shí)驗(yàn),對(duì)基于禁忌搜索屬性約簡(jiǎn)算法的有效性進(jìn)行證明。
[Abstract]:With the rise of smart medicine, a large number of medical data resources have been integrated. Medical big data as a valuable asset, its knowledge mining has become a research focus in the current academic field. Knowledge mining is difficult because of the increase of data and redundant attributes. How to reduce dimensionality effectively and improve the efficiency of knowledge mining is the research direction of this paper. Rough set theory is very powerful in exploring incomplete data, inaccurate knowledge representation, generalization, learning and so on. Attribute reduction is one of its main applications. In this paper, the problems of attribute reduction algorithms in rough sets are summarized, and the optimization strategy and parallelization scheme combining attribute reduction in rough sets with Tabu search algorithm are put forward, and the performance of the algorithm is verified by simulation experiments and disease classification experiments. It not only provides a good idea for the improvement of the reduction algorithm, but also provides the possibility for the efficient processing of big data sets. The specific research contents are as follows: (1) by referring to the relevant domestic and foreign literature, the common attribute reduction algorithm of rough set is analyzed, and the problems among the algorithms are summarized. According to the characteristics of rough set theory and Tabu search algorithm, a Tabu search attribute reduction algorithm is proposed. First, the composition of the algorithm is described, including the representation of the solution, the measurement of solution precision, the Tabu list, the generation of adjacent candidate solutions, extensiveness and centralized mode, and then the whole implementation process of the algorithm is introduced. In order to improve the expansibility of Tabu search attribute reduction algorithm, a parallelization scheme of Tabu search attribute reduction algorithm is proposed. In order to test the basic performance of Tabu search attribute reduction algorithm, UCI dataset is used as experimental data. By using the proposed algorithm and several common attribute reduction algorithms for simulation experiments, according to the experimental results, respectively, from the feasibility, stability, In order to test the effectiveness of Tabu search attribute reduction algorithm, a Hadoop experimental environment is built, and a large number of electronic medical records are used as experimental data. In the stage of data preprocessing, the traditional four attribute reduction algorithms and the Tabu search-based attribute reduction algorithm are used to reduce the attributes. In the classification stage, the naive Bayes classification algorithm is used to construct five kinds of disease classifiers. The effectiveness of attribute reduction algorithm based on Tabu search is proved by disease classification experiment.
【學(xué)位授予單位】:浙江理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:R-05;TP18
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 俞榮強(qiáng);闕挺;;企業(yè)醫(yī)院運(yùn)用“互聯(lián)網(wǎng)+醫(yī)療”加強(qiáng)慢性病管理的設(shè)想[J];中國臨床研究;2016年06期
2 劉智慧;張泉靈;;大數(shù)據(jù)技術(shù)研究綜述[J];浙江大學(xué)學(xué)報(bào)(工學(xué)版);2014年06期
3 相海泉;;醫(yī)療衛(wèi)生信息化“3521工程”[J];中國信息界(e醫(yī)療);2013年10期
4 汪偉;鄒璇;詹雪;;論數(shù)據(jù)挖掘中的數(shù)據(jù)預(yù)處理技術(shù)[J];煤炭技術(shù);2013年05期
5 黃勇軍;朱永慶;;新一代互聯(lián)網(wǎng)發(fā)展趨勢(shì)與技術(shù)淺析[J];電信科學(xué);2013年04期
6 李天瑞;陳紅梅;楊燕;;粗糙集理論及應(yīng)用[J];國際學(xué)術(shù)動(dòng)態(tài);2013年02期
7 陳林;鄧大勇;閆電勛;;基于屬性重要度并行約簡(jiǎn)算法的優(yōu)化[J];南京大學(xué)學(xué)報(bào)(自然科學(xué)版);2012年04期
8 劉正偉;文中領(lǐng);張海濤;;云計(jì)算和云數(shù)據(jù)管理技術(shù)[J];計(jì)算機(jī)研究與發(fā)展;2012年S1期
9 楊傳健;葛浩;汪志圣;;基于粗糙集的屬性約簡(jiǎn)方法研究綜述[J];計(jì)算機(jī)應(yīng)用研究;2012年01期
10 胡立花;丁世飛;丁浩;;基于啟發(fā)式的粗糙集屬性約簡(jiǎn)算法研究[J];計(jì)算機(jī)工程與設(shè)計(jì);2011年04期
相關(guān)博士學(xué)位論文 前1條
1 張雪英;基于粗糙集理論的文本自動(dòng)分類研究[D];南京理工大學(xué);2005年
,本文編號(hào):1961356
本文鏈接:http://sikaile.net/shoufeilunwen/mpalunwen/1961356.html