基于機器學(xué)習(xí)的卵巢腫瘤預(yù)測與分析研究
本文選題:機器學(xué)習(xí) + 數(shù)據(jù)挖掘 ; 參考:《吉林大學(xué)》2016年碩士論文
【摘要】:21世紀(jì)以來隨著信息科技的飛速發(fā)展,計算機在社會發(fā)展中發(fā)揮著越來越重要的作用。隨著醫(yī)院信息化的發(fā)展(醫(yī)院信息系統(tǒng)和電子病歷的應(yīng)用)、數(shù)據(jù)儲存技術(shù)的發(fā)展,醫(yī)院數(shù)據(jù)庫積累了大規(guī)模的數(shù)據(jù)。然而,目前大多數(shù)醫(yī)院對于數(shù)據(jù)的處理還僅僅停留在“增、刪、改、查”的低端數(shù)據(jù)處理操作,缺乏數(shù)據(jù)集成和分析的技術(shù),更加無法利用已經(jīng)獲取的數(shù)據(jù)進(jìn)行輔助醫(yī)學(xué)決策和自動獲取知識。另一方面,面對大量的數(shù)據(jù),傳統(tǒng)的數(shù)據(jù)分析和處理方法已經(jīng)無法獲得數(shù)據(jù)之間的隱藏信息和內(nèi)在關(guān)聯(lián),現(xiàn)在我們遇到的問題是,數(shù)據(jù)收集的手段得到飛速發(fā)展,數(shù)據(jù)存儲的技術(shù)得到顯著提高,但是如何利用這些來之不易的數(shù)據(jù)學(xué)以致用是我們現(xiàn)在主要面臨的問題。本文在研究了數(shù)據(jù)挖掘的相關(guān)理論基礎(chǔ)后,首先利用數(shù)據(jù)挖掘的相關(guān)理論基礎(chǔ)對收集到的用于評價卵巢腫瘤的關(guān)鍵醫(yī)學(xué)數(shù)據(jù)進(jìn)行篩選和預(yù)處理。通過學(xué)習(xí)機器學(xué)習(xí)算法選取了機器學(xué)習(xí)中適合于醫(yī)學(xué)數(shù)據(jù)挖掘的四種分類器:支持向量機,對于小的樣本集、非線性樣本集及需要進(jìn)行高維降維的模式識別中有較好的效果,并且可以拓展到函數(shù)擬合等其他問題中。樸素貝葉斯分類器,樸素貝葉斯模型有堅實的數(shù)學(xué)基礎(chǔ),分類效果穩(wěn)定,并且所需要的樣本空間很小,對有缺陷的數(shù)據(jù)集不敏感,算法簡單。最近鄰分類器,此方法對于類域的交叉或重疊較多的待分樣本集來說,分類效果較好。隨機森林算法對于很多種資料,可以產(chǎn)生高準(zhǔn)確度的分類器,適合處理大量的輸入變量,并且學(xué)習(xí)過程快。并且本文針對所采集數(shù)據(jù)設(shè)計了一個人工神經(jīng)網(wǎng)絡(luò)算法,由于其具有自學(xué)習(xí)能力、高速尋找最優(yōu)解能力和聯(lián)想存儲功能,在構(gòu)建數(shù)據(jù)分類算法方面,效果顯著。本文分別用這五種算法進(jìn)行分類預(yù)測分析,通過統(tǒng)計學(xué)理論知識對實驗結(jié)果進(jìn)行檢驗,并且將實驗結(jié)果與國內(nèi)外研究結(jié)果的準(zhǔn)確性進(jìn)行分析比較。從機器學(xué)習(xí)的角度認(rèn)識、理解實驗結(jié)果,并且進(jìn)行算法的整體性能評價,通過分析本文的實驗結(jié)果,提取出有關(guān)于卵巢腫瘤臨床醫(yī)學(xué)數(shù)據(jù)的分類提取規(guī)則,實現(xiàn)針對卵巢癌早期預(yù)測的目的,以輔助臨床診斷。做到早預(yù)測,早治療,提高卵巢癌患者的生存率。
[Abstract]:With the rapid development of information technology in the 21st century, computer plays an increasingly important role in social development. With the development of hospital information (the application of hospital information system and electronic medical records, and the development of data storage technology), the hospital database has accumulated a large scale of data. However, at present, the data processing in most hospitals only stays at the low-end data processing operation of "increase, delete, change and check", and lacks the technology of data integration and analysis. It is even more difficult to use the acquired data to assist medical decision making and automatic acquisition of knowledge. On the other hand, in the face of a large amount of data, the traditional methods of data analysis and processing have been unable to obtain the hidden information and internal correlation between the data. The problem we now encounter is the rapid development of the means of data collection. Data storage technology has been greatly improved, but how to use these hard-won data for practical use is the main problem we now face. After studying the theoretical basis of data mining, the key medical data collected for the evaluation of ovarian tumors are screened and preprocessed by using the relevant theoretical basis of data mining. Through learning machine learning algorithm, four kinds of classifiers suitable for medical data mining in machine learning are selected: support vector machine (SVM), which has a good effect on small sample set, nonlinear sample set and pattern recognition requiring high dimension reduction. And it can be extended to other problems such as function fitting. Naive Bayesian classifier, naive Bayesian model has a solid mathematical foundation, the classification effect is stable, and the required sample space is very small, is not sensitive to the defective data sets, and the algorithm is simple. The nearest neighbor classifier has a good classification effect for the sample set with more crossover or overlap. For many kinds of data, the stochastic forest algorithm can produce high accuracy classifier, which is suitable for dealing with a large number of input variables, and the learning process is fast. In this paper, an artificial neural network algorithm is designed for the collected data. Because of its self-learning ability, high-speed ability to find the best solution and associative storage, it has a remarkable effect in constructing data classification algorithm. In this paper, the five algorithms are used for classification and prediction analysis, and the experimental results are tested by statistical theory knowledge, and the accuracy of the experimental results is analyzed and compared with the domestic and foreign research results. From the point of view of machine learning, the experimental results are understood, and the whole performance of the algorithm is evaluated. By analyzing the experimental results in this paper, the rules of classification and extraction of clinical medical data about ovarian tumors are extracted. To achieve early prediction of ovarian cancer to assist clinical diagnosis. To achieve early prediction, early treatment, improve the survival rate of patients with ovarian cancer.
【學(xué)位授予單位】:吉林大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP181;TP311.13
【相似文獻(xiàn)】
相關(guān)期刊論文 前2條
1 盧朝暉;王敏;王寧;胡琬;;食物與乳腺癌、卵巢癌風(fēng)險關(guān)系的流行病學(xué)文獻(xiàn)統(tǒng)計分析[J];中華醫(yī)學(xué)圖書情報雜志;2010年12期
2 ;[J];;年期
相關(guān)會議論文 前10條
1 李利;王超英;林忠乙;;老年卵巢腫瘤82例分析[A];中國抗癌協(xié)會婦科腫瘤專業(yè)委員會第六次全國學(xué)術(shù)會議論文匯編[C];2001年
2 楊幼易;;老年婦女卵巢腫瘤手術(shù)治療140例臨床分析[A];中國抗癌協(xié)會婦科腫瘤專業(yè)委員會第六次全國學(xué)術(shù)會議論文匯編[C];2001年
3 楊帆;楊太珠;羅紅;朱琦;郭文琪;田雨;陳嬌;;生育前期女性卵巢腫瘤39例超聲診斷[A];2005年全國醫(yī)學(xué)影像技術(shù)學(xué)術(shù)會議西部論壇論文匯編[C];2005年
4 張海;李光展;吳瑛;盧俊;王慧芳;鄧偉蓮;;經(jīng)陰道彩色多普勒血流圖檢測卵巢腫瘤血管的臨床價值[A];中華醫(yī)學(xué)會第六次全國超聲醫(yī)學(xué)學(xué)術(shù)年會論文匯編[C];2001年
5 洪樹勛;許紅;曹良杰;;801例卵巢腫瘤臨床分析[A];紀(jì)念卓越的人民醫(yī)學(xué)家林巧稚大夫誕辰100周年——全國婦產(chǎn)科高級學(xué)術(shù)論壇論文集[C];2001年
6 梁元姣;葉小勤;;老年婦女雙側(cè)卵巢巨大腫瘤1例報告[A];中國抗癌協(xié)會婦科腫瘤專業(yè)委員會第六次全國學(xué)術(shù)會議論文匯編[C];2001年
7 劉力;李冰琳;張啟培;;836例卵巢腫瘤臨床病理分析[A];第八次全國婦產(chǎn)科學(xué)學(xué)術(shù)會議論文匯編[C];2004年
8 陳曉玲;紀(jì)莉;吳曉燕;魚紅菊;王琳;;彩色多普勒超聲在卵巢腫瘤診斷中的應(yīng)用[A];第一屆全國婦產(chǎn)科超聲學(xué)術(shù)會議論文匯編[C];2006年
9 許幼峰;郭e,
本文編號:1799146
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1799146.html