神經(jīng)網(wǎng)絡(luò)模型在預(yù)測(cè)急性心肌梗死中的應(yīng)用及模型預(yù)測(cè)能力的比較研究
本文選題:心血管疾病 + 急性心肌梗死; 參考:《北京協(xié)和醫(yī)學(xué)院》2013年博士論文
【摘要】:目的 心血管疾病是世界范圍內(nèi)嚴(yán)重危害人類(lèi)健康的疾病,近年來(lái)研究顯示,其發(fā)病率和死亡率在發(fā)展中國(guó)家日益增高。目前已經(jīng)有很多研究探討心肌梗死發(fā)病的危險(xiǎn)因素并預(yù)測(cè)發(fā)病概率。預(yù)測(cè)疾病發(fā)病概率需要建立統(tǒng)計(jì)學(xué)模型,對(duì)于目前常規(guī)使用的統(tǒng)計(jì)模型預(yù)測(cè)能力有限。我們希望尋找一種更好的分析變量間更為復(fù)雜的非線(xiàn)性關(guān)系的數(shù)學(xué)模型,從而為中國(guó)人群急性心肌梗死的診斷和預(yù)防提供參考。神經(jīng)網(wǎng)絡(luò)模型是在模擬人腦神經(jīng)組織的基礎(chǔ)上發(fā)展起來(lái)的計(jì)算系統(tǒng),是由大量處理單元通過(guò)廣泛互聯(lián)而構(gòu)成的網(wǎng)絡(luò)體系,它具有生物神經(jīng)系統(tǒng)的基本特征,具有非線(xiàn)性映射能力、學(xué)習(xí)能力、自適應(yīng)能力、容錯(cuò)能力、聯(lián)想儲(chǔ)存的功能,是數(shù)據(jù)挖掘方法中一類(lèi)非常重要的模型。 本研究的目的是構(gòu)建Logistic回歸模型、BP神經(jīng)網(wǎng)絡(luò)模型和Elman神經(jīng)網(wǎng)絡(luò)模型,并將常規(guī)的統(tǒng)計(jì)學(xué)方法與神經(jīng)網(wǎng)絡(luò)模型的方法結(jié)合起來(lái)運(yùn)用到急性心肌梗死的預(yù)測(cè)中,期望能夠提高疾病的預(yù)測(cè)能力。 方法 我們將中國(guó)人群急性心肌梗死流行學(xué)調(diào)查數(shù)據(jù)中涉及的變量分為常規(guī)變量和基因SNP位點(diǎn)變量。常規(guī)變量分為定性變量和定量變量,進(jìn)行了變量的描述和單變量分析。對(duì)于基因SNP位點(diǎn)變量,進(jìn)行了基因頻率和基因型頻率計(jì)算、哈代-溫伯格平衡定律驗(yàn)證、趨勢(shì)檢驗(yàn)和SNP位點(diǎn)單體型區(qū)域的構(gòu)建。 之后我們構(gòu)建了3種統(tǒng)計(jì)預(yù)測(cè)模型,常規(guī)Logistic回歸模型、BP神經(jīng)網(wǎng)絡(luò)模型和Elman神經(jīng)網(wǎng)絡(luò)模型,回代數(shù)據(jù)計(jì)算ROC曲線(xiàn)下面積,初步比較三種模型的預(yù)測(cè)精度;而后利用隨機(jī)抽樣的方法將數(shù)據(jù)分為訓(xùn)練集和驗(yàn)證集,重新構(gòu)建模型評(píng)價(jià)3種模型的泛化能力,利用反復(fù)抽樣的方法比較三種模型的預(yù)測(cè)精度;最后我們隨機(jī)模擬數(shù)據(jù),考慮到連續(xù)型變量和離散型變量在模型中的差別,因此,我們將隨機(jī)模擬分為兩種情況,第一部分模擬連續(xù)型變量具有統(tǒng)計(jì)學(xué)意義;第二部分模擬離散型變量具有統(tǒng)計(jì)學(xué)意義,分別構(gòu)建模型,并針對(duì)模型對(duì)變量的適應(yīng)性和模型的穩(wěn)定性進(jìn)行研究。 結(jié)果 經(jīng)過(guò)數(shù)據(jù)隨機(jī)抽樣分為預(yù)測(cè)數(shù)據(jù)集和驗(yàn)證數(shù)據(jù)集擬合模型比較3種模型的預(yù)測(cè)能力,結(jié)果表明10%-40%4種不同驗(yàn)證數(shù)據(jù)集比例情況下,BP神經(jīng)網(wǎng)絡(luò)模型ROC曲線(xiàn)下面積相比Logistic回歸模型分別高出4.5%、3.1%、3.3%和2.9%,具有統(tǒng)計(jì)學(xué)意義。Elman (?)神經(jīng)網(wǎng)絡(luò)模型ROC曲線(xiàn)下面積相比Logistic回歸模型分別高出4.2%、2.1%、2.9%和1.4%,20%和40%比例人群作為驗(yàn)證數(shù)據(jù)集情況下無(wú)統(tǒng)計(jì)學(xué)意義。BP模型ROC曲線(xiàn)下面積相比Elman模型4種不同驗(yàn)證數(shù)據(jù)集比例差別為:0.2%、0.9%、0.4%和1.6%,差別不具有統(tǒng)計(jì)學(xué)意義。BP神經(jīng)網(wǎng)絡(luò)模型相比常規(guī)的Logistic回歸模型能夠顯著提高模型的泛化能力。 隨機(jī)模擬數(shù)據(jù)研究結(jié)果表明,第一部分模擬連續(xù)型變量具有統(tǒng)計(jì)學(xué)意義,3種模型的預(yù)測(cè)性能均較高;第二部分模擬離散型變量具有統(tǒng)計(jì)學(xué)意義,在10%-40%4種不同驗(yàn)證數(shù)據(jù)集比例情況下,BP神經(jīng)網(wǎng)絡(luò)模型與Elman神經(jīng)網(wǎng)絡(luò)模型ROC曲線(xiàn)下面積相比Logistic回歸模型分別高出3.2%、2.9%、3.2%和3.1%,具有統(tǒng)計(jì)學(xué)意義。2種神經(jīng)網(wǎng)絡(luò)模型預(yù)測(cè)性能均顯著優(yōu)于Logistic回歸模型。Elman模型與BP模型差別無(wú)統(tǒng)計(jì)學(xué)意義。 結(jié)論 通過(guò)本研究的實(shí)際應(yīng)用結(jié)果可知:利用BP神經(jīng)網(wǎng)絡(luò)、Elman神經(jīng)網(wǎng)絡(luò)模型具有良好的預(yù)測(cè)能力、較快的運(yùn)算速度、良好的穩(wěn)定性,具有解決復(fù)雜的非線(xiàn)性關(guān)系的能力,特別是在樣本量不大、離散型變量較多、非線(xiàn)性關(guān)系復(fù)雜的數(shù)據(jù)研究中,神經(jīng)網(wǎng)絡(luò)模型的預(yù)測(cè)性能高于Logistic回歸分析,充分顯示出神經(jīng)網(wǎng)絡(luò)方法的優(yōu)越性和合理性。這2種神經(jīng)網(wǎng)絡(luò)方法在心臟病流行病學(xué)領(lǐng)域預(yù)測(cè)和評(píng)價(jià)方面的使用將具有較好的實(shí)際應(yīng)用價(jià)值。
[Abstract]:objective
Cardiovascular disease is a worldwide disease which seriously endangers human health. In recent years, studies have shown that its morbidity and mortality are increasing in the developing countries. There are many studies on the risk factors of myocardial infarction and the probability of predicting the incidence of the disease. The statistical model used by the rules is limited. We hope to find a better mathematical model of the more complex nonlinear relationship between the variables, so as to provide a reference for the diagnosis and prevention of acute myocardial infarction in the Chinese population. The network system, consisting of a large number of processing units through extensive interconnection, has the basic characteristics of the biological neural system. It has the ability of nonlinear mapping, learning, self-adaptive, fault-tolerant, and associative storage. It is a very important model in the data mining method.
The purpose of this study is to construct the Logistic regression model, the BP neural network model and the Elman neural network model, and combine the conventional statistical method with the neural network model to predict the acute myocardial infarction, and expect to improve the prediction ability of the disease.
Method
We divide the variables involved in the epidemiological survey data of acute myocardial infarction in Chinese population into conventional and gene SNP loci variables. The conventional variables are divided into qualitative and quantitative variables. The variables are described and the single variable analysis is carried out. For the gene SNP locus variables, the basis frequency and genotype frequency are calculated, Hardy Weber. Verification of lattice equilibrium law, trend test and construction of haplotype region of SNP locus.
Then we construct 3 kinds of statistical prediction models, the conventional Logistic regression model, the BP neural network model and the Elman neural network model, calculate the area under the ROC curve, compare the prediction accuracy of the three models, and then divide the data into the training set and the verification set by random sampling, and re construct the 3 models of the model evaluation. The generalization ability of the type is used to compare the prediction accuracy of the three models by repeated sampling. Finally, we simulate the data randomly and take into account the difference between the continuous and discrete variables in the model. Therefore, we divide the random simulation into two cases. The first part simulates the continuous variable with statistical significance; the second part of the simulation is simulated. Discrete variables have statistical significance, build models respectively, and study the adaptability of models to variables and the stability of models.
Result
After data random sampling is divided into prediction data set and validation data set fitting model to compare the prediction ability of the 3 models, the results show that the area under the BP neural network model ROC curve is 4.5%, 3.1%, 3.3% and 2.9% higher than that of the Logistic regression model, with the statistical significance.Elman (?) deity under the proportion of 10%-40%4 different validation data sets. The area under the network model ROC curve is 4.2%, 2.1%, 2.9% and 1.4% higher than that of the Logistic regression model, and the population of 20% and 40% is not statistically significant.BP model ROC curve under the ROC curve, compared with the Elman model, the ratio of 4 different validation data sets is 0.2%, 0.9%, 0.4% and 1.6%, and the difference does not have statistical meaning. Compared with the conventional Logistic regression model, the semantic.BP neural network model can significantly improve the generalization ability of the model.
The results of random simulation data show that the first part of the simulated continuous variable has statistical significance, the prediction performance of the 3 models is high, the second part of the simulated discrete variable has statistical significance. In the case of different 10%-40%4 verification data sets, the BP neural network model and the Elman neural network model ROC curve area Compared with the Logistic regression model, 3.2%, 2.9%, 3.2% and 3.1% were higher respectively. The predictive performance of.2 neural network model was significantly better than that of the Logistic regression model, and there was no statistical difference between the.Elman model and the BP model.
conclusion
Through the practical application of this study, we can see that using the BP neural network, the Elman neural network model has good prediction ability, fast computing speed, good stability, and has the ability to solve complex nonlinear relations, especially in the data study of small sample size, more discrete variable and complex nonlinear relationship. The predictive performance of the network model is higher than the Logistic regression analysis, which fully shows the superiority and rationality of the neural network method. The 2 neural network methods will have good practical application value in the field of prediction and evaluation of the field of heart disease epidemiology.
【學(xué)位授予單位】:北京協(xié)和醫(yī)學(xué)院
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2013
【分類(lèi)號(hào)】:R542.22
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 陳建新;西廣成;王偉;趙慧輝;陳靜;;數(shù)據(jù)挖掘分類(lèi)算法在冠心病臨床應(yīng)用的比較[J];北京生物醫(yī)學(xué)工程;2008年03期
2 李儉川;神經(jīng)網(wǎng)絡(luò)在信號(hào)除噪技術(shù)中的應(yīng)用[J];電子技術(shù)應(yīng)用;1999年12期
3 張磊,胡春,錢(qián)鋒;BP算法局部極小問(wèn)題改進(jìn)的研究進(jìn)展[J];工業(yè)控制計(jì)算機(jī);2004年09期
4 郭晉;李衛(wèi);劉欣;王興宇;王楊;劉力生;;染色體9p21和1p13上單核苷酸多態(tài)性位點(diǎn)與中國(guó)人群急性心肌梗死的關(guān)聯(lián):中國(guó)急性心梗研究[J];第二軍醫(yī)大學(xué)學(xué)報(bào);2011年08期
5 田國(guó)鈺;黃海洋;;神經(jīng)網(wǎng)絡(luò)中隱含層的確定[J];信息技術(shù);2010年10期
6 郭晉;胡良平;李長(zhǎng)平;高輝;;如何進(jìn)行因變量為二值變量的多重logistic回歸分析——怎樣在藥物應(yīng)用與監(jiān)測(cè)研究中正確運(yùn)用統(tǒng)計(jì)學(xué)(十二)[J];中國(guó)藥物應(yīng)用與監(jiān)測(cè);2009年06期
7 李儉川,秦國(guó)軍,溫熙森,胡蔦慶;神經(jīng)網(wǎng)絡(luò)學(xué)習(xí)算法的過(guò)擬合問(wèn)題及解決方法[J];振動(dòng)、測(cè)試與診斷;2002年04期
8 沈洪兵,徐耀初;Logistic回歸模型的適用條件及其局限性[J];中國(guó)公共衛(wèi)生;1991年03期
9 易洪剛;陳峰;于浩;趙楊;婁冬華;荀鵬程;;病例-同胞對(duì)照設(shè)計(jì)統(tǒng)計(jì)方法檢驗(yàn)效能的比較研究[J];中國(guó)衛(wèi)生統(tǒng)計(jì);2007年04期
10 易洪剛;陳峰;于浩;趙楊;婁東華;;病例同胞對(duì)照設(shè)計(jì)[J];中華流行病學(xué)雜志;2006年02期
,本文編號(hào):1873676
本文鏈接:http://sikaile.net/yixuelunwen/jjyx/1873676.html