天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

不同缺失值處理技術的模擬比較

發(fā)布時間:2018-03-02 18:02

  本文選題:缺失值 切入點:模擬技術 出處:《鄭州大學》2012年碩士論文 論文類型:學位論文


【摘要】:目的 在艾滋病中醫(yī)證候研究領域,數據缺失現象普遍存在。數據缺失會增加分析的復雜性,造成結果偏倚等一系列的問題。探索適合該數據庫的缺失值填充方法是進行數據分析前迫切需要解決的問題。本研究以中醫(yī)證侯現場調查數據為基礎,通過數據模擬技術,比較不同的處理方法的優(yōu)劣,探討各自適用性,確定MI法的最佳填補次數,探索不同的缺失模式和缺失機制下,最為準確、高效、方便的處理方法。 方法 利用SAS9.1,模擬出完整數據集和不同缺失率的數據集,對于完全隨機缺失和隨機缺失的連續(xù)變量,采用期望最大化法(expectation maximization, EM)、回歸法、均值填補法、成組刪除法、多重填補法(multiple imputation, MI)進行填補,比較不同方法處理后的精確度、準確度以及均值。二分類變量,采用成組刪除法和MI中的logistic回歸進行填補,比較不同方法處理后的回歸系數以及標準誤。 結果 1.連續(xù)變量:本資料的數據均為任意缺失模式,隨著填充次數的增加,填充效率逐漸增加,在MI填充10次時填充效率均達到0.95以上。精確度也伴隨著填充次數的增加而逐漸增加,填充10次后精確度最高。關于準確度,缺失20%以下時,只需較少的填充次數(3-5次),就能達到較高的準確度;缺失率30-40%時,MI填充10次的準確度相對較高;缺失50%以上時,準確度不穩(wěn)定。 2.完全隨機缺失機制:缺失10%以下時,任何一種方法處理后,都與完整數據集均值一致,MI法的精確度和準確度最高。缺失20%以上時,采用成組刪除法和MI法效果優(yōu)于其他方法,MI法的精確度高,成組刪除法的準確度高。 3.隨機缺失機制:缺失較少時(10%-20%),采用MI法準確度、精確度高于其他方法。缺失30%時,采用成組刪除法處理后的準確度高,但是精確度較差。缺失較多(缺失率40%)時,所有方法填充效果均不佳。 4.二分類變量,缺失較少(缺失率40%)時,采用成組刪除法簡單易行、準確、高效,而MI法程序比較復雜,需占用較大內存和時間進行反復填補,且結果不如成組刪除法。缺失40%-50%時,采用MI/logistic回歸法,只需較少的填補次數(2次)即可達到較好的效果。缺失率60%以上時,兩種方法的處理效果均不好。 結論 對于大樣本連續(xù)型變量資料,可認為服從正態(tài)分布,可容許的缺失比例在30%以下。傳統(tǒng)的缺失值處理方法,如均值填補法和成組刪除法簡單、方便,具有一定的優(yōu)勢,但是MI法更能夠解決相對比較普遍的問題,發(fā)揮優(yōu)勢的空間更大,方便了人們對絕大多數類型的缺失值進行填補,填補效率較高。
[Abstract]:Purpose. In the research field of TCM syndrome of AIDS, the phenomenon of missing data is common. Missing data will increase the complexity of analysis. A series of problems are caused by bias of results. It is urgent to solve the problem before data analysis by exploring the filling method of missing value suitable for this database. This study is based on the data of field investigation of TCM syndrome and is based on data simulation technology. Compare the advantages and disadvantages of different methods, discuss their applicability, determine the best filling times of MI method, explore the most accurate, efficient and convenient processing methods under different missing modes and mechanisms. Method. The complete data sets and data sets with different deletion rates were simulated by using SAS9.1. For the continuous variables with complete random deletions and random deletions, the expectation maximization method, EMU, regression method, mean filling method, group deletion method were used. Multiple multiple imputation (MII) method was used to fill, compare the accuracy, accuracy and mean value of two classifiable variables treated by different methods, and use group deletion method and logistic regression in MI to fill. The regression coefficient and standard error of different methods were compared. Results. 1. Continuous variables: the data in this data are arbitrary missing patterns, and the filling efficiency increases with the increase of filling times. When MI fills 10 times, the filling efficiency is more than 0.95. The accuracy increases gradually with the increase of filling times, and the accuracy is the highest after filling 10 times. For accuracy, when the accuracy is less than 20%, The accuracy of MI filling is relatively high when the missing rate is 30-40%, and the accuracy is unstable when the missing rate is more than 50%. 2. Complete random deletion mechanism: when missing below 10%, either method has the same accuracy and accuracy as the average of the complete data set. When missing more than 20%, the MI method has the highest accuracy and accuracy. The accuracy of group deletion method and MI method is higher than that of other methods, and the accuracy of group deletion method is higher than that of other methods. 3. Random deletion mechanism: when there are fewer deletions, the accuracy of MI method is higher than that of other methods. When missing 30, the accuracy of group deletion method is high, but the accuracy is poor. The filling effect of all methods is not good. 4. In the case of two classified variables with fewer deletions (the deletion rate is 40%), the method of group deletion is simple, accurate and efficient, while the MI method is more complicated and requires a large amount of memory and time to be filled repeatedly. The results were not as good as the group deletion method. When the deletion rate was 40% -50%, the MI/logistic regression method was used, only two times of filling were needed to achieve a better effect. When the deletion rate was more than 60%, the treatment effect of both methods was not good. Conclusion. For the data of large sample of continuous variables, it can be considered that the acceptable missing ratio is less than 30% from normal distribution. The traditional methods of processing missing values, such as mean value filling method and group deletion method, are simple, convenient and have certain advantages. But the MI method can solve the relatively common problems, and the space of exerting advantages is bigger, which makes it convenient for people to fill the missing value of most types, and the filling efficiency is higher.
【學位授予單位】:鄭州大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:R181.3

【參考文獻】

相關期刊論文 前10條

1 游曉鋒;丁樹良;劉紅云;;缺失數據的估計方法及應用[J];江西師范大學學報(自然科學版);2011年03期

2 張國毅;宋德亮;王長宇;李冬梅;;相位差變化率定位法中缺失值精確填補研究[J];吉林大學學報(信息科學版);2010年01期

3 劉超;石冰;;一種基于相關函數法的奇異值補值方法[J];測試技術學報;2010年04期

4 霍忠誠;曾玲;范婷;;一種新的基于序數型不完備信息系統(tǒng)的粗糙集方法[J];桂林電子科技大學學報;2010年04期

5 李琳琳;楊永利;施學忠;時松和;馬瑩瑩;劉愛華;謝世平;;HIV/AIDS患者中醫(yī)四診信息的主成分分析[J];鄭州大學學報(醫(yī)學版);2007年04期

6 王愛英;楊永利;施學忠;;艾滋病對河南省居民期望壽命的影響[J];鄭州大學學報(醫(yī)學版);2008年04期

7 花琳琳;施念;楊永利;趙天儀;施學忠;;不同缺失值處理方法對隨機缺失數據處理效果的比較[J];鄭州大學學報(醫(yī)學版);2012年03期

8 茅群霞,李曉松;多重填補法Markov Chain Monte Carlo模型在有缺失值的婦幼衛(wèi)生縱向數據中的應用[J];四川大學學報(醫(yī)學版);2005年03期

9 李宏;阿瑪尼;李平;吳敏;;基于EM和貝葉斯網絡的丟失數據填充算法[J];計算機工程與應用;2010年05期

10 潘立強;李建中;駱吉洲;;傳感器網絡中一種基于時-空相關性的缺失值估計算法[J];計算機學報;2010年01期

相關碩士學位論文 前3條

1 劉志永;基于非隨機缺失機制的模式混合模型醫(yī)學應用研究[D];山西醫(yī)科大學;2011年

2 茅群霞;缺失值處理統(tǒng)計方法的模擬比較研究及應用[D];四川大學;2005年

3 朱曼龍;最近鄰方法在填充和分類中應用的新技術[D];廣西師范大學;2010年

,

本文編號:1557582

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/yixuelunwen/yufangyixuelunwen/1557582.html


Copyright(c)文論論文網All Rights Reserved | 網站地圖 |

版權申明:資料由用戶1c24b***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
亚洲国产av国产av| 欧美国产亚洲一区二区三区| 中文字幕一区二区久久综合| 国产成人高清精品尤物| 欧美日本道一区二区三区| 亚洲一区二区三区三区| 亚洲精品中文字幕熟女| 亚洲欧美日韩中文字幕二欧美| 欧美亚洲综合另类色妞| 日本午夜一本久久久综合| 日本加勒比中文在线观看| 欧美精品日韩精品一区| 一区二区三区在线不卡免费| 果冻传媒在线观看免费高清| 日韩一本不卡在线观看| 老熟女露脸一二三四区| 黄片在线免费看日韩欧美| 午夜资源在线观看免费高清| 国产精品偷拍视频一区| 国产一区二区熟女精品免费| 国产亚洲欧美另类久久久| 国产又大又黄又粗的黄色| 国产精品人妻熟女毛片av久| 日韩精品综合免费视频| 欧美日韩综合免费视频| 白白操白白在线免费观看| 妻子的新妈妈中文字幕| 国产又色又粗又黄又爽| 日韩中文字幕免费在线视频| 欧美一级特黄大片做受大屁股| 日韩精品中文在线观看| 欧美人与动牲交a精品| 国内尹人香蕉综合在线| 中文人妻精品一区二区三区四区| 欧美欧美日韩综合一区| 日本欧美在线一区二区三区| 99久久免费中文字幕| 亚洲国产精品久久综合网| 国产午夜在线精品视频| 肥白女人日韩中文视频| 日韩欧美三级视频在线|