基于Logistic回歸的近鄰擇優(yōu)插補法

發(fā)布時間：2018-01-31 17:52

本文關鍵詞： 無回答最近鄰插補法 Logistic回歸插補法　出處：《天津財經(jīng)大學》2013年碩士論文　論文類型：學位論文

【摘要】：現(xiàn)實生活中進行數(shù)據(jù)收集時經(jīng)常遇到無回答的現(xiàn)象。被調(diào)查者可能拒絕或忘記回答一項調(diào)查問題,文件丟失或者數(shù)據(jù)記錄的不正確都會導致無回答。調(diào)查數(shù)據(jù)的無回答容易增加統(tǒng)計分析難度,導致統(tǒng)計分析結(jié)果出現(xiàn)較大偏差,降低了統(tǒng)計工作質(zhì)量。由于研究“精確”的數(shù)據(jù)的收集方法是不存在的；很多情況下受時間和費用的限制,我們也不能重新去調(diào)查。事前預防是最有效的處理方法,由于現(xiàn)實中種種原因和條件的限制,事前處理方法往往并不能完全解決無回答的問題。無回答的事后插補法越來越受到重視,很多學者對此進行了深入的研究。論文對前人研究過的插補方法進行了簡單總結(jié),在這些方法的基礎上嘗試了另外一種插補法—基于Logistic回歸的近鄰擇優(yōu)插補法。這種方法繼承Logistic回歸插補法的高精確度以及最近鄰插補法的單元擇優(yōu)性質(zhì)。論文將基于Logistic回歸的近鄰擇優(yōu)插補法與常用的均值插補法、最近鄰插補法、回歸插補法、Logistic回歸插補法進行了模擬比較�？紤]無回答率分別為5%、10%、20%、30%、40%和50%,回歸變量個數(shù)分別為2、3、4和5的情況。模擬結(jié)果顯示：對于分類數(shù)據(jù),基于Logistic回歸的近鄰擇優(yōu)插補法和Logistic回歸插補法都優(yōu)于最近鄰插補法。在有些情況下,基于Logistic回歸近鄰擇優(yōu)插補法優(yōu)于Logistic回歸插補法。對于連續(xù)型數(shù)據(jù),方差較大時(如為0.25或1時),基于Logistic回歸的近鄰擇優(yōu)插補法明顯優(yōu)于其他方法,方差較小(如為0.01或0.04時),基于Logistic回歸的近鄰擇優(yōu)插補法的優(yōu)勢就不那么明顯,并且該方法隨著變量個數(shù)的增加,均方誤差有上升的趨勢。對于實際的數(shù)據(jù),結(jié)果顯示：隨著缺失率的增加均方誤差有增加的趨勢,基于Logistic回歸的近鄰擇優(yōu)插補法的均方誤差最小,波動性最小,插補效果較好。通過模擬數(shù)據(jù)和實際數(shù)據(jù)說明了基于Logistic回歸近鄰擇優(yōu)插補法具有一定的優(yōu)越性,希望為實際問題提供一種新的有參考價值的方法。
[Abstract]:In real life, data collection often occurs when there is no answer. Respondents may refuse or forget to answer a survey question. File loss or incorrect data recording will lead to no answer. No answer to the survey data is easy to increase the difficulty of statistical analysis, leading to a large deviation in the results of statistical analysis. The quality of statistical work has been reduced. There is no method of collecting "accurate" data in the study; In many cases, due to time and cost constraints, we can not re-investigate. Prior prevention is the most effective treatment, due to a variety of practical reasons and conditions. The method of pre-processing can not solve the unanswered problem completely. The method of post-interpolation without answer has been paid more and more attention to, and many scholars have made a deep research on it. In this paper, the interpolation methods which have been studied by the predecessors are briefly summarized. On the basis of these methods, we try another interpolation method-nearest neighbor optimal interpolation method based on Logistic regression. This method inherits the high precision and nearest neighbor of Logistic regression interpolation method. In this paper, the nearest neighbor optimal interpolation method based on Logistic regression and the commonly used mean interpolation method are proposed. The nearest neighbor interpolation method, regression interpolation method and Logistic regression interpolation method were simulated and compared. The number of regression variables is 2 ~ 3 ~ 4 and 5 respectively. The simulation results show that: for the classified data. The nearest neighbor optimal interpolation method based on Logistic regression and the Logistic regression interpolation method are better than the nearest neighbor interpolation method in some cases. The optimal interpolation method based on Logistic regression is better than the Logistic regression interpolation method. For the continuous data, the variance is larger (for example, 0.25 or 1:00). The nearest neighbor optimal interpolation method based on Logistic regression is obviously superior to other methods, and the variance is smaller (such as 0. 01 or 0. 04). The advantage of the nearest neighbor optimal interpolation method based on Logistic regression is not so obvious, and the mean square error increases with the increase of the number of variables. The results show that the mean square error increases with the increase of the loss rate. The nearest neighbor optimal interpolation method based on Logistic regression has the smallest mean square error, the smallest volatility and the better interpolation effect. The simulation data and the actual data show that the nearest neighbor optimal interpolation method based on Logistic regression has some advantages and hope to provide a new method with reference value for practical problems.
【學位授予單位】：天津財經(jīng)大學
【學位級別】：碩士
【學位授予年份】：2013
【分類號】：O212.1;C81

【參考文獻】

相關期刊論文前10條

1 梁琪;企業(yè)經(jīng)營管理預警:主成分分析在logistic回歸方法中的應用[J];管理工程學報;2005年01期

2 張師超;朱曼龍;黃j昌;;QENNI:一種缺失值填充的新方法[J];廣西師范大學學報(自然科學版);2010年01期

3 王玉梅;王楠楠;;抽樣調(diào)查中無回答誤差的分析與調(diào)整[J];廣西財經(jīng)學院學報;2011年05期

4 花琳琳;施念;楊永利;趙天儀;施學忠;;不同缺失值處理方法對隨機缺失數(shù)據(jù)處理效果的比較[J];鄭州大學學報(醫(yī)學版);2012年03期

5 嚴潔;任莉穎;;政治敏感問題無回答的處理:多重插補法的應用[J];華中師范大學學報(人文社會科學版);2010年02期

6 王彥平;;二重抽樣中子抽樣無回答的處理[J];科學技術(shù)與工程;2009年01期

7 武森;馮小東;單志廣;;基于不完備數(shù)據(jù)聚類的缺失數(shù)據(jù)填補方法[J];計算機學報;2012年08期

8 王鳳梅;胡麗霞;;一種基于近鄰規(guī)則的缺失數(shù)據(jù)填補方法[J];計算機工程;2012年21期

9 楊軍;趙宇;丁文興;;抽樣調(diào)查中缺失數(shù)據(jù)的插補方法[J];數(shù)理統(tǒng)計與管理;2008年05期

10 周影;劉龍;馬維軍;李季;劉海東;朱佶;李紹坤;;調(diào)查問卷中含缺失數(shù)據(jù)的等級變量的補缺方法[J];數(shù)學的實踐與認識;2011年01期

相關博士學位論文前1條

1 王睿;胃食管反流病流行病學調(diào)查及其缺失數(shù)據(jù)的處理方法研究[D];第二軍醫(yī)大學;2009年

，

本文編號：1479644

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/shekelunwen/shgj/1479644.html

上一篇：《社會科學輯刊》2014年總目錄
下一篇：在上海當“麥工”一種文化人類學的視角與洞見

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于Logistic回歸的近鄰擇優(yōu)插補法