基于Logistic回歸的近鄰擇優(yōu)插補(bǔ)法
本文關(guān)鍵詞: 無回答 最近鄰插補(bǔ)法 Logistic回歸插補(bǔ)法 出處:《天津財經(jīng)大學(xué)》2013年碩士論文 論文類型:學(xué)位論文
【摘要】:現(xiàn)實(shí)生活中進(jìn)行數(shù)據(jù)收集時經(jīng)常遇到無回答的現(xiàn)象。被調(diào)查者可能拒絕或忘記回答一項(xiàng)調(diào)查問題,文件丟失或者數(shù)據(jù)記錄的不正確都會導(dǎo)致無回答。調(diào)查數(shù)據(jù)的無回答容易增加統(tǒng)計(jì)分析難度,導(dǎo)致統(tǒng)計(jì)分析結(jié)果出現(xiàn)較大偏差,降低了統(tǒng)計(jì)工作質(zhì)量。由于研究“精確”的數(shù)據(jù)的收集方法是不存在的;很多情況下受時間和費(fèi)用的限制,我們也不能重新去調(diào)查。事前預(yù)防是最有效的處理方法,由于現(xiàn)實(shí)中種種原因和條件的限制,事前處理方法往往并不能完全解決無回答的問題。無回答的事后插補(bǔ)法越來越受到重視,很多學(xué)者對此進(jìn)行了深入的研究。 論文對前人研究過的插補(bǔ)方法進(jìn)行了簡單總結(jié),在這些方法的基礎(chǔ)上嘗試了另外一種插補(bǔ)法—基于Logistic回歸的近鄰擇優(yōu)插補(bǔ)法。這種方法繼承Logistic回歸插補(bǔ)法的高精確度以及最近鄰插補(bǔ)法的單元擇優(yōu)性質(zhì)。論文將基于Logistic回歸的近鄰擇優(yōu)插補(bǔ)法與常用的均值插補(bǔ)法、最近鄰插補(bǔ)法、回歸插補(bǔ)法、Logistic回歸插補(bǔ)法進(jìn)行了模擬比較?紤]無回答率分別為5%、10%、20%、30%、40%和50%,回歸變量個數(shù)分別為2、3、4和5的情況。模擬結(jié)果顯示:對于分類數(shù)據(jù),基于Logistic回歸的近鄰擇優(yōu)插補(bǔ)法和Logistic回歸插補(bǔ)法都優(yōu)于最近鄰插補(bǔ)法。在有些情況下,基于Logistic回歸近鄰擇優(yōu)插補(bǔ)法優(yōu)于Logistic回歸插補(bǔ)法。對于連續(xù)型數(shù)據(jù),方差較大時(如為0.25或1時),基于Logistic回歸的近鄰擇優(yōu)插補(bǔ)法明顯優(yōu)于其他方法,方差較小(如為0.01或0.04時),基于Logistic回歸的近鄰擇優(yōu)插補(bǔ)法的優(yōu)勢就不那么明顯,并且該方法隨著變量個數(shù)的增加,均方誤差有上升的趨勢。對于實(shí)際的數(shù)據(jù),結(jié)果顯示:隨著缺失率的增加均方誤差有增加的趨勢,基于Logistic回歸的近鄰擇優(yōu)插補(bǔ)法的均方誤差最小,波動性最小,插補(bǔ)效果較好。 通過模擬數(shù)據(jù)和實(shí)際數(shù)據(jù)說明了基于Logistic回歸近鄰擇優(yōu)插補(bǔ)法具有一定的優(yōu)越性,希望為實(shí)際問題提供一種新的有參考價值的方法。
[Abstract]:In real life, data collection often occurs when there is no answer. Respondents may refuse or forget to answer a survey question. File loss or incorrect data recording will lead to no answer. No answer to the survey data is easy to increase the difficulty of statistical analysis, leading to a large deviation in the results of statistical analysis. The quality of statistical work has been reduced. There is no method of collecting "accurate" data in the study; In many cases, due to time and cost constraints, we can not re-investigate. Prior prevention is the most effective treatment, due to a variety of practical reasons and conditions. The method of pre-processing can not solve the unanswered problem completely. The method of post-interpolation without answer has been paid more and more attention to, and many scholars have made a deep research on it. In this paper, the interpolation methods which have been studied by the predecessors are briefly summarized. On the basis of these methods, we try another interpolation method-nearest neighbor optimal interpolation method based on Logistic regression. This method inherits the high precision and nearest neighbor of Logistic regression interpolation method. In this paper, the nearest neighbor optimal interpolation method based on Logistic regression and the commonly used mean interpolation method are proposed. The nearest neighbor interpolation method, regression interpolation method and Logistic regression interpolation method were simulated and compared. The number of regression variables is 2 ~ 3 ~ 4 and 5 respectively. The simulation results show that: for the classified data. The nearest neighbor optimal interpolation method based on Logistic regression and the Logistic regression interpolation method are better than the nearest neighbor interpolation method in some cases. The optimal interpolation method based on Logistic regression is better than the Logistic regression interpolation method. For the continuous data, the variance is larger (for example, 0.25 or 1:00). The nearest neighbor optimal interpolation method based on Logistic regression is obviously superior to other methods, and the variance is smaller (such as 0. 01 or 0. 04). The advantage of the nearest neighbor optimal interpolation method based on Logistic regression is not so obvious, and the mean square error increases with the increase of the number of variables. The results show that the mean square error increases with the increase of the loss rate. The nearest neighbor optimal interpolation method based on Logistic regression has the smallest mean square error, the smallest volatility and the better interpolation effect. The simulation data and the actual data show that the nearest neighbor optimal interpolation method based on Logistic regression has some advantages and hope to provide a new method with reference value for practical problems.
【學(xué)位授予單位】:天津財經(jīng)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:O212.1;C81
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 梁琪;企業(yè)經(jīng)營管理預(yù)警:主成分分析在logistic回歸方法中的應(yīng)用[J];管理工程學(xué)報;2005年01期
2 張師超;朱曼龍;黃j昌;;QENNI:一種缺失值填充的新方法[J];廣西師范大學(xué)學(xué)報(自然科學(xué)版);2010年01期
3 王玉梅;王楠楠;;抽樣調(diào)查中無回答誤差的分析與調(diào)整[J];廣西財經(jīng)學(xué)院學(xué)報;2011年05期
4 花琳琳;施念;楊永利;趙天儀;施學(xué)忠;;不同缺失值處理方法對隨機(jī)缺失數(shù)據(jù)處理效果的比較[J];鄭州大學(xué)學(xué)報(醫(yī)學(xué)版);2012年03期
5 嚴(yán)潔;任莉穎;;政治敏感問題無回答的處理:多重插補(bǔ)法的應(yīng)用[J];華中師范大學(xué)學(xué)報(人文社會科學(xué)版);2010年02期
6 王彥平;;二重抽樣中子抽樣無回答的處理[J];科學(xué)技術(shù)與工程;2009年01期
7 武森;馮小東;單志廣;;基于不完備數(shù)據(jù)聚類的缺失數(shù)據(jù)填補(bǔ)方法[J];計(jì)算機(jī)學(xué)報;2012年08期
8 王鳳梅;胡麗霞;;一種基于近鄰規(guī)則的缺失數(shù)據(jù)填補(bǔ)方法[J];計(jì)算機(jī)工程;2012年21期
9 楊軍;趙宇;丁文興;;抽樣調(diào)查中缺失數(shù)據(jù)的插補(bǔ)方法[J];數(shù)理統(tǒng)計(jì)與管理;2008年05期
10 周影;劉龍;馬維軍;李季;劉海東;朱佶;李紹坤;;調(diào)查問卷中含缺失數(shù)據(jù)的等級變量的補(bǔ)缺方法[J];數(shù)學(xué)的實(shí)踐與認(rèn)識;2011年01期
相關(guān)博士學(xué)位論文 前1條
1 王睿;胃食管反流病流行病學(xué)調(diào)查及其缺失數(shù)據(jù)的處理方法研究[D];第二軍醫(yī)大學(xué);2009年
,本文編號:1479644
本文鏈接:http://sikaile.net/shekelunwen/shgj/1479644.html