針對零膨脹超散度計數(shù)數(shù)據(jù)的統(tǒng)計推斷

發(fā)布時間：2018-05-02 00:12

本文選題：零膨脹 + ZIP模型　；參考：《昆明理工大學》2011年碩士論文

【摘要】：計數(shù)數(shù)據(jù)是廣泛存在于日常生活和研究中的一類離散數(shù)據(jù)。對于該類數(shù)據(jù),我們一般使用普通泊松分布對其進行回歸分析。該方法在過去的實踐和研究中被廣泛應用。然而,相對于普通的泊松分布存在過分多零的計數(shù)數(shù)據(jù),在日常生活和研究中也經(jīng)常會碰到。對于該類計數(shù)數(shù)據(jù),如果仍沿用普通的泊松分布去擬合,將會導致偏差過大的參數(shù)估計和錯誤的推斷。為解決這一問題,針對該類數(shù)據(jù)的將普通泊松分布和在零點的退化分布混合起來構(gòu)成的零膨脹泊松混合回歸(ZIP)模型被提出來。而對于所研究的計數(shù)數(shù)據(jù)是否確實存在零膨脹的判斷,對模型的選擇起到?jīng)Q定性的作用。對此,本文提出了一種Score檢驗方法來判斷所研究的計數(shù)數(shù)據(jù)是否存在零膨脹。如果零膨脹確實存在,則使用ZIP模型進行回歸分析；否則,可繼續(xù)沿用傳統(tǒng)的相對簡單的普通泊松分布進行回歸分析。此外,對于普通的計數(shù)數(shù)據(jù),由于縱向數(shù)據(jù)采集機制等原因,數(shù)據(jù)之間可能會存在關(guān)聯(lián)性和分層結(jié)構(gòu)。這時普通的單水平模型將不能得到理想的參數(shù)估計和檢驗結(jié)果。對此,針對這類有著分層結(jié)構(gòu)的數(shù)據(jù)的多水平回歸模型被提出來。本文基于最為廣泛的具有分層結(jié)構(gòu)的雙水平數(shù)據(jù),采用貝葉斯方法對該類數(shù)據(jù)進行了參數(shù)估計和檢驗判斷。除了計數(shù)數(shù)據(jù)存在過分多零的情況外,對于非零部分的計數(shù)數(shù)據(jù),也可能會存在相對于普通的泊松分布方差與均值存在較大偏差,即超散度的情形。此時,若仍采用普通的零膨脹泊松混合回歸模型(ZIP)來處理該類數(shù)據(jù),將不能得到最佳的擬合效果。而由于帶有散度參數(shù)的負二項分布(NB)能夠更充分的解釋該散度過大的問題,所以,可以采用零膨脹負二項混合回歸模型(ZINB)來處理該類數(shù)據(jù)以達到最佳擬合效果。而在模型的選擇之前,對于所研究的數(shù)據(jù)是否存在超散度的檢驗也是必不可少的。為此,本文提出了針對雙水平情形下的該類數(shù)據(jù)是否存在超散度的Score檢驗。若結(jié)果顯示超散度不存在,則可使用ZIP模型進行回歸分析：否則,應選用ZINB模型。在實際生活和研究中,經(jīng)常會碰到數(shù)據(jù)缺失的情形,它給參數(shù)估計和模型推斷帶來了許多麻煩。對于該類缺失數(shù)據(jù)的處理,前人已經(jīng)總結(jié)了大量的方法,但均是基于隨機缺失的假設(shè)前提下,且認為各協(xié)變量是屬于同一多元分布。而事實上,很多缺失是由于測量值超出測度范圍或其它一些非隨機因素引起的,即所謂的非隨機缺失。對于該類缺失數(shù)據(jù),傳統(tǒng)的缺失數(shù)據(jù)處理方法將不再適合。針對該類缺失數(shù)據(jù),本文將傳統(tǒng)方法加以優(yōu)化,即將缺失數(shù)據(jù)作為未知參數(shù)對待,再采用Gibbs抽樣的方法,以及數(shù)據(jù)分解技巧來填充所缺失的數(shù)據(jù),并將該方法應用到所研究的模型中。通過模擬結(jié)果顯示,對于非隨機缺失數(shù)據(jù),該方法要明顯優(yōu)于隨機缺失假設(shè)下的傳統(tǒng)方法。最后,在本文的結(jié)尾,對于本文所做的工作進行了總結(jié)。并對針對計數(shù)數(shù)據(jù)的模型的后續(xù)研究方向做了一個初步的展望與預測。
[Abstract]:Counting data is a kind of discrete data which is widely used in daily life and research. For this kind of data, we generally use ordinary Poisson distribution to carry out regression analysis. This method is widely used in the past practice and research.
However, there is too much zero count data relative to common Poisson distribution, which is often encountered in daily life and research. For this kind of count data, if still using the common Poisson distribution to fit, it will lead to excessive parameter estimation and error inference. In order to solve this problem, the data will be common to the general data. The Poisson distribution and the zero expansion Poisson mixed regression (ZIP) model, which is mixed together with the degenerated distribution of the zero point, are proposed. But the decision of whether the counted data is indeed zero expansion is decisive for the selection of the model. In this paper, a Score test method is proposed to determine the number of counts studied. If there is a zero expansion, if the zero expansion does exist, the ZIP model is used for regression analysis; otherwise, the traditional relatively simple general Poisson distribution can continue to be used for regression analysis.
In addition, for common count data, there may be a correlation and hierarchical structure between the data due to the longitudinal data acquisition mechanism. The ordinary single level model will not get the ideal parameter estimation and test results. In this case, the multi level regression model for this kind of data with hierarchical structure is proposed. Based on the most widely used bi level data with hierarchical structure, Bayesian method is used to estimate and check the parameters of the data.
In addition to the excessive zero of the counting data, there may be a large deviation from the average Poisson distribution variance to the average of the ordinary Poisson distribution, that is, the case of excess dispersion. At this time, it will not be best to use the ordinary zero expansion Poisson mixed regression model (ZIP) to deal with this kind of data. The negative two term distribution (NB) with divergence parameters can more fully explain the problem of excessive divergence, so the zero expansion negative two term mixed regression model (ZINB) can be used to deal with this kind of data in order to achieve the best fitting effect. The test is also necessary. For this reason, this paper proposes a Score test for the existence of the hyper scatter for the class of data in a double level case. If the result shows that the hyper divergence does not exist, the ZIP model can be used for regression analysis. Otherwise, the ZINB model should be selected.
In real life and research, data lack is often encountered. It brings a lot of trouble to parameter estimation and model inference. For the processing of this kind of missing data, a large number of methods have been summed up, but they are based on the assumption of random deletion and are considered to belong to the same multivariate distribution. In fact, Many defects are caused by the measurement value beyond the range of measurement or other non random factors, that is, the so-called non random deletion. For the missing data, the traditional missing data processing method will no longer be suitable. In this paper, the traditional method is optimized for the missing data, and the missing data is treated as an unknown parameter, and then the data is taken as an unknown parameter. The method of Gibbs sampling and data decomposition technique are used to fill the missing data and apply the method to the model studied. The simulation results show that the method is obviously better than the traditional method under the random missing hypothesis for the non random missing data.
Finally, at the end of this paper, the work done in this paper is summarized, and a preliminary prospect and prediction are made for the follow-up research direction of the model for counting data.

【學位授予單位】：昆明理工大學
【學位級別】：碩士
【學位授予年份】：2011
【分類號】：C81

【相似文獻】

相關(guān)期刊論文前10條

1 王林年,白心愛;Yee算法的散度特性[J];呂梁高等�？茖W校學報;2002年01期

2 孫影;閆蒙鋼;朱小麗;;利用SPSS14.0軟件進行化學計數(shù)數(shù)據(jù)的χ~2檢驗[J];化學教學;2008年07期

3 張燕;簡明推導%絶2鄃%紸和%絰在一般正交曲線坐標系中的表達式[J];大學物理;1995年08期

4 鐘克武 ,胡業(yè)騰;廣義Helmholtz定理的四維勢表述及其在同時含有電荷磁、荷的電磁場中的應用[J];九江師專學報;1985年Z2期

5 趙彥杰;磁場強度鄪只與傳導電流猞有關(guān)的幾種特殊情況的分析[J];德州學院學報;1996年02期

6 俞禮鈞;從能量動量張量T~(ab)討論熵四矢的散度[J];武漢教育學院學報;1999年06期

7 劉榮道;場論中的梯度、散度、旋度和兩個重要的積分公式[J];湖北大學成人教育學院學報;2002年01期

8 謝贊;王高;李飛;;混合回歸模型在百貨商店市場細分中的應用[J];數(shù)理統(tǒng)計與管理;2009年02期

9 鐘克武 ,胡業(yè)騰;散度、旋度表述下的兩類電磁運動規(guī)律[J];九江師專學報;1984年04期

10 高維政 ,康玉娥;關(guān)于畢——薩定律的幾點注釋[J];沈陽大學學報;1993年02期

相關(guān)會議論文前10條

1 李勇智;楊靜宇;毛洪賁;;基于局部和非局部散度理論的一種新的特征提取方法[A];第二十七屆中國控制會議論文集[C];2008年

2 馮予;;散度族非線性隨機效應模型的幾何結(jié)構(gòu)[A];中國現(xiàn)場統(tǒng)計研究會第12屆學術(shù)年會論文集[C];2005年

3 薛景浩;章毓晉;林行剛;;兩種改進的圖象模糊散度閾值化分割算法[A];中國圖象圖形科學技術(shù)新進展——第九屆全國圖象圖形科技大會論文集[C];1998年

4 李澤飛;陳培紅;瞿壽德;;GaAs晶體生長的自動控制[A];1995年中國智能自動化學術(shù)會議暨智能自動化專業(yè)委員會成立大會論文集（下冊）[C];1995年

5 王修瓊;張相庭;;群體高層建筑基于混合回歸模型的風響應時域分析[A];第八屆全國結(jié)構(gòu)工程學術(shù)會議論文集（第Ⅱ卷）[C];1999年

6 黎堅;張厚粲;;心理學常用計數(shù)數(shù)據(jù)相似性檢驗法[A];全國教育與心理統(tǒng)計測量學術(shù)年會論文摘要集[C];2006年

7 楊波;魏東;;新型探測資料的二次開發(fā)及應用——風廓線篇[A];第26屆中國氣象學會年會第三屆氣象綜合探測技術(shù)研討會分會場論文集[C];2009年

8 楊曉松;安世全;;平面系統(tǒng)吸引區(qū)域估計的一個討論[A];2001中國控制與決策學術(shù)年會論文集[C];2001年

9 王中興;陳磊;唐五龍;;基于二元語義的模糊數(shù)排序方法[A];第六屆中國不確定系統(tǒng)年會論文集[C];2008年

10 孫仲毅;靳冰凌;邢用書;蘆阿咪;孫日丁;常江;;河南省北部一次暴雪天氣過程診斷分析[A];天氣、氣候與可持續(xù)發(fā)展——河南省氣象學會2010年年會論文集[C];2010年

相關(guān)重要報紙文章前10條

1 記者郭逸晴通訊員段榮;零售業(yè)零散度高達90%[N];南方日報;2006年

2 郭廷杰;日本成功開發(fā)新陶瓷[N];中國礦業(yè)報;2000年

3 西本新干線工作室　Elson;增倉？減倉？[N];現(xiàn)代物流報;2007年

4 本報記者范高明;觀念創(chuàng)新把握企業(yè)發(fā)展脈搏[N];中國高新技術(shù)產(chǎn)業(yè)導報;2008年

5 郭文忠;種企競爭拼實力[N];農(nóng)民日報;2003年

6 海通證券研究所聯(lián)蒙珂博士;美國基金的信息披露[N];信息時報;2001年

7 海通證券研究所聯(lián)蒙珂博士;美國基金的信息披露[N];證券時報;2000年

8 本報記者龔勇劉雪梅;全面提高國民素質(zhì)重在教育[N];中國改革報;2003年

9 翟文;特種陶瓷生產(chǎn)技術(shù)新進展[N];中國建材報;2001年

10 本報記者李上雄劉堅;抓準市場、吃透對手、利用人才是企業(yè)入世后的重中之重[N];中國汽車報;2002年

相關(guān)博士學位論文前10條

1 郭福濤;基于空間分析和模型理論的大興安嶺地區(qū)林火分布與預測模型研究[D];東北林業(yè)大學;2010年

2 馮予;散度族非線性模型的統(tǒng)計分析[D];南京理工大學;2003年

3 張健;質(zhì)量工程試驗中的波動分析[D];蘇州大學;2004年

4 劉偉;基于貝葉斯方法的有限混合模型選擇[D];東北師范大學;2010年

5 王曉明;基于統(tǒng)計學習的模式識別幾個問題及其應用研究[D];江南大學;2010年

6 王秀美;隱變量模型的建模與優(yōu)化[D];西安電子科技大學;2010年

7 樓宋江;基于保局子空間分析的人臉特征提取算法研究[D];哈爾濱工程大學;2011年

8 吳華;Chebyshev-Legendre譜方法及其區(qū)域分裂方法[D];上海大學;2004年

9 曹先凡;帶隙材料與特定性能材料設(shè)計[D];大連理工大學;2007年

10 張曉峰;大行程超精密工作臺關(guān)鍵技術(shù)研究[D];天津大學;2008年

相關(guān)碩士學位論文前10條

1 李克春;針對零膨脹超散度計數(shù)數(shù)據(jù)的統(tǒng)計推斷[D];昆明理工大學;2011年

2 敬曉英;關(guān)于若干回歸模型的研究[D];長安大學;2011年

3 唐玨;機動車第三者責任保險損失頻率模型與獎懲系統(tǒng)的研究[D];復旦大學;2008年

4 董傳磊;關(guān)于f-散度與Bregman散度[D];上海交通大學;2011年

5 謝中華;赤潮發(fā)生的頻率分析和預報[D];天津大學;2004年

6 馮佳睿;縱向Zero-Inflated計數(shù)數(shù)據(jù)的半?yún)?shù)分析[D];復旦大學;2010年

7 畢華;帶有截尾數(shù)據(jù)的無重復因子試驗的位置效應與散度效應分析[D];山西大學;2004年

8 蔣嬌嬌;非負矩陣分解算法的改進及應用[D];北京工業(yè)大學;2011年

9 劉小弟;多普勒雷達風暴螺旋度計算和研究[D];南京信息工程大學;2008年

10 王麗榮;多普勒雷達在大面積降水中的速度圖像特征及動力學分析[D];南京信息工程大學;2005年

，

本文編號：1831592

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/shekelunwen/shgj/1831592.html

上一篇：社會科學普及的使命與任務(wù)
下一篇：社交網(wǎng)絡(luò)環(huán)境下隱私保護投入的博弈策略分析——基于演化博弈的視角

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

針對零膨脹超散度計數(shù)數(shù)據(jù)的統(tǒng)計推斷