偏差數(shù)據(jù)下的半?yún)?shù)模型研究
發(fā)布時(shí)間:2018-05-25 22:14
本文選題:左截?cái)?/strong> + 右刪失; 參考:《中國(guó)科學(xué)技術(shù)大學(xué)》2015年博士論文
【摘要】:生存分析已經(jīng)發(fā)展成為生物統(tǒng)計(jì)學(xué)最主要的領(lǐng)域之一,它在其他領(lǐng)域也有很重要的應(yīng)用,包括可靠性理論,精算學(xué),人口統(tǒng)計(jì)學(xué),流行病學(xué),社會(huì)學(xué)和經(jīng)濟(jì)學(xué).由于抽樣的復(fù)雜性,我們得到的實(shí)際數(shù)據(jù)大部分是有偏差的,例如,常見(jiàn)的刪失數(shù)據(jù)和截?cái)鄶?shù)據(jù),它們都可被看作是一般偏差數(shù)據(jù).當(dāng)然,偏差數(shù)據(jù)也出現(xiàn)在許多其它領(lǐng)域中,例如生物醫(yī)學(xué),社會(huì)學(xué),經(jīng)濟(jì)學(xué),質(zhì)量控制學(xué)等.當(dāng)個(gè)體被抽樣的概率取決于它本身的取值,即每個(gè)個(gè)體被抽樣的概率不同時(shí),所得到的數(shù)據(jù)為偏差數(shù)據(jù).這是一個(gè)有趣的抽樣問(wèn)題,因?yàn)樗媚骋恍﹤(gè)體而忽略另外一些個(gè)體.當(dāng)收集到的數(shù)據(jù)是偏差數(shù)據(jù)時(shí),原先關(guān)于簡(jiǎn)單數(shù)據(jù)的統(tǒng)計(jì)推斷程序已經(jīng)不再適用,我們必須尋找針對(duì)偏差數(shù)據(jù)的方法.本文用估計(jì)方程的方法來(lái)研究一般偏差數(shù)據(jù)下的半?yún)?shù)模型,因?yàn)榘雲(yún)?shù)模型既含有易于解釋的有限維參數(shù),又含有增加模型靈活性的無(wú)限維未知函數(shù). 在本文的第一章,我們首先介紹要研究的幾種偏差數(shù)據(jù)類(lèi)型,即刪失數(shù)據(jù),長(zhǎng)度偏差數(shù)據(jù)和病例隊(duì)列設(shè)計(jì)下收集到的數(shù)據(jù).然后介紹要研究的幾種生存分析中常見(jiàn)的半?yún)?shù)模型,即Cox模型,加性風(fēng)險(xiǎn)模型,半?yún)⒕(xiàn)性轉(zhuǎn)移模型,分位數(shù)回歸模型和比例均值剩余壽命模型. 在本文的第二章,我們利用長(zhǎng)度偏差數(shù)據(jù)的一個(gè)重要性質(zhì),即截?cái)鄷r(shí)間與登記后剩余時(shí)間具有相同的分布(HuangQin,2011,2012),來(lái)構(gòu)造加性風(fēng)險(xiǎn)模型下的復(fù)合估計(jì)量,由此得到估計(jì)量的效率是原來(lái)左截?cái)嘤覄h失估計(jì)量效率的二倍左右.我們和ChengHuang(2014)幾乎同時(shí)最先利用復(fù)合估計(jì)方程這個(gè)概念.所得到估計(jì)量的大樣本性質(zhì)和有限樣本下的隨機(jī)模擬結(jié)果也將在本章中展示,同時(shí)我們將所提出的方法應(yīng)用到美國(guó)Channing House數(shù)據(jù)上,發(fā)現(xiàn)效果很好. 在本文的第三章,我們利用一般左截?cái)嘤覄h失數(shù)據(jù)的鞅結(jié)構(gòu)和第二章中介紹的長(zhǎng)度偏差數(shù)據(jù)的重要特性,提出了長(zhǎng)度偏差數(shù)據(jù)下分位數(shù)回歸模型的簡(jiǎn)單估計(jì)方程方法和復(fù)合估計(jì)方程方法.我們的方法并不需要估計(jì)刪失變量的分布.因而跟ChenZhou(2012)和WangWang(2014)比起來(lái),我們的方法減少了復(fù)雜度.我們通過(guò)經(jīng)驗(yàn)過(guò)程和隨機(jī)積分技巧建立了漸近性質(zhì),包括一致相合性和弱收斂性.和PengHuang(2008)類(lèi)似,通過(guò)最小化一系列L1型的凸函數(shù)來(lái)得到簡(jiǎn)單的算法.新的估計(jì)方法可以簡(jiǎn)單的利用R語(yǔ)言中現(xiàn)有的函數(shù).當(dāng)估計(jì)方差時(shí),由于極限方差中含有未知的密度函數(shù),這在有限樣本下的估計(jì)量是很不穩(wěn)定的,所以我們通過(guò)推廣Jin et al.(2001)的方法來(lái)估計(jì)方差.最后,我們將所提出的方法應(yīng)用到美國(guó)Channing House數(shù)據(jù)上. 在本文的第四章,我們研究刪失數(shù)據(jù)在病例隊(duì)列設(shè)計(jì)下的比例均值剩余壽命模型.由南威爾士州一個(gè)鎳煉油廠的實(shí)際數(shù)據(jù)驅(qū)動(dòng),這里我們想知道鎳礦工人在現(xiàn)有的各種協(xié)變量下他還能活多久.而且這項(xiàng)研究的發(fā)病率很低,因此優(yōu)先選擇病例隊(duì)列設(shè)計(jì).通過(guò)提出加權(quán)估計(jì)方程來(lái)對(duì)回歸參數(shù)和基本均值剩余壽命函數(shù)進(jìn)行估計(jì),并給出所提出估計(jì)量的大樣本性質(zhì).然后,我們給出隨機(jī)模擬結(jié)果來(lái)檢驗(yàn)所提出方法在有限樣本下的表現(xiàn).最后,通過(guò)分析上面提到的南威爾士州鎳煉油廠的實(shí)際數(shù)據(jù)來(lái)說(shuō)明我們所提出的方法. 在本文的第五章,我們研究長(zhǎng)度偏差數(shù)據(jù)在病例隊(duì)列設(shè)計(jì)下的Cox模型.受SelfPrentice(1988)提出的偽似然方法和HuangQin(2012)提出的復(fù)合部分似然方法的啟發(fā),我們提出一個(gè)簡(jiǎn)單的復(fù)合偽部分似然方法.通過(guò)經(jīng)驗(yàn)過(guò)程和無(wú)放回抽樣收斂結(jié)果,我們也給出了病例設(shè)計(jì)下極大復(fù)合偽似然估計(jì)量和相應(yīng)累積風(fēng)險(xiǎn)率函數(shù)的大樣本性質(zhì).我們也展示了隨機(jī)模擬實(shí)驗(yàn)結(jié)果,并用奧斯卡數(shù)據(jù)來(lái)說(shuō)明所提出的估計(jì)方法. 在本文的第六章,我們討論了長(zhǎng)度偏差數(shù)據(jù)在病例隊(duì)列設(shè)計(jì)下的半?yún)⒕(xiàn)性轉(zhuǎn)移模型LuTsaitis(2006)應(yīng)用的是鞅積分表示和逆概率加權(quán)方法來(lái)處理右刪失數(shù)據(jù)在病例隊(duì)列設(shè)計(jì)下的半?yún)⒕(xiàn)性轉(zhuǎn)移模型.即使我們可以利用鞅積分表示來(lái)處理左截?cái)?所得到的估計(jì)量在長(zhǎng)度偏差抽樣下并不是全有效的.我們繼續(xù)利用第二章中提到的長(zhǎng)度偏差數(shù)據(jù)的重要性質(zhì)和逆概率加權(quán)方法來(lái)構(gòu)造復(fù)合估計(jì)方程.所得到的估計(jì)方程可以利用簡(jiǎn)單的迭代算法來(lái)估計(jì)回歸參數(shù)和未知的轉(zhuǎn)移函數(shù).我們給出了所提出估計(jì)量的漸近分布結(jié)果和它們的證明.通過(guò)展示隨機(jī)模擬結(jié)果和一個(gè)實(shí)際例子分析來(lái)檢驗(yàn)所提出的回歸參數(shù)估計(jì)量在有限樣本下的表現(xiàn).
[Abstract]:Survival analysis has developed into one of the most important fields of biometrics, and it has important applications in other fields, including reliability theory, actuarial, demography, epidemiology, sociology, and economics. Because of the complexity of the sampling, most of the real data we get are biased, for example, the common censorship number. According to and cut off the data, they can all be seen as general deviation data. Of course, the deviation data also appears in many other fields, such as biomedicine, sociology, economics, and quality control. When the probability of an individual is sampled depends on its own value, that is, the probability of each body being sampled is different, and the data obtained is deviation. Data. This is an interesting sampling problem because it favors some individuals and neglects other individuals. When the collected data is deviation data, the original statistical inference program on simple data is no longer applicable. We must find a method for the deviation data. This paper uses the method of estimating the equation to study the general bias. The semi parametric model is based on the difference data, because the semi parametric model contains both the easy to interpret finite dimensional parameters and the infinite dimensional unknown function that adds flexibility to the model.
In the first chapter of this paper, we first introduce several types of deviation data, which are censored data, length deviation data and case cohort design. Then we introduce some common semi parametric models in the survival analysis, that is, Cox model, additive risk model, semi parametric linear transfer model, quantile regression model. Model and proportional mean residual life model.
In the second chapter of this paper, we use an important nature of the length deviation data, that is, the truncation time and the remaining time after the registration have the same distribution (HuangQin, 20112012), to construct the compound estimator under the additive risk model, and thus the efficiency of the estimator is about two times the efficiency of the original left truncated estimation. The concept of compound estimation equation is used almost at the same time as ChengHuang (2014). The large sample properties of the estimators and the random simulation results under the finite sample will also be shown in this chapter. Meanwhile, we apply the proposed method to the US Channing House data and find the effect is very good.
In the third chapter, we use the martingale structure of the normal left truncated right censored data and the important characteristics of the length deviation data in the second chapter. We propose a simple estimation equation method and a compound estimation equation method for the quantile regression model under the length deviation data. Our method does not need to estimate the distribution of the censored variables. Compared with ChenZhou (2012) and WangWang (2014), our method reduces the complexity. We build an asymptotic property through the experiential process and the random integration technique, including the uniform consistency and weak convergence. It is similar to PengHuang (2008). A simple algorithm is obtained by minimizing a series of L1 type convex functions. The new estimation method can be obtained. In order to simply use the existing function in the R language. When the variance is estimated, the estimator in the limited sample is very unstable because of the unknown density function in the limit variance. So we estimate the variance by extending the method of Jin et al. (2001). Finally, we apply the proposed method to the Channing House data in the United States. Up.
In the fourth chapter of this article, we study the proportional mean surplus life model of the deleted data in a case cohort design. Driven by the actual data of a nickel refinery in the state of South Wales, we want to know how long the nickel mine workers can live under the existing covariates. And the incidence of this study is very low, so the priority is to choose. Case cohort design. The weighted estimation equation is proposed to estimate the regression parameters and the basic mean mean residual life functions, and the large sample properties of the proposed estimators are given. Then, we give the random simulation results to test the performance of the proposed method under the limited sample. The actual data of the nickel refinery illustrate the proposed method.
In the fifth chapter of this paper, we study the Cox model of the length deviation data under the case cohort design. Inspired by the pseudo likelihood method proposed by SelfPrentice (1988) and the compound partial likelihood method proposed by HuangQin (2012), we propose a simple composite pseudo partial likelihood method. We also give the large sample properties of the maximum composite pseudo likelihood estimator and the corresponding cumulative risk rate function under the case of case design. We also show the results of the random simulation experiments and illustrate the proposed method using the Oscar data.
In the sixth chapter of this paper, we discuss the semi parametric linear transfer model LuTsaitis (2006) of the length deviation data under the case queue design (2006), which is the martingale integral representation and the inverse probability weighting method to deal with the semi parametric linear transfer model of the right censored data in the case queue design. Even if we can use martingale integral representation to deal with the left We continue to use the important properties of the length deviation data in the second chapter and the inverse probability weighting method to construct the compound estimation equation. The estimated equation can be used to estimate the regression parameters and the unknown transfer functions by the simple iterative method. We give the results of the asymptotic distribution of the proposed estimators and their proof. By showing the results of the random simulation and an actual example analysis, we test the performance of the proposed regression parameter estimator under the finite sample.
【學(xué)位授予單位】:中國(guó)科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2015
【分類(lèi)號(hào)】:O212.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 ;Quantile Regression for Right-Censored and Length-Biased Data[J];Acta Mathematicae Applicatae Sinica(English Series);2012年03期
,本文編號(hào):1934836
本文鏈接:http://sikaile.net/shoufeilunwen/jckxbs/1934836.html
最近更新
教材專(zhuān)著