基于分布加權思想的充分降維方法的影響分析
發(fā)布時間:2018-07-04 08:15
本文選題:充分降維 + 分布加權估計; 參考:《云南財經(jīng)大學》2014年碩士論文
【摘要】:怎樣推斷在給定某些隨機變量(自變量)時另一個變量(響應變量)的分布是統(tǒng)計中的重要問題。當自變量的個數(shù)很大時,用響應變量直接對自變量進行擬合,將很可能面臨“維數(shù)詛咒”。在許多場合下,因變量其實僅僅與原自變量的幾個線性組合相關聯(lián),換言之,若給定這些線性組合的值,則因變量將與所有自變量獨立。若能找出這些原自變量的線性組合,以因變量對這些組合進行回歸,,則高維自變量引起的問題即可得到解決。充分降維的任務正是在不預先假定參數(shù)模型的前提下,尋找這些原自變量的線性組合。近年來,由于各學科數(shù)據(jù)維度和規(guī)模日漸增長,降維問題廣受矚目。不依賴模型假設的充分降維問題已成為統(tǒng)計學界關注的熱點。由于充分降維(SDR)是高維非參數(shù)回歸問題的一個重要階段,其結果是進一步研究回歸的基礎,故其穩(wěn)健性在建模過程中尤其重要,因而研究充分降維方法的影響分析非常必要。影響分析理論是統(tǒng)計診斷理論的一個重要組成部分,該理論主要關注統(tǒng)計推斷結果對模型初始設定的敏感程度。充分降維理論中的影響分析探索充分降維方法的穩(wěn)健性,即研究模型中的某些方面(比如:某些數(shù)據(jù)點)是否對這些降維方法的結果有超出平均水平甚多的影響。某種意義上,影響分析是在評價降維的結果是否可以信賴。然而,由于充分降維理論中的統(tǒng)計推斷結果是向量空間,現(xiàn)有的影響分析方法不適用于充分降維理論。本文在單指標模型下,研究了分布加權偏最小二乘估計的影響分析,在多指標模型下,研究累計切片估計降維方法的影響分析,通過數(shù)據(jù)刪除方法和局部影響分析分析方法,解決了強影響點,特別是特殊強影響模式(如:掩蓋效應)的探測問題。研究的主要成果有:1、在分布加權偏最小二乘估計和累計切片估計的影響分析中,引入Hooper (1959)提出的正則跡相關系數(shù)構造了一個空間位移函數(shù),用于度量擾動前后充分降維空間估計之間的差異。該差異度量對于空間基向量的選取具有不變性,且充分考慮了自變量的協(xié)方差結構和降維空間的統(tǒng)計意義。2、依托上述空間位移函數(shù),提出了一個擬曲率的概念,用于度量擾動對降維空間估計的局部影響,并給出了求取使得擬曲率達到最大的擾動方向的方法。該最大擾動方向經(jīng)過標準化后即可視為影響評價統(tǒng)計量。上述研究成果是對Cook(1986)提出的基于似然位移函數(shù)的正則曲率方法的一種推廣。數(shù)據(jù)模擬結果顯示,我們提出的方法對于強影響點的探測效果比較理想。
[Abstract]:How to infer the distribution of another variable (response variable) when some random variables are given is an important problem in statistics. When the number of independent variables is very large, the response variables will be directly fitted to the independent variables, which will likely face the "curse of dimension". In many cases, the dependent variables are only associated with several linear combinations of the original independent variables. In other words, given the values of these linear combinations, the dependent variables will be independent of all the independent variables. If we can find out the linear combination of these original independent variables and regress these combinations with dependent variables, the problems caused by high-dimensional independent variables can be solved. The task of fully reducing dimension is to find the linear combination of these primitive variables without presupposing the parameter model. In recent years, dimensionality reduction has attracted much attention due to the increasing data dimension and scale of various disciplines. The sufficient dimensionality reduction problem which does not depend on the model hypothesis has become a hot topic in the field of statistics. Since full dimensionality reduction (SDR) is an important stage of high dimensional nonparametric regression problem and its result is the basis of further study of regression, its robustness is especially important in the modeling process, so it is very necessary to study the influence analysis of full dimensionality reduction method. The influence analysis theory is an important part of the statistical diagnosis theory, which focuses on the sensitivity of the statistical inference results to the initial setting of the model. The effect Analysis in the Theory of sufficient Dimension reduction the robustness of the sufficient dimension reduction method is explored, that is to say, whether some aspects of the model (for example, some data points) have more influence than the average level on the results of these methods. In a sense, impact analysis is an assessment of whether the results of dimensionality reduction are reliable. However, due to the fact that the result of statistical inference in the theory of sufficient dimensionality reduction is vector space, the existing methods of influence analysis are not suitable for the theory of sufficient dimension reduction. In this paper, the influence analysis of distributed weighted partial least square estimation is studied under the single index model. Under the multi-index model, the influence analysis of the dimension reduction method of cumulative slice estimation is studied, and the data deletion method and the local impact analysis method are used. The problem of detection of strong influence points, especially special strong influence modes (such as masking effect) is solved. The main results of this study are: 1. In the influence analysis of distributed weighted partial least squares estimation and cumulative slice estimation, a spatial displacement function is constructed by introducing the regular trace correlation coefficient proposed by Hooper (1959). It is used to measure the difference between sufficient dimensionality estimation before and after disturbance. The difference measure is invariant for the selection of spatial basis vectors, and fully considers the covariance structure of independent variables and the statistical significance of dimensionally reduced space. Based on the above spatial displacement function, a concept of quasi curvature is proposed. The method is used to measure the local influence of perturbation on the estimation of dimensionally reduced space, and a method is given to obtain the maximum perturbation direction of quasi curvature. The maximum disturbance direction can be regarded as an impact evaluation statistic after standardization. The above results are a generalization of the canonical curvature method based on likelihood displacement function proposed by Cook (1986). The simulation results show that the proposed method is effective for the detection of strong influence points.
【學位授予單位】:云南財經(jīng)大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:C81
【參考文獻】
相關期刊論文 前6條
1 石磊;陳飛;;具有一般協(xié)方差結構線性模型的局部影響評價[J];數(shù)學物理學報;2007年01期
2 何利平;石磊;;列聯(lián)表數(shù)據(jù)的局部影響分析[J];數(shù)學物理學報;2011年02期
3 林路;數(shù)據(jù)刪除模型和均值漂移模型對嶺估計的影響[J];邵陽師專學報;1994年02期
4 趙喜倉;渠田田;許鮮欣;;數(shù)據(jù)刪除模型在GDP診斷中的應用[J];統(tǒng)計與決策;2011年10期
5 朱寧;黃黎平;李紹波;李兵;;數(shù)據(jù)刪除模型下的高杠桿點度量[J];統(tǒng)計與決策;2012年05期
6 解鋒昌;韋博成;;多元t分布數(shù)據(jù)的局部影響分析[J];應用概率統(tǒng)計;2006年02期
本文編號:2095422
本文鏈接:http://sikaile.net/shekelunwen/shgj/2095422.html
最近更新
教材專著