多元統(tǒng)計分析中兩個問題的研究
發(fā)布時間:2018-12-31 13:38
【摘要】:統(tǒng)計分布是用來描述隨機(jī)變量特性及規(guī)律的一種主要手段。多元統(tǒng)計分析方法是建立在多元統(tǒng)計分布基礎(chǔ)上的一類處理多元統(tǒng)計數(shù)據(jù)方法的總稱,是統(tǒng)計分析中有著豐富理論成果和許多應(yīng)用方法的主要分支。本文由兩部分相對獨(dú)立的內(nèi)容構(gòu)成,第一部分是貝葉斯決策理論中的判別分析理論方面的研究,第二部分典型相關(guān)分析方法的應(yīng)用研究。貝葉斯決策理論中的判別分析理論研究:統(tǒng)計模式識別方法是以樣本特征值的統(tǒng)計概率為基礎(chǔ),本文運(yùn)用貝葉斯決策理論、統(tǒng)計學(xué)理論做了一系列探究工作。貝葉斯決策理論以其分類錯誤發(fā)生率最小的特點(diǎn)在眾多領(lǐng)域中進(jìn)行了應(yīng)用。根據(jù)原始的貝葉斯公式,前人已經(jīng)得出了基于多元正態(tài)概率模型的貝葉斯判別函數(shù)及其決策面,并使用實驗驗證和分析了所得出的結(jié)論,但在其它統(tǒng)計分布下的分析結(jié)果卻至今不得而知。多年來的研究表明,在現(xiàn)實生活中,并非所有樣本的分布情況都服從多元正態(tài)分布。當(dāng)研究樣本數(shù)據(jù)出現(xiàn)尖峰特性時,在概率密度分布圖上顯現(xiàn)出較嚴(yán)重的尾部時,多元正態(tài)分布是不能滿足這種情況的。當(dāng)我們采用多元正態(tài)分布來描述樣本數(shù)據(jù)的長拖尾特性時,樣本的數(shù)據(jù)中的異常點(diǎn)必然會影響到協(xié)方差矩陣和均值的估計,從而使判別結(jié)果與實際結(jié)果相差甚遠(yuǎn),進(jìn)而影響多元正態(tài)分布的穩(wěn)健性。然而,多元t分布比多元正態(tài)分布擁有更好的穩(wěn)健性。在多元t分布中,我們可以適當(dāng)調(diào)整自由度參數(shù)的大小,減少數(shù)據(jù)中的異常點(diǎn)對研究結(jié)果的影響。所以本文第一部分將多元t分布的概率密度函數(shù)作為分類器設(shè)計的依據(jù),按照多元t分布概率模型抽取樣本集并進(jìn)行樣本分析,具有較強(qiáng)的實際意義。主要是從協(xié)方差結(jié)構(gòu)表達(dá)式的不同,自由度的相等和不等分為六種情形,分別來討論在多元t密度模型下的判別函數(shù)表達(dá)式。對于這六種情形,我們分別再從先驗概率相等和不等的情形來進(jìn)一步討論。最終可以推導(dǎo)出每種情形下的兩種多元t密度模型的判別函數(shù)表達(dá)式,有了判別函數(shù)的表達(dá)式,我們就可以得出它的決策面方程,并且畫出決策面圖形。典型相關(guān)分析在煙草領(lǐng)域的應(yīng)用研究:典型相關(guān)分析方法是多元統(tǒng)計分析的一個研究課題。它借助主成分的思想,用很少幾對綜合變量來反映兩組變量之間的線性相關(guān)性。目前它已在很多領(lǐng)域的相關(guān)分析和預(yù)測分析中得到廣泛應(yīng)用。本文在探究典型相關(guān)分析的理論后,并將其應(yīng)用于烤煙實例分析,對烤煙35個化學(xué)成分與10個感官舒適度指標(biāo)進(jìn)行了典型相關(guān)分析。研究結(jié)果表明,烤煙化學(xué)成分中的某些指標(biāo)對感官舒適度中的某些指標(biāo)都有顯著影響,所以在烤煙的生產(chǎn)、制作、加工過程中,重點(diǎn)可以放在研究這些有顯著影響的指標(biāo)上,從而改善烤煙的感官舒適度。進(jìn)一步說明了研究典型相關(guān)分析的價值所在。
[Abstract]:Statistical distribution is a main method to describe the characteristics and laws of random variables. Multivariate statistical analysis method is a general term for dealing with multivariate statistical data on the basis of multivariate statistical distribution. It is the main branch of statistical analysis with rich theoretical results and many applied methods. This paper consists of two parts: the first part is the research of discriminant analysis theory in Bayesian decision theory, the second part is the application of canonical correlation analysis method. Research on discriminant Analysis Theory in Bayesian decision Theory: statistical pattern recognition method is based on the statistical probability of sample eigenvalue. In this paper, Bayesian decision theory and statistical theory are used to do a series of research work. Bayesian decision theory has been applied in many fields because of its minimal incidence of classification errors. According to the original Bayesian formula, the Bayesian discriminant function based on the multivariate normal probability model and its decision surface have been obtained, and the conclusions have been verified and analyzed by experiments. However, the results of the analysis under other statistical distributions are still unknown. Many years of research have shown that not all samples are distributed from multivariate normal distribution in real life. The multivariate normal distribution can not satisfy this condition when the peak characteristic of the sample data is studied and the more serious tail is shown on the probability density distribution map. When we use the multivariate normal distribution to describe the long tail characteristics of the sample data, the outliers in the sample data will inevitably affect the estimation of the covariance matrix and the mean value, so that the discriminant result is far from the actual result. Then the robustness of multivariate normal distribution is affected. However, the multivariate t distribution is more robust than the multivariate normal distribution. In the multivariate t distribution, we can adjust the degree of freedom parameter properly and reduce the influence of outliers in the data on the results of the study. Therefore, in the first part of this paper, the probability density function of multivariate t distribution is taken as the basis of classifier design, and the sample set is extracted according to the multivariate t distribution probability model and the sample analysis is carried out, which is of great practical significance. This paper mainly discusses the discriminant function expressions under the multivariate t density model from the different expression of covariance structure and the equality and inequality of degrees of freedom into six cases. For these six cases, we further discuss them in the case of equal and unequal prior probabilities, respectively. Finally, the discriminant function expressions of two multivariate t-density models in each case can be derived. With the expression of the discriminant function, we can obtain its decision surface equation and draw the decision surface figure. Application of canonical correlation analysis in tobacco field: canonical correlation analysis method is a research topic of multivariate statistical analysis. It reflects the linear correlation between two sets of variables with the help of the idea of principal component and a few pairs of comprehensive variables. At present, it has been widely used in related analysis and prediction analysis in many fields. After exploring the theory of canonical correlation analysis and applying it to the case study of flue-cured tobacco, the canonical correlation analysis of 35 chemical components and 10 sensory comfort indexes of flue-cured tobacco was carried out. The results show that some indexes in the chemical composition of flue-cured tobacco have a significant effect on the sensory comfort, so in the production, manufacture and processing of flue-cured tobacco, the emphasis can be placed on the study of these indexes. So as to improve the sensory comfort of flue-cured tobacco. The value of canonical correlation analysis is further explained.
【學(xué)位授予單位】:云南財經(jīng)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2015
【分類號】:O212.4
本文編號:2396658
[Abstract]:Statistical distribution is a main method to describe the characteristics and laws of random variables. Multivariate statistical analysis method is a general term for dealing with multivariate statistical data on the basis of multivariate statistical distribution. It is the main branch of statistical analysis with rich theoretical results and many applied methods. This paper consists of two parts: the first part is the research of discriminant analysis theory in Bayesian decision theory, the second part is the application of canonical correlation analysis method. Research on discriminant Analysis Theory in Bayesian decision Theory: statistical pattern recognition method is based on the statistical probability of sample eigenvalue. In this paper, Bayesian decision theory and statistical theory are used to do a series of research work. Bayesian decision theory has been applied in many fields because of its minimal incidence of classification errors. According to the original Bayesian formula, the Bayesian discriminant function based on the multivariate normal probability model and its decision surface have been obtained, and the conclusions have been verified and analyzed by experiments. However, the results of the analysis under other statistical distributions are still unknown. Many years of research have shown that not all samples are distributed from multivariate normal distribution in real life. The multivariate normal distribution can not satisfy this condition when the peak characteristic of the sample data is studied and the more serious tail is shown on the probability density distribution map. When we use the multivariate normal distribution to describe the long tail characteristics of the sample data, the outliers in the sample data will inevitably affect the estimation of the covariance matrix and the mean value, so that the discriminant result is far from the actual result. Then the robustness of multivariate normal distribution is affected. However, the multivariate t distribution is more robust than the multivariate normal distribution. In the multivariate t distribution, we can adjust the degree of freedom parameter properly and reduce the influence of outliers in the data on the results of the study. Therefore, in the first part of this paper, the probability density function of multivariate t distribution is taken as the basis of classifier design, and the sample set is extracted according to the multivariate t distribution probability model and the sample analysis is carried out, which is of great practical significance. This paper mainly discusses the discriminant function expressions under the multivariate t density model from the different expression of covariance structure and the equality and inequality of degrees of freedom into six cases. For these six cases, we further discuss them in the case of equal and unequal prior probabilities, respectively. Finally, the discriminant function expressions of two multivariate t-density models in each case can be derived. With the expression of the discriminant function, we can obtain its decision surface equation and draw the decision surface figure. Application of canonical correlation analysis in tobacco field: canonical correlation analysis method is a research topic of multivariate statistical analysis. It reflects the linear correlation between two sets of variables with the help of the idea of principal component and a few pairs of comprehensive variables. At present, it has been widely used in related analysis and prediction analysis in many fields. After exploring the theory of canonical correlation analysis and applying it to the case study of flue-cured tobacco, the canonical correlation analysis of 35 chemical components and 10 sensory comfort indexes of flue-cured tobacco was carried out. The results show that some indexes in the chemical composition of flue-cured tobacco have a significant effect on the sensory comfort, so in the production, manufacture and processing of flue-cured tobacco, the emphasis can be placed on the study of these indexes. So as to improve the sensory comfort of flue-cured tobacco. The value of canonical correlation analysis is further explained.
【學(xué)位授予單位】:云南財經(jīng)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2015
【分類號】:O212.4
【參考文獻(xiàn)】
相關(guān)期刊論文 前5條
1 張禮平,楊志勇,陳正洪;典型相關(guān)系數(shù)及其在短期氣候預(yù)測中的應(yīng)用[J];大氣科學(xué);2000年03期
2 宗序平;趙俊;陶偉;;統(tǒng)計學(xué)上三大分布推導(dǎo)方法[J];數(shù)學(xué)的實踐與認(rèn)識;2009年07期
3 胡建軍;周冀衡;李文偉;馮曉民;;烤煙香味成分與其感官質(zhì)量的典型相關(guān)分析[J];煙草科技;2007年03期
4 魏立力,張文修;幾何分布的一類貝葉斯停止判決法則[J];應(yīng)用數(shù)學(xué)學(xué)報;2003年01期
5 周榮亮;吳文奇;徐愛民;;關(guān)于χ~2分布概率密度函數(shù)的一個直接求解方法[J];浙江萬里學(xué)院學(xué)報;2010年05期
,本文編號:2396658
本文鏈接:http://sikaile.net/kejilunwen/yysx/2396658.html
最近更新
教材專著