基于核函數(shù)的成分數(shù)據(jù)缺失值處理
發(fā)布時間:2018-10-17 10:17
【摘要】:由于人們的科學意識不斷進步,分析研究的科學精神逐漸深入人心,現(xiàn)代生活中常常需要面對數(shù)據(jù)的收集與處理,以便更高效地完成日常工作。在所有可能出現(xiàn)的數(shù)據(jù)中,成分數(shù)據(jù)是一種滿足特殊性質(zhì)的復雜多維數(shù)據(jù),一般用于研究一個整體中各部分間關于指定因素下的比例關系。隨著經(jīng)濟發(fā)展水平不斷提高,各行各業(yè)越來越意識到精確數(shù)據(jù)統(tǒng)計帶來的好處,成分數(shù)據(jù)因此也應用得越來越廣泛。然而實際問題中,我們發(fā)現(xiàn)收集統(tǒng)計的數(shù)據(jù)常常會存在缺失,例如問卷中的無效或空白信息,收集中的遺漏等等都會產(chǎn)生缺失數(shù)據(jù)。統(tǒng)計質(zhì)量會受到缺失數(shù)據(jù)的影響,導致估計偏差,產(chǎn)生不良結(jié)果。故而我們希望數(shù)據(jù)能夠完整,因此對缺失數(shù)據(jù)的補全顯得尤為重要。目前國內(nèi)外在缺失數(shù)據(jù)的處理方面已有不少成果,本文在前人的研究基礎上,嘗試利用核函數(shù)的方法進行缺失值填補,研究對比不同方法的優(yōu)劣。本文分為五章:第一章說明了本文的研究意義,闡述了當前的研究背景,國內(nèi)外的研究現(xiàn)狀,并對一些基本情況作了概述。第二章簡要敘述了成分數(shù)據(jù)的基本概念,以及需要用到相關的相關知識,對研究過程中的大致操作進行描述,并對已有的一些方法給予介紹。第三章是本文重點,提出了基于核函數(shù)的幾種成分數(shù)據(jù)缺失值填補法,闡明了提出方法的原因、過程以及具體實現(xiàn)步驟。第四章通過對提出的幾種基于核函數(shù)的缺失值填補方法與已有常見方法的模擬實驗對比,得出實驗結(jié)果,并對真實數(shù)據(jù)進行實例分析,以驗證方法的可行性。最后一章進行了總結(jié),提煉本文的研究結(jié)論,以及對今后研究的展望。
[Abstract]:Due to the continuous progress of people's scientific consciousness, the scientific spirit of analysis and research has gradually taken root in the hearts of the people. In modern life, it is often necessary to face the collection and processing of data in order to complete daily work more efficiently. Among all the possible data, the component data is a kind of complex multidimensional data which satisfies the special properties. It is generally used to study the proportional relationship between the parts of a whole under the specified factors. As the level of economic development continues to improve, various industries are increasingly aware of the benefits of accurate data statistics, and component data are therefore more and more widely used. However, in practical problems, we find that the data collected from statistical data are often missing, such as invalid or blank information in the questionnaire, missing information in the collection and so on. The statistical quality will be affected by the missing data, resulting in the estimation deviation and bad results. Therefore, we want the data to be complete, so it is very important to complete the missing data. At present, there have been a lot of achievements in the processing of missing data at home and abroad. On the basis of previous studies, this paper attempts to use the kernel function method to fill the missing value, and to study and compare the advantages and disadvantages of different methods. This paper is divided into five chapters: the first chapter explains the significance of the research, describes the current research background, domestic and foreign research status, and gives an overview of some basic conditions. The second chapter briefly describes the basic concept of component data and the need to use relevant knowledge to describe the general operation of the research process and to introduce some existing methods. The third chapter is the focus of this paper. Several methods based on kernel function are proposed to fill the missing values of component data. The reason, process and implementation steps of the proposed method are explained. In chapter 4, the experimental results are obtained by comparing the proposed missing value filling methods based on kernel functions with common methods, and the real data are analyzed by an example to verify the feasibility of the method. The last chapter summarizes the conclusion of this paper and prospects for future research.
【學位授予單位】:山西大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:O212.1;F224
本文編號:2276339
[Abstract]:Due to the continuous progress of people's scientific consciousness, the scientific spirit of analysis and research has gradually taken root in the hearts of the people. In modern life, it is often necessary to face the collection and processing of data in order to complete daily work more efficiently. Among all the possible data, the component data is a kind of complex multidimensional data which satisfies the special properties. It is generally used to study the proportional relationship between the parts of a whole under the specified factors. As the level of economic development continues to improve, various industries are increasingly aware of the benefits of accurate data statistics, and component data are therefore more and more widely used. However, in practical problems, we find that the data collected from statistical data are often missing, such as invalid or blank information in the questionnaire, missing information in the collection and so on. The statistical quality will be affected by the missing data, resulting in the estimation deviation and bad results. Therefore, we want the data to be complete, so it is very important to complete the missing data. At present, there have been a lot of achievements in the processing of missing data at home and abroad. On the basis of previous studies, this paper attempts to use the kernel function method to fill the missing value, and to study and compare the advantages and disadvantages of different methods. This paper is divided into five chapters: the first chapter explains the significance of the research, describes the current research background, domestic and foreign research status, and gives an overview of some basic conditions. The second chapter briefly describes the basic concept of component data and the need to use relevant knowledge to describe the general operation of the research process and to introduce some existing methods. The third chapter is the focus of this paper. Several methods based on kernel function are proposed to fill the missing values of component data. The reason, process and implementation steps of the proposed method are explained. In chapter 4, the experimental results are obtained by comparing the proposed missing value filling methods based on kernel functions with common methods, and the real data are analyzed by an example to verify the feasibility of the method. The last chapter summarizes the conclusion of this paper and prospects for future research.
【學位授予單位】:山西大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:O212.1;F224
【參考文獻】
相關期刊論文 前10條
1 花琳琳;施念;楊永利;趙天儀;施學忠;;不同缺失值處理方法對隨機缺失數(shù)據(jù)處理效果的比較[J];鄭州大學學報(醫(yī)學版);2012年03期
2 孫志猛;張忠占;杜江;;缺失數(shù)據(jù)下半?yún)?shù)單調(diào)回歸模型的估計[J];數(shù)理統(tǒng)計與管理;2011年06期
3 龐新生;;缺失數(shù)據(jù)處理方法的比較[J];統(tǒng)計與決策;2010年24期
4 何亮;宋擒豹;沈鈞毅;海振;;一種新的組合k-近鄰預測方法[J];西安交通大學學報;2009年04期
5 龍文;王惠文;;成分數(shù)據(jù)相關系數(shù)的計算方法[J];數(shù)學的實踐與認識;2008年24期
6 郭麗娟;孫世宇;段修生;;支持向量機及核函數(shù)研究[J];科學技術與工程;2008年02期
7 顏根廷;馬廣富;肖余之;;一種混合核函數(shù)支持向量機算法[J];哈爾濱工業(yè)大學學報;2007年11期
8 胡紅曉;謝佳;韓冰;;缺失值處理方法比較研究[J];商場現(xiàn)代化;2007年15期
9 胡金海;謝壽生;侯勝利;尉詢楷;何衛(wèi)鋒;;核函數(shù)主元分析及其在故障特征提取中的應用[J];振動、測試與診斷;2007年01期
10 王華忠;俞金壽;;核函數(shù)方法及其模型選擇[J];江南大學學報;2006年04期
,本文編號:2276339
本文鏈接:http://sikaile.net/jingjilunwen/jiliangjingjilunwen/2276339.html
最近更新
教材專著