函數(shù)型數(shù)據(jù)建模的方法及其應用
發(fā)布時間:2018-01-29 16:47
本文關鍵詞: 函數(shù)型數(shù)據(jù) 基表示 標準正交基 函數(shù)主成分分析 分類性能 聚類 函數(shù)k-means聚類算法 回歸 部分函數(shù)線性模型 出處:《山西大學》2017年博士論文 論文類型:學位論文
【摘要】:信息技術的迅猛發(fā)展催生了大量的函數(shù)型數(shù)據(jù),該類數(shù)據(jù)廣泛存在于經(jīng)濟、金融、生物信息、醫(yī)學、氣象學、人體運動學、語音識別等眾多領域,函數(shù)型數(shù)據(jù)分析已經(jīng)成為數(shù)據(jù)挖掘領域中的一個研究熱點.傳統(tǒng)的數(shù)據(jù)挖掘方法將函數(shù)型數(shù)據(jù)視為離散、有限的觀測序列,忽視了函數(shù)型數(shù)據(jù)的連續(xù)性和高維性,限制了函數(shù)型數(shù)據(jù)的知識發(fā)現(xiàn).針對傳統(tǒng)數(shù)據(jù)挖掘方法處理函數(shù)型數(shù)據(jù)的局限性,基于函數(shù)型數(shù)據(jù)的基表示,探索了函數(shù)型數(shù)據(jù)在分類、聚類及回歸問題中建模的理論與方法,并通過具體案例驗證了建模方法的有效性.主要工作和創(chuàng)新點概括如下:(1)在函數(shù)型數(shù)據(jù)的表示問題中,探究了函數(shù)主成分表示的建模原理.運用變分理論建立了由數(shù)據(jù)驅(qū)動的函數(shù)主成分基滿足的模型,為求解函數(shù)主成分基提供了方法;證明了函數(shù)主成分表示是基于均方誤差準則的最優(yōu)標準正交表示,為函數(shù)型數(shù)據(jù)的主成分表示提供了理論依據(jù).(2)在函數(shù)型數(shù)據(jù)的分類問題中,探究了不同基表示的分類性能差異.證明了在標準正交表示下函數(shù)型數(shù)據(jù)的L2距離等價于基系數(shù)向量的歐氏距離,為函數(shù)型數(shù)據(jù)兩階段分類方法奠定了理論基礎;基于兩階段分類方法,從分類性能視角分別給出了適合傅里葉基、小波基以及函數(shù)主成分基表示的函數(shù)型數(shù)據(jù)類型;同時,通過實驗比較了函數(shù)型數(shù)據(jù)在非正交表示和正交表示兩種表示下的分類性能差異.(3)在函數(shù)型數(shù)據(jù)的聚類問題中,探究了函數(shù)k-means聚類算法的類中心表示.證明了一個度量多維函數(shù)樣例間相似性的測度是距離,該距離的構(gòu)造考慮了函數(shù)樣例的導數(shù)信息,也為構(gòu)建函數(shù)k-means聚類算法奠定了基礎;基于指定距離給出了函數(shù)k-means聚類算法的類中心表示,證明了該中心能保證類內(nèi)距離平方和最小;真實數(shù)據(jù)上的實驗驗證了該函數(shù)k-means聚類算法的有效性.(4)在函數(shù)型數(shù)據(jù)的回歸問題中,探究了用于處理混合數(shù)據(jù)的部分函數(shù)線性模型的建模方法.為提高模型的預測精度,借用了函數(shù)系數(shù)在Sololev-Hilbert空間的基表示,將半?yún)⒛P娃D(zhuǎn)化為參數(shù)模型;同時為增加模型的穩(wěn)健性,在懲罰最小二乘法中引入了更為寬松的懲罰策略,運用該懲罰最小二乘法對模型進行學習.人造數(shù)據(jù)和真實數(shù)據(jù)均驗證了該方法的有效性.本文針對傳統(tǒng)數(shù)據(jù)挖掘方法處理函數(shù)型數(shù)據(jù)的局限性,基于函數(shù)型數(shù)據(jù)的基表示策略,提供了函數(shù)型數(shù)據(jù)建模的理論與方法,研究結(jié)果在函數(shù)型數(shù)據(jù)挖掘領域具有一定的理論價值和實踐意義.
[Abstract]:The rapid development of information technology has given birth to a large number of functional data, such data widely exist in the economy, finance, biological information, medicine, meteorology, human kinematics, speech recognition and many other fields. Functional data analysis has become a research hotspot in the field of data mining. Traditional data mining methods treat functional data as discrete and finite observation sequences. The continuity and high dimension of the functional data are ignored, and the knowledge discovery of the functional data is limited. In view of the limitation of the traditional data mining method to deal with the functional data, the basic representation of the functional data is based on. The theory and method of functional data modeling in classification, clustering and regression problems are explored. The main work and innovation are summarized as follows: 1) in the representation of functional data. The modeling principle of function principal component representation is explored. The data-driven principal component basis model is established by using the variational theory, which provides a method for solving the function principal component basis. It is proved that the function principal component representation is the optimal standard orthogonal representation based on the mean square error criterion, which provides a theoretical basis for the principal component representation of the functional data. It is proved that the L2 distance of the functional data is equivalent to the Euclidean distance of the base coefficient vector under the standard orthogonal representation. It lays a theoretical foundation for the two-stage classification method of functional data. Based on the two-stage classification method, the functional data types suitable for the representation of Fourier basis, wavelet basis and principal component basis are given from the perspective of classification performance. At the same time, the classification performance difference of functional data under non-orthogonal representation and orthogonal representation is compared by experiments. It is proved that a measure to measure the similarity between multi-dimensional function samples is distance, and the construction of this distance takes into account the derivative information of function samples. It also lays a foundation for constructing function k-means clustering algorithm. The class center representation of function k-means clustering algorithm is given based on the specified distance, and it is proved that the center can guarantee the minimum sum of square distance within the class. Experiments on real data verify the validity of the function k-means clustering algorithm. 4) in the regression problem of functional data. The modeling method of partial functional linear model used to deal with mixed data is explored. In order to improve the prediction accuracy of the model, the basic representation of function coefficients in Sololev-Hilbert space is used. The semi-parametric model is transformed into a parametric model. At the same time, in order to increase the robustness of the model, a more relaxed penalty strategy is introduced in the penalty least square method. Using the penalty least square method to learn the model. Artificial data and real data both verify the effectiveness of the method. This paper aims at the limitations of traditional data mining methods to deal with functional data. Based on the basic representation strategy of functional data, the theory and method of functional data modeling are provided. The research results have certain theoretical value and practical significance in the field of functional data mining.
【學位授予單位】:山西大學
【學位級別】:博士
【學位授予年份】:2017
【分類號】:TP311.13;O212.1
【參考文獻】
相關期刊論文 前2條
1 孟銀鳳;梁吉業(yè);;基于最小二乘支持向量機的函數(shù)型數(shù)據(jù)回歸分析[J];模式識別與人工智能;2014年12期
2 王R,
本文編號:1473870
本文鏈接:http://sikaile.net/kejilunwen/yysx/1473870.html
最近更新
教材專著