基于SOM聚類(lèi)變量選擇方法的共識(shí)模型在近紅外光譜數(shù)據(jù)中的應(yīng)用
本文選題:定量分析 + 共識(shí)模型 ; 參考:《溫州大學(xué)》2017年碩士論文
【摘要】:數(shù)據(jù)建模是化學(xué)計(jì)量學(xué)研究的重要內(nèi)容,根據(jù)數(shù)據(jù)建模的任務(wù)不同,可以分為定量分析和定性分析。目前,單模型建模是數(shù)據(jù)建模中常用的方法,即反復(fù)分析測(cè)量數(shù)據(jù)的過(guò)程中,建立一系列預(yù)測(cè)模型,選出一個(gè)預(yù)測(cè)性能最好的模型。然而,現(xiàn)代高通量分析儀器的成千上萬(wàn)個(gè)分析通道為測(cè)量樣本提供了豐富的測(cè)量數(shù)據(jù),但常遇到樣本少,變量多的問(wèn)題,采用單模型的方法就難以滿足其建模要求。為了彌補(bǔ)單模型建模方法的不足,近年來(lái),多模型共識(shí)建模在很多領(lǐng)域得到廣泛的研究和應(yīng)用,共識(shí)建模則是通過(guò)某種建模方法建立多個(gè)成員模型,并用某種共識(shí)策略結(jié)合起多個(gè)成員模型對(duì)未知樣品進(jìn)行預(yù)測(cè),形成一個(gè)共識(shí)結(jié)果,以提高模型的預(yù)測(cè)精度和可靠性。本文將共識(shí)建模方法應(yīng)用于近紅外光譜數(shù)據(jù),并對(duì)線性共識(shí)成員模型和非線性共識(shí)的成員模型進(jìn)行探討,主要內(nèi)容如下:介紹選題的背景和意義,分析數(shù)據(jù)建模的基本原理及本文應(yīng)用的建模方法。研究變量選擇多回歸成員模型共識(shí)建模方法,分析變量選擇的優(yōu)勢(shì),提出了一種基于偏最小二乘的共識(shí)模型(C-SOM-PLS)和基于最小二乘支持向量機(jī)的共識(shí)模型(C-SOM-LS-SVM),即分別是線性多成員共識(shí)模型和非線性多成員共識(shí)模型。建模方法是先通過(guò)Kohonen自組織特征映射網(wǎng)絡(luò)(SOM)聚類(lèi)算法對(duì)變量進(jìn)行選擇,使相似的變量聚集在一起,選出N個(gè)子數(shù)據(jù)集,然后把N個(gè)子數(shù)據(jù)集分別通過(guò)Duplex算法把近紅外光譜數(shù)據(jù)分為訓(xùn)練集、驗(yàn)證集和測(cè)試集,利用訓(xùn)練集建立一系列成員回歸模型,通過(guò)驗(yàn)證集選出模型預(yù)測(cè)性能最好時(shí)對(duì)應(yīng)的模型及誤差,運(yùn)用驗(yàn)證集誤差計(jì)算共識(shí)模型的權(quán)重,最后把成員模型對(duì)未知樣品的預(yù)測(cè)結(jié)果用加權(quán)求和的方法結(jié)合起來(lái),形成一個(gè)共識(shí)的結(jié)果。結(jié)果表明,大多數(shù)共識(shí)模型的預(yù)測(cè)性能要比單模型好,不僅提高了模型的預(yù)測(cè)精度,也增強(qiáng)了模型的穩(wěn)定性。分析C-SOM-PLS、C-SOM-LS-SVM和各自成員模型的預(yù)測(cè)結(jié)果,發(fā)現(xiàn)有些共識(shí)建模的預(yù)測(cè)效果比成員模型差,研究表明,因?yàn)槌蓡T模型過(guò)擬合對(duì)共識(shí)模型產(chǎn)生了影響。為了降低過(guò)擬合對(duì)模型的影響,本文在共識(shí)模型中引入了模型集群分析(MPA),該算法實(shí)現(xiàn)需要三步,第一,通過(guò)蒙特卡洛采樣獲取子數(shù)據(jù)集;第二,針對(duì)每一個(gè)子數(shù)據(jù)集建立一個(gè)子模型;第三,從樣本空間對(duì)所有建立的集群子模型的參數(shù)進(jìn)行統(tǒng)計(jì)分析,獲取有用信息。結(jié)果表明引入MPA能夠很好的降低過(guò)擬合對(duì)共識(shí)模型的影響。
[Abstract]:Data modeling is an important part of chemometrics. According to the task of data modeling, it can be divided into quantitative analysis and qualitative analysis. At present, single model modeling is a commonly used method in data modeling, that is, in the process of repeatedly analyzing and measuring data, a series of prediction models are established, and a model with the best prediction performance is selected. However, thousands of analysis channels of modern high-throughput analysis instruments provide abundant measurement data for measuring samples. However, the problems of small samples and many variables are often encountered, so it is difficult to use single model method to meet the requirements of modeling. In order to make up for the shortage of single model modeling method, in recent years, multi-model consensus modeling has been widely studied and applied in many fields. Consensus modeling is to establish multi-member models through some modeling method. In order to improve the accuracy and reliability of the model, a consensus strategy is used to predict the unknown samples with several member models. In this paper, the consensus modeling method is applied to the near infrared spectral data, and the linear consensus member model and the nonlinear consensus member model are discussed. The main contents are as follows: the background and significance of the selected topic are introduced. The basic principle of data modeling and the modeling method applied in this paper are analyzed. The consensus modeling method of variable selection multiple regression member model is studied, and the advantages of variable selection are analyzed. A consensus model based on partial least squares (C-SOM-PLS) and a consensus model based on least squares support vector machine (C-SOM-LS-SVM) are proposed, which are linear multi-member consensus model and nonlinear multi-member consensus model respectively. The modeling method is to select the variables by Kohonen self-organizing feature mapping network (SOM) clustering algorithm, so that the similar variables gather together and select N subdatasets. Then N subdatasets are divided into training set, verification set and test set by Duplex algorithm, and a series of member regression models are built by training set. Through the verification set, the model and error corresponding to the best prediction performance are selected, and the weight of the consensus model is calculated by using the validation set error. Finally, the prediction results of the member model for unknown samples are combined with the weighted summation method. The result of forming a consensus. The results show that the prediction performance of most consensus models is better than that of single model, which not only improves the prediction accuracy of the model, but also enhances the stability of the model. By analyzing the prediction results of C-SOM-PLSS-SVM and their member models, it is found that some consensus models are less effective than the member models. In order to reduce the influence of over-fitting on the model, this paper introduces the model cluster analysis (MPA) into the consensus model. A submodel is established for each subdataset. Thirdly, the parameters of all the established cluster submodels are statistically analyzed from the sample space to obtain useful information. The results show that MPA can reduce the influence of over-fitting on consensus model.
【學(xué)位授予單位】:溫州大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:O657.33
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 褚小立;陸婉珍;;近五年我國(guó)近紅外光譜分析技術(shù)研究與應(yīng)用進(jìn)展[J];光譜學(xué)與光譜分析;2014年10期
2 袁天軍;王家俊;者為;段焰青;李偉;侯英;楊式華;趙艷麗;張金渝;;近紅外光譜法的應(yīng)用及相關(guān)標(biāo)準(zhǔn)綜述[J];中國(guó)農(nóng)學(xué)通報(bào);2013年20期
3 李民贊;鄭立華;安曉飛;孫紅;;土壤成分與特性參數(shù)光譜快速檢測(cè)方法及傳感技術(shù)[J];農(nóng)業(yè)機(jī)械學(xué)報(bào);2013年03期
4 薛雅琳;王雪蓮;張蕊;趙會(huì)義;;食用植物油摻偽鑒別快速檢驗(yàn)方法研究[J];中國(guó)糧油學(xué)報(bào);2010年10期
5 嚴(yán)紅兵;王忠義;朱福來(lái);嚴(yán)衍祿;;專(zhuān)用近紅外谷物分析儀的研制與開(kāi)發(fā)[J];現(xiàn)代科學(xué)儀器;2008年04期
6 李軍會(huì);陳斌;馬翔;趙龍蓮;勞彩蓮;張文娟;段佳;陶帥;張錄達(dá);嚴(yán)衍祿;;專(zhuān)用近紅外光譜分析軟件系統(tǒng)的研制[J];現(xiàn)代科學(xué)儀器;2008年04期
7 龍亞平;田偉鵬;;近紅外快速水份檢測(cè)技術(shù)的應(yīng)用[J];分析測(cè)試技術(shù)與儀器;2006年04期
8 谷筱玉;徐可欣;汪f,
本文編號(hào):2059049
本文鏈接:http://sikaile.net/kejilunwen/huaxue/2059049.html