基于梯度提升模型的負(fù)相關(guān)學(xué)習(xí)算法的研究與應(yīng)用

發(fā)布時間：2019-03-24 20:51

【摘要】：論文的第一個工作是關(guān)于集成學(xué)習(xí)的研究。在機(jī)器學(xué)習(xí)領(lǐng)域,我們把具備從經(jīng)驗(yàn)知識中學(xué)習(xí)能力的系統(tǒng)或者模型叫做學(xué)習(xí)器。一般來說訓(xùn)練出一個學(xué)習(xí)能力較弱的模型比訓(xùn)練出一個學(xué)習(xí)能力較強(qiáng)的模型所要耗費(fèi)的代價小得多。集成學(xué)習(xí)是一類特殊的機(jī)器學(xué)習(xí)方法,其思想是不直接訓(xùn)練一個強(qiáng)學(xué)習(xí)器,而是通過組合一批弱學(xué)習(xí)器來得到一個學(xué)習(xí)能力強(qiáng)的集成學(xué)習(xí)器。集成學(xué)習(xí)算法性能好壞主要取決于兩個因素：基學(xué)習(xí)器自身的性能好壞以及基學(xué)習(xí)器之間的差異性。目前常用的集成學(xué)習(xí)算法包括Bagging、Boosting等,在提升每個基學(xué)習(xí)器性能的同時,其實(shí)也是在以一種隱性的方式維持了基學(xué)習(xí)器之間的差異性,從而使得最終的集成學(xué)習(xí)器的性能達(dá)到最佳。負(fù)相關(guān)學(xué)習(xí)(Negative correlation learning,NCL)是一種常用于神經(jīng)網(wǎng)絡(luò)集成的集成學(xué)習(xí)算法,它是把基學(xué)習(xí)器之間的差異性作為一個顯性的度量標(biāo)準(zhǔn)引入到神經(jīng)網(wǎng)絡(luò)的損失函數(shù)中去,進(jìn)而影響神經(jīng)網(wǎng)絡(luò)的訓(xùn)練。通過調(diào)整影響因子可以權(quán)衡基神經(jīng)網(wǎng)絡(luò)之間的性能與多樣性,以謀求獲得一個性能最優(yōu)的集成神經(jīng)網(wǎng)絡(luò)模型。我們借鑒NCL的思路,提出了一種新的集成學(xué)習(xí)算法。NCL最早提出是使用神經(jīng)網(wǎng)絡(luò)作為基學(xué)習(xí)器,而且目前有關(guān)NCL的研究大多數(shù)還是采用神經(jīng)網(wǎng)絡(luò)作為基學(xué)習(xí)模型,主要原因是神經(jīng)網(wǎng)絡(luò)具有一個顯性的損失函數(shù),而且訓(xùn)練神經(jīng)網(wǎng)絡(luò)的BP算法是一種采用梯度下降方法來最小化該損失函數(shù)的優(yōu)化算法。我們比較了神經(jīng)網(wǎng)絡(luò)與另外一種常用的學(xué)習(xí)模型：梯度提升模型(gradient boosting machine, GBM)之間的相似性,提出可以用GBM代替神經(jīng)網(wǎng)絡(luò)來實(shí)踐負(fù)相關(guān)學(xué)習(xí)的思想,設(shè)計(jì)出一種新的集成學(xué)習(xí)算法：GB-NCL。論文給出了GB-NCL算法的設(shè)計(jì)思路以及詳細(xì)的步驟,并且通過實(shí)驗(yàn)比較了GB-NCL與原始基于神經(jīng)網(wǎng)絡(luò)的NCL算法以及梯度提升算法的分類性能。實(shí)驗(yàn)結(jié)果表明GB-NCL算法相比于這兩種算法,擁有更好的性能。論文的第二個工作是基于GB-NCL算法設(shè)計(jì)并實(shí)現(xiàn)了一種新的用于解決高光譜遙感圖像分類問題的分類算法：RCASSL。高光譜遙感圖像分類的特點(diǎn)是標(biāo)記樣本少,未標(biāo)記的樣本多,而且人工標(biāo)記遙感圖像的像素點(diǎn)屬于什么地物類別的成本比較大。前人的做法主要有兩種：第一種,利用主動學(xué)習(xí)算法,從大量未標(biāo)記樣本中挑選出最值得標(biāo)記的像素點(diǎn)讓人類專家來標(biāo)記其所屬的地物類別。這種方法的特點(diǎn)是新增訓(xùn)練樣本的質(zhì)量高(類標(biāo)號百分百正確),但是數(shù)量少。第二種,利用半監(jiān)督學(xué)習(xí)算法,用已訓(xùn)練出來的分類器賦予一些未標(biāo)記樣本類標(biāo)號,并將其視為真實(shí)可用的樣本,添加到訓(xùn)練集中,我們稱之為“偽標(biāo)記”樣本。這類算法可以大大提高訓(xùn)練樣本數(shù)量但是無法保證新增的偽標(biāo)記樣本的類標(biāo)號一定正確。數(shù)量多,質(zhì)量不好,這是半監(jiān)督學(xué)習(xí)算法的特點(diǎn)。我們提出不妨將主動學(xué)習(xí)與半監(jiān)督學(xué)習(xí)結(jié)合,并且引入一套“偽”標(biāo)記樣本驗(yàn)證的機(jī)制,對通過半監(jiān)督學(xué)習(xí)引入進(jìn)來的偽標(biāo)記樣本進(jìn)行校驗(yàn),將不合格的偽標(biāo)記樣本剔除出去,從而既能夠獲得足夠多的訓(xùn)練樣本,又能夠保證訓(xùn)練樣本集的質(zhì)量。擁有了更大更完備的訓(xùn)練集,訓(xùn)練出來的分類器也就自然會有更佳的性能。根據(jù)這種想法我們在論文中針對高光譜遙感分類設(shè)計(jì)了RCASSL算法。RCASSL在訓(xùn)練分類器的時候不僅采用帶標(biāo)記的樣本,而且使用半監(jiān)督學(xué)習(xí)引入的偽標(biāo)記樣本。我們采用GB-NCL算法校驗(yàn)半監(jiān)督學(xué)習(xí)方法引入的偽標(biāo)記樣本,提升偽標(biāo)記樣本集的質(zhì)量。我們在高光譜遙感數(shù)數(shù)據(jù)集上對比了RCASSL算法、MCLU-ECBD算法以及RCASSL-NoPLV算法。MCLU-ECBD算法是一種常用的主動學(xué)習(xí)算法。RCASSL-NoPLV算法是去除掉偽標(biāo)記樣驗(yàn)證環(huán)節(jié)的RCASSL算法。實(shí)驗(yàn)的結(jié)果表明,在引入相同多的標(biāo)記樣本情況下,RCASSL算法的分類性能最強(qiáng)。RCASSL與MCLU-ECBD的對比結(jié)果說明結(jié)合半監(jiān)督學(xué)習(xí)可以提升主動學(xué)習(xí)算法的性能,RCASSL與RCASSL-NoPLV的對比結(jié)果說明我們采用GB-NCL算法實(shí)現(xiàn)的偽標(biāo)記驗(yàn)證機(jī)制的有效性。
[Abstract]:The first work of the paper is to study the integration of learning. In the field of machine learning, we call a system or model that has the ability to learn from empirical knowledge, called a learner. In general, it is much less expensive to train a weaker model than to train a more powerful model. Integrated learning is a kind of special machine learning method, its idea is not to train a strong learner directly, but by combining a group of weak learner to get an integrated learner with strong learning ability. The performance of the integrated learning algorithm depends on two factors: the performance of the base-based learner and the difference between the base-based learning devices. At present, the commonly used integrated learning algorithm includes Bagging, Boosting and the like, while the performance of each base learner is improved, the difference between the base-learning devices is maintained in a recessive way, so that the performance of the final integrated learner is optimized. Negative correlation learning (NCL) is a kind of integrated learning algorithm, which is commonly used in the integration of neural network, which is introduced into the loss function of the neural network as a dominant measure standard, and then influences the training of the neural network. The performance and diversity of the base neural network can be balanced by adjusting the influence factors, so as to obtain an integrated neural network model with optimal performance. Based on the idea of NCL, we put forward a new kind of integrated learning calculation The first point of NCL is to use the neural network as the base learner, and most of the research on the NCL is based on the neural network as the base learning model. The main reason is that the neural network has a dominant loss function. The BP algorithm of training neural network is a kind of optimization calculation using gradient descent method to minimize the loss function. This paper compares the similarity between the neural network and another commonly used learning model: the gradient lifting machine (GBM), and puts forward the idea of using the GBM instead of the neural network to practice the negative correlation study, and designs a new integrated learning algorithm: GB-NC L. The design idea and detailed steps of the GB-NCL algorithm are given in this paper, and the classification of the NCL algorithm and the gradient lifting algorithm based on the neural network are compared by the experiment. The results show that the GB-NCL algorithm has better performance compared with the two algorithms. The second work of the paper is to design and implement a new classification algorithm for high-spectral remote sensing image classification based on the GB-NCL algorithm: RCA The characteristic of high-spectral remote sensing image classification is that the mark sample is small, the unlabeled sample is more, and the pixel point of the remote sensing image of the artificial mark belongs to the cost of the object class. The first one, using the active learning algorithm, selects the most valuable pixel points from a large number of unlabeled samples to let the human expert mark the place to which it belongs. The feature of this method is that the quality of the new training samples is high (the class label is 100% correct), but Second, with a semi-supervised learning algorithm, the trained classifier is used to give some unlabeled sample-like reference numbers, and they are treated as real-available samples, added to the training set, and we call it "trunk>" dummy mark " Samples. This type of algorithm can greatly improve the number of training samples, but cannot guarantee the class label of the newly added pseudo-marker sample. It is correct. The quantity is too large and the quality is not good. This is a semi-supervised learning algorithm. The feature of this paper is to combine the active learning with the semi-supervised learning, and to introduce a set of "pseudo-"-labeled sample verification mechanism to check the pseudo-mark samples introduced in the semi-supervised learning and to use the non-qualified pseudo-marker samples. The method can not only obtain enough training samples, but also guarantee the training sample. The quality of this set. With a more complete set of training, the trained classifiers will naturally Better performance. According to this idea, we designed RCA for hyperspectral remote sensing in the paper The SSL algorithm. RCASSL not only uses the tagged samples while training the classifier, but uses semi-supervised learning to introduce Pseudo-mark samples. We use the GB-NCL algorithm to check the pseudo-marker samples introduced by the semi-supervised learning method to improve the pseudo-mark sample. We compared the RCASSL algorithm, the MCLU-ECBD algorithm and the RCASSL-No on the high-spectral remote sensing data set. The PLV algorithm. The MCLU-ECBD algorithm is a common master The RCASSL-NoPLV algorithm is an RCA to remove the pseudo-marker-like verification link. The results of the experiment show that, in the case of introducing the same number of tag samples, the algorithm of the RCASSL The result of comparison between RCASSL and MCLU-ECBD shows that combining semi-supervised learning can improve the performance of active learning algorithm, and the comparison between RCASSL and RCASSL-NoPLV shows that we use the GB-NCL algorithm to implement the pseudo-mark verification machine.
【學(xué)位授予單位】：中國科學(xué)技術(shù)大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2014
【分類號】：TP18;TP751

【共引文獻(xiàn)】

相關(guān)期刊論文前10條

1 蔡坤琪;;基于相關(guān)鑒別分析和隨機(jī)森林的人臉識別方法[J];安徽電子信息職業(yè)技術(shù)學(xué)院學(xué)報(bào);2012年01期

2 潘希姣;;多子群粒子群集成神經(jīng)網(wǎng)絡(luò)[J];安徽建筑工業(yè)學(xué)院學(xué)報(bào)(自然科學(xué)版);2007年02期

3 王爾丹;人群運(yùn)動與密度估計(jì)技術(shù)研究[J];安全;2005年03期

4 馮學(xué)軍;;最小二乘支持向量機(jī)的研究與應(yīng)用[J];安慶師范學(xué)院學(xué)報(bào)(自然科學(xué)版);2009年01期

5 周德強(qiáng);馮建中;;建筑物沉降預(yù)測的改進(jìn)Verhulst模型研究[J];地下空間與工程學(xué)報(bào);2011年01期

6 王立平;孔小梅;付夢印;王美玲;張甲文;姜明;;Temperature Drift Modeling of FOG Based on LS-WSVM[J];Journal of China Ordnance;2008年03期

7 王亮;胡靜濤;;基于LS-SVM的光刻過程R2R預(yù)測控制方法[J];半導(dǎo)體技術(shù);2012年06期

8 田盛豐;基于核函數(shù)的學(xué)習(xí)算法[J];北方交通大學(xué)學(xué)報(bào);2003年02期

9 焦健;瞿有利;;知網(wǎng)的話題更新與跟蹤算法研究[J];北京交通大學(xué)學(xué)報(bào);2009年05期

10 林正奎;唐煥玲;魯明羽;王敬東;;基于特征多視圖提升Naive Bayesian的Boosting改進(jìn)算法[J];北京交通大學(xué)學(xué)報(bào);2009年06期

相關(guān)會議論文前10條

1 宋海鷹;桂衛(wèi)華;陽春華;;基于核偏最小二乘的簡約最小二乘支持向量機(jī)及其應(yīng)用研究[A];第二十六屆中國控制會議論文集[C];2007年

2 宋海鷹;桂衛(wèi)華;陽春華;;基于最小二乘支持向量機(jī)的Hammerstein-Wiener模型辨識[A];第二十六屆中國控制會議論文集[C];2007年

3 ;Inverse System Control of Nonlinear Systems Using LS-SVM[A];第二十六屆中國控制會議論文集[C];2007年

4 ;A Novel Proximal Support Vector Machine and Its Application in Radar Target Recognition[A];第二十六屆中國控制會議論文集[C];2007年

5 ;A CDMA Signal Receiver Based on LS-SVM[A];第二十六屆中國控制會議論文集[C];2007年

6 ;LS-SVM Based Stable Generalized Predictive Control[A];第二十七屆中國控制會議論文集[C];2008年

7 閻綱;梁昔明;龍祖強(qiáng);李翔;;一種新的提前一步預(yù)測控制算法[A];第二十七屆中國控制會議論文集[C];2008年

8 孫玉坤;王博;丁慎平;;基于模糊支持向量機(jī)的賴氨酸發(fā)酵軟測量[A];第二十七屆中國控制會議論文集[C];2008年

9 ;GA Based LS-SVM Classifier for Waste Water Treatment Process[A];第二十七屆中國控制會議論文集[C];2008年

10 柴偉;孫先仿;喬俊飛;;有監(jiān)督的等距映射和k近鄰分類結(jié)合用于集員辨識[A];第二十九屆中國控制會議論文集[C];2010年

相關(guān)博士學(xué)位論文前10條

1 趙瑩;半監(jiān)督支持向量機(jī)學(xué)習(xí)算法研究[D];哈爾濱工程大學(xué);2010年

2 于化龍;基于DNA微陣列數(shù)據(jù)的癌癥分類技術(shù)研究[D];哈爾濱工程大學(xué);2010年

3 李建平;面向異構(gòu)數(shù)據(jù)源的網(wǎng)絡(luò)安全態(tài)勢感知模型與方法研究[D];哈爾濱工程大學(xué);2010年

4 孟宇龍;基于本體的多源異構(gòu)安全數(shù)據(jù)聚合[D];哈爾濱工程大學(xué);2010年

5 鄔俊;基于交互式語義推理的圖像檢索算法研究[D];大連海事大學(xué);2010年

6 李書艷;單點(diǎn)氨基酸多態(tài)性與疾病相關(guān)關(guān)系的預(yù)測及其機(jī)制研究[D];蘭州大學(xué);2010年

7 張明;電能質(zhì)量擾動相關(guān)問題研究[D];華中科技大學(xué);2010年

8 姚志明;基于步態(tài)觸覺信息的身份識別研究[D];中國科學(xué)技術(shù)大學(xué);2010年

9 許偉;基于進(jìn)化算法的復(fù)雜化工過程智能建模方法及其應(yīng)用[D];華東理工大學(xué);2011年

10 向國齊;支持向量回歸機(jī)代理模型設(shè)計(jì)優(yōu)化及應(yīng)用研究[D];電子科技大學(xué);2010年

相關(guān)碩士學(xué)位論文前10條

1 曾傳華;基于顏色和紋理特征的竹條分級方法研究[D];華中農(nóng)業(yè)大學(xué);2010年

2 馬冉冉;集成學(xué)習(xí)算法研究[D];山東科技大學(xué);2010年

3 王萍;語音情感識別研究[D];山東科技大學(xué);2010年

4 田文娟;基于支持向量機(jī)的人民幣序列號識別方法的研究[D];山東科技大學(xué);2010年

5 呂萬里;中文文本分類技術(shù)研究[D];山東科技大學(xué);2010年

6 孟培培;基于3S的土地督察信息系統(tǒng)研究[D];山東科技大學(xué);2010年

7 李海清;支持向量機(jī)在金融市場預(yù)測中的應(yīng)用[D];遼寧師范大學(xué);2010年

8 江達(dá)秀;基于HMAX模型的人臉表情識別研究[D];浙江理工大學(xué);2010年

9 石國強(qiáng);基于規(guī)則的組合分類器的研究[D];鄭州大學(xué);2010年

10 李光遠(yuǎn);基于在線聚類和最小二乘支持向量機(jī)的模糊建模方法研究[D];鄭州大學(xué);2010年

，

本文編號：2446666

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/guanlilunwen/gongchengguanli/2446666.html

上一篇：垂直陣列CNTs納米吸氣劑制備工藝研究
下一篇：結(jié)構(gòu)化的復(fù)合聲場及其操縱顆粒有效性的實(shí)驗(yàn)研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于梯度提升模型的負(fù)相關(guān)學(xué)習(xí)算法的研究與應(yīng)用