粒子群優(yōu)化加權(quán)隨機(jī)森林算法研究

發(fā)布時(shí)間：2018-06-14 00:15

本文選題：隨機(jī)森林 + 粒子群　；參考：《鄭州大學(xué)》2017年碩士論文

【摘要】：隨機(jī)森林(Random Forest,RF)算法是2001年由Breiman提出的一種分類模型。其本質(zhì)是將Bagging的Bootstrap Aggregating算法和Ho的隨機(jī)子空間(Random Subspace)算法結(jié)合起來,通過對(duì)多棵決策樹分類結(jié)果采取投票選取機(jī)制,確定最終的分類結(jié)果。隨機(jī)森林算法自提出之后,被廣泛地運(yùn)用于數(shù)據(jù)挖掘與分類問題,后來還有許多學(xué)者對(duì)模型做出了改進(jìn)。隨機(jī)森林是一種高效的分類算法,隨機(jī)森林模型的優(yōu)點(diǎn)在于它不需要樣本的背景知識(shí),不用做變量選擇,擁有很高的噪聲容忍度,因此可省略數(shù)據(jù)預(yù)處理的繁瑣工作。但模型中的投票選取機(jī)制會(huì)導(dǎo)致一些訓(xùn)練精度較低的決策樹也擁有相同的投票能力,從而降低投票準(zhǔn)確度。而且隨機(jī)森林模型中的決策樹棵數(shù)及其它參數(shù)的選取通常對(duì)隨機(jī)森林的最終分類結(jié)果也有較大的影響。針對(duì)那些訓(xùn)練精度不優(yōu)、投票能力相對(duì)較差的決策樹,本文通過對(duì)傳統(tǒng)隨機(jī)森林算法進(jìn)行詳細(xì)試驗(yàn)與分析,基本確定了傳統(tǒng)隨機(jī)森林算法性能不足的原因:隨機(jī)森林投票選取機(jī)制會(huì)導(dǎo)致一些訓(xùn)練精度較低的決策樹也擁有相同的投票能力,這對(duì)隨機(jī)森林最終的分類結(jié)果準(zhǔn)確率造成較大的影響。在分類的同時(shí)也可能會(huì)產(chǎn)生多個(gè)類別的最高票數(shù)相同而最終導(dǎo)致難以分類的現(xiàn)象,本文將此現(xiàn)象定義為“死局現(xiàn)象”。為解決低精度決策和高票數(shù)競爭帶來的分類困難,本文以傳統(tǒng)隨機(jī)森林模型為基礎(chǔ)提出一種精確度加權(quán)隨機(jī)森林算法(Accuracy Weighted Random Forest,AWRF),即在投票時(shí)將每棵決策樹乘以一個(gè)與其訓(xùn)練精度成正比的權(quán)重,針對(duì)參數(shù)難以選取的問題,采取粒子群算法對(duì)影響新模型的參數(shù)進(jìn)行迭代優(yōu)化,選取模型中包括的參數(shù)。同時(shí)設(shè)計(jì)相關(guān)仿真實(shí)驗(yàn)對(duì)比,通過Matlab軟件對(duì)UCI數(shù)據(jù)庫中6個(gè)標(biāo)準(zhǔn)數(shù)據(jù)集進(jìn)行驗(yàn)證,最后用不同的算法對(duì)比新模型的優(yōu)缺點(diǎn)。通過對(duì)比得出結(jié)論,表明了新模型在對(duì)此類數(shù)據(jù)分類時(shí)的優(yōu)勢(shì)。
[Abstract]:Random Forest Random (RFF) algorithm is a classification model proposed by Bizaran in 2001. Its essence is to combine bagging bootstrap Aggregating algorithm and Ho's random subspace algorithm to determine the final classification results by adopting the voting selection mechanism for the classification results of multiple decision trees. Since the stochastic forest algorithm was proposed, it has been widely used in data mining and classification problems, and many scholars have improved the model. Stochastic forest is an efficient classification algorithm. The advantage of stochastic forest model is that it does not need the background knowledge of sample, does not need to do variable selection, and has high noise tolerance, so it can omit the tedious work of data preprocessing. However, the voting selection mechanism in the model will result in some decision trees with low training accuracy have the same voting ability, thus reducing the voting accuracy. Moreover, the selection of the number of decision trees and other parameters in the stochastic forest model usually has a great influence on the final classification results of the stochastic forest. For those decision trees with poor training precision and relatively poor voting ability, this paper makes a detailed experiment and analysis of the traditional stochastic forest algorithm. The reason why the traditional stochastic forest algorithm has insufficient performance is basically determined: the mechanism of random forest voting will result in some decision trees with low training precision also having the same voting ability. This has a great influence on the accuracy of the final classification of the random forest. At the same time, it may produce the phenomenon that the highest number of votes in many categories is equal and it is difficult to classify. This phenomenon is defined as "death phenomenon" in this paper. In order to solve the classification difficulties caused by low precision decision making and high vote competition, Based on the traditional stochastic forest model, an accuracy weighted random forest algorithm is proposed in this paper, in which each decision tree is multiplied by a weight proportional to its training accuracy, and the parameters are difficult to select. Particle swarm optimization (PSO) is used to optimize the parameters that affect the new model, and the parameters included in the model are selected. At the same time, the relevant simulation experiments are designed to verify the six standard data sets in UCI database by Matlab software. Finally, the advantages and disadvantages of the new model are compared with different algorithms. By comparison, the advantages of the new model in classifying this kind of data are shown.
【學(xué)位授予單位】：鄭州大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP18;TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 王杰;李紅文;;Particle Swarm Optimization with Directed Mutation[J];Journal of Donghua University(English Edition);2016年05期

2 黃寶瑩;周臣清;黃玲玲;蘇妙儀;;配對(duì)t檢驗(yàn)法比較3種方法檢測奶粉中金黃色葡萄球菌計(jì)數(shù)結(jié)果[J];中國乳品工業(yè);2016年08期

3 潘峰;;基于C5.0決策樹算法的考試結(jié)果預(yù)測研究[J];微型機(jī)與應(yīng)用;2016年08期

4 王杰;蔡良健;高瑜;;一種基于決策樹的多示例學(xué)習(xí)算法[J];鄭州大學(xué)學(xué)報(bào)(理學(xué)版);2016年01期

5 楊飚;尚秀偉;;加權(quán)隨機(jī)森林算法研究[J];微型機(jī)與應(yīng)用;2016年03期

6 潘大勝;屈遲文;;一種改進(jìn)ID3型決策樹挖掘算法[J];華僑大學(xué)學(xué)報(bào)(自然科學(xué)版);2016年01期

7 王超學(xué);張濤;馬春森;;面向不平衡數(shù)據(jù)集的改進(jìn)型SMOTE算法[J];計(jì)算機(jī)科學(xué)與探索;2014年06期

8 李欣海;;隨機(jī)森林模型在分類與回歸分析中的應(yīng)用[J];應(yīng)用昆蟲學(xué)報(bào);2013年04期

9 董師師;黃哲學(xué);;隨機(jī)森林理論淺析[J];集成技術(shù);2013年01期

10 馮變英;張旭;張春枝;;關(guān)于t檢驗(yàn)方差分析及多重比較的研究[J];太原師范學(xué)院學(xué)報(bào)(自然科學(xué)版);2012年04期

相關(guān)博士學(xué)位論文前1條

1 張麗平;粒子群優(yōu)化算法的理論及實(shí)踐[D];浙江大學(xué);2005年

，

本文編號(hào)：2016144

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2016144.html

上一篇：中國電信西藏公司應(yīng)用商城的設(shè)計(jì)與實(shí)現(xiàn)
下一篇：電網(wǎng)工程設(shè)計(jì)評(píng)審業(yè)務(wù)過程管控模式研究與信息系統(tǒng)建設(shè)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

粒子群優(yōu)化加權(quán)隨機(jī)森林算法研究