粒子群優(yōu)化加權隨機森林算法研究
本文選題:隨機森林 + 粒子群。 參考:《鄭州大學》2017年碩士論文
【摘要】:隨機森林(Random Forest,RF)算法是2001年由Breiman提出的一種分類模型。其本質是將Bagging的Bootstrap Aggregating算法和Ho的隨機子空間(Random Subspace)算法結合起來,通過對多棵決策樹分類結果采取投票選取機制,確定最終的分類結果。隨機森林算法自提出之后,被廣泛地運用于數(shù)據(jù)挖掘與分類問題,后來還有許多學者對模型做出了改進。隨機森林是一種高效的分類算法,隨機森林模型的優(yōu)點在于它不需要樣本的背景知識,不用做變量選擇,擁有很高的噪聲容忍度,因此可省略數(shù)據(jù)預處理的繁瑣工作。但模型中的投票選取機制會導致一些訓練精度較低的決策樹也擁有相同的投票能力,從而降低投票準確度。而且隨機森林模型中的決策樹棵數(shù)及其它參數(shù)的選取通常對隨機森林的最終分類結果也有較大的影響。針對那些訓練精度不優(yōu)、投票能力相對較差的決策樹,本文通過對傳統(tǒng)隨機森林算法進行詳細試驗與分析,基本確定了傳統(tǒng)隨機森林算法性能不足的原因:隨機森林投票選取機制會導致一些訓練精度較低的決策樹也擁有相同的投票能力,這對隨機森林最終的分類結果準確率造成較大的影響。在分類的同時也可能會產生多個類別的最高票數(shù)相同而最終導致難以分類的現(xiàn)象,本文將此現(xiàn)象定義為“死局現(xiàn)象”。為解決低精度決策和高票數(shù)競爭帶來的分類困難,本文以傳統(tǒng)隨機森林模型為基礎提出一種精確度加權隨機森林算法(Accuracy Weighted Random Forest,AWRF),即在投票時將每棵決策樹乘以一個與其訓練精度成正比的權重,針對參數(shù)難以選取的問題,采取粒子群算法對影響新模型的參數(shù)進行迭代優(yōu)化,選取模型中包括的參數(shù)。同時設計相關仿真實驗對比,通過Matlab軟件對UCI數(shù)據(jù)庫中6個標準數(shù)據(jù)集進行驗證,最后用不同的算法對比新模型的優(yōu)缺點。通過對比得出結論,表明了新模型在對此類數(shù)據(jù)分類時的優(yōu)勢。
[Abstract]:Random Forest Random (RFF) algorithm is a classification model proposed by Bizaran in 2001. Its essence is to combine bagging bootstrap Aggregating algorithm and Ho's random subspace algorithm to determine the final classification results by adopting the voting selection mechanism for the classification results of multiple decision trees. Since the stochastic forest algorithm was proposed, it has been widely used in data mining and classification problems, and many scholars have improved the model. Stochastic forest is an efficient classification algorithm. The advantage of stochastic forest model is that it does not need the background knowledge of sample, does not need to do variable selection, and has high noise tolerance, so it can omit the tedious work of data preprocessing. However, the voting selection mechanism in the model will result in some decision trees with low training accuracy have the same voting ability, thus reducing the voting accuracy. Moreover, the selection of the number of decision trees and other parameters in the stochastic forest model usually has a great influence on the final classification results of the stochastic forest. For those decision trees with poor training precision and relatively poor voting ability, this paper makes a detailed experiment and analysis of the traditional stochastic forest algorithm. The reason why the traditional stochastic forest algorithm has insufficient performance is basically determined: the mechanism of random forest voting will result in some decision trees with low training precision also having the same voting ability. This has a great influence on the accuracy of the final classification of the random forest. At the same time, it may produce the phenomenon that the highest number of votes in many categories is equal and it is difficult to classify. This phenomenon is defined as "death phenomenon" in this paper. In order to solve the classification difficulties caused by low precision decision making and high vote competition, Based on the traditional stochastic forest model, an accuracy weighted random forest algorithm is proposed in this paper, in which each decision tree is multiplied by a weight proportional to its training accuracy, and the parameters are difficult to select. Particle swarm optimization (PSO) is used to optimize the parameters that affect the new model, and the parameters included in the model are selected. At the same time, the relevant simulation experiments are designed to verify the six standard data sets in UCI database by Matlab software. Finally, the advantages and disadvantages of the new model are compared with different algorithms. By comparison, the advantages of the new model in classifying this kind of data are shown.
【學位授予單位】:鄭州大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP18;TP311.13
【參考文獻】
相關期刊論文 前10條
1 王杰;李紅文;;Particle Swarm Optimization with Directed Mutation[J];Journal of Donghua University(English Edition);2016年05期
2 黃寶瑩;周臣清;黃玲玲;蘇妙儀;;配對t檢驗法比較3種方法檢測奶粉中金黃色葡萄球菌計數(shù)結果[J];中國乳品工業(yè);2016年08期
3 潘峰;;基于C5.0決策樹算法的考試結果預測研究[J];微型機與應用;2016年08期
4 王杰;蔡良健;高瑜;;一種基于決策樹的多示例學習算法[J];鄭州大學學報(理學版);2016年01期
5 楊飚;尚秀偉;;加權隨機森林算法研究[J];微型機與應用;2016年03期
6 潘大勝;屈遲文;;一種改進ID3型決策樹挖掘算法[J];華僑大學學報(自然科學版);2016年01期
7 王超學;張濤;馬春森;;面向不平衡數(shù)據(jù)集的改進型SMOTE算法[J];計算機科學與探索;2014年06期
8 李欣海;;隨機森林模型在分類與回歸分析中的應用[J];應用昆蟲學報;2013年04期
9 董師師;黃哲學;;隨機森林理論淺析[J];集成技術;2013年01期
10 馮變英;張旭;張春枝;;關于t檢驗方差分析及多重比較的研究[J];太原師范學院學報(自然科學版);2012年04期
相關博士學位論文 前1條
1 張麗平;粒子群優(yōu)化算法的理論及實踐[D];浙江大學;2005年
,本文編號:2016144
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2016144.html