基于隨機(jī)森林的視覺數(shù)據(jù)分類關(guān)鍵技術(shù)研究
本文選題:模式分類 + 隨機(jī)森林 ; 參考:《華南理工大學(xué)》2016年博士論文
【摘要】:隨著經(jīng)濟(jì)社會(huì)的發(fā)展,計(jì)算機(jī)技術(shù)和電子產(chǎn)品在人們?nèi)粘I钪幸呀?jīng)得到普及,每天都產(chǎn)生著大量的視頻、圖像。對視頻、圖像等視覺數(shù)據(jù)研究成為計(jì)算機(jī)視覺、模式識(shí)別和機(jī)器學(xué)習(xí)等領(lǐng)域科技工作者關(guān)注的焦點(diǎn)之一。由于視覺數(shù)據(jù)越來越多且更加復(fù)雜最終形成視覺大數(shù)據(jù),傳統(tǒng)的單個(gè)統(tǒng)計(jì)計(jì)算模型已經(jīng)不再能很好地分析、理解、挖掘和分類視覺大數(shù)據(jù)。近年來,機(jī)器學(xué)習(xí)方法已經(jīng)逐步成為計(jì)算機(jī)視覺、模式識(shí)別、數(shù)字信號(hào)處理、自動(dòng)化控制以及人工智能領(lǐng)域中對視覺數(shù)據(jù)進(jìn)行挖掘和統(tǒng)計(jì)分析的主要方法和工具。隨機(jī)森林是集成學(xué)習(xí)領(lǐng)域中一個(gè)重要的分支和方向,是對計(jì)算機(jī)視覺數(shù)據(jù)理解和分析的一種有效方法。該方法可以應(yīng)用于數(shù)據(jù)分類和回歸問題研究,其主要思想是通過構(gòu)建弱分類器(預(yù)測器),然后將已構(gòu)建的弱分類器(預(yù)測器)進(jìn)行組合得到一個(gè)集成的綜合系統(tǒng)。當(dāng)一個(gè)新的實(shí)例到來時(shí),這些集成的弱分類器(預(yù)測器)先單個(gè)進(jìn)行分類(預(yù)測),然后將它們的分類(預(yù)測)結(jié)果進(jìn)行投票(計(jì)算平均值)作為該實(shí)例的結(jié)果輸出。隨機(jī)森林作為一類有效的集成學(xué)習(xí)方法,在數(shù)據(jù)挖掘、模式識(shí)別、機(jī)器視覺以及人工智能等領(lǐng)域取得了諸多成就,并表現(xiàn)出了優(yōu)異的實(shí)踐能力。盡管隨機(jī)森林在實(shí)際應(yīng)用中取得了很多成就,但其在視覺數(shù)據(jù)屬性特征選擇、視覺數(shù)據(jù)樣本實(shí)例分布以及集成學(xué)習(xí)元基礎(chǔ)模型設(shè)計(jì)等方面研究尚未得到完全證明和充分解釋。本文開展基于隨機(jī)森林的視覺數(shù)據(jù)分類關(guān)鍵技術(shù)研究相關(guān)工作,主要研究隨機(jī)森林學(xué)習(xí)方法作為視覺數(shù)據(jù)分類器的相關(guān)工作與核心問題。具體來說,本文開展的主要研究工作和創(chuàng)新點(diǎn)有:(1)在基于特征選擇的隨機(jī)森林集成學(xué)習(xí)問題研究中,探討特征選擇對隨機(jī)森林作為視覺數(shù)據(jù)分類器的影響關(guān)系。本文提出一種基于移動(dòng)塊搜索的屬性特征選擇隨機(jī)森林方法;趬K搜索屬性特征選擇算法首先將視覺數(shù)據(jù)的屬性特征按照預(yù)定規(guī)則進(jìn)行分塊,然后從某個(gè)塊中的屬性特征與從剩下屬性特征中隨機(jī)抽取的特征共同構(gòu)成決策樹結(jié)點(diǎn)分裂的數(shù)據(jù)來源。當(dāng)全部元基礎(chǔ)模型決策樹都建好之后,對于一個(gè)新的測試樣本,由所有的決策樹投票后輸出相應(yīng)的類標(biāo)簽信息。在灰度共生矩陣、局部二值模式和多重分形譜等視覺數(shù)據(jù)生成算法基礎(chǔ)上,本文所提出的基于塊搜索屬性特征選擇算法在UIUC數(shù)據(jù)集、UMD數(shù)據(jù)集、KTH-TIPS數(shù)據(jù)集、ALOT數(shù)據(jù)集和FMD數(shù)據(jù)集等五個(gè)數(shù)據(jù)集分類結(jié)果具有一定的競爭力。(2)在基于實(shí)例測度分布的隨機(jī)森林學(xué)習(xí)關(guān)鍵技術(shù)研究中,從樣本實(shí)例數(shù)據(jù)分布的角度對隨機(jī)森林學(xué)習(xí)模型進(jìn)行研究。該方法是在建立元基礎(chǔ)模型時(shí),通過一定的測度學(xué)習(xí)途徑反應(yīng)出其數(shù)據(jù)的分布狀況并充分應(yīng)用到分類決策中的策略;趯(shí)例測度學(xué)習(xí)的隨機(jī)森林方法,通過對原始數(shù)據(jù)集以及采樣數(shù)據(jù)子集都分別采用混合高斯模型擬合數(shù)據(jù)分布之后,使得每個(gè)相應(yīng)的數(shù)據(jù)集都有一組混合高斯模擬分布的結(jié)果參數(shù),這些參數(shù)的個(gè)數(shù)是相同的,也就是維度相同的向量。通過測度學(xué)習(xí)方法計(jì)算任一個(gè)數(shù)據(jù)子集與原訓(xùn)練數(shù)據(jù)集之間的相似程度,根據(jù)數(shù)據(jù)子集分布與原始訓(xùn)練數(shù)據(jù)集之間子太相似或者太不相似投票時(shí)權(quán)重都應(yīng)較小的原則進(jìn)行集成。建議算法在ALOT數(shù)據(jù)集、Flower102花圖像分類數(shù)據(jù)集、Scene-15場景圖像分類數(shù)據(jù)集和Food101餐桌菜品圖像分類數(shù)據(jù)集四個(gè)實(shí)驗(yàn)中取得了良好的分類效果。(3)對基于集成學(xué)習(xí)的視覺數(shù)據(jù)分類器關(guān)鍵問題研究中元基礎(chǔ)模型本身復(fù)雜度進(jìn)行探討和研究,提出隨機(jī)深度決策森林提升方法模型。隨機(jī)深度決策森林提升方法是在深度提升方法的基礎(chǔ)上,根據(jù)不同數(shù)據(jù)子集應(yīng)該有著不同深度決策樹原則進(jìn)行模型設(shè)計(jì)與優(yōu)化求解。隨機(jī)深度決策森林提升方法是一種融合深度學(xué)習(xí)和深層決策樹思想的集成學(xué)習(xí)策略。該方法以提升方法為主體框架,采用隨機(jī)深度決策森林代替?zhèn)鹘y(tǒng)提升方法中單一決策樹的方法策略,是一種兩層結(jié)構(gòu)學(xué)習(xí)模型。通過在機(jī)器學(xué)習(xí)庫中英文字母識(shí)別數(shù)據(jù)集與FMD數(shù)據(jù)集上的實(shí)驗(yàn)結(jié)果驗(yàn)證了本文提出的隨機(jī)深度決策森林提升方法模型具有較高的可靠性和準(zhǔn)確性。
[Abstract]:With the development of economy and society, computer technology and electronic products have been popularized in people's daily life. A large number of video and images are produced every day. Visual data, such as video and image, have become one of the focus of computer vision, pattern recognition and machine learning. The more and more complex and more complex to form large visual data, the traditional single statistical computing model is no longer able to analyze, understand, excavate, and classify visual large data. In recent years, machine learning methods have gradually become computer vision, pattern recognition, digital signal processing, automation control, and artificial intelligence in the field of vision. According to the main methods and tools for mining and statistical analysis, random forest is an important branch and direction in the field of integrated learning. It is an effective method for understanding and analysis of computer visual data. This method can be applied to the research of data classification and regression. The main idea is to construct a weak classifier (predictor). Then the constructed weak classifier (predictor) is combined to get an integrated integrated system. When a new instance comes, the integrated weak classifier (predictor) first classifies (prediction) and then votes their classification (predicted) results as the result of the example. As an effective integrated learning method, many achievements have been achieved in the fields of data mining, pattern recognition, machine vision and artificial intelligence, and excellent practical ability is shown. Although a lot of achievements have been made in the practical application of random forests, the visual data samples are distributed in the visual data samples. Research on the design of meta model of integrated learning meta model has not been fully proved and fully explained. In this paper, the key technology of visual data classification based on random forest is researched, and the main research work and core problem of random forest learning method as visual data classifier. The research work and innovation are as follows: (1) in the study of random forest integrated learning based on feature selection, the influence of feature selection on the random forest as a visual data classifier is discussed. A random forest method based on the property feature selection based on mobile block search is proposed. The attribute features of the visual data are partitioned in accordance with the predetermined rules, and then the data sources of the decision tree node splitting are formed from the attribute features in the block and the random extraction features from the remaining attributes. When all the meta model decision trees are built, for a new test sample, all the decision trees are thrown. On the basis of grayscale symbiotic matrix, local two value model and multifractal spectrum generation algorithm, the proposed block search attribute feature selection algorithm is used to classify five data sets, such as UIUC data set, UMD dataset, KTH-TIPS data set, ALOT data set and FMD data set. (2) in the study of the key technology of random forest learning based on the distribution of case measure, the random forest learning model is studied from the point of view of sample data distribution. A random forest method based on case measure learning. By fitting a mixed Gauss model to the original data sets and sampling data subsets, each corresponding data set has a set of mixed Gauss simulation distribution results. The number of these parameters is the same, that is, The similarity between the data subset and the original training data set is calculated by the measure learning method. The proposed algorithm is integrated with the principle that the weight of the data subset distribution is too similar to the original training data set and the weight should be smaller when it is too dissimilar. The proposed algorithm is in the ALOT data set, and the Flower102 flower image is divided. Class data sets, Scene-15 scene image classification data sets and Food101 table dishes image classification data sets have achieved good classification results in four experiments. (3) the key problem of visual data classifier based on integrated learning is studied and studied in the complexity of the meta base model itself, and a random depth decision forest lifting party is proposed. The method of stochastic depth decision forest lifting is based on the method of depth lifting, which should be designed and optimized according to the principle of different depth decision tree. The method of forest lifting in random depth decision is an integrated learning strategy that combines deep learning and deep decision tree thinking. Taking the lifting method as the main frame, the stochastic depth decision forest is used to replace the single decision tree method in the traditional lifting method. It is a two layer structure learning model. Through the experimental results on the English letter recognition data set and the FMD data set in the machine learning library, the random depth decision forest hoisting party proposed in this paper is verified. The method model has high reliability and accuracy.
【學(xué)位授予單位】:華南理工大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP18
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 劉足華;熊惠霖;;基于隨機(jī)森林的目標(biāo)檢測與定位[J];計(jì)算機(jī)工程;2012年13期
2 董師師;黃哲學(xué);;隨機(jī)森林理論淺析[J];集成技術(shù);2013年01期
3 王象剛;;基于K均值隨機(jī)森林快速算法及入侵檢測中的應(yīng)用[J];科技通報(bào);2013年08期
4 陳姝;彭小寧;;基于粒子濾波和在線隨機(jī)森林分類的目標(biāo)跟蹤[J];江蘇大學(xué)學(xué)報(bào)(自然科學(xué)版);2014年02期
5 羅知林;陳挺;蔡皖東;;一個(gè)基于隨機(jī)森林的微博轉(zhuǎn)發(fā)預(yù)測算法[J];計(jì)算機(jī)科學(xué);2014年04期
6 王麗婷;丁曉青;方馳;;基于隨機(jī)森林的人臉關(guān)鍵點(diǎn)精確定位方法[J];清華大學(xué)學(xué)報(bào)(自然科學(xué)版);2009年04期
7 李建更;高志坤;;隨機(jī)森林針對小樣本數(shù)據(jù)類權(quán)重設(shè)置[J];計(jì)算機(jī)工程與應(yīng)用;2009年26期
8 張建;武東英;劉慧生;;基于隨機(jī)森林的流量分類方法[J];信息工程大學(xué)學(xué)報(bào);2012年05期
9 吳華芹;;基于訓(xùn)練集劃分的隨機(jī)森林算法[J];科技通報(bào);2013年10期
10 張華偉;王明文;甘麗新;;基于隨機(jī)森林的文本分類模型研究[J];山東大學(xué)學(xué)報(bào)(理學(xué)版);2006年03期
相關(guān)會(huì)議論文 前7條
1 謝程利;王金橋;盧漢清;;核森林及其在目標(biāo)檢測中的應(yīng)用[A];第六屆和諧人機(jī)環(huán)境聯(lián)合學(xué)術(shù)會(huì)議(HHME2010)、第19屆全國多媒體學(xué)術(shù)會(huì)議(NCMT2010)、第6屆全國人機(jī)交互學(xué)術(shù)會(huì)議(CHCI2010)、第5屆全國普適計(jì)算學(xué)術(shù)會(huì)議(PCC2010)論文集[C];2010年
2 武曉巖;方慶偉;;基因表達(dá)數(shù)據(jù)分析的隨機(jī)森林方法及算法改進(jìn)[A];黑龍江省第十次統(tǒng)計(jì)科學(xué)討論會(huì)論文集[C];2008年
3 張?zhí)忑?梁龍;王康;李華;;隨機(jī)森林結(jié)合激光誘導(dǎo)擊穿光譜技術(shù)用于的鋼鐵分類[A];中國化學(xué)會(huì)第29屆學(xué)術(shù)年會(huì)摘要集——第19分會(huì):化學(xué)信息學(xué)與化學(xué)計(jì)量學(xué)[C];2014年
4 相玉紅;張卓勇;;組蛋白去乙;敢种苿┑臉(gòu)效關(guān)系研究[A];第十一屆全國計(jì)算(機(jī))化學(xué)學(xué)術(shù)會(huì)議論文摘要集[C];2011年
5 張濤;李貞子;武曉巖;李康;;隨機(jī)森林回歸分析方法及在代謝組學(xué)中的應(yīng)用[A];2011年中國衛(wèi)生統(tǒng)計(jì)學(xué)年會(huì)會(huì)議論文集[C];2011年
6 馮飛翔;馮輔周;江鵬程;劉菁;劉建敏;;隨機(jī)森林和k-近鄰法在某型坦克變速箱狀態(tài)識(shí)別中的應(yīng)用[A];第八屆全國轉(zhuǎn)子動(dòng)力學(xué)學(xué)術(shù)討論會(huì)論文集[C];2008年
7 曹東升;許青松;梁逸曾;陳憲;李洪東;;組合樹的集合體和后向消除策略去分類P-糖蛋白化合物[A];第十屆全國計(jì)算(機(jī))化學(xué)學(xué)術(shù)會(huì)議論文摘要集[C];2009年
相關(guān)博士學(xué)位論文 前5條
1 張乾;基于隨機(jī)森林的視覺數(shù)據(jù)分類關(guān)鍵技術(shù)研究[D];華南理工大學(xué);2016年
2 曹正鳳;隨機(jī)森林算法優(yōu)化研究[D];首都經(jīng)濟(jì)貿(mào)易大學(xué);2014年
3 雷震;隨機(jī)森林及其在遙感影像處理中應(yīng)用研究[D];上海交通大學(xué);2012年
4 岳明;基于隨機(jī)森林和規(guī)則集成法的酒類市場預(yù)測與發(fā)展戰(zhàn)略[D];天津大學(xué);2008年
5 李書艷;單點(diǎn)氨基酸多態(tài)性與疾病相關(guān)關(guān)系的預(yù)測及其機(jī)制研究[D];蘭州大學(xué);2010年
相關(guān)碩士學(xué)位論文 前10條
1 錢維;藥品不良反應(yīng)監(jiān)測中隨機(jī)森林方法的建立與實(shí)現(xiàn)[D];第二軍醫(yī)大學(xué);2012年
2 韓燕龍;基于隨機(jī)森林的指數(shù)化投資組合構(gòu)建研究[D];華南理工大學(xué);2015年
3 賀捷;隨機(jī)森林在文本分類中的應(yīng)用[D];華南理工大學(xué);2015年
4 張文婷;交通環(huán)境下基于改進(jìn)霍夫森林的目標(biāo)檢測與跟蹤[D];華南理工大學(xué);2015年
5 李強(qiáng);基于多視角特征融合與隨機(jī)森林的蛋白質(zhì)結(jié)晶預(yù)測[D];南京理工大學(xué);2015年
6 朱玟謙;一種收斂性隨機(jī)森林在人臉檢測中的應(yīng)用研究[D];武漢理工大學(xué);2015年
7 肖宇;基于序列圖像的手勢檢測與識(shí)別算法研究[D];電子科技大學(xué);2014年
8 李慧;一種改進(jìn)的隨機(jī)森林并行分類方法在運(yùn)營商大數(shù)據(jù)的應(yīng)用[D];電子科技大學(xué);2015年
9 趙亞紅;面向多類標(biāo)分類的隨機(jī)森林算法研究[D];哈爾濱工業(yè)大學(xué);2014年
10 黎成;基于隨機(jī)森林和ReliefF的致病SNP識(shí)別方法[D];西安電子科技大學(xué);2014年
,本文編號(hào):1818997
本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/1818997.html