半監(jiān)督學(xué)習(xí)框架下基于圖的SVM分類算法研究
發(fā)布時間:2018-02-26 02:18
本文關(guān)鍵詞: SVM 半監(jiān)督分類 偽標(biāo)記 LRR圖 去噪處理 出處:《北方民族大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
【摘要】:在機器學(xué)習(xí)領(lǐng)域,支持向量機(SVM)算法是較早的一種監(jiān)督學(xué)習(xí)算法,它解決了早期神經(jīng)網(wǎng)絡(luò)中的過擬合和“維數(shù)災(zāi)難”等問題,并在諸多領(lǐng)域發(fā)揮了很好的應(yīng)用。半監(jiān)督學(xué)習(xí)可以有效利用標(biāo)記樣本和無標(biāo)記樣本,充分挖掘整體樣本集的聚類結(jié)構(gòu)信息。相比監(jiān)督分類,對標(biāo)記樣本的數(shù)量要求不高,且性能較好。其中,基于圖的半監(jiān)督學(xué)習(xí)是當(dāng)前最流行的一種半監(jiān)督算法。本文在半監(jiān)督學(xué)習(xí)框架下提出一種基于圖模型的SVM分類算法,通過將無標(biāo)記樣本的特征信息納入到算法的訓(xùn)練過程,進一步提升SVM算法的分類精度。首先,利用基于圖的半監(jiān)督學(xué)習(xí)方法給無標(biāo)記樣本賦予偽標(biāo)記;然后將偽標(biāo)記樣本和標(biāo)記樣本信息共同輸入到SVM算法中。生成的偽標(biāo)記樣本可能存在噪聲樣本,我們應(yīng)先對偽標(biāo)記樣本集進行去噪處理,以避免噪聲樣本減弱擴充標(biāo)記樣本集所帶來的正面效應(yīng)。另外,偽標(biāo)記樣本的準(zhǔn)確率越高,噪聲樣本越少,樣本信息越有價值,工作量也會減少。所以,本文在擴充訓(xùn)練樣本集中標(biāo)記樣本數(shù)目的預(yù)處理階段,通過實驗對比選取一個分類精度較高,性能較好的圖模型,并結(jié)合SVM算法完成實驗。本文主要研究工作如下:(1)第一階段,針對UCI數(shù)據(jù)集和USPS手寫數(shù)據(jù)集,對指數(shù)權(quán)重圖(EW),k近鄰圖(kNN),1?范數(shù)圖(LN),低秩表示圖(LRR)進行實驗和分析,最終選擇低秩表示圖(LRR)作為樣本的預(yù)處理過程,不同的圖模型結(jié)合高斯場和調(diào)和函數(shù)(GHF)傳播算法完成分類實驗。(2)第二階段,對低秩表示圖(LRR)賦予偽標(biāo)記后的樣本利用k近鄰圖算法對比標(biāo)記值剔除噪聲樣本。并針對UCI數(shù)據(jù)集和USPS手寫數(shù)據(jù)集進行實驗,結(jié)果證明,本文提出算法相對傳統(tǒng)SVM算法在缺乏標(biāo)記樣本情況下,可充分挖掘整體樣本集樣本分布信息,將SVM轉(zhuǎn)換為一種新的樣本可擴充性的半監(jiān)督學(xué)習(xí)算法,且最終的分類精度更高。
[Abstract]:In the field of machine learning, support vector machine (SVM) algorithm is an earlier supervised learning algorithm, which solves the problem of early neural network in over fitting and "dimension disaster" and other issues, and played a very good application in many fields. Semi supervised learning can use labeled samples and unlabeled samples, full clustering structure information of the whole sample set. Compared with supervised classification, the number of labeled samples is not high, and good performance. The graph based semi supervised learning is a semi supervised algorithm is the most popular. This paper proposes a SVM classification algorithm based on graph model in the semi supervised learning framework, into the to the training process of the algorithm through the feature information will be unlabeled samples, to further improve the classification accuracy of SVM algorithm. Firstly, using semi supervised learning method to map the unlabeled samples with pseudo markers based on pseudo; then The labeled and labeled samples information is input to the SVM algorithm. The pseudo labeled samples may generate noise samples, we should first of pseudo labeledsamples denoising, to avoid noise samples decreased the positive effect brought the expansion of the labeledsamples. In addition, the pseudo labeled samples with higher accuracy, noise with fewer samples, sample information more valuable, the workload will be reduced. So, this paper expanded the preprocessing stage of the training sample set the number of labeled samples, select a higher classification accuracy compared with the experiment, graph model is a good performance, and SVM algorithm to complete the experiment. The main research work are as follows: (1) the first stage, according to the UCI data set and USPS data set of handwritten, index weight map (EW), K (kNN), the 1 nearest neighbor graph graph (LN), norm? Low rank representation (LRR) experiment and analysis, the final selection of low rank representation (LRR) as The pretreatment process of samples, different graph model combined with Gauss field and harmonic function (GHF) propagation algorithm to complete the classification experiment. (2) the second stage of low rank representation (LRR) provides pseudo labeled samples using k neighbor graph algorithm removes noise samples. The value of contrast markers and for the UCI data set and USPS handwritten data set for experiment, results show that the proposed algorithm compared with the traditional SVM algorithm in the case of lack of labeled samples, can fully excavate the sample distribution information of the whole sample set, SVM will be converted to a new sample scalable semi supervised learning algorithm, and the final classification accuracy is higher.
【學(xué)位授予單位】:北方民族大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP181
【參考文獻】
相關(guān)期刊論文 前1條
1 張健;李白燕;;基于圖論最小割集算法的圖像分割研究[J];激光技術(shù);2014年06期
相關(guān)博士學(xué)位論文 前1條
1 張國云;支持向量機算法及其應(yīng)用研究[D];湖南大學(xué);2006年
,本文編號:1536118
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1536118.html
最近更新
教材專著