基于信息;奶卣鬟x擇算法研究
本文選題:特征選擇 + 信息;; 參考:《閩南師范大學(xué)》2016年碩士論文
【摘要】:特征選擇作為數(shù)據(jù)預(yù)處理的關(guān)鍵手段,是數(shù)據(jù)挖掘、模式識別和機(jī)器學(xué)習(xí)等領(lǐng)域的重要研究課題之一。它是指在原始數(shù)據(jù)中刪除大量無關(guān)和冗余的特征,找到一組包含原始特征空間的全部或大部分分類信息的特征子集的過程。對于高維數(shù)據(jù),借鑒表征整體的思想,將數(shù)據(jù)集由一個大信息粒細(xì)化為多個可有效表征其整體的小信息粒,有助于從多層次、多視覺分析數(shù)據(jù)。因此,本文利用信息;谋碚鳈C(jī)制,將其運(yùn)用于特征選擇中,并構(gòu)造了一系列的基于信息粒化的特征選擇模型。本文首先介紹特征選擇問題的研究現(xiàn)狀,重點(diǎn)討論了鄰域;,大間隔和局部子空間模型。然后,針對數(shù)據(jù)中冗余和無關(guān)特征的消除問題,以粒化為基礎(chǔ),分別從樣本;⑻卣髁;约皹颖咎卣麟p重粒化三個角度,展開一系列的研究來解決不同的數(shù)據(jù)分類預(yù)測問題,本文主要的研究成果有:(1)從樣本;嵌瘸霭l(fā),結(jié)合特征本身具有質(zhì)量這一情況,提出了基于特征質(zhì)量的特征選擇算法。該算法根據(jù)信息熵和大間隔分別定義了特征質(zhì)量和最近鄰,并利用該近鄰實現(xiàn)了樣本的;。實驗從特征子集的緊湊性,分類精度,以及分類精度隨著特征數(shù)目的變化情況這三方面對模型進(jìn)行了驗證,結(jié)果表明基于特征質(zhì)量可以選擇一組有效的特征子集。(2)從樣本;嵌瘸霭l(fā),采用鄰域關(guān)系,提出了基于最大近鄰粗糙逼近的特征選擇算法MNNRS。該算法以鄰域粗糙集的特征選擇算法NRS為框架,利用大間隔定義了最大近鄰來;瘶颖,并修正了正域的計算方法。MNNRS算法保留了NRS算法的優(yōu)點(diǎn),且有效降低了計算復(fù)雜性,提高了算法的分類性能。(3)從特征;嵌瘸霭l(fā),針對多標(biāo)記數(shù)據(jù)集的高維性和標(biāo)記與特征之間存在的類屬關(guān)系,提出了基于局部子空間的多標(biāo)記特征選擇算法。該算法以局部子空間模型為基礎(chǔ),結(jié)合信息熵理論,鑒別了多標(biāo)記中對標(biāo)記集合相對次要,但卻不可遺漏的特征。實驗表明該算法能有效降低計算復(fù)雜性,提高分類性能,增強(qiáng)選擇策略的靈活性。(4)從樣本;吞卣髁;嵌瘸霭l(fā),針對高維小樣本數(shù)據(jù)存在高維性和易導(dǎo)致過擬合的問題,提出了一種啟發(fā)式的局部隨機(jī)特征選擇方法。該算法利用局部子空間模型來;卣,結(jié)合樣本的鄰域粒化,以提高分類模型的分類精度,降低計算代價,并在一定程度上解決了過擬合問題。
[Abstract]:As a key means of data preprocessing, feature selection is one of the important research topics in data mining, pattern recognition and machine learning. It refers to the process of removing a large number of irrelevant and redundant features from the original data and finding a set of feature subsets containing all or most of the classification information in the original feature space. For high-dimensional data, using the idea of representing the whole, the data set is refined from one large information particle to several small information grains that can effectively represent the whole of the data, which is helpful to analyze the data from multi-level and multi-vision. Therefore, this paper uses the representation mechanism of information granulation, applies it to feature selection, and constructs a series of feature selection models based on information granulation. In this paper, the current situation of feature selection is introduced, and the models of neighborhood granulation, large spacing and local subspace are discussed. Then, aiming at the problem of eliminating redundant and irrelevant features in the data, the granulation is based on three aspects: sample granulation, feature granulation and sample feature double granulation. A series of studies have been carried out to solve the problem of different data classification and prediction. In this paper, a feature selection algorithm based on feature quality is proposed from the point of view of sample granulation and considering the fact that the feature itself has quality. The feature quality and nearest neighbor are defined according to information entropy and large interval, respectively, and the granulation of samples is realized by using the nearest neighbor. The experiment verifies the model from three aspects: the compactness of feature subset, the classification accuracy, and the variation of classification accuracy with the number of features. The results show that a set of effective feature subsets can be selected based on feature quality. From the point of view of sample granulation, a feature selection algorithm based on maximum nearest neighbor rough approximation (MNNRS) is proposed. Based on the feature selection algorithm of neighborhood rough sets (NRS), this algorithm defines the maximum nearest neighbor granulated samples with large intervals, and modifies the positive domain computing method. MNNRS algorithm retains the advantages of NRS algorithm and reduces the computational complexity effectively. The classification performance of the algorithm is improved. (3) from the point of view of feature granulation, a multi-label feature selection algorithm based on local subspace is proposed in view of the high dimension of multi-label data set and the class relationship between label and feature. Based on the local subspace model and the information entropy theory, the algorithm identifies the features of multiple markers which are relatively secondary to the set of markers, but which cannot be omitted. Experiments show that the algorithm can effectively reduce the computational complexity, improve the classification performance, enhance the flexibility of the selection strategy. (4) from the point of view of sample granulation and feature granulation, the algorithm can solve the problems of high dimension and easy over-fitting of high-dimensional and small-sample data. A heuristic local random feature selection method is proposed. The algorithm uses local subspace model to granulate the feature and combines the neighborhood granulation of the sample to improve the classification accuracy of the classification model and reduce the computational cost and solve the problem of over-fitting to a certain extent.
【學(xué)位授予單位】:閩南師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP18;TP311.13
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 李鴻;;粒化思維研究[J];滁州學(xué)院學(xué)報;2010年05期
2 修保新,任雙橋,張維明;基于模糊信息;碚摰膱D像插值方法[J];國防科技大學(xué)學(xué)報;2004年03期
3 趙興永;陳慶凱;李傳紅;;高爐渣處理;啌p壞原因分析及改進(jìn)[J];科技傳播;2010年21期
4 張燕姑;廣義模糊;倔w論在知識工程中的應(yīng)用——模糊理論本質(zhì)研究[J];計算機(jī)工程與應(yīng)用;2005年01期
5 閆林;宋金朋;;數(shù)據(jù)集的;瘶浼捌浣(yīng)用[J];計算機(jī)科學(xué);2014年03期
6 劉生福;信息的;c劃分(覆蓋)解粒[J];計算機(jī)工程與應(yīng)用;2004年02期
7 羅敏;;粒計算及其研究現(xiàn)狀[J];計算機(jī)與現(xiàn)代化;2007年01期
8 王曉丹;田永梅;;粒計算與WEB信息;痆J];數(shù)字技術(shù)與應(yīng)用;2011年09期
9 陳艷艷;馬杰偉;趙海濤;楊國華;;高爐渣離心;瘮(shù)值仿真與試驗研究[J];計算機(jī)仿真;2013年02期
10 李鴻;;粒計算的基本要素研究[J];計算機(jī)技術(shù)與發(fā)展;2009年11期
相關(guān)會議論文 前8條
1 閆兆民;周揚(yáng)民;楊志遠(yuǎn);儀垂杰;;離心;碚撆c設(shè)備[A];第十一屆全國MOCVD學(xué)術(shù)會議論文集[C];2010年
2 薛青;徐文超;鄭長偉;劉永紅;;城市作戰(zhàn)仿真中戰(zhàn)場環(huán)境信息;P脱芯縖A];第13屆中國系統(tǒng)仿真技術(shù)及其應(yīng)用學(xué)術(shù)年會論文集[C];2011年
3 仇志國;;青鋼圖拉法;に嚨膽(yīng)用與改進(jìn)[A];2009年山東省煉鐵學(xué)術(shù)交流會論文集[C];2009年
4 李順;張功多;孟慶波;謝國威;;熔渣離心;酂岢醮位厥諏嶒炑芯縖A];2013年全國冶金能源環(huán)保生產(chǎn)技術(shù)會論文集[C];2013年
5 董志鵬;林東;樊促軍;;轉(zhuǎn)爐爐渣;に囋诒句摰膽(yīng)用[A];2005中國鋼鐵年會論文集(第2卷)[C];2005年
6 朱文淵;李先旺;李社鋒;;高爐熔渣干式粒化及熱能回收技術(shù)及工業(yè)應(yīng)用分析[A];2012年全國冶金安全環(huán)保暨能效優(yōu)化學(xué)術(shù)交流會論文集[C];2012年
7 劉軍祥;于慶波;竇晨曦;胡賢忠;;高爐渣轉(zhuǎn)杯式;膶嶒炑芯縖A];2008全國能源與熱工學(xué)術(shù)年會論文集[C];2008年
8 代勁;何中市;;基于云模型的快速信息;惴╗A];第五屆全國信息檢索學(xué)術(shù)會議論文集[C];2009年
相關(guān)重要報紙文章 前8條
1 林立恒;熔渣干法粒化工藝技術(shù)經(jīng)濟(jì)評估[N];世界金屬導(dǎo)報;2013年
2 羅錫蘭 羅恒志;達(dá)鋼;こ炭⒐ね懂a(chǎn)[N];中國冶金報;2003年
3 太鋼設(shè)計院 郝正榮 太鋼計控處 郝穎;節(jié)水、節(jié)電的高爐渣輪法;b置[N];山西科技報;2000年
4 ;高爐爐渣;到y(tǒng)[N];世界金屬導(dǎo)報;2003年
5 崔艷萍;INBA渣粒化系統(tǒng)-環(huán)境過程控制[N];世界金屬導(dǎo)報;2007年
6 劉譚t,
本文編號:1776085
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1776085.html