天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 自動化論文 >

一種面向文本分類的基于動態(tài)鄰域粗糙集的屬性約簡算法

發(fā)布時(shí)間:2018-05-18 13:40

  本文選題:粗糙集 + 鄰域粗糙集模型; 參考:《山東科技大學(xué)》2017年碩士論文


【摘要】:隨著機(jī)器學(xué)習(xí)領(lǐng)域不斷的發(fā)展與進(jìn)步,計(jì)算機(jī)處理海量數(shù)據(jù)的能力大大提升。但是,海量的數(shù)據(jù)中摻雜著大量冗余的、不完全的信息,對機(jī)器學(xué)習(xí)算法的性能造成了極大的影響。為了解決這一問題,有學(xué)者提出了數(shù)據(jù)約簡這一概念,即以保持原有數(shù)據(jù)分類能力為前提,剔除掉數(shù)據(jù)中的冗余信息。如何對海量數(shù)據(jù)進(jìn)行有效約簡的同時(shí)最大限度保留有用信息,是數(shù)據(jù)挖掘與機(jī)器學(xué)習(xí)領(lǐng)域中的重要研究方向。近年來粗糙集理論作為一種有效處理不精確、不一致、不完整數(shù)據(jù)的分析工具,在機(jī)器學(xué)習(xí)等諸多領(lǐng)域得到了廣泛地應(yīng)用。鄰域粗糙集模型作為粗糙集的一種拓展,能夠很好的對連續(xù)型數(shù)據(jù)進(jìn)行處理,從而解決了經(jīng)典粗糙集中出現(xiàn)的信息損失和對離散化方法的依賴問題。本文對鄰域粗糙集模型以及基于此模型的屬性約簡算法進(jìn)行研究,主要包括:(1)為更好的確定適合特定數(shù)據(jù)集的鄰域值,提高約簡效果,本文將FCM算法和鄰域粗糙集結(jié)合,并以屬性重要度為啟發(fā)條件,構(gòu)造了一種基于Canopy-FCM非對稱動態(tài)鄰域粗糙集模型的前向貪心屬性約簡算法,為每個(gè)屬性確定特定的鄰域值,使鄰域值的設(shè)定完全根據(jù)數(shù)據(jù)的分布,避免了設(shè)置全局定鄰域值的弊端,從而更準(zhǔn)確的選擇出對決策能力貢獻(xiàn)度高的屬性。在UCI上的公開數(shù)據(jù)集實(shí)驗(yàn)結(jié)果表明,本文算法能保留較少的條件屬性,而且較好的提升分類精度。(2)將本文提出的屬性約簡算法應(yīng)用于中文文本分類中,以提取關(guān)鍵特征詞并減少冗余詞匯對分類效果的影響。本文以李榮陸整理的中文文本語料庫為實(shí)驗(yàn)對象進(jìn)行實(shí)驗(yàn),實(shí)驗(yàn)結(jié)果表明,本文提出的屬性約簡算法可以很好地減少文本特征詞,降低文本集的維度,提高了對文本數(shù)據(jù)的分類能力,便于更準(zhǔn)確的捕捉關(guān)鍵信息,具有一定的實(shí)際意義。
[Abstract]:With the continuous development and progress of machine learning, the ability of computer to deal with massive data is greatly improved. However, a large amount of redundant and incomplete information is mixed in the massive data, which has a great impact on the performance of machine learning algorithm. In order to solve this problem, some scholars put forward the concept of data reduction, which is to eliminate redundant information from the data on the premise of maintaining the original data classification ability. It is an important research direction in the field of data mining and machine learning that how to effectively reduce the mass data while keeping the useful information to the maximum extent. In recent years, rough set theory, as an effective analysis tool for dealing with imprecise, inconsistent and incomplete data, has been widely used in many fields such as machine learning. As an extension of rough set, neighborhood rough set model can deal with continuous data well, thus solving the problem of information loss and dependence on discrete methods in classical rough sets. In this paper, the neighborhood rough set model and the attribute reduction algorithm based on this model are studied, including: 1) in order to better determine the neighborhood value suitable for a particular data set and improve the reduction effect, this paper combines FCM algorithm with neighborhood rough set. Taking attribute importance as the heuristic condition, a forward greedy attribute reduction algorithm based on Canopy-FCM asymmetric dynamic neighborhood rough set model is constructed, which determines the specific neighborhood value for each attribute and makes the neighborhood value set according to the distribution of the data. The disadvantage of setting global local neighborhood value is avoided, and the attribute with high contribution to decision-making ability is selected more accurately. The experimental results on the open dataset on UCI show that the proposed algorithm can retain less conditional attributes and improve the classification accuracy. (2) the proposed attribute reduction algorithm is applied to Chinese text classification. In order to extract the key feature words and reduce the influence of redundant words on the classification effect. In this paper, the Chinese text corpus compiled by Li Ronglu is used as the experimental object. The experimental results show that the attribute reduction algorithm proposed in this paper can reduce the text feature words and reduce the dimension of the text set. It improves the classification ability of text data, and it is convenient to capture the key information more accurately, which has certain practical significance.
【學(xué)位授予單位】:山東科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP18;TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 李波;;機(jī)器學(xué)習(xí)方向研究生創(chuàng)新能力培養(yǎng)[J];科教文匯(下旬刊);2015年07期

2 惠景麗;潘巍;吳康康;周曉英;;基于非對稱變鄰域粗糙集模型的屬性約簡[J];計(jì)算機(jī)科學(xué);2015年06期

3 焦娜;;基于差異關(guān)系的變精度粗糙集知識約簡算法研究[J];計(jì)算機(jī)科學(xué);2015年05期

4 邢麗莉;張兵;鹿玉紅;李忠;周海全;;基于粗糙集約簡并加權(quán)的SVM分類算法[J];科技通報(bào);2014年09期

5 單雪紅;吳濤;張文軍;高顯彩;;覆蓋粗糙集的偏序關(guān)系研究[J];計(jì)算機(jī)工程與應(yīng)用;2015年05期

6 毛清華;馬宏偉;張旭輝;;改進(jìn)鄰域粗糙集的輸送帶缺陷特征約簡算法[J];儀器儀表學(xué)報(bào);2014年07期

7 黃治國;劉罡;;代數(shù)與信息論觀點(diǎn)的分辨矩陣屬性約簡研究[J];微電子學(xué)與計(jì)算機(jī);2014年07期

8 錢文彬;楊炳儒;謝永紅;李慧;;一種基于屬性度量的快速屬性約簡算法[J];小型微型計(jì)算機(jī)系統(tǒng);2014年06期

9 茍和平;景永霞;馮百明;李勇;;一種基于粗糙集的改進(jìn)KNN文本分類算法[J];科學(xué)技術(shù)與工程;2012年20期

10 林治;張璇;;粗糙集理論的應(yīng)用探析[J];邢臺職業(yè)技術(shù)學(xué)院學(xué)報(bào);2011年03期

相關(guān)博士學(xué)位論文 前1條

1 李榮陸;文本分類及其相關(guān)技術(shù)研究[D];復(fù)旦大學(xué);2005年

相關(guān)碩士學(xué)位論文 前4條

1 梁海龍;基于鄰域粗糙集的屬性約簡和樣本約減算法研究及在文本分類中的應(yīng)用[D];太原理工大學(xué);2015年

2 廖啟明;基于粗糙集理論的屬性約簡與求核算法研究[D];長沙理工大學(xué);2012年

3 李楠;基于鄰域粗糙集的屬性約簡算法研究[D];陜西師范大學(xué);2011年

4 張濱;中文文檔分類技術(shù)研究[D];武漢大學(xué);2004年



本文編號:1906041

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1906041.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶d0b48***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com