基于深度學(xué)習(xí)理論和SVM技術(shù)的文本分類(lèi)研究與實(shí)現(xiàn)

發(fā)布時(shí)間：2019-05-17 13:35

【摘要】：隨著互聯(lián)網(wǎng)技術(shù)高速發(fā)展,產(chǎn)生海量的數(shù)據(jù)信息。每天都有數(shù)以百萬(wàn)計(jì)的網(wǎng)民通過(guò)互聯(lián)網(wǎng)獲取對(duì)自己有價(jià)值和意義的信息,如何能夠讓每一個(gè)人能快速、準(zhǔn)確的從海量的數(shù)據(jù)中得到自己想要的知識(shí)、技能,已經(jīng)成為當(dāng)前研究的熱點(diǎn)問(wèn)題。要解決這類(lèi)問(wèn)題,研究者對(duì)數(shù)據(jù)進(jìn)行獲取分析、挖掘、歸類(lèi),幫助人們提高信息檢索的效率。本文主要核心的工作是:利用深度學(xué)習(xí)進(jìn)行特征提取和支持向量機(jī)相結(jié)合的方法對(duì)海量數(shù)據(jù)文本進(jìn)行挖掘分類(lèi)和分析,最后得到文本的本質(zhì)特征。傳統(tǒng)的文本分類(lèi)算法都是采用期望交叉熵、信息增益和互信息等統(tǒng)計(jì)方法,通過(guò)設(shè)置閾值獲取特征集。如果訓(xùn)練集的數(shù)據(jù)量較大,則容易出現(xiàn)特征項(xiàng)不明確、特征信息丟失等缺陷,針對(duì)這些問(wèn)題,本文利用深度學(xué)習(xí)方法,結(jié)合現(xiàn)有的數(shù)據(jù)特點(diǎn),提出將深度學(xué)習(xí)的兩種方法和支持向量機(jī)方法進(jìn)行結(jié)合設(shè)計(jì)分類(lèi)器,完成文本分類(lèi),本文主要的研究?jī)?nèi)容和創(chuàng)新點(diǎn)如下:1.對(duì)國(guó)內(nèi)外現(xiàn)有的文本分類(lèi)技術(shù)的研究現(xiàn)狀和研究意義進(jìn)行了介紹,并且對(duì)文本分類(lèi)重要性進(jìn)行了闡述,最后指出了本論文要做的工作。2.首先研究了傳統(tǒng)的分類(lèi)技術(shù),從文本預(yù)處理,文本特征提取和文本分類(lèi)三部分充分研究,然后對(duì)貝葉斯,KNN,SVM分類(lèi)算法進(jìn)行闡述,并且對(duì)三種算法的適用范圍和優(yōu)缺點(diǎn)進(jìn)行了分析。3.介紹深度學(xué)習(xí)的相關(guān)理論知識(shí),提出了利用稀疏自動(dòng)編碼將原始數(shù)據(jù)進(jìn)行高維空間映射,運(yùn)用深度信念網(wǎng)絡(luò)對(duì)稀疏自動(dòng)編碼的輸出進(jìn)行投影獲取文本抽象特征。研究了深度學(xué)習(xí)中的稀疏自動(dòng)編碼和深度信念網(wǎng)絡(luò)相結(jié)合進(jìn)行文本特征提取的過(guò)程。4.本文結(jié)合深度學(xué)習(xí)和改進(jìn)的多分類(lèi)SVM方法,設(shè)計(jì)出由稀疏自動(dòng)編碼和深度信念網(wǎng)絡(luò),SVM分類(lèi)相結(jié)合的分類(lèi)器對(duì)文本進(jìn)行分類(lèi)。最后通過(guò)設(shè)計(jì)實(shí)驗(yàn),對(duì)本文提出的方法進(jìn)行測(cè)試,并與傳統(tǒng)的文本分類(lèi)方法進(jìn)行了比較和分析。通過(guò)修改參數(shù)測(cè)試文本分類(lèi)的準(zhǔn)確率。
[Abstract]:With the rapid development of Internet technology, a large number of data and information are produced. Every day, millions of netizens get valuable and meaningful information through the Internet. How can everyone get the knowledge and skills they want from massive data quickly and accurately? It has become a hot issue in current research. In order to solve this kind of problem, researchers analyze, mine and classify the data to help people improve the efficiency of information retrieval. The main work of this paper is to use deep learning for feature extraction and support vector machine to mine and analyze the massive data text, and finally get the essential features of the text. Traditional text classification algorithms use statistical methods such as expected cross entropy, information gain and mutual information to obtain feature sets by setting threshold values. If the amount of data in the training set is large, it is easy to have some defects, such as unclear feature items and loss of feature information. In order to solve these problems, this paper uses the deep learning method to combine the existing data characteristics. Two methods of deep learning and support vector machine (SVM) are proposed to design classifiers to complete text classification. the main research contents and innovations of this paper are as follows: 1. This paper introduces the research status and significance of the existing text classification technology at home and abroad, and expounds the importance of text classification, and finally points out the work to be done in this paper. 2. Firstly, the traditional classification technology is studied, which is fully studied from three parts: text preprocessing, text feature extraction and text classification, and then the Bayesian and KNN,SVM classification algorithms are described. The applicable scope, advantages and disadvantages of the three algorithms are analyzed. This paper introduces the related theoretical knowledge of depth learning, and proposes to use sparse automatic coding to map the original data in high dimensional space, and to use depth belief network to project the output of sparse automatic coding to obtain text abstract features. The process of text feature extraction based on sparse automatic coding and depth belief network in depth learning is studied. 4. In this paper, based on the deep learning and improved multi-classification SVM method, a classifier based on sparse automatic coding, depth belief network and SVM classification is designed to classify the text. Finally, through the design experiment, the method proposed in this paper is tested, and compared and analyzed with the traditional text classification method. The accuracy of text classification is tested by modifying parameters.
【學(xué)位授予單位】：江蘇科技大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類(lèi)號(hào)】：TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 郭正斌;張仰森;蔣玉茹;;一種面向文本分類(lèi)的特征向量?jī)?yōu)化方法[J];計(jì)算機(jī)應(yīng)用研究;2017年08期

2 肖江;王曉進(jìn);;基于SVM的在線商品評(píng)論的情感傾向性分析[J];信息技術(shù);2016年07期

3 耿杰;范劍超;初佳蘭;王洪玉;;基于深度協(xié)同稀疏編碼網(wǎng)絡(luò)的海洋浮筏SAR圖像目標(biāo)識(shí)別[J];自動(dòng)化學(xué)報(bào);2016年04期

4 常建秋;沈煒;;基于字符串匹配的中文分詞算法的研究[J];工業(yè)控制計(jì)算機(jī);2016年02期

5 盧宏濤;張秦川;;深度卷積神經(jīng)網(wǎng)絡(luò)在計(jì)算機(jī)視覺(jué)中的應(yīng)用研究綜述[J];數(shù)據(jù)采集與處理;2016年01期

6 曲建嶺;杜辰飛;邸亞洲;高峰;郭超然;;深度自動(dòng)編碼器的研究與展望[J];計(jì)算機(jī)與現(xiàn)代化;2014年08期

7 袁琳琳;陳紅平;;漢語(yǔ)自動(dòng)分詞系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];信息與電腦(理論版);2014年07期

8 梁勝;成衛(wèi)青;;基于組合型中文分詞技術(shù)的改進(jìn)[J];南京郵電大學(xué)學(xué)報(bào)(自然科學(xué)版);2013年06期

9 單麗莉;劉秉權(quán);孫承杰;;文本分類(lèi)中特征選擇方法的比較與改進(jìn)[J];哈爾濱工業(yè)大學(xué)學(xué)報(bào);2011年S1期

10 姜鶴;陳麗亞;;SVM文本分類(lèi)中一種新的特征提取方法[J];計(jì)算機(jī)技術(shù)與發(fā)展;2010年03期

相關(guān)碩士學(xué)位論文前1條

1 馬冬梅;基于深度學(xué)習(xí)的圖像檢索研究[D];內(nèi)蒙古大學(xué);2014年

，

本文編號(hào)：2479130

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2479130.html

上一篇：結(jié)合顯性與隱性空間光滑的高效二維圖像判別特征抽取
下一篇：基于SSL和即時(shí)通信的教學(xué)科研管理系統(tǒng)

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于深度學(xué)習(xí)理論和SVM技術(shù)的文本分類(lèi)研究與實(shí)現(xiàn)