天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

基于特征選擇的文本分類方法研究及其應(yīng)用

發(fā)布時(shí)間:2018-03-31 12:57

  本文選題:文本分類 切入點(diǎn):特征選擇 出處:《江南大學(xué)》2017年碩士論文


【摘要】:隨著計(jì)算機(jī)技術(shù)的不斷發(fā)展,網(wǎng)絡(luò)信息數(shù)據(jù)呈爆發(fā)式增長,這些信息在豐富人們生活的同時(shí),也產(chǎn)生了很多無用甚至有害的信息,給信息的合理有效應(yīng)用帶了困難和挑戰(zhàn)。如何在眾多數(shù)據(jù)中準(zhǔn)確尋找到對自己有用的信息,已成為信息技術(shù)領(lǐng)域有待進(jìn)一步解決的問題。而文本分類技術(shù)為這一問題提供有效的解決方案,傳統(tǒng)基于專家知識的人工分類方法花費(fèi)大量人力和時(shí)間成本,已難以適應(yīng)現(xiàn)代社會數(shù)據(jù)的增長,隨著科學(xué)發(fā)展,出現(xiàn)了自動文本分類方法。特征選擇方法是文本分類中不可或缺的技術(shù),其對特征的選取能力將嚴(yán)重影響類別判斷的結(jié)果。本文主要針對傳統(tǒng)的卡方統(tǒng)計(jì)特征選擇方法未能充分考慮類內(nèi)詞頻和特征項(xiàng)分布情況,提出了一種關(guān)于類內(nèi)信息優(yōu)化卡方統(tǒng)計(jì)的特征選擇方法。在分類方法中,支持向量機(jī)作為文本自動分類方法中最典型的機(jī)器學(xué)習(xí)方法之一,具有簡單、高效,且分類準(zhǔn)確率高等優(yōu)點(diǎn),不斷受到眾多學(xué)者的廣泛關(guān)注。本文采用支持向量機(jī)進(jìn)行文本分類,為進(jìn)一步提高其分類精度,針對支持向量機(jī)中參數(shù)難以選擇問題,提出改進(jìn)人工蜂群算法優(yōu)化支持向量機(jī)模型對文本進(jìn)行分類,對基本人工蜂群算法的引領(lǐng)蜂和跟隨蜂搜索策略進(jìn)行改進(jìn),有效提高分類準(zhǔn)確率。為拓寬文本分類方法的應(yīng)用領(lǐng)域,構(gòu)建基于人類p53癌癥基因二級生物信息數(shù)據(jù)庫作為文本分類的語料庫,該數(shù)據(jù)庫主要包含了多種癌癥p53基因的外顯子和內(nèi)含子序列信息,為深入研究癌癥提供良好的平臺。同時(shí)提出了一種基于擬比對細(xì)胞神經(jīng)網(wǎng)絡(luò)的序列比對方法對數(shù)據(jù)庫中的癌癥p53基因進(jìn)行序列比對分析,有效提高了序列比對的相似度,為進(jìn)一步研究癌癥文本分類提供了理論基礎(chǔ)。
[Abstract]:With the development of computer technology, the data of network information is increasing explosively, which not only enriches people's life, but also produces a lot of useless and even harmful information, which brings difficulties and challenges to the rational and effective application of information.How to accurately find useful information in many data has become a problem to be solved in the field of information technology.Text classification technology provides an effective solution to this problem. The traditional manual classification method based on expert knowledge costs a lot of manpower and time, so it is difficult to adapt to the growth of modern social data, with the development of science.An automatic text categorization method appears.Feature selection is an indispensable technique in text categorization, and its ability to select features will seriously affect the result of category judgment.Aiming at the fact that the traditional chi-square statistical feature selection method fails to fully consider the word frequency and the distribution of feature items within the class, this paper proposes a feature selection method for optimizing chi-square statistics on intra-class information.As one of the most typical machine learning methods in automatic text classification, support vector machine (SVM) has the advantages of simplicity, high efficiency and high classification accuracy, so it has been paid more and more attention by many scholars.In this paper, support vector machine (SVM) is used for text classification. In order to improve the classification accuracy, an improved artificial bee colony algorithm is proposed to optimize the support vector machine model for text classification, aiming at the difficulty of selecting parameters in support vector machine (SVM).In order to improve the classification accuracy of the basic artificial bee colony algorithm, the search strategies of leading bee and following bee are improved.In order to widen the application field of text classification methods, the secondary biological information database of human p53 cancer gene is constructed as the corpus of text classification. The database mainly contains exon and intron sequence information of many kinds of cancer p53 gene.It provides a good platform for further research on cancer.At the same time, a sequence alignment method based on pseudo alignment cell neural network is proposed to analyze the cancer p53 gene sequence alignment in the database, which effectively improves the similarity of sequence alignment.It provides a theoretical basis for the further study of cancer text classification.
【學(xué)位授予單位】:江南大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前5條

1 秦全德;程適;李麗;史玉回;;人工蜂群算法研究綜述[J];智能系統(tǒng)學(xué)報(bào);2014年02期

2 林煜明;王曉玲;朱濤;周傲英;;用戶評論的質(zhì)量檢測與控制研究綜述[J];軟件學(xué)報(bào);2014年03期

3 張紫瓊;葉強(qiáng);李一軍;;互聯(lián)網(wǎng)商品評論情感分析研究綜述[J];管理科學(xué)學(xué)報(bào);2010年06期

4 周炎濤;唐劍波;王家琴;;基于信息熵的改進(jìn)TFIDF特征選擇算法[J];計(jì)算機(jī)工程與應(yīng)用;2007年35期

5 錢曉東,王正歐;基于改進(jìn)KNN的文本分類方法[J];情報(bào)科學(xué);2005年04期

,

本文編號:1690835

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1690835.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶2a25d***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com
亚洲国产成人av毛片国产| 国产情侣激情在线对白| 爱草草在线观看免费视频| 欧美精品久久99九九| 国产欧美韩日一区二区三区| 欧美整片精品日韩综合| 中文字幕一区二区免费| 久久国产青偷人人妻潘金莲| 亚洲国产欧美久久精品| 免费在线播放不卡视频| 国产免费一区二区不卡| 中日韩美一级特黄大片| 香蕉久久夜色精品国产尤物| 99国产一区在线播放| 四季av一区二区播放| 国产欧美日韩精品一区二| 亚洲国产av国产av| 欧美日韩综合综合久久久| 高中女厕偷拍一区二区三区| 午夜精品在线观看视频午夜| 日本精品最新字幕视频播放| 东京热男人的天堂一二三区| 日韩不卡一区二区在线| 国产日韩精品欧美综合区| 中文字幕无线码一区欧美| 狠狠干狠狠操亚洲综合| 日韩人妻一区二区欧美| 成年男女午夜久久久精品| 欧美大胆美女a级视频| 亚洲五月婷婷中文字幕| 国产又粗又长又爽又猛的视频| 免费精品一区二区三区| 欧美人妻少妇精品久久性色| 欧美成人欧美一级乱黄| 亚洲第一区二区三区女厕偷拍 | 欧美日韩国产免费看黄片| 亚洲天堂精品在线视频| 日韩蜜桃一区二区三区| 欧美精品中文字幕亚洲| 我想看亚洲一级黄色录像| 亚洲欧美一二区日韩高清在线 |