天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 軟件論文 >

基于多指標融合的文本特征評價及選擇算法

發(fā)布時間:2019-03-13 09:20
【摘要】:在文本分類問題中,有多種評價特征優(yōu)劣的指標,其中主要有特征與類別的相關性、特征自身的冗余度和特征在語料中的稀疏程度。由于文本特征的優(yōu)劣直接影響分類效果,全方位考慮特征的各個因素很有必要。特征選擇常分為三步驟分別對相關性、冗余度和稀疏程度進行衡量,而在每一步的加權和篩選過程中都要耗費大量時間,在面對實時性和準確性要求較高的情況時,這種分步評價特征的方法很難適用。針對上述問題,首先建立坐標模型,將相關性、冗余度和稀疏程度映射到坐標系中,根據(jù)空間內(nèi)的點和原點構成的向量與坐標面或坐標軸的夾角對文本特征進行加權和篩選,從而將多個評價指標整合為一個評價指標,大幅節(jié)省了多次加權和篩選所耗費的時間,提高了特征選擇效率。在復旦大學中文文本語料庫和網(wǎng)易文本語料庫中的實驗結果表明,相比于分步法,基于多指標融合的文本特征評價及選擇算法能夠更快、更準地篩選詞匯和n-grams特征,并在支持向量機(Support Vector Machine,SVM)中驗證了特征在分類時的有效性。
[Abstract]:In the problem of text classification, there are a variety of indicators to evaluate the advantages and disadvantages of features, including the correlation between features and categories, the redundancy of features themselves and the sparse degree of features in the corpus. Because the advantages and disadvantages of the text features directly affect the classification effect, it is necessary to consider all the factors of the features in an all-round way. Feature selection is often divided into three steps to measure the correlation, redundancy and sparsity respectively. However, it takes a lot of time in each step of the weighting and screening process, and in the face of real-time and high accuracy requirements, This method of step-by-step evaluation of features is difficult to apply. In order to solve the above problems, firstly, the coordinate model is established, and the correlation, redundancy and sparsity are mapped to the coordinate system. The text features are weighted and screened according to the vector of the point and origin in the space and the angle between the coordinate plane or the coordinate axis. As a result, the multiple evaluation indexes are integrated into one evaluation index, which greatly saves the time of multiple weighting and screening, and improves the efficiency of feature selection. The experimental results in the Chinese text corpus of Fudan University and NetEase text corpus show that the multi-index fusion-based text feature evaluation and selection algorithm is faster and more accurate than the step-by-step method in selecting vocabulary and n-grams features. The validity of the feature in classification is verified in support vector machine (Support Vector Machine,SVM).
【作者單位】: 遼寧工程技術大學軟件學院;
【基金】:國家自然科學基金(No.70971059) 遼寧省創(chuàng)新團隊項目(No.2009T045) 遼寧省高等學校杰出青年學者成長計劃(No.LJQ2012027)
【分類號】:TP391.1
,

本文編號:2439266

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2439266.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權申明:資料由用戶9fa54***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
精品人妻av区波多野结依| 美国欧洲日本韩国二本道| 国产福利在线播放麻豆| 午夜久久久精品国产精品| 东京干男人都知道的天堂| 国产一区二区三区不卡| 亚洲国产精品无遮挡羞羞| 97人妻精品一区二区三区男同 | 一区二区三区日本高清| 亚洲欧美日产综合在线网| 五月婷婷六月丁香狠狠| 久草热视频这里只有精品| 午夜福利大片亚洲一区| 99久久精品一区二区国产| 午夜精品久久久免费视频| 久久精品一区二区少妇| 2019年国产最新视频| 色哟哟哟在线观看视频| 成人精品一区二区三区在线| 国产在线一区二区免费| 免费高清欧美一区二区视频| 好吊妞视频免费在线观看| 亚洲天堂国产精品久久精品| 好吊视频有精品永久免费| 亚洲成人精品免费在线观看| 国产精品推荐在线一区| 欧美日韩一级aa大片| 粗暴蹂躏中文一区二区三区| 白白操白白在线免费观看| 成人日韩视频中文字幕| 成人国产激情在线视频| 国产亚洲精品一二三区| 日本一本在线免费福利| 不卡免费成人日韩精品| 白丝美女被插入视频在线观看| 中文久久乱码一区二区| 激情中文字幕在线观看| 日韩中文字幕人妻精品| 久久精品国产亚洲av麻豆| 国产精品自拍杆香蕉视频| 日本一区二区三区久久娇喘|