天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 軟件論文 >

一種基于特征庫投影的文本分類算法

發(fā)布時間:2018-10-23 18:44
【摘要】:基于KNN的主流文本分類策略適合樣本容量較大的自動分類,但存在時間復(fù)雜度偏高、特征降維和樣本剪裁易出現(xiàn)信息丟失等問題,本文提出一種基于特征庫投影(FLP)的分類算法。該算法首先將所有訓(xùn)練樣本的特征按照一定的權(quán)重策略構(gòu)筑特征庫,通過特征庫保留所有樣本特征信息;然后,通過投影函數(shù),根據(jù)待分類樣本的特征集合將每個分類的特征庫映射為投影樣本,通過計算新樣本與各分類投影樣本的相似度來完成分類。采用復(fù)旦大學國際數(shù)據(jù)庫中心自然語言處理小組整理的語料庫對所提出的分類算法進行驗證,分小量訓(xùn)練文本和大量訓(xùn)練文本2個場景進行測試,并與基于聚類的KNN算法進行對比。實驗結(jié)果表明:FLP分類算法不會丟失分類特征,分類精確度較高;分類效率與樣本規(guī)模的增長不直接關(guān)聯(lián),時間復(fù)雜度低。
[Abstract]:The mainstream text classification strategy based on KNN is suitable for automatic classification with large sample size, but it has some problems such as high time complexity, feature reduction and sample clipping, etc. In this paper, a classification algorithm based on feature base projection (FLP) is proposed. In this algorithm, the feature of all training samples is constructed according to a certain weight strategy, and the feature information of all samples is preserved through the feature library. According to the feature set of the samples to be classified, the feature bank of each classification is mapped to the projection sample, and the classification is completed by calculating the similarity between the new sample and the projection sample of each classification. The proposed classification algorithm is verified by the corpus compiled by the Natural language processing Group of the International Database Center of Fudan University. The proposed classification algorithm is tested in two scenarios: a small number of training texts and a large number of training texts. And compared with KNN algorithm based on clustering. The experimental results show that the FLP classification algorithm does not lose the classification features, and the classification accuracy is high, the classification efficiency is not directly related to the growth of sample size, and the time complexity is low.
【作者單位】: 湖南大學校園信息化建設(shè)與管理辦公室;湖南商學院旅游管理學院;湖南大學信息工程與科學學院;
【基金】:國家自然科學基金資助項目(61672221,61304184,61672156)~~
【分類號】:TP391.1

【相似文獻】

相關(guān)期刊論文 前10條

1 景寧,劉雨,彭甫陽;一種實用外分類算法—快速分類-折半插入算法的研究及實現(xiàn)[J];小型微型計算機系統(tǒng);1988年09期

2 鄭智捷;幻序合并分類算法[J];計算機學報;1984年05期

3 劉t,

本文編號:2290154


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2290154.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶47dae***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com