天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 軟件論文 >

基于詞頻類別相關的特征權重算法

發(fā)布時間:2019-05-13 10:17
【摘要】:在文本分類領域中,目前關于特征權重的研究存在兩方面不足:一方面,對于基于文檔頻率的特征權重算法,其中的文檔頻率常常忽略特征的詞頻信息;另一方面,對特征與類別的關系表達不夠準確和充分。針對以上不足,提出一種新的基于詞頻的類別相關特征權重算法(CDF-AICF)。該算法在度量特征權重時,考慮了特征在每個詞頻下的文檔頻率。同時,為了準確表達特征與類別的關系,提出了兩個新的概念:類別相關文檔頻率CDF和平均逆類頻率AICF,分別用于表示特征對類別的表現(xiàn)力和區(qū)分力。最后,通過與其他五個特征權重度量方法相比較,在三個數(shù)據(jù)集上進行分類實驗,結果顯示,CDF-AICF的分類性能優(yōu)于其他五種度量方法。
[Abstract]:In the field of text classification, there are two shortcomings in the current research on feature weight: on the one hand, for the feature weight algorithm based on document frequency, the document frequency often ignores the word frequency information of features; On the other hand, the expression of the relationship between features and categories is not accurate and sufficient. In order to overcome these shortcomings, a new class-related feature weight algorithm (CDF-AICF) based on word frequency is proposed. When measuring the feature weight, the algorithm takes into account the document frequency of the feature at each word frequency. At the same time, in order to accurately express the relationship between features and categories, two new concepts are proposed: category related document frequency CDF and average inverse class frequency AICF, are used to represent the expressive force and discriminant force of features to categories, respectively. Finally, compared with the other five feature weight measurement methods, the classification experiments are carried out on three data sets, and the results show that the classification performance of CDF-AICF is better than that of the other five measurement methods.
【作者單位】: 電子工程學院網絡系;
【分類號】:TP391.1


本文編號:2475801

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2475801.html


Copyright(c)文論論文網All Rights Reserved | 網站地圖 |

版權申明:資料由用戶b9feb***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com