用于垃圾郵件識別的“詞頻-篩”混合特征選擇方法
發(fā)布時間:2018-06-12 04:31
本文選題:垃圾郵件識別 + 混合特征選擇方法 ; 參考:《華南理工大學(xué)學(xué)報(自然科學(xué)版)》2017年03期
【摘要】:文中針對當(dāng)下愈發(fā)泛濫的垃圾郵件,分別使用樸素貝葉斯分類和支持向量機分類法對當(dāng)前日益泛濫的垃圾郵件進行識別、分類,將"詞頻-篩"混合特征選擇方法應(yīng)用于分類器模型中,以提高分類器的識別性能.同時,通過考慮更全面的分類概率情況,改進樸素貝葉斯分類模型,進一步提升樸素貝葉斯分類器的識別性能.最后通過實驗得到了該垃圾郵件識別系統(tǒng)的準(zhǔn)確率、召回率和F1值等分類識別性能指標(biāo).實驗結(jié)果表明,"詞頻-篩"混合特征選擇方法能有效提高垃圾郵件分類器的識別性能,而且使用成本敏感方法的分類輸出調(diào)節(jié)模塊也能大大降低分類器將正常郵件誤判為垃圾郵件的概率,因此,文中設(shè)計的垃圾郵件識別系統(tǒng)具有較強的實用性,可以在實際工作、生活中使用.
[Abstract]:In this paper, we use naive Bayes classification and support vector machine classification to identify and classify the spam which is becoming more and more widespread. In order to improve the recognition performance of the classifier, the "word frequency sieve" hybrid feature selection method is applied to the classifier model. At the same time, by considering more comprehensive classification probability, the naive Bayesian classification model is improved to further improve the recognition performance of naive Bayesian classifier. Finally, the accuracy rate, recall rate and F1 value of the spam recognition system are obtained by experiments. The experimental results show that the mixed feature selection method of "word frequency sieve" can effectively improve the recognition performance of spam classifier. Moreover, the classification output adjustment module using the cost sensitive method can greatly reduce the probability that the classifier can misjudge the normal mail as spam. Therefore, the spam identification system designed in this paper has strong practicability and can be used in practice. Used in life.
【作者單位】: 華南理工大學(xué)軟件學(xué)院∥廣州市機器人軟件及復(fù)雜信息處理重點實驗室;
【基金】:廣東省自然科學(xué)基金資助項目(2016A030310412) 廣東高校省級重點平臺及科研項目-青年創(chuàng)新人才類項目(2015KQNCX003) 廣州市科技計劃重點實驗室項目(15180007);廣州市科技計劃項目(201707010223)~~
【分類號】:TP18;TP393.098
【相似文獻】
相關(guān)期刊論文 前10條
1 王琳;陳偉萍;封化民;方勇;楊鼎才;;基于類別概念的特征選擇方法[J];北京電子科技學(xué)院學(xué)報;2006年02期
2 毛俐e,
本文編號:2008363
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2008363.html
最近更新
教材專著