一種基于語義的垃圾短信過濾算法
發(fā)布時間:2018-10-18 19:41
【摘要】:垃圾短信過濾是文本分類的一種,將用戶收到的短信分為正常短信和垃圾短信,從而實(shí)現(xiàn)對垃圾短信的屏蔽。在樸素貝葉斯分類算法的基礎(chǔ)上進(jìn)行改進(jìn),針對短信內(nèi)容較短包含信息不足的特點(diǎn),引入同義詞集對短信中特征詞進(jìn)行擴(kuò)展,降低同義特征詞分散給分類帶來的負(fù)面影響。同時針對垃圾短信自身包含的特殊信息,提出模式概念,采用模式概念替換具有相同模式的特征詞,使垃圾短信的特征更加集中,增強(qiáng)分類算法對垃圾短信的鑒別能力,最后通過實(shí)驗(yàn)對樸素貝葉斯算法以及改進(jìn)后算法的分類性能進(jìn)行了分析,驗(yàn)證了改進(jìn)后算法的有效性。
[Abstract]:Spam short message filtering is a kind of text classification, which classifies the SMS received by users into normal SMS and spam SMS, so that the spam SMS can be shielded. Based on the naive Bayes classification algorithm, aiming at the lack of information in short message, the synonym set is introduced to extend the feature words in short message, so as to reduce the negative effect of synonym dispersion on the classification. At the same time, aiming at the special information contained in spam message itself, the concept of pattern is put forward, and the concept of pattern is used to replace the feature word with the same pattern, so that the feature of spam short message is more concentrated, and the ability of classification algorithm to identify spam message is enhanced. Finally, the classification performance of the naive Bayes algorithm and the improved algorithm are analyzed through experiments, and the effectiveness of the improved algorithm is verified.
【作者單位】: 南京師范大學(xué)泰州學(xué)院信息工程學(xué)院;
【基金】:江蘇省大學(xué)生創(chuàng)新訓(xùn)練計(jì)劃項(xiàng)目(201613843015Y) 教育部—Google2014年校企合作產(chǎn)學(xué)合作項(xiàng)目(PO640068)
【分類號】:TP391.1
本文編號:2280169
[Abstract]:Spam short message filtering is a kind of text classification, which classifies the SMS received by users into normal SMS and spam SMS, so that the spam SMS can be shielded. Based on the naive Bayes classification algorithm, aiming at the lack of information in short message, the synonym set is introduced to extend the feature words in short message, so as to reduce the negative effect of synonym dispersion on the classification. At the same time, aiming at the special information contained in spam message itself, the concept of pattern is put forward, and the concept of pattern is used to replace the feature word with the same pattern, so that the feature of spam short message is more concentrated, and the ability of classification algorithm to identify spam message is enhanced. Finally, the classification performance of the naive Bayes algorithm and the improved algorithm are analyzed through experiments, and the effectiveness of the improved algorithm is verified.
【作者單位】: 南京師范大學(xué)泰州學(xué)院信息工程學(xué)院;
【基金】:江蘇省大學(xué)生創(chuàng)新訓(xùn)練計(jì)劃項(xiàng)目(201613843015Y) 教育部—Google2014年校企合作產(chǎn)學(xué)合作項(xiàng)目(PO640068)
【分類號】:TP391.1
【相似文獻(xiàn)】
相關(guān)期刊論文 前1條
1 何焱;宋麗麗;;關(guān)鍵領(lǐng)域熱點(diǎn)發(fā)現(xiàn)與跟蹤[J];西南師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2014年07期
,本文編號:2280169
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2280169.html
最近更新
教材專著