天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于貝葉斯算法的垃圾郵件過(guò)濾系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)

發(fā)布時(shí)間:2018-03-26 19:04

  本文選題:郵件協(xié)議 切入點(diǎn):貝葉斯過(guò)濾器 出處:《吉林大學(xué)》2014年碩士論文


【摘要】:伴隨著互聯(lián)網(wǎng)的大爆發(fā),電子郵件成為人們?nèi)粘贤ń涣鞯闹匾绞,正是因(yàn)殡娮余]件有著無(wú)與倫比的優(yōu)勢(shì)像收發(fā)容易、操作簡(jiǎn)單、費(fèi)用低廉等,所以眾多網(wǎng)絡(luò)用戶(hù)將電子郵件列為他們的首選聯(lián)系方式。然而伴隨著網(wǎng)絡(luò)郵件的發(fā)展,我們的郵箱經(jīng)常會(huì)收到不認(rèn)識(shí)的人或地址發(fā)來(lái)的郵件。這種郵件以各種廣告信息為主例如免費(fèi)通話(huà)、打折商品、各種非法信息等。這些郵件可能與你的工作與生活毫不相干,或就是你十分厭惡的,但類(lèi)似的這些郵件每天都“執(zhí)著的”豐富著你的郵箱,打擾著你的生活,有時(shí)候它還會(huì)帶來(lái)病毒使計(jì)算機(jī)中毒導(dǎo)致癱瘓。這種強(qiáng)行進(jìn)入到用戶(hù)電子郵箱里的郵件就是所謂的垃圾郵件(UBE,Unsolicited Bulk Email)或者又稱(chēng)為商業(yè)宣傳郵件(Unsolicited Commercial Email,指以宣傳商品為主要內(nèi)容的郵件)。 鑒于垃圾郵件給現(xiàn)代社會(huì)造成了極大的危害,研究如何更好的抑制垃圾郵件的濫發(fā)變得愈發(fā)緊迫,國(guó)際化的反垃圾郵件技術(shù)一直是人們討論的熱點(diǎn)話(huà)題。本論文在基于前人的理論與研究基礎(chǔ)之上,系統(tǒng)的學(xué)習(xí)了電子郵件的理論與國(guó)際上的垃圾郵件過(guò)濾方法,主要分析的重點(diǎn)是樸素貝葉斯算法對(duì)垃圾郵件的分類(lèi)研究。論文首先介紹了電子郵件的發(fā)展歷程及電子郵件的工作原理,介紹了電子郵件中常用到的幾種協(xié)議,比如MIME(Multipurpose Internet Mail Extensions)、SMTP(Simple MailTransfer Protocol)。其次介紹了基于規(guī)則的垃圾郵件過(guò)濾,分別有發(fā)送者郵件地址分析、接收者郵件地址過(guò)濾、黑白名單過(guò)濾、郵件主題過(guò)濾等。這些相關(guān)的規(guī)則集組成了反垃圾郵件的第一道防線(xiàn)。最后重點(diǎn)介紹了基于內(nèi)容的樸素貝葉斯算法應(yīng)用于垃圾郵件過(guò)濾,根據(jù)樸素貝葉斯算法的不足做出了一些改進(jìn)。對(duì)中文分詞的幾種獲取方法進(jìn)行了相關(guān)的介紹,主要有詞典中文分詞方法、N-gram方法和人工分詞等,然后建立能表征郵件文本內(nèi)容的特征向量,,對(duì)已知分類(lèi)的郵件語(yǔ)料進(jìn)行系統(tǒng)學(xué)習(xí),利用樸素貝葉斯理論對(duì)新收到的電子封郵件進(jìn)行判別歸類(lèi),最終將電子郵件呈現(xiàn)給用戶(hù)為垃圾郵件還是正常郵件。 最后在理論與相關(guān)的技術(shù)的結(jié)合下,本文給出了一個(gè)樸素貝葉斯對(duì)垃圾郵件分類(lèi)的模擬,通過(guò)對(duì)郵件樣本學(xué)習(xí)進(jìn)行垃圾郵件過(guò)濾,垃圾郵件和正常郵件的比例參考了《中國(guó)反垃圾郵件狀況調(diào)查報(bào)告》中垃圾郵件占用戶(hù)郵件中的百分比,通過(guò)實(shí)驗(yàn)得到的數(shù)據(jù)反映了該方法對(duì)垃圾攔截的有效性。
[Abstract]:With the explosion of the Internet, email has become an important way of daily communication and communication. It is precisely because e-mail has unparalleled advantages, such as easy to send and receive, simple operation, low cost, etc. So many Internet users list email as their preferred contact information. However, with the development of online mail, Our email box often receives emails from people or addresses we don't know. They are based on a variety of advertising messages, such as free calls, discounted items, illegal messages, etc. These emails may have nothing to do with your work or life. Or what you hate very much, but these emails are "persistent" to enrich your mailbox and disturb your life every day. Sometimes it can also cause computer poisoning and paralysis. This forced entry into a user's e-mail is known as a spam message or a commercial promotion message. For the main content of the mail. In view of the great harm that spam has done to modern society, it is becoming increasingly urgent to study how to better curb spam spamming. International anti-spam technology has always been a hot topic of discussion. Based on previous theories and research, this paper systematically studied the theory of email and the international spam filtering methods. The emphasis of this paper is on the classification of spam by naive Bayes algorithm. Firstly, this paper introduces the development of email and the working principle of email, and introduces several protocols that are often used in email. For example, MIME(Multipurpose Internet Mail extension is a simple MailTransfer protocol. Secondly, we introduce spam filtering based on rules, such as sender email address analysis, receiver email address filtering, black-and-white list filtering, etc. These related rule sets form the first line of defense against spam. Finally, the application of content based naive Bayes algorithm to spam filtering is introduced. According to the shortcomings of naive Bayes algorithm, some improvements are made. Several methods of Chinese word segmentation are introduced, such as dictionary Chinese word segmentation method, N-gram method and artificial word segmentation method, etc. Then the feature vector which can represent the content of email text is established, and the known classified email corpus is studied systematically, and the newly received email is classified by using naive Bayes theory. Finally, presenting email to the user as spam or normal mail. Finally, under the combination of theory and related technology, this paper presents a naive Bayes simulation of spam classification, through the study of email samples for spam filtering, The proportion of spam to normal mail refers to the percentage of spam in users' emails in China Anti-Spam Survey report. The experimental data show the effectiveness of this method in spam interception.
【學(xué)位授予單位】:吉林大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類(lèi)號(hào)】:TP393.098

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 王峻;;一種基于屬性相關(guān)性度量的樸素貝葉斯分類(lèi)模型[J];安慶師范學(xué)院學(xué)報(bào)(自然科學(xué)版);2007年02期

2 董立巖;劉光遠(yuǎn);苑森淼;李永麗;孫銘會(huì);;混合式樸素貝葉斯分類(lèi)模型[J];吉林大學(xué)學(xué)報(bào)(信息科學(xué)版);2007年01期

3 陳少飛,郝亞南,李天柱,徐林昊,楊文柱;Web信息抽取技術(shù)研究進(jìn)展[J];河北大學(xué)學(xué)報(bào)(自然科學(xué)版);2003年01期

4 徐建民;劉清江;付婷婷;戴旭;;基于量化同義詞關(guān)系的改進(jìn)特征詞提取方法[J];河北大學(xué)學(xué)報(bào)(自然科學(xué)版);2010年01期

5 劉靜;余曉曄;丁立新;王振旗;;基于地址與內(nèi)容過(guò)濾的垃圾郵件過(guò)濾器設(shè)計(jì)[J];華北電力大學(xué)學(xué)報(bào);2006年03期

6 周茜,田忠和;基于SMTP組件的多功能郵件服務(wù)系統(tǒng)研究[J];華中理工大學(xué)學(xué)報(bào);2000年10期

7 李榮陸,王建會(huì),陳曉云,陶曉鵬,胡運(yùn)發(fā);使用最大熵模型進(jìn)行中文文本分類(lèi)[J];計(jì)算機(jī)研究與發(fā)展;2005年01期

8 司道浩;楊金升;;反垃圾郵件系統(tǒng)的內(nèi)容過(guò)濾模塊設(shè)計(jì)與實(shí)現(xiàn)[J];計(jì)算機(jī)與信息技術(shù);2006年09期

9 熊忠陽(yáng);黎剛;陳小莉;陳偉;;文本分類(lèi)中詞語(yǔ)權(quán)重計(jì)算方法的改進(jìn)與應(yīng)用[J];計(jì)算機(jī)工程與應(yīng)用;2008年05期

10 翟軍昌;秦玉平;王春立;;改進(jìn)的樸素貝葉斯垃圾郵件過(guò)濾算法[J];計(jì)算機(jī)工程與應(yīng)用;2009年14期



本文編號(hào):1669171

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1669171.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶(hù)d5b37***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com