基于郵件意圖與指紋分析的垃圾郵件過(guò)濾方法研究
發(fā)布時(shí)間:2018-05-09 14:10
本文選題:垃圾郵件 + 特征選擇 ; 參考:《廈門大學(xué)》2014年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)的飛速發(fā)展,電子郵件已經(jīng)成為一種非常流行的溝通工具,被廣泛的應(yīng)用于個(gè)人通信和企業(yè)環(huán)境中。與之而來(lái)的垃圾郵件卻給網(wǎng)絡(luò)用戶帶來(lái)了非常大的安全隱患,這些安全隱患包括浪費(fèi)使用者時(shí)間、存儲(chǔ)空間以及寶貴的網(wǎng)絡(luò)帶寬等。如今,有越來(lái)越多的人專注于制造垃圾郵件,只要企業(yè)和互聯(lián)網(wǎng)有郵件接口就不能豁免其影響,就連主流的社交網(wǎng)站,像是騰訊、新浪微博和谷歌這樣的知名企業(yè)都不能例外。由于垃圾郵件爆炸式的增長(zhǎng),也產(chǎn)生了眾多與之對(duì)應(yīng)的反垃圾郵件技術(shù)。這種‘掰手腕’式的技術(shù)對(duì)抗也使得反垃圾郵件的手段和技術(shù)逐年成熟。盡管在新的反垃圾郵件技術(shù)部署之后,垃圾郵件可以得到暫時(shí)的控制,但是制造者們也在不斷反過(guò)濾,產(chǎn)生更新的垃圾郵件技術(shù)。針對(duì)以上情況,本文開(kāi)展了以下方面的研究工作: ●分析和研究當(dāng)下反垃圾郵件系統(tǒng)的設(shè)計(jì)原理、實(shí)現(xiàn)方法以及現(xiàn)狀,對(duì)典型的反垃圾郵件技術(shù)的特點(diǎn)進(jìn)行歸納和總結(jié),了解和把握垃圾郵件識(shí)別技術(shù)新的發(fā)展趨勢(shì); ●通過(guò)對(duì)大量垃圾郵件和正常郵件進(jìn)行分析,發(fā)現(xiàn)二者的郵件發(fā)送者在意圖表現(xiàn)上有所不同,從而選取用于分類的意圖特征,研究如何高效而準(zhǔn)確地提取這些意圖特征,并在測(cè)試數(shù)據(jù)集中驗(yàn)證其效率和準(zhǔn)確性; ●在大量郵件的基礎(chǔ)上,找出圖像形式的垃圾郵件,發(fā)現(xiàn)其特點(diǎn),在垃圾郵件中分析圖像郵件與文本郵件的區(qū)別,分析基于機(jī)器學(xué)習(xí)的反圖像垃圾郵件技術(shù)存在的不足,提出了基于統(tǒng)計(jì)概率、規(guī)則和投票機(jī)制的圖像過(guò)濾選擇方法。 ●構(gòu)造一個(gè)高效的哈希生成算法,對(duì)垃圾郵件正文信息和附件進(jìn)行采樣計(jì)算哈希值,生成指紋文件,再與在線指紋庫(kù)進(jìn)行對(duì)比從而識(shí)別一封郵件是否為垃圾郵件。
[Abstract]:With the rapid development of the Internet, email has become a very popular communication tool, widely used in personal communications and enterprise environment. However, spam brings great security risks to network users, including wasting user time, storage space and valuable network bandwidth. Today, more and more people are focused on spamming, as long as businesses and the Internet have email interfaces, even mainstream social networking sites, such as Tencent, Sina Weibo and Google, are no exception. Because of the explosive growth of spam, there are many anti-spam technologies. This' wrist-breaking 'type of technical confrontation also makes anti-spam means and technology mature year by year. Although spam can be temporarily controlled after the deployment of the new anti-spam technology, manufacturers are also constantly de-filtering to produce newer spam technologies. In view of the above situation, this paper has carried out the following research work: This paper analyzes and studies the design principle, implementation method and present situation of the current anti-spam system, summarizes and summarizes the characteristics of typical anti-spam technology, and understands and grasps the new development trend of spam identification technology. Through the analysis of a large number of spam and normal mail, it is found that the two mail senders have different intention performance, and then select the intention features used for classification, and study how to extract these intention features efficiently and accurately. And verify its efficiency and accuracy in the test data set; On the basis of a large number of mail, we find out the image form of spam, find its characteristics, analyze the difference between image mail and text mail in spam, and analyze the shortcomings of anti-image spam technology based on machine learning. An image filtering selection method based on statistical probability, rule and voting mechanism is proposed. An efficient hash generation algorithm is constructed to sample and calculate the hash value of spam text information and attachments, generate fingerprint files, and then compare with the online fingerprint database to identify whether an email is spam or not.
【學(xué)位授予單位】:廈門大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.098
【參考文獻(xiàn)】
相關(guān)期刊論文 前7條
1 張秋余;孫晶濤;閆曉文;黃文漢;;LSA和MD5算法在垃圾郵件過(guò)濾系統(tǒng)的應(yīng)用研究[J];電子科技大學(xué)學(xué)報(bào);2007年06期
2 李楠萼;盧顯良;;分層垃圾郵件過(guò)濾器的設(shè)計(jì)與實(shí)現(xiàn)[J];計(jì)算機(jī)應(yīng)用;2005年S1期
3 萬(wàn)明成;耿技;程紅蓉;陳佳;;圖像型垃圾郵件過(guò)濾技術(shù)綜述[J];計(jì)算機(jī)應(yīng)用研究;2008年09期
4 張良勝;蔣建中;陳金陽(yáng);郭軍利;李娜;;基于速率控制的反垃圾郵件模型設(shè)計(jì)[J];計(jì)算機(jī)應(yīng)用與軟件;2006年11期
5 王昕溥;姚健康;李曉東;王峰;毛偉;;域密鑰識(shí)別郵件技術(shù)綜述[J];計(jì)算機(jī)應(yīng)用研究;2008年01期
6 尹勇;;垃圾郵件的危害與防范[J];科協(xié)論壇(下半月);2013年01期
7 楊磊;張代遠(yuǎn);;基于DKIM和評(píng)分管理的反郵件系統(tǒng)的設(shè)計(jì)[J];計(jì)算機(jī)技術(shù)與發(fā)展;2013年07期
,本文編號(hào):1866326
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1866326.html
最近更新
教材專著