基于改進(jìn)度量尺度和閾值確定方法的馬田系統(tǒng)及其在郵件過濾中的應(yīng)用
發(fā)布時(shí)間:2018-11-04 09:53
【摘要】:隨著互聯(lián)網(wǎng)的發(fā)展和移動(dòng)終端的普及,電子郵件逐漸成為一種重要的溝通方式。同時(shí),大量的垃圾郵件給用戶和服務(wù)商帶來了諸多挑戰(zhàn),近年來電子郵件過濾逐漸成為了研究的熱點(diǎn)問題。馬田系統(tǒng)是面向多維變量的模式識(shí)別和分類預(yù)測(cè)的方法,該方法對(duì)數(shù)據(jù)分布類型無其他假設(shè),可在約簡特征變量后完成分類預(yù)測(cè)。本文針對(duì)傳統(tǒng)馬田系統(tǒng)在度量尺度和閾值計(jì)算方面的不足提出針對(duì)性的改進(jìn),將改進(jìn)后的馬田系統(tǒng)應(yīng)用于電子郵件過濾研究。主要有以下三個(gè)方面的工作:(1)基于灰色關(guān)聯(lián)度的馬田系統(tǒng)新度量尺度研究。度量尺度的值反映了樣品間的親疏關(guān)系并據(jù)此判定樣品的類別歸屬。馬田系統(tǒng)將馬氏距離用于衡量樣品到基準(zhǔn)空間距離貼近度,該統(tǒng)計(jì)量考慮了變量間相關(guān)性而忽視了樣品與總體在空間范圍內(nèi)序列曲線的相似性。灰色關(guān)聯(lián)模型是一種新的計(jì)算序列曲線形狀相似性的方法,具有良好的通用性。為全面衡量樣品間的近似度,本文通過線性加權(quán)方式將灰色關(guān)聯(lián)度和馬氏距離相結(jié)合,構(gòu)建新的樣本度量尺度,提高馬田系統(tǒng)的準(zhǔn)確率。(2)基于受試者工作特征曲線的馬田系統(tǒng)閾值確定方法研究。馬田系統(tǒng)的閾值計(jì)算方法一直備受關(guān)注,已有的眾多方法均存在不同程度的局限性,難以有效地推廣。受試者工作特征曲線是專門用于診斷效果分析和計(jì)算系統(tǒng)閾值的方法,主要應(yīng)用于醫(yī)學(xué)診斷領(lǐng)域,本文將受試者工作特征曲線用于馬田系統(tǒng)研究,使馬田系統(tǒng)閾值更加客觀和精確。(3)基于改進(jìn)馬田系統(tǒng)的電子郵件過濾研究。將改進(jìn)后的馬田系統(tǒng)應(yīng)用于電子郵件過濾研究,通過最終對(duì)比結(jié)果可以發(fā)現(xiàn):相較于傳統(tǒng)馬田系統(tǒng),改進(jìn)后的馬田系統(tǒng)在準(zhǔn)確率、誤報(bào)率和檢出率等方面均有顯著的提高,可見改進(jìn)的方法是有效可行的;與其他常用的電子郵件過濾方法相比較,改進(jìn)后馬田系統(tǒng)準(zhǔn)確率較高,同時(shí)特征變量的篩選可以大幅節(jié)約成本,提高郵件過濾的效率。
[Abstract]:With the development of the Internet and the popularity of mobile terminals, email has gradually become an important way of communication. At the same time, a large number of spam has brought many challenges to users and service providers. In recent years, email filtering has gradually become a hot issue. Matian system is a multi-dimensional variable oriented pattern recognition and classification prediction method. This method has no other assumptions about the data distribution type and can be used to achieve classification prediction after reducing the feature variables. Aiming at the shortcomings of the traditional martian system in the measurement and threshold calculation, this paper puts forward some improvements, and applies the improved Martian system to the research of email filtering. The main works are as follows: (1) the research of new metric of Matton system based on grey correlation degree. The value of measurement scale reflects the affinity between the samples and determines the classification of the samples. The Martian distance is used to measure the closeness of spatial distance from sample to datum. This statistic takes into account the correlation between variables and neglects the similarity between the sample and the sequence curve of the population in spatial range. Grey correlation model is a new method to calculate the shape similarity of sequence curves, and it has good generality. In order to measure the approximate degree of samples comprehensively, a new sample metric is constructed by combining grey correlation degree with Markov distance by linear weighting. (2) based on the operating characteristic curve of subjects, the threshold of the system is determined. The threshold calculation method of Matian system has been paid much attention to, and many of the existing methods are limited to some extent, so it is difficult to popularize them effectively. The operating characteristic curve of subjects is a special method for analyzing the diagnostic effect and calculating the threshold of the system. It is mainly used in the field of medical diagnosis. In this paper, the operating characteristic curve of the subject is applied to the study of the Martian system. It makes the threshold more objective and accurate. (3) the research of email filtering based on improved Martian system. The improved Martian system is applied to the research of email filtering. The results show that compared with the traditional Martian system, the improved system has a significant increase in accuracy, false positive rate and detection rate. It can be seen that the improved method is effective and feasible. Compared with other commonly used email filtering methods, the improved Martian system has a higher accuracy rate, and the filtering of feature variables can greatly reduce the cost and improve the efficiency of mail filtering.
【學(xué)位授予單位】:南京理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類號(hào)】:TP393.098;N941.5
本文編號(hào):2309476
[Abstract]:With the development of the Internet and the popularity of mobile terminals, email has gradually become an important way of communication. At the same time, a large number of spam has brought many challenges to users and service providers. In recent years, email filtering has gradually become a hot issue. Matian system is a multi-dimensional variable oriented pattern recognition and classification prediction method. This method has no other assumptions about the data distribution type and can be used to achieve classification prediction after reducing the feature variables. Aiming at the shortcomings of the traditional martian system in the measurement and threshold calculation, this paper puts forward some improvements, and applies the improved Martian system to the research of email filtering. The main works are as follows: (1) the research of new metric of Matton system based on grey correlation degree. The value of measurement scale reflects the affinity between the samples and determines the classification of the samples. The Martian distance is used to measure the closeness of spatial distance from sample to datum. This statistic takes into account the correlation between variables and neglects the similarity between the sample and the sequence curve of the population in spatial range. Grey correlation model is a new method to calculate the shape similarity of sequence curves, and it has good generality. In order to measure the approximate degree of samples comprehensively, a new sample metric is constructed by combining grey correlation degree with Markov distance by linear weighting. (2) based on the operating characteristic curve of subjects, the threshold of the system is determined. The threshold calculation method of Matian system has been paid much attention to, and many of the existing methods are limited to some extent, so it is difficult to popularize them effectively. The operating characteristic curve of subjects is a special method for analyzing the diagnostic effect and calculating the threshold of the system. It is mainly used in the field of medical diagnosis. In this paper, the operating characteristic curve of the subject is applied to the study of the Martian system. It makes the threshold more objective and accurate. (3) the research of email filtering based on improved Martian system. The improved Martian system is applied to the research of email filtering. The results show that compared with the traditional Martian system, the improved system has a significant increase in accuracy, false positive rate and detection rate. It can be seen that the improved method is effective and feasible. Compared with other commonly used email filtering methods, the improved Martian system has a higher accuracy rate, and the filtering of feature variables can greatly reduce the cost and improve the efficiency of mail filtering.
【學(xué)位授予單位】:南京理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類號(hào)】:TP393.098;N941.5
【參考文獻(xiàn)】
相關(guān)博士學(xué)位論文 前1條
1 陳湘來;關(guān)于馬田系統(tǒng)若干問題的研究[D];南京理工大學(xué);2008年
,本文編號(hào):2309476
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2309476.html
最近更新
教材專著