基于WordNet概念向量空間模型的電子郵件分類方法的研究與實(shí)現(xiàn)
[Abstract]:With the development of computer technology and information technology, especially the popularity of the Internet, email has become a popular means of communication because of its quick and economical characteristics. Because of this, e-mail often reflects the current hot social issues and public opinion focus. However, the more and more frequent use of email, spam, advertising, mass messaging and other flooding, users spend more time on the processing of mail, but also affect the collation and access to information. If email can be classified, people can get the contents of their concern accurately, comprehensively and quickly, and greatly improve their work efficiency, thus reducing the loss of human, financial, material and other aspects. Therefore, email classification has attracted the interest of many scholars. The existing email classification techniques can be classified into three methods: statistical based, linked-based and rule-based. The commonly used statistical methods include Naive Bayes,KNN, class center vector, regression model, support vector machine, maximum entropy model and so on. The commonly used method based on link is artificial neural network. The commonly used rule-based methods are decision tree, association rules and so on. There is a common problem with these classification methods: they do not consider the semantic relationship between words and words in email texts, but the words used in real mail texts are often related, such as synonyms, etc. The relationship between the upper and lower synonyms and so on, without considering the semantic relationship between words and words in the email text, often leads to the high dimension of vector space, and the result is that the classification performance and classification accuracy will be reduced because of the high dimensionality. In order to solve the above problems, a feature extraction method is proposed in this paper, which is based on WordNet ontology library, using synonym set instead of entries, and considering the relationship between the upper and lower synonyms. The concept space vector model of mail text is established as the feature vector of mail text, which makes it possible to extract high-level information which can be used as category feature in the process of training. This paper also designs a method of determining the threshold (percentage threshold), which can satisfy different recall and precision by adjusting the threshold. Finally, the proposed method is implemented, and the validity of the email classification method based on WordNet concept vector space model is proved by experiments. In this paper, the email classification method based on WordNet concept vector space model is improved, and the classification performance and efficiency are improved. These results make it possible to obtain useful information quickly and accurately, thus greatly improving people's working efficiency.
【學(xué)位授予單位】:華東師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2008
【分類號(hào)】:TP393.098
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 章成敏,章成志;一種基于知識(shí)庫(kù)的電子郵件自動(dòng)分類系統(tǒng)[J];淮海工學(xué)院學(xué)報(bào)(自然科學(xué)版);2004年02期
2 朱斌,熊應(yīng),朱海云;人工智能在電子郵件分類中的應(yīng)用研究[J];華南理工大學(xué)學(xué)報(bào)(自然科學(xué)版);2001年12期
3 徐海濤,楊森,柴喬林;基于統(tǒng)計(jì)分詞的中文郵件智能分類系統(tǒng)[J];華中科技大學(xué)學(xué)報(bào)(自然科學(xué)版);2003年S1期
4 林鴻飛,戰(zhàn)學(xué)剛,姚天順;基于概念的文本結(jié)構(gòu)分析方法[J];計(jì)算機(jī)研究與發(fā)展;2000年03期
5 邱科寧,郭清順,張小波;基于Agent的個(gè)性化分類郵件系統(tǒng)研究[J];計(jì)算機(jī)工程與應(yīng)用;2005年07期
6 王小偉;王黎明;;基于動(dòng)態(tài)人工免疫的郵件分類算法研究[J];計(jì)算機(jī)應(yīng)用;2006年10期
7 宗平,田震生;基于樸素貝葉斯分類器郵件分類系統(tǒng)的改進(jìn)[J];計(jì)算機(jī)與現(xiàn)代化;2004年12期
8 張學(xué)工;關(guān)于統(tǒng)計(jì)學(xué)習(xí)理論與支持向量機(jī)[J];自動(dòng)化學(xué)報(bào);2000年01期
9 葉浩,王明文,曾雪強(qiáng);基于潛在語(yǔ)義的多類文本分類模型研究[J];清華大學(xué)學(xué)報(bào)(自然科學(xué)版);2005年S1期
10 余剛,陳華月,朱征宇,高原;基于詞同現(xiàn)頻率的文本特征描述[J];計(jì)算機(jī)工程與設(shè)計(jì);2005年08期
,本文編號(hào):2428089
本文鏈接:http://sikaile.net/wenyilunwen/guanggaoshejilunwen/2428089.html