基于規(guī)則和SVM的教育資源分類技術(shù)研究
[Abstract]:With the rapid development of network technology and the rapid growth of all kinds of online information resources, a large number of educational resources are also emerging in the network. Network education resources are becoming more and more important for students, educational scientists and parents to obtain important information. But existing search engines tend to have a lot of irrelevant or useless content when searching for information. Therefore, how to quickly and effectively obtain useful information and classify educational resources from a large number of information resources is the focus of this paper. The automatic text classification is one of the key technologies to realize the automatic text classification of network education resources. The main contents of this paper are as follows: 1. This paper analyzes the current situation of network education resources and the behavior and needs of network subjects, and constructs the classification system of basic education resources. 2. In view of the existence of a large number of feature selection algorithms, in order to be able to decide which algorithm to use in a specific situation, we need to put forward criteria that can be relied upon or judged. In this paper, we review some basic feature selection algorithms in relevant literature, and compare the feature selection methods and algorithms, and then propose a criterion that can be relied on or judged. 3. There are subordination and parallel relationships among educational resources. According to these relationships, this paper constructs them into a hierarchical structure and discusses the influence of the main structural features of HTML format web pages (i.e. title,Anchor Text,meta) on the classification of web pages. A rule-based classification method is proposed. The experimental results show that the title and anchor text have a positive effect on the classification of web pages. 4. To construct a classifier for educational resources, this paper first introduces the basic theoretical knowledge of SVM. Based on the traditional SVM algorithm, this paper aims at the sensitivity of outlier to classification results in nonlinear separable text problems. An improved multi-class SVM algorithm is proposed. The experimental results of Weighted Multi-Class SVM), show that the algorithm is more effective than the multi-class SVM algorithm. Aiming at the problems of high precision and low recall of rule-based classification algorithm, low precision rate and high recall rate of improved SVM algorithm, this paper proposes a method to combine the two methods. The experimental results show that the classification effect and efficiency of the system can be improved.
【學位授予單位】:新疆大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP391.1
【參考文獻】
相關(guān)期刊論文 前10條
1 譚金波;;面向網(wǎng)絡(luò)教育資源的文本自動分類系統(tǒng)的設(shè)計與實現(xiàn)[J];中國遠程教育;2009年04期
2 楊學兵,蔡慶生;一種基于概念層次的分類規(guī)則挖掘算法研究[J];華中科技大學學報;2001年09期
3 段宏,張桂清,譚運猛;一種基于Web挖掘的信息自動分類系統(tǒng)[J];華中科技大學學報(自然科學版);2003年07期
4 冀俊忠,劉椿年,沙志強;貝葉斯網(wǎng)模型的學習、推理和應(yīng)用[J];計算機工程與應(yīng)用;2003年05期
5 王君澤;黃本雄;胡廣;溫杰;;社區(qū)問答服務(wù)中的問題分類任務(wù)研究[J];計算機工程與科學;2011年01期
6 蔣剛毅;張云;郁梅;;基于相關(guān)性分析的多模式多視點視頻編碼[J];計算機學報;2007年12期
7 胡于進,周小玲,凌玲,王學林;基于向量空間模型的貝葉斯文本分類方法[J];計算機與數(shù)字工程;2004年06期
8 代六玲,黃河燕,陳肇雄;中文文本分類中特征抽取方法的比較研究[J];中文信息學報;2004年01期
9 張學工;關(guān)于統(tǒng)計學習理論與支持向量機[J];自動化學報;2000年01期
10 姚旭;王曉丹;張玉璽;權(quán)文;;基于近似Markov Blanket和動態(tài)互信息的特征選擇算法[J];計算機科學;2012年08期
相關(guān)博士學位論文 前1條
1 張雪英;基于粗糙集理論的文本自動分類研究[D];南京理工大學;2005年
相關(guān)碩士學位論文 前4條
1 黃峰;基礎(chǔ)教育搜索引擎中的網(wǎng)頁文檔特征提取研究[D];南京師范大學;2006年
2 蘇勁松;全宋詞語料庫建設(shè)及其風格與情感分析的計算方法研究[D];廈門大學;2007年
3 徐亮;中文新詞識別研究[D];大連理工大學;2009年
4 譚俊武;面向網(wǎng)絡(luò)輿情分析的文本傾向性分類技術(shù)的研究與實現(xiàn)[D];國防科學技術(shù)大學;2009年
本文編號:2262357
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2262357.html