基于規(guī)則和SVM的教育資源分類技術(shù)研究

發(fā)布時(shí)間：2018-10-10 15:48

【摘要】：隨著網(wǎng)絡(luò)技術(shù)的迅猛發(fā)展，各類網(wǎng)上信息資源的急速增長，大量的教育資源也涌現(xiàn)在網(wǎng)絡(luò)中。網(wǎng)絡(luò)教育資源越來越成為學(xué)生，教育科學(xué)工作者和家長獲取重要的信息重要來源。但現(xiàn)有的搜索引擎在搜索信息時(shí)往往會(huì)大量的不相關(guān)或根本沒用的內(nèi)容，因此如何快速有效地獲得有用的資源信息和從大量的信息資源中對教育資源進(jìn)行分類是本文研究的重點(diǎn)，而文本自動(dòng)分類技術(shù)是實(shí)現(xiàn)網(wǎng)絡(luò)教育資源文本自動(dòng)分類的關(guān)鍵技術(shù)之一。本文的主要研究內(nèi)容如下： 1.對現(xiàn)有的網(wǎng)絡(luò)教育資源現(xiàn)狀進(jìn)行分析及網(wǎng)絡(luò)主體行為和需求進(jìn)行分析，構(gòu)建基礎(chǔ)教育資源的分類體系。 2.針對目前存在大量的特征選擇算法，為了能夠適當(dāng)?shù)貨Q定在特定的情況下使用有哪種算法，需要提出可以依賴或判定的標(biāo)準(zhǔn)。本文綜述相關(guān)文獻(xiàn)里的一些基本特征選擇算法，通過對特征選擇方法和算法進(jìn)行實(shí)證比較，然后提出一種可以依賴或判定的標(biāo)準(zhǔn)。 3..教育資源之間存在著隸屬關(guān)系和并列關(guān)系，本文根據(jù)這些關(guān)系將其構(gòu)建為層次結(jié)構(gòu)，探討了HTML格式網(wǎng)頁的主要結(jié)構(gòu)特征(即title、Anchor Text、meta)對網(wǎng)頁分類的影響，并提出了基于規(guī)則的分類方法，實(shí)驗(yàn)結(jié)果表明標(biāo)題和錨文本等對網(wǎng)頁分類有正面影響。 4.構(gòu)建教育資源的分類器，本文首先介紹了SVM的基本理論知識(shí)，在傳統(tǒng)SVM算法的基礎(chǔ)上，針對非線性可分文本問題中outlier對分類結(jié)果的敏感性，提出了一種改進(jìn)的多類SVM算法（Weighted Multi-Class SVM），實(shí)驗(yàn)結(jié)果表明該算法比多類SVM算法分類效果更好。 5.針對基于規(guī)則的分類算法查準(zhǔn)率高，查全率低；改進(jìn)的SVM算法查準(zhǔn)率低，召回率高的問題，，本文提出了將這兩種方法結(jié)合的方法，實(shí)驗(yàn)結(jié)果表明系統(tǒng)的分類效果和效率都得以提高。
[Abstract]:With the rapid development of network technology and the rapid growth of all kinds of online information resources, a large number of educational resources are also emerging in the network. Network education resources are becoming more and more important for students, educational scientists and parents to obtain important information. But existing search engines tend to have a lot of irrelevant or useless content when searching for information. Therefore, how to quickly and effectively obtain useful information and classify educational resources from a large number of information resources is the focus of this paper. The automatic text classification is one of the key technologies to realize the automatic text classification of network education resources. The main contents of this paper are as follows: 1. This paper analyzes the current situation of network education resources and the behavior and needs of network subjects, and constructs the classification system of basic education resources. 2. In view of the existence of a large number of feature selection algorithms, in order to be able to decide which algorithm to use in a specific situation, we need to put forward criteria that can be relied upon or judged. In this paper, we review some basic feature selection algorithms in relevant literature, and compare the feature selection methods and algorithms, and then propose a criterion that can be relied on or judged. 3. There are subordination and parallel relationships among educational resources. According to these relationships, this paper constructs them into a hierarchical structure and discusses the influence of the main structural features of HTML format web pages (i.e. title,Anchor Text,meta) on the classification of web pages. A rule-based classification method is proposed. The experimental results show that the title and anchor text have a positive effect on the classification of web pages. 4. To construct a classifier for educational resources, this paper first introduces the basic theoretical knowledge of SVM. Based on the traditional SVM algorithm, this paper aims at the sensitivity of outlier to classification results in nonlinear separable text problems. An improved multi-class SVM algorithm is proposed. The experimental results of Weighted Multi-Class SVM), show that the algorithm is more effective than the multi-class SVM algorithm. Aiming at the problems of high precision and low recall of rule-based classification algorithm, low precision rate and high recall rate of improved SVM algorithm, this paper proposes a method to combine the two methods. The experimental results show that the classification effect and efficiency of the system can be improved.
【學(xué)位授予單位】：新疆大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2013
【分類號(hào)】：TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 譚金波;;面向網(wǎng)絡(luò)教育資源的文本自動(dòng)分類系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];中國遠(yuǎn)程教育;2009年04期

2 楊學(xué)兵,蔡慶生;一種基于概念層次的分類規(guī)則挖掘算法研究[J];華中科技大學(xué)學(xué)報(bào);2001年09期

3 段宏,張桂清,譚運(yùn)猛;一種基于Web挖掘的信息自動(dòng)分類系統(tǒng)[J];華中科技大學(xué)學(xué)報(bào)(自然科學(xué)版);2003年07期

4 冀俊忠,劉椿年,沙志強(qiáng);貝葉斯網(wǎng)模型的學(xué)習(xí)、推理和應(yīng)用[J];計(jì)算機(jī)工程與應(yīng)用;2003年05期

5 王君澤;黃本雄;胡廣;溫杰;;社區(qū)問答服務(wù)中的問題分類任務(wù)研究[J];計(jì)算機(jī)工程與科學(xué);2011年01期

6 蔣剛毅;張?jiān)?郁梅;;基于相關(guān)性分析的多模式多視點(diǎn)視頻編碼[J];計(jì)算機(jī)學(xué)報(bào);2007年12期

7 胡于進(jìn),周小玲,凌玲,王學(xué)林;基于向量空間模型的貝葉斯文本分類方法[J];計(jì)算機(jī)與數(shù)字工程;2004年06期

8 代六玲,黃河燕,陳肇雄;中文文本分類中特征抽取方法的比較研究[J];中文信息學(xué)報(bào);2004年01期

9 張學(xué)工;關(guān)于統(tǒng)計(jì)學(xué)習(xí)理論與支持向量機(jī)[J];自動(dòng)化學(xué)報(bào);2000年01期

10 姚旭;王曉丹;張玉璽;權(quán)文;;基于近似Markov Blanket和動(dòng)態(tài)互信息的特征選擇算法[J];計(jì)算機(jī)科學(xué);2012年08期

相關(guān)博士學(xué)位論文前1條

1 張雪英;基于粗糙集理論的文本自動(dòng)分類研究[D];南京理工大學(xué);2005年

相關(guān)碩士學(xué)位論文前4條

1 黃峰;基礎(chǔ)教育搜索引擎中的網(wǎng)頁文檔特征提取研究[D];南京師范大學(xué);2006年

2 蘇勁松;全宋詞語料庫建設(shè)及其風(fēng)格與情感分析的計(jì)算方法研究[D];廈門大學(xué);2007年

3 徐亮;中文新詞識(shí)別研究[D];大連理工大學(xué);2009年

4 譚俊武;面向網(wǎng)絡(luò)輿情分析的文本傾向性分類技術(shù)的研究與實(shí)現(xiàn)[D];國防科學(xué)技術(shù)大學(xué);2009年

本文編號(hào)：2262357

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2262357.html

上一篇：一種基于分形特征的圖片分類算法
下一篇：一個(gè)實(shí)時(shí)搜索引擎的設(shè)計(jì)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于規(guī)則和SVM的教育資源分類技術(shù)研究