基于SVM的初等數(shù)學(xué)問題自動(dòng)分類的研究與應(yīng)用

發(fā)布時(shí)間：2018-10-29 21:03

【摘要】：眾所周知,隨著計(jì)算機(jī)信息技術(shù)的迅速發(fā)展,信息技術(shù)已經(jīng)在我們生活的方方面面有所應(yīng)用。在教育領(lǐng)域,人們的目光也逐漸從線下老師輔導(dǎo)以及手工批閱試卷,解題轉(zhuǎn)向了基于人工智能的互聯(lián)網(wǎng)智能教育。這種新型的數(shù)學(xué)教育理念實(shí)現(xiàn)的一個(gè)重要前提就是對(duì)文本自然語言進(jìn)行轉(zhuǎn)換,通俗的講就是把人類理解的數(shù)學(xué)語句轉(zhuǎn)化為事先定義好的計(jì)算機(jī)存儲(chǔ)知識(shí),來供計(jì)算機(jī)進(jìn)行下一步的處理。這些處理主要有解題,以及全流程判卷等等。這一前提也可以稱為自然語言處理過程。而分類問題又是自然語言處理的過程中的主要問題。本文主要分為兩個(gè)部分,第一部分是對(duì)初等數(shù)學(xué)問題文本進(jìn)行分詞,以及詞性標(biāo)注和命名實(shí)體識(shí)別。第二部分是基于SVM對(duì)初等數(shù)學(xué)問題文本進(jìn)行題型分類,進(jìn)而根據(jù)不同的類別轉(zhuǎn)換成計(jì)算機(jī)推理所需的表現(xiàn)形式。在英文表達(dá)中,每個(gè)單詞之間都是有空格的,但是中文則不同,所有的字符都是連接在一起的,所以需要對(duì)中文的文本進(jìn)行分詞。但是數(shù)學(xué)表述中包含了較多的有特定含義的符號(hào),所以通用的分詞方法行不通。因此需要針對(duì)數(shù)學(xué)表述構(gòu)造專門的分詞器。同樣,數(shù)學(xué)語言表達(dá)中的實(shí)體和普通語言表達(dá)的實(shí)體不同,普通語言的實(shí)體更多是時(shí)間,地點(diǎn),姓名等。而在數(shù)學(xué)表達(dá)中,包含重要信息的實(shí)體往往是數(shù)學(xué)名詞,比如三角形,以及各方程等等。因此需要針對(duì)初等數(shù)學(xué)方向定義專門的命名實(shí)體,然后進(jìn)行提取。本論文采用條件隨機(jī)場(chǎng)來進(jìn)行命名實(shí)體標(biāo)注。初等數(shù)學(xué)問題中涉及的類型有很多,想要對(duì)初等數(shù)學(xué)問題進(jìn)行自動(dòng)求解,首先要做的是把問題進(jìn)行分類,然后根據(jù)不同的類別調(diào)用相應(yīng)的求解方法。對(duì)經(jīng)過命名實(shí)體模型標(biāo)注的初等數(shù)學(xué)問題文本進(jìn)行文本預(yù)處理,包括去停用詞,建立詞袋模型。在本論文中,通過卡方統(tǒng)計(jì)量來實(shí)現(xiàn)文本特征向量的選擇。這樣使用特征向量通過選擇降維技術(shù)能很好的減少計(jì)算量,還能維持分類的精度。最后,根據(jù)本文提出的方法,使用支持向量機(jī)實(shí)現(xiàn)了一套對(duì)初等數(shù)學(xué)問題進(jìn)行命名實(shí)體提取并且對(duì)題目進(jìn)行分類的系統(tǒng)。該系統(tǒng)可以準(zhǔn)確標(biāo)注命名實(shí)體,為后面解題等處理提供知識(shí)表示,同時(shí)有效的題目分類可以為后面解題或者判卷做推理剪枝。
[Abstract]:As we all know, with the rapid development of computer information technology, information technology has been applied in every aspect of our life. In the field of education, people's eyes have gradually shifted from offline tutoring and manual marking of examination papers to intelligent Internet education based on artificial intelligence. One of the important prerequisites for the realization of this new concept of mathematical education is to transform the text into natural language. In popular terms, it is to convert mathematical statements understood by human beings into pre-defined computer storage knowledge. To allow the computer to handle the next step. These processing mainly have the solution, as well as the whole flow judgment paper and so on. This premise can also be called natural language processing process. Classification is the main problem in the process of natural language processing. This paper is mainly divided into two parts. The first part is the participle of elementary mathematical problem text, as well as part of speech tagging and named entity recognition. In the second part, the paper classifies the text of elementary mathematics problem based on SVM, and then transforms it into the representation of computer reasoning according to different categories. In English, there is a space between each word, but Chinese is different, all characters are connected together, so the Chinese text should be partitioned. However, mathematical expressions contain more symbols with specific meanings, so the general participle method is not feasible. Therefore, it is necessary to construct a special participle for mathematical expression. Similarly, the entities expressed in mathematical language are different from those expressed in common language. The entities of common language are more time, place, name and so on. In mathematical expressions, the entities that contain important information are often mathematical nouns, such as triangles, equations and so on. Therefore, it is necessary to define a specific named entity for the primary mathematical direction and then extract it. In this paper, conditional random fields are used to label named entities. There are many types involved in elementary mathematics problems. In order to solve elementary mathematical problems automatically, the first thing to do is to classify the problems and then call the corresponding solving methods according to different categories. The text preprocessing of primary mathematical problem text tagged by named entity model includes deactivating words and establishing word bag model. In this paper, chi-square statistics are used to select text feature vectors. In this way, the feature vector can reduce the computational cost and maintain the classification accuracy by selecting dimensionality reduction. Finally, according to the method proposed in this paper, the support vector machine (SVM) is used to implement a system for extracting named entities from elementary mathematical problems and classifying them. The system can accurately label named entities and provide knowledge representation for later problem solving and so on. At the same time, effective topic classification can be used as inference pruning for later problem solving or marking.
【學(xué)位授予單位】：電子科技大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP391.1;O12

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 奚雪峰;周國棟;;面向自然語言處理的深度學(xué)習(xí)研究[J];自動(dòng)化學(xué)報(bào);2016年10期

2 何苑;郝夢(mèng)巖;;基于自然語言處理的計(jì)算機(jī)專業(yè)數(shù)學(xué)課程教學(xué)研究[J];長治學(xué)院學(xué)報(bào);2016年02期

3 邱均平;方國平;;基于知識(shí)圖譜的中外自然語言處理研究的對(duì)比分析[J];現(xiàn)代圖書情報(bào)技術(shù);2014年12期

4 李海艦;田躍新;李文杰;;互聯(lián)網(wǎng)思維與傳統(tǒng)企業(yè)再造[J];中國工業(yè)經(jīng)濟(jì);2014年10期

5 王宇;邵洪雨;;基于主題詞提取的國內(nèi)自然語言處理研究現(xiàn)狀分析[J];情報(bào)科學(xué);2013年03期

6 唐釗;;條件隨機(jī)場(chǎng)模型在中文人名識(shí)別中的研究與實(shí)現(xiàn)[J];現(xiàn)代計(jì)算機(jī)(專業(yè)版);2012年21期

7 楊皓東;江凌;李國俊;;國內(nèi)自然語言處理研究熱點(diǎn)分析——基于共詞分析[J];圖書情報(bào)工作;2011年10期

8 付年鈞;彭昌水;王慰;;中文分詞技術(shù)及其實(shí)現(xiàn)[J];軟件導(dǎo)刊;2011年01期

9 周穎;袁鶯;馬玉慧;任峗;;小學(xué)數(shù)學(xué)應(yīng)用題自動(dòng)解答特征分析及研究路線[J];中國電化教育;2010年08期

10 李國臣;王瑞波;李濟(jì)洪;;基于條件隨機(jī)場(chǎng)模型的漢語功能塊自動(dòng)標(biāo)注[J];計(jì)算機(jī)研究與發(fā)展;2010年02期

相關(guān)博士學(xué)位論文前2條

1 計(jì)峰;自然語言處理中序列標(biāo)注模型的研究[D];復(fù)旦大學(xué);2012年

2 魯松;自然語言處理中詞相關(guān)性知識(shí)無導(dǎo)獲取和均衡分類器構(gòu)建[D];中國科學(xué)院研究生院（計(jì)算技術(shù)研究所）;2001年

相關(guān)碩士學(xué)位論文前8條

1 張磊磊;基于Hadoop和SVM算法的中文文本分類的研究與實(shí)現(xiàn)[D];昆明理工大學(xué);2015年

2 王綱;一種改進(jìn)隱條件隨機(jī)場(chǎng)模型的行為識(shí)別方法[D];西安電子科技大學(xué);2014年

3 王鵬;基于Lucene的中文分詞技術(shù)研究與實(shí)現(xiàn)[D];浙江工商大學(xué);2014年

4 張碩果;基于條件隨機(jī)場(chǎng)模型的文本分類研究[D];重慶大學(xué);2010年

5 毛玉才;基于語義網(wǎng)技術(shù)的語義檢索系統(tǒng)模型研究[D];黑龍江大學(xué);2008年

6 王秋;淺析自然語言理解及其應(yīng)用[D];陜西師范大學(xué);2008年

7 王宇寧;隱馬爾可夫模型在信息抽取中的應(yīng)用研究[D];大連理工大學(xué);2007年

8 趙俊霞;中學(xué)數(shù)學(xué)教師專業(yè)知識(shí)的發(fā)展[D];東北師范大學(xué);2006年

，

本文編號(hào)：2298820

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/yysx/2298820.html

上一篇：國外民族數(shù)學(xué)研究述評(píng)及啟示
下一篇：Burgers方程的有限元后驗(yàn)誤差估計(jì)及其應(yīng)用

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于SVM的初等數(shù)學(xué)問題自動(dòng)分類的研究與應(yīng)用