大學英語四級寫作自動評分中的隱式篇章關(guān)系相關(guān)性的評定
本文關(guān)鍵詞: 大學英語四級寫作 隱式篇章關(guān)系 相關(guān)性 潛伏語義分析 奇異值降解 出處:《湖北工業(yè)大學》2017年碩士論文 論文類型:學位論文
【摘要】:合理的寫作自動評分系統(tǒng)應(yīng)包括語言質(zhì)量評分和內(nèi)容質(zhì)量評分兩個方面。區(qū)別于作文的語言質(zhì)量評分,作文內(nèi)容評分更復雜,需要以語篇為框架分析語塊(單詞、短語、小句)間的有機聯(lián)系。大學英語四級寫作的評分準則是以內(nèi)容為主語言為輔的總體評分準則,即作文內(nèi)容是衡量作文質(zhì)量的主要標尺。而文本內(nèi)容就是隱式篇章關(guān)系,這是本研究的選題依據(jù)之一。自動評分系統(tǒng)的構(gòu)想如下:計算機計算出待評分作文與已評分作文在隱式篇章關(guān)系上的相關(guān)性,再參考已評分作文的評分數(shù)據(jù),給待評分作文自動評分。判斷隱式篇章關(guān)系的相關(guān)性在整個自動評分系統(tǒng)中處于核心地位,也是本研究的論點。研究隱式篇章關(guān)系有兩大模型,分別是傳統(tǒng)的向量空間模型和潛伏語義分析模型。前者視除停用詞外的所有詞項為特征向量,并以這些特征向量表征文本。該方法的弊端在于無法解決一詞多義及多詞同義問題;后者也是從語篇的最小組成成分詞匯出發(fā)來分析隱式篇章關(guān)系,但它輔以語言哲學為視角來探究語言習得乃至知識習得中的相似性及概括性問題,即柏拉圖的困惑:人類如何憑借有限的線索信息習得大量知識?本研究的理論依據(jù)是后者。潛伏語義分析理論認為,文本中的詞匯不是孤立存在的,它們通過某種潛在的語義網(wǎng)絡(luò)緊密相連。但不是所有的詞匯都與該潛在的語義網(wǎng)絡(luò)直接相關(guān),即我們需要提取與該潛在的語義網(wǎng)絡(luò)直接相關(guān)的特征詞匯。特征詞項抽取過程分為兩步:粗略提取特征詞項即文本的預(yù)處理,包括完成大小寫折疊、去除停用詞及詞根歸一化;調(diào)用數(shù)學處理軟件matlab中的奇異值降解功能函數(shù)再次提取特征詞項,具體做法分為以下幾個步驟:首先構(gòu)建一個粗提取的特征詞項x文本矩陣;然后進行奇異值降解,該函數(shù)可將原始矩陣表征為三個小矩陣的乘積;再觀察分解后的三個小矩陣的每列的數(shù)值,依據(jù)具體情況選擇前k列數(shù)值;調(diào)用奇異值降解的反向函數(shù),將三個列數(shù)縮減為k的小矩陣相乘重構(gòu)為一個新矩陣。新矩陣屏蔽了大量噪聲信息,保留了原始矩陣中的重要信息,實現(xiàn)了真正意義上地特征抽取。計算機即是以該方法模擬人類識別相似性和實現(xiàn)概括性。這也是本文的理論核心。本文首先以一個經(jīng)典的精簡案例展示了潛伏語義分析理論在評定隱式篇章關(guān)系相關(guān)性中的重要作用。其次,我們以湖北工業(yè)大學非英語專業(yè)的本科生四級寫作文本作為數(shù)據(jù),進行了深入的分析,得出結(jié)論:隱式篇章關(guān)系的相關(guān)系數(shù)與人工評分的數(shù)據(jù)結(jié)果的確存在一定的聯(lián)系。
[Abstract]:A reasonable automatic writing scoring system should include two aspects: language quality score and content quality score. Different from the language quality score of composition, the content score of composition is more complicated, and the text should be used as the frame to analyze the chunks (words, phrases, phrases). The score criterion of CET-4 writing is the general scoring criterion supplemented by content-oriented language, that is, the composition content is the main measure of composition quality, and the text content is the implicit text relation. This is one of the basis of this study. The conception of automatic scoring system is as follows: the computer calculates the correlation between the graded composition and the graded composition in the implicit text relationship, and then refers to the score data of the graded composition. To judge the relevance of implicit text relation is the core of the whole automatic scoring system, which is also the argument of this study. There are two models to study implicit text relationship. They are the traditional vector space model and the latent semantic analysis model. The disadvantage of this method is that it can not solve the problem of polysemy and multi-word synonym, which is also based on the smallest component vocabulary of the text to analyze the implicit text relationship. But from the perspective of linguistic philosophy, it explores the similarity and generality in language acquisition and knowledge acquisition, that is, Plato's puzzlement: how can human beings acquire a large amount of knowledge with limited clue information? The theoretical basis of this study is the latter. The theory of latent semantic analysis holds that the vocabulary in the text does not exist in isolation. They are closely connected through a potential semantic network, but not all words are directly related to that underlying semantic network. In other words, we need to extract the feature words which are directly related to the potential semantic network. The extraction process of feature items is divided into two steps: rough extraction of feature items, namely, preprocessing of text, including completion of case-and-case folding, removal of stop words and root normalization; The singular value degradation function in the mathematical processing software matlab is used to extract the feature terms again. The specific steps are as follows: firstly, a coarse extracted X text matrix of feature terms is constructed; then singular value degradation is carried out. The function can represent the original matrix as the product of three small matrices, observe the values of each column of the three small matrices after decomposition, select the first k column values according to the specific conditions, call the inverse function of singular value degradation, A new matrix is reconstructed by multiplying three small matrices whose number of columns is reduced to k. The new matrix shields a lot of noise information and retains the important information in the original matrix. The computer is used to simulate the similarity and generality of human recognition. This is also the core of this paper. Firstly, this paper shows the latent language with a classic reduced case. The important role of semantic analysis theory in assessing the relevance of implicit text relations. Secondly, Taking the CET-4 writing text of non-English majors in Hubei University of Technology as the data, we make an in-depth analysis and draw a conclusion that the correlation coefficient of implicit text relationship is really related to the data result of artificial score.
【學位授予單位】:湖北工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:H319.3
【參考文獻】
相關(guān)期刊論文 前10條
1 雷曉東;;英語作文自動評價系統(tǒng)技術(shù)的國內(nèi)研究與應(yīng)用[J];科技視界;2015年35期
2 張雙祥;;大學英語寫作教學中在線寫作自動評價系統(tǒng)應(yīng)用研究[J];當代教育理論與實踐;2014年11期
3 嚴為絨;洪宇;朱珊珊;車婷婷;姚建民;朱巧明;;基于語義場景的隱式篇章關(guān)系檢測方法[J];山東大學學報(理學版);2014年11期
4 曾華人;牛潔珍;陳周云;;英語學習者應(yīng)用自動作文評分系統(tǒng)的個案研究[J];考試與評價(大學英語教研版);2014年04期
5 唐錦蘭;;探究寫作自動評價系統(tǒng)在英語教學中的應(yīng)用模式[J];外語教學理論與實踐;2014年01期
6 劉衛(wèi)忠;余力;;基于鏈語法的英語作文自動評分研究[J];電腦知識與技術(shù);2014年02期
7 張牧宇;宋原;秦兵;劉挺;;中文篇章級句間語義關(guān)系識別[J];中文信息學報;2013年06期
8 江進林;;近五十年來自動評分研究綜述——兼論中國學生英譯漢機器評分系統(tǒng)的新探索[J];現(xiàn)代教育技術(shù);2013年06期
9 何旭良;;句酷批改網(wǎng)英語作文評分的信度和效度研究[J];現(xiàn)代教育技術(shù);2013年05期
10 楊玲;;作文自動評價系統(tǒng)在高水平學生英語寫作學習中的應(yīng)用[J];現(xiàn)代教育技術(shù);2013年05期
相關(guān)博士學位論文 前1條
1 李金輝;使用潛伏語義分析理論研究計算機改中國學生英語作文[D];廣東外語外貿(mào)大學;2009年
,本文編號:1508922
本文鏈接:http://sikaile.net/shoufeilunwen/zaizhiboshi/1508922.html