基于文檔發(fā)散度的作文跑題檢測(cè)
發(fā)布時(shí)間:2018-03-01 07:50
本文關(guān)鍵詞: 跑題檢測(cè) 文檔發(fā)散度 文本相似度 出處:《中文信息學(xué)報(bào)》2017年01期 論文類(lèi)型:期刊論文
【摘要】:作文跑題檢測(cè)是作文自動(dòng)評(píng)分系統(tǒng)的重要模塊。傳統(tǒng)的作文跑題檢測(cè)一般計(jì)算文章內(nèi)容相關(guān)性作為得分,并將其與某一固定閾值進(jìn)行對(duì)比,從而判斷文章是否跑題。但是實(shí)際上文章得分高低與題目有直接關(guān)系,發(fā)散性題目和非發(fā)散性題目的文章得分有明顯差異,所以很難用一個(gè)固定閾值來(lái)判斷所有文章。該文提出一種作文跑題檢測(cè)方法,基于文檔發(fā)散度的作文跑題檢測(cè)方法。該方法的創(chuàng)新之處在于研究文章集合發(fā)散度的概念,建立發(fā)散度與跑題閾值的關(guān)系模型,對(duì)于不同的題目動(dòng)態(tài)選取不同的跑題閾值。該文構(gòu)建了一套跑題檢測(cè)系統(tǒng),并在一個(gè)真實(shí)的數(shù)據(jù)集中進(jìn)行測(cè)試。實(shí)驗(yàn)結(jié)果表明基于文檔發(fā)散度的作文跑題檢測(cè)系統(tǒng)能有效識(shí)別跑題作文。
[Abstract]:Composition detection is an important module of automatic scoring system. The composition of the composition detection of traditional calculation content as the correlation score, and compares it with a fixed threshold, so as to judge whether the article subject. But there is a direct relationship between the score and title actually, have obvious differences of divergent topics and non divergent topics the score, it is difficult to use a fixed threshold to determine all of the article. This paper proposes a detection method of composition, composition detection method based on document divergence. The innovation of this method is the concept of collection of divergence, divergence and relationship model is established for dynamic threshold point, topic but the different selection of different threshold. This paper constructs a set point detection system, and in a real data set for testing. The experimental results table Composition detection system in BenQ document divergence can effectively identify the topic composition.
【作者單位】: 蘇州大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)學(xué)院;軟件新技術(shù)與產(chǎn)業(yè)化協(xié)同創(chuàng)新中心;
【基金】:國(guó)家自然科學(xué)基金(61572338)
【分類(lèi)號(hào)】:H15;TP391.1
,
本文編號(hào):1551059
本文鏈接:http://sikaile.net/wenyilunwen/yuyanyishu/1551059.html
最近更新
教材專(zhuān)著