天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

基于多策略的學(xué)術(shù)論文術(shù)語抽取方法研究

發(fā)布時(shí)間:2018-07-08 12:23

  本文選題:多策略 + 術(shù)語抽取 ; 參考:《華中科技大學(xué)》2016年碩士論文


【摘要】:如何快速又準(zhǔn)確地抽取術(shù)語是自然語言處理中一項(xiàng)重要課題。面向?qū)W術(shù)論文領(lǐng)域的術(shù)語抽取研究能夠有效地推動(dòng)科學(xué)的發(fā)展與成果的推廣。學(xué)術(shù)論文中,術(shù)語在不同的位置,如標(biāo)題、關(guān)鍵字、摘要等文本塊,具有不同的分布特征。傳統(tǒng)的術(shù)語抽取方法忽略了術(shù)語分布的位置信息,因此,急需一種能夠綜合考慮術(shù)語位置信息的方法來彌補(bǔ)現(xiàn)有方法的不足。提出了一種面向?qū)W術(shù)論文的基于多策略的術(shù)語抽取方法TEM,該方法首先根據(jù)標(biāo)題、摘要和關(guān)鍵詞的不同特征,分別采用基于邊界標(biāo)記集、基于中文術(shù)語構(gòu)詞規(guī)則和基于關(guān)鍵詞的候選術(shù)語抽取策略;接著分析了候選術(shù)語抽取的結(jié)果及錯(cuò)誤類型,引入術(shù)語反例規(guī)則字典改進(jìn)抽取結(jié)果;再結(jié)合K-近頻子串歸并算法對(duì)候選術(shù)語進(jìn)行篩選過濾;最后利用術(shù)語的位置信息,構(gòu)建了綜合評(píng)分模型,采用層次分析法決策標(biāo)題、摘要和關(guān)鍵詞三個(gè)維度的權(quán)重值,根據(jù)最終的評(píng)分排序得到正確術(shù)語。此外,針對(duì)單詞型術(shù)語,在TF-IDF算法的基礎(chǔ)上引入了類別頻率CF,提高了篩選的效果。在實(shí)驗(yàn)階段,測(cè)試了K值變化對(duì)子串歸并的影響,對(duì)比了引入CF和位置信息后術(shù)語抽取結(jié)果的變化。結(jié)果表明,相比于傳統(tǒng)方法,TF-IDF-CF方法的準(zhǔn)確率和召回率分別提升了5.73%和8.43%;TEM-SW方法的準(zhǔn)確率和召回率分別提升了7.85%和11.54%,TEM-MW方法的準(zhǔn)確率和召回率分別提升了11.62%和9.71%;更好地實(shí)現(xiàn)了學(xué)術(shù)論文術(shù)語的抽取。
[Abstract]:How to extract terms quickly and accurately is an important task in natural language processing. Term extraction for academic papers can effectively promote the development of science and the promotion of achievements. In academic papers, terms in different positions, such as titles, keywords, abstracts and other text blocks, have different distribution characteristics. The traditional term extraction method neglects the location information of term distribution, so it is urgent that a method which can consider the term location information synthetically to make up for the deficiency of the existing methods. In this paper, a multi-strategy based term extraction method (temm) for academic papers is proposed. Firstly, according to the different features of titles, abstracts and keywords, a new method based on boundary markers is proposed. The extraction strategy of candidate terms based on Chinese term formation rule and keyword is analyzed, and the results and error types of candidate term extraction are analyzed, and the dictionary of term counterexample rule is introduced to improve the extraction result. Combined with the K-Near-frequency substring merging algorithm, the candidate terms are filtered. Finally, a comprehensive scoring model is constructed by using the location information of the terms, and the weight values of the three dimensions of the AHP decision title, summary and key words are adopted. Get the correct terminology according to the final ranking. In addition, the category frequency CFS is introduced based on the TF-IDF algorithm to improve the screening effect. In the experiment stage, the influence of the change of K value on the substring merging is tested, and the variation of the term extraction results with the introduction of CF and position information is compared. The results show that Compared with the traditional TF-IDF-CF method, the accuracy and recall rate of TF-IDF-CF method were increased by 5.73% and 8.43%, respectively. The accuracy and recall rate of TEM-SW method were increased by 7.85% and 11.54%, respectively, and the recall rate of TEM-MW method was increased by 11.62% and 9.71%, respectively. Paper term extraction.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前7條

1 袁勁松;張小明;李舟軍;;術(shù)語自動(dòng)抽取方法研究綜述[J];計(jì)算機(jī)科學(xué);2015年08期

2 丁杰;呂學(xué)強(qiáng);劉克會(huì);;基于邊界標(biāo)記集的專利文獻(xiàn)術(shù)語抽取方法[J];計(jì)算機(jī)工程與科學(xué);2015年08期

3 杜麗萍;李曉戈;周元哲;邵春昌;;互信息改進(jìn)方法在術(shù)語抽取中的應(yīng)用[J];計(jì)算機(jī)應(yīng)用;2015年04期

4 湯青;呂學(xué)強(qiáng);李卓;施水才;;領(lǐng)域本體術(shù)語抽取研究[J];現(xiàn)代圖書情報(bào)技術(shù);2014年01期

5 周浪;馮沖;黃河燕;王平堯;;一種基于獨(dú)立性統(tǒng)計(jì)的子串歸并算法[J];計(jì)算機(jī)工程與應(yīng)用;2010年24期

6 周浪;張亮;馮沖;黃河燕;;基于詞頻分布變化統(tǒng)計(jì)的術(shù)語抽取方法[J];計(jì)算機(jī)科學(xué);2009年05期

7 呂學(xué)強(qiáng),張樂,黃志丹,胡俊峰;基于散列技術(shù)的快速子串歸并算法[J];復(fù)旦學(xué)報(bào)(自然科學(xué)版);2004年05期

,

本文編號(hào):2107420

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2107420.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶7b03d***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com