天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 軟件論文 >

基于多策略的學(xué)術(shù)論文術(shù)語(yǔ)抽取方法研究

發(fā)布時(shí)間:2018-07-08 12:23

  本文選題:多策略 + 術(shù)語(yǔ)抽取 ; 參考:《華中科技大學(xué)》2016年碩士論文


【摘要】:如何快速又準(zhǔn)確地抽取術(shù)語(yǔ)是自然語(yǔ)言處理中一項(xiàng)重要課題。面向?qū)W術(shù)論文領(lǐng)域的術(shù)語(yǔ)抽取研究能夠有效地推動(dòng)科學(xué)的發(fā)展與成果的推廣。學(xué)術(shù)論文中,術(shù)語(yǔ)在不同的位置,如標(biāo)題、關(guān)鍵字、摘要等文本塊,具有不同的分布特征。傳統(tǒng)的術(shù)語(yǔ)抽取方法忽略了術(shù)語(yǔ)分布的位置信息,因此,急需一種能夠綜合考慮術(shù)語(yǔ)位置信息的方法來(lái)彌補(bǔ)現(xiàn)有方法的不足。提出了一種面向?qū)W術(shù)論文的基于多策略的術(shù)語(yǔ)抽取方法TEM,該方法首先根據(jù)標(biāo)題、摘要和關(guān)鍵詞的不同特征,分別采用基于邊界標(biāo)記集、基于中文術(shù)語(yǔ)構(gòu)詞規(guī)則和基于關(guān)鍵詞的候選術(shù)語(yǔ)抽取策略;接著分析了候選術(shù)語(yǔ)抽取的結(jié)果及錯(cuò)誤類(lèi)型,引入術(shù)語(yǔ)反例規(guī)則字典改進(jìn)抽取結(jié)果;再結(jié)合K-近頻子串歸并算法對(duì)候選術(shù)語(yǔ)進(jìn)行篩選過(guò)濾;最后利用術(shù)語(yǔ)的位置信息,構(gòu)建了綜合評(píng)分模型,采用層次分析法決策標(biāo)題、摘要和關(guān)鍵詞三個(gè)維度的權(quán)重值,根據(jù)最終的評(píng)分排序得到正確術(shù)語(yǔ)。此外,針對(duì)單詞型術(shù)語(yǔ),在TF-IDF算法的基礎(chǔ)上引入了類(lèi)別頻率CF,提高了篩選的效果。在實(shí)驗(yàn)階段,測(cè)試了K值變化對(duì)子串歸并的影響,對(duì)比了引入CF和位置信息后術(shù)語(yǔ)抽取結(jié)果的變化。結(jié)果表明,相比于傳統(tǒng)方法,TF-IDF-CF方法的準(zhǔn)確率和召回率分別提升了5.73%和8.43%;TEM-SW方法的準(zhǔn)確率和召回率分別提升了7.85%和11.54%,TEM-MW方法的準(zhǔn)確率和召回率分別提升了11.62%和9.71%;更好地實(shí)現(xiàn)了學(xué)術(shù)論文術(shù)語(yǔ)的抽取。
[Abstract]:How to extract terms quickly and accurately is an important task in natural language processing. Term extraction for academic papers can effectively promote the development of science and the promotion of achievements. In academic papers, terms in different positions, such as titles, keywords, abstracts and other text blocks, have different distribution characteristics. The traditional term extraction method neglects the location information of term distribution, so it is urgent that a method which can consider the term location information synthetically to make up for the deficiency of the existing methods. In this paper, a multi-strategy based term extraction method (temm) for academic papers is proposed. Firstly, according to the different features of titles, abstracts and keywords, a new method based on boundary markers is proposed. The extraction strategy of candidate terms based on Chinese term formation rule and keyword is analyzed, and the results and error types of candidate term extraction are analyzed, and the dictionary of term counterexample rule is introduced to improve the extraction result. Combined with the K-Near-frequency substring merging algorithm, the candidate terms are filtered. Finally, a comprehensive scoring model is constructed by using the location information of the terms, and the weight values of the three dimensions of the AHP decision title, summary and key words are adopted. Get the correct terminology according to the final ranking. In addition, the category frequency CFS is introduced based on the TF-IDF algorithm to improve the screening effect. In the experiment stage, the influence of the change of K value on the substring merging is tested, and the variation of the term extraction results with the introduction of CF and position information is compared. The results show that Compared with the traditional TF-IDF-CF method, the accuracy and recall rate of TF-IDF-CF method were increased by 5.73% and 8.43%, respectively. The accuracy and recall rate of TEM-SW method were increased by 7.85% and 11.54%, respectively, and the recall rate of TEM-MW method was increased by 11.62% and 9.71%, respectively. Paper term extraction.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類(lèi)號(hào)】:TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前7條

1 袁勁松;張小明;李舟軍;;術(shù)語(yǔ)自動(dòng)抽取方法研究綜述[J];計(jì)算機(jī)科學(xué);2015年08期

2 丁杰;呂學(xué)強(qiáng);劉克會(huì);;基于邊界標(biāo)記集的專(zhuān)利文獻(xiàn)術(shù)語(yǔ)抽取方法[J];計(jì)算機(jī)工程與科學(xué);2015年08期

3 杜麗萍;李曉戈;周元哲;邵春昌;;互信息改進(jìn)方法在術(shù)語(yǔ)抽取中的應(yīng)用[J];計(jì)算機(jī)應(yīng)用;2015年04期

4 湯青;呂學(xué)強(qiáng);李卓;施水才;;領(lǐng)域本體術(shù)語(yǔ)抽取研究[J];現(xiàn)代圖書(shū)情報(bào)技術(shù);2014年01期

5 周浪;馮沖;黃河燕;王平堯;;一種基于獨(dú)立性統(tǒng)計(jì)的子串歸并算法[J];計(jì)算機(jī)工程與應(yīng)用;2010年24期

6 周浪;張亮;馮沖;黃河燕;;基于詞頻分布變化統(tǒng)計(jì)的術(shù)語(yǔ)抽取方法[J];計(jì)算機(jī)科學(xué);2009年05期

7 呂學(xué)強(qiáng),張樂(lè),黃志丹,胡俊峰;基于散列技術(shù)的快速子串歸并算法[J];復(fù)旦學(xué)報(bào)(自然科學(xué)版);2004年05期

,

本文編號(hào):2107420

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2107420.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶(hù)7b03d***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
亚洲国产欧美精品久久| 亚洲成人免费天堂诱惑| 日韩偷拍精品一区二区三区| 日本精品视频一二三区| 久热久热精品视频在线观看| 国产精品久久精品毛片| 欧美精品在线观看国产| 国产日韩精品激情在线观看| 99热在线播放免费观看| 老鸭窝老鸭窝一区二区| 少妇熟女亚洲色图av天堂| 国产不卡免费高清视频| 国产乱久久亚洲国产精品| 激情中文字幕在线观看| 欧美日韩国产欧美日韩| 99久久婷婷国产亚洲综合精品| 精品国自产拍天天青青草原| 欧美不雅视频午夜福利| 亚洲中文字幕高清视频在线观看| 亚洲一区在线观看蜜桃| 成人精品一区二区三区在线| 久久一区内射污污内射亚洲| 亚洲日本加勒比在线播放| 成人午夜视频在线播放| 欧美精品久久一二三区| 日韩精品人妻少妇一区二区| 免费午夜福利不卡片在线 视频 | 欧美一区二区三区播放| 美女黄色三级深夜福利| 最近日韩在线免费黄片| 欧美精品中文字幕亚洲| 女人高潮被爽到呻吟在线观看| 99热九九热这里只有精品| 青青操视频在线播放免费| 欧美日韩欧美国产另类| 少妇毛片一区二区三区| 少妇特黄av一区二区三区| 人妻露脸一区二区三区| 国产午夜在线精品视频| 国产在线一区中文字幕 | 美女黄片大全在线观看|