基于多知識庫科技報告術(shù)語實體鏈接研究
[Abstract]:As an important document resource, it is of great value and significance to excavate and analyze the scientific and technological report. However, at present, the research on science and technology report is still focused on its basic concept, definition of attributes and construction of science and technology report system. There are a large number of technical terminology entities in science and technology reports, which are the main research subjects of science and technology reports, which represent the development status and future trend of science and technology in China. Therefore, it is of great significance to excavate and analyze the contents of science and technology reports and identify the technical terminology entities. As the key technology of natural language processing, entity recognition technology can be used to automatically recognize the names of persons, place names, agency names and other entities in the text. In this paper, the scientific and technological report is taken as the research object. Firstly, the new term discovery technology is used to discover the potential new term in the scientific and technological report, and then the specialized terminology knowledge base is constructed as the corpus support for the identification and link of the term entity. Finally, the Stanford NER entity recognition framework is used to realize the automatic recognition of the terminology entities in the scientific and technological reports, and links disambiguation with multiple knowledge bases. The main research works are as follows: (1) aiming at the problems existing in Chinese word segmentation and the characteristics of the terms in scientific and technological reports, a new word discovery method based on part of speech combination is proposed. By drawing up the rules of part of speech combination of professional terms to extract the words in accordance with the rules, and according to the support degree of the strings and the internal and external characteristics of the words, such as length and mutual information, the new words are determined, and the new words of the professional terms are found effectively. To some extent, it improves the accuracy of Chinese word segmentation, and lays a foundation for the identification of terminology entities. (2) constructing the specialized terminology knowledge base. Entity recognition needs a large number of corpus as the support, through training corpus to extract entity features to achieve automatic entity recognition. Due to the lack of public scientific and technological reporting terminology data, this paper uses the technical terminology knowledge provided by the China Standard terminology Network as the data source and uses the web crawler as the data source. Database and other information technologies design and construct the term knowledge base. (3) the main methods of entity recognition are introduced in detail, and the mature Stanford NER open source entity recognition framework based on conditional random field model is selected to train the term entity model. Realizing the automatic recognition of the technical report term entity, and combining the multi-knowledge base and semantic similarity calculation to realize the link disambiguation of the term entity. (4) selecting the science and technology report issued by the national science and technology report service system as the experimental data. This paper designs and develops a prototype system of entity link of scientific and technological reporting terms based on multi-knowledge base. The system mainly integrates preprocessing of scientific and technological report data, neologism discovery, entity identification and entity link function, realizes automatic recognition and disambiguation of scientific and technological report term entity, and verifies the correctness and validity of this method.
【學(xué)位授予單位】:華中師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:G353.1
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 陳衛(wèi)紅;;論科技報告編輯的全方位能力[J];編輯學(xué)報;2006年02期
2 陳馨武;;科技報告在高校教學(xué)和科研中的作用[J];高校圖書館工作;1982年04期
3 張龍根;;科技報告的查檢[J];圖書情報工作;1982年01期
4 秦洪生;;科技報告管理辦法應(yīng)改進(jìn)[J];兵工情報工作;1986年02期
5 慶芳;《航天部科技報告》編輯出版[J];中國空間科學(xué)技術(shù);1987年Z1期
6 王琳,陳京麗;關(guān)于加速船舶科技報告發(fā)展的探討[J];情報理論與實踐;1997年06期
7 王維亮;美國政府科技報告的調(diào)查分析——關(guān)于近幾年來發(fā)行數(shù)量減少問題[J];情報理論與實踐;2000年02期
8 劉立雪;;我們是怎樣用主題鍵詞處理科技報告的[J];圖書情報工作;1981年04期
9 劉士星;美國政府科技報告檢索工具的特點[J];中國科學(xué)技術(shù)大學(xué)學(xué)報;1982年S2期
10 方平;;怎樣查閱科技報告中的醫(yī)學(xué)文獻(xiàn)[J];醫(yī)學(xué)情報工作;1984年04期
相關(guān)會議論文 前3條
1 鄒鍵;;關(guān)于科技報告管理體系建設(shè)的思考[A];第二屆中國航空學(xué)會青年科技論壇文集[C];2006年
2 鄒鍵;;關(guān)于科技報告管理體系建設(shè)的思考[A];節(jié)能環(huán)保 和諧發(fā)展——2007中國科協(xié)年會論文集(一)[C];2007年
3 夏文;;關(guān)于綜述寫作的一些問題[A];遼寧省高校學(xué)報研究會首屆學(xué)術(shù)年會論文集[C];1983年
相關(guān)重要報紙文章 前10條
1 本報記者 劉垠;建立國家科技報告體系[N];大眾科技報;2011年
2 本報記者 徐玢;“科技報告制度是國家創(chuàng)新體系的基本保障條件”[N];科技日報;2012年
3 見習(xí)記者 王恒;建立國家科技報告制度需注意四大問題[N];中國經(jīng)濟時報;2014年
4 本報記者 陳磊;國家科技報告制度,,從頂層設(shè)計走向逐級實施[N];科技日報;2014年
5 記者 喻思孌;國家科技報告制度全面推行[N];人民日報;2014年
6 記者 胡宇芬邋通訊員 戴雄輝 任彬彬;三百省直廳干聽科技報告[N];湖南日報;2008年
7 本報記者 司建楠;馮長根:加快建立國家科技報告體系[N];中國工業(yè)報;2011年
8 本報記者 劉垠 陳磊;科技報告:展現(xiàn)科技實力 推進(jìn)開放共享[N];科技日報;2013年
9 宗禾;制度護航國家科技成果向社會開放共享[N];中國財經(jīng)報;2014年
10 尹江勇;省科協(xié)科技報告周啟動[N];河南日報;2007年
相關(guān)碩士學(xué)位論文 前5條
1 陳桂強;基于多知識庫科技報告術(shù)語實體鏈接研究[D];華中師范大學(xué);2017年
2 范苗苗;科技報告的風(fēng)格翻譯[D];北京外國語大學(xué);2017年
3 李亞峰;科技報告知識共享績效評價體系構(gòu)建研究[D];吉林大學(xué);2015年
4 張金云;科技報告語篇中人際情感與態(tài)度意義[D];山東大學(xué);2005年
5 李成龍;科技報告中粒度關(guān)聯(lián)數(shù)據(jù)的創(chuàng)建與發(fā)布研究[D];華中師范大學(xué);2014年
本文編號:2361695
本文鏈接:http://sikaile.net/tushudanganlunwen/2361695.html