基于語義相似度計(jì)算的術(shù)語推薦與可視化

發(fā)布時(shí)間：2018-01-07 01:38

本文關(guān)鍵詞：基于語義相似度計(jì)算的術(shù)語推薦與可視化　出處：《聊城大學(xué)》2017年碩士論文　論文類型：學(xué)位論文

【摘要】：信息時(shí)代飛速發(fā)展,帶動(dòng)世界變成移動(dòng)化,智能化。信息數(shù)據(jù)的爆炸式增長,推動(dòng)術(shù)語學(xué)也在不斷更新與發(fā)展。術(shù)語數(shù)據(jù)規(guī)模的增長,信息文化的豐富,促使術(shù)語概念含義逐漸多樣性。術(shù)語概念含義的多樣性給國際標(biāo)準(zhǔn)組織ISO制定術(shù)語標(biāo)準(zhǔn)文件時(shí),帶來極大的困難;同時(shí)國際術(shù)語專家急需一個(gè)術(shù)語推薦系統(tǒng),實(shí)現(xiàn)各國術(shù)語概念更新及制定工作同步進(jìn)行。術(shù)語的語義相似度計(jì)算方法對(duì)其它學(xué)科的發(fā)展具有基礎(chǔ)性作用,如信息檢索,機(jī)器翻譯,人工智能等,研究人員不斷對(duì)其進(jìn)行更新優(yōu)化,以適應(yīng)新的需求。目前的計(jì)算方法多數(shù)基于數(shù)據(jù)的組織形式進(jìn)行劃分,如基于結(jié)構(gòu)化數(shù)據(jù)和非結(jié)構(gòu)化數(shù)據(jù)兩種方法。數(shù)據(jù)的結(jié)構(gòu)化組織形式包括本體、Hownet、WordNet等;非結(jié)構(gòu)化組織形式,往往是大規(guī)模的數(shù)據(jù),沒有固定的結(jié)構(gòu)形式。非結(jié)構(gòu)化數(shù)據(jù)的語義相似度計(jì)算是通過機(jī)器學(xué)習(xí)進(jìn)行模型訓(xùn)練,然后調(diào)用模型進(jìn)行計(jì)算。本文主要通過對(duì)基于本體的結(jié)構(gòu)化數(shù)據(jù)及大規(guī)模非結(jié)構(gòu)化數(shù)據(jù)的術(shù)語語義相似度計(jì)算方法進(jìn)行研究及應(yīng)用,主要包含以下內(nèi)容:(1)基于結(jié)構(gòu)化數(shù)據(jù)的術(shù)語語義相似度計(jì)算方法大多不能兼顧各種影響因子,或者根據(jù)專家經(jīng)驗(yàn)確定各因子的權(quán)值,造成計(jì)算不準(zhǔn)確等問題。因此本文對(duì)基于本體的混合式語義相似度計(jì)算方法改進(jìn),借鑒模糊優(yōu)化排序思想確定不同因素的權(quán)值,提高了計(jì)算的準(zhǔn)確性。同時(shí)將本方法應(yīng)用于術(shù)語推薦工作中,在術(shù)語專家進(jìn)行術(shù)語推薦前,需要對(duì)推薦術(shù)語進(jìn)行語義相似度計(jì)算,判斷其在術(shù)語標(biāo)準(zhǔn)文件中是否存在同義或近義詞,然后將其提交到術(shù)語推薦系統(tǒng)中,進(jìn)行術(shù)語文件的更新。(2)隨著大數(shù)據(jù)時(shí)代的到來,大規(guī)模非結(jié)構(gòu)化數(shù)據(jù)語料的術(shù)語語義相似度計(jì)算方法逐漸成為研究熱點(diǎn)。在海量數(shù)據(jù)中提取出術(shù)語的語義相似詞并進(jìn)行可視化展示,是本文的另一個(gè)研究重點(diǎn)。對(duì)于大規(guī)模非結(jié)構(gòu)化數(shù)據(jù)的語義相似度計(jì)算,本文通過基于詞向量的術(shù)語語義相似度計(jì)算方法,利用Word2vec對(duì)語料進(jìn)行模型訓(xùn)練,將語料庫中的文本用詞向量進(jìn)行表示。通過詞向量進(jìn)行語義相似度計(jì)算,得到術(shù)語的語義相似詞。其次,調(diào)用Prefuse組件對(duì)語義相似詞的關(guān)系網(wǎng)絡(luò)進(jìn)行可視化展示,這樣方便術(shù)語工作者挖掘術(shù)語之間的潛在關(guān)系,同時(shí)為后期的知識(shí)圖譜的繪制工作奠定基礎(chǔ)。
[Abstract]:The rapid development of the information era, led the world into mobile, intelligent information. The explosive growth of data, promote the terminology is constantly updated and development. In terms of the growth in the size of data information, the rich culture, promote concept meaning gradually diversity. Diversity in terms of the concept to develop the standards of terminology documents to the international standards organization ISO when bring great difficulties; at the same time the international experts need a term term recommendation system, the realization of national term concept update and develop work simultaneously. The term semantic similarity calculation method plays a basic role in the development of other disciplines, such as information retrieval, artificial intelligence, Machine Translation, researchers continue to update optimization and in order to meet the new demand. Most of the current calculation method of data partition based on the organizational form, such as based on structured data and unstructured data Two. Structured data including ontology, Hownet, WordNet; unstructured form of organization is often large-scale data, no fixed structure. Semantic similarity calculation of unstructured data is used to train the model through machine learning, and then call the model calculation. This paper focuses on the calculation method of term semantic similarity structured data and unstructured data of large-scale ontology based on the research and application, mainly includes the following contents: (1) the term semantic similarity calculation method based on structured data are not taking into account the various factors, or to determine the factor weights based on expert experience, resulting in inaccurate calculation problems. The calculation method of mixing type of semantic similarity based on ontology, using fuzzy optimization to determine the different factors of the right sort of thinking value, improve The accuracy of the calculation. At the same time, the proposed method is applied to the recommended terms in terms of recommendation in terms of experts before the need for semantic similarity calculation of recommended terms, determine whether the file exists in the standard terminology of synonyms or near synonyms, which will then be submitted to the term recommendation system. The term document update (2). With the advent of the era of big data, the term semantic similarity of large unstructured data corpus method has gradually become a research hotspot. In the data extracted in terms of semantic similar words and visual display, is another focus of this paper. The semantic similarity of large-scale unstructured data calculation, through the calculation method of semantic terms the similarity of word vector based on the model of training corpus by using Word2vec, the text corpus of the word by word vector representation. Vector of semantic similarity calculation, get the term semantic similarity word. Secondly, call the Prefuse component of the semantic similarity between word network visual display, so convenient term workers excavate the potential relations between terms, and lay the foundation for the knowledge map drawing work.

【學(xué)位授予單位】：聊城大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP391.3

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 胡艷波;崔新春;路青;;2002～2011年國內(nèi)語義相似度研究計(jì)量分析[J];情報(bào)科學(xué);2013年07期

2 王家琴;李仁發(fā);李仲生;唐劍波;;一種基于本體的概念語義相似度方法的研究[J];計(jì)算機(jī)工程;2007年11期

3 劉俊;;基于語義相似度的關(guān)鍵詞生成在企業(yè)搜索引擎營銷中應(yīng)用[J];電腦知識(shí)與技術(shù);2008年14期

4 宗裕朋;吳剛;;一種基于上下文的語義相似度算法[J];微計(jì)算機(jī)信息;2008年30期

5 劉春辰;劉大有;王生生;趙靜濱;王兆丹;;改進(jìn)的語義相似度計(jì)算模型及應(yīng)用[J];吉林大學(xué)學(xué)報(bào)(工學(xué)版);2009年01期

6 徐猛;劉宗田;周文;;一種基于知網(wǎng)語義相似度計(jì)算的應(yīng)用研究[J];微計(jì)算機(jī)信息;2010年03期

7 孫海霞;錢慶;成穎;;基于本體的語義相似度計(jì)算方法研究綜述[J];現(xiàn)代圖書情報(bào)技術(shù);2010年01期

8 魏椺;向陽;陳千;;計(jì)算術(shù)語間語義相似度的混合方法[J];計(jì)算機(jī)應(yīng)用;2010年06期

9 馬續(xù)補(bǔ);郭菊娥;;基于《知網(wǎng)》語義相似度的企業(yè)事實(shí)主題診斷研究[J];情報(bào)雜志;2010年05期

10 魏凱斌;冉延平;余牛;;語義相似度的計(jì)算方法研究與分析[J];計(jì)算機(jī)技術(shù)與發(fā)展;2010年07期

相關(guān)會(huì)議論文前10條

1 關(guān)毅;王曉龍;;基于統(tǒng)計(jì)的漢語詞匯間語義相似度計(jì)算[A];語言計(jì)算與基于內(nèi)容的文本處理——全國第七屆計(jì)算語言學(xué)聯(lián)合學(xué)術(shù)會(huì)議論文集[C];2003年

2 李月雷;師瑞峰;林麗冰;周一民;;漢語語句語義相似度的計(jì)算方法[A];2008'中國信息技術(shù)與應(yīng)用學(xué)術(shù)論壇論文集（一）[C];2008年

3 馮新元;魏建國;路文煥;黨建武;;引入領(lǐng)域知識(shí)的基于《知網(wǎng)》詞語語義相似度計(jì)算[A];第十二屆全國人機(jī)語音通訊學(xué)術(shù)會(huì)議（NCMMSC'2013）論文集[C];2013年

4 章成志;;詞語的語義相似度計(jì)算及其應(yīng)用研究[A];NCIRCS2004第一屆全國信息檢索與內(nèi)容安全學(xué)術(shù)會(huì)議論文集[C];2004年

5 劉寒磊;關(guān)毅;徐永東;;多文檔文摘中基于語義相似度的最大邊緣相關(guān)技術(shù)研究[A];全國第八屆計(jì)算語言學(xué)聯(lián)合學(xué)術(shù)會(huì)議（JSCL-2005）論文集[C];2005年

6 石靜;邱立坤;王菲;吳云芳;;相似詞獲取的集成方法[A];中國計(jì)算語言學(xué)研究前沿進(jìn)展（2009-2011）[C];2011年

7 陳明;鹿e，

本文編號(hào)：1390442

資料下載

論文發(fā)表

本文鏈接：http://sikaile.net/shoufeilunwen/xixikjs/1390442.html

上一篇：基于SDN的可擴(kuò)展轉(zhuǎn)發(fā)設(shè)備架構(gòu)設(shè)計(jì)及關(guān)鍵技術(shù)實(shí)現(xiàn)
下一篇：媒體融合背景下獨(dú)立書店的營銷策略研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于語義相似度計(jì)算的術(shù)語推薦與可視化