基于語(yǔ)義相似度計(jì)算的術(shù)語(yǔ)推薦與可視化
本文關(guān)鍵詞:基于語(yǔ)義相似度計(jì)算的術(shù)語(yǔ)推薦與可視化 出處:《聊城大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 術(shù)語(yǔ) 語(yǔ)義相似度 結(jié)構(gòu)化 非結(jié)構(gòu)化 可視化 術(shù)語(yǔ)推薦 知識(shí)圖譜
【摘要】:信息時(shí)代飛速發(fā)展,帶動(dòng)世界變成移動(dòng)化,智能化。信息數(shù)據(jù)的爆炸式增長(zhǎng),推動(dòng)術(shù)語(yǔ)學(xué)也在不斷更新與發(fā)展。術(shù)語(yǔ)數(shù)據(jù)規(guī)模的增長(zhǎng),信息文化的豐富,促使術(shù)語(yǔ)概念含義逐漸多樣性。術(shù)語(yǔ)概念含義的多樣性給國(guó)際標(biāo)準(zhǔn)組織ISO制定術(shù)語(yǔ)標(biāo)準(zhǔn)文件時(shí),帶來(lái)極大的困難;同時(shí)國(guó)際術(shù)語(yǔ)專家急需一個(gè)術(shù)語(yǔ)推薦系統(tǒng),實(shí)現(xiàn)各國(guó)術(shù)語(yǔ)概念更新及制定工作同步進(jìn)行。術(shù)語(yǔ)的語(yǔ)義相似度計(jì)算方法對(duì)其它學(xué)科的發(fā)展具有基礎(chǔ)性作用,如信息檢索,機(jī)器翻譯,人工智能等,研究人員不斷對(duì)其進(jìn)行更新優(yōu)化,以適應(yīng)新的需求。目前的計(jì)算方法多數(shù)基于數(shù)據(jù)的組織形式進(jìn)行劃分,如基于結(jié)構(gòu)化數(shù)據(jù)和非結(jié)構(gòu)化數(shù)據(jù)兩種方法。數(shù)據(jù)的結(jié)構(gòu)化組織形式包括本體、Hownet、WordNet等;非結(jié)構(gòu)化組織形式,往往是大規(guī)模的數(shù)據(jù),沒(méi)有固定的結(jié)構(gòu)形式。非結(jié)構(gòu)化數(shù)據(jù)的語(yǔ)義相似度計(jì)算是通過(guò)機(jī)器學(xué)習(xí)進(jìn)行模型訓(xùn)練,然后調(diào)用模型進(jìn)行計(jì)算。本文主要通過(guò)對(duì)基于本體的結(jié)構(gòu)化數(shù)據(jù)及大規(guī)模非結(jié)構(gòu)化數(shù)據(jù)的術(shù)語(yǔ)語(yǔ)義相似度計(jì)算方法進(jìn)行研究及應(yīng)用,主要包含以下內(nèi)容:(1)基于結(jié)構(gòu)化數(shù)據(jù)的術(shù)語(yǔ)語(yǔ)義相似度計(jì)算方法大多不能兼顧各種影響因子,或者根據(jù)專家經(jīng)驗(yàn)確定各因子的權(quán)值,造成計(jì)算不準(zhǔn)確等問(wèn)題。因此本文對(duì)基于本體的混合式語(yǔ)義相似度計(jì)算方法改進(jìn),借鑒模糊優(yōu)化排序思想確定不同因素的權(quán)值,提高了計(jì)算的準(zhǔn)確性。同時(shí)將本方法應(yīng)用于術(shù)語(yǔ)推薦工作中,在術(shù)語(yǔ)專家進(jìn)行術(shù)語(yǔ)推薦前,需要對(duì)推薦術(shù)語(yǔ)進(jìn)行語(yǔ)義相似度計(jì)算,判斷其在術(shù)語(yǔ)標(biāo)準(zhǔn)文件中是否存在同義或近義詞,然后將其提交到術(shù)語(yǔ)推薦系統(tǒng)中,進(jìn)行術(shù)語(yǔ)文件的更新。(2)隨著大數(shù)據(jù)時(shí)代的到來(lái),大規(guī)模非結(jié)構(gòu)化數(shù)據(jù)語(yǔ)料的術(shù)語(yǔ)語(yǔ)義相似度計(jì)算方法逐漸成為研究熱點(diǎn)。在海量數(shù)據(jù)中提取出術(shù)語(yǔ)的語(yǔ)義相似詞并進(jìn)行可視化展示,是本文的另一個(gè)研究重點(diǎn)。對(duì)于大規(guī)模非結(jié)構(gòu)化數(shù)據(jù)的語(yǔ)義相似度計(jì)算,本文通過(guò)基于詞向量的術(shù)語(yǔ)語(yǔ)義相似度計(jì)算方法,利用Word2vec對(duì)語(yǔ)料進(jìn)行模型訓(xùn)練,將語(yǔ)料庫(kù)中的文本用詞向量進(jìn)行表示。通過(guò)詞向量進(jìn)行語(yǔ)義相似度計(jì)算,得到術(shù)語(yǔ)的語(yǔ)義相似詞。其次,調(diào)用Prefuse組件對(duì)語(yǔ)義相似詞的關(guān)系網(wǎng)絡(luò)進(jìn)行可視化展示,這樣方便術(shù)語(yǔ)工作者挖掘術(shù)語(yǔ)之間的潛在關(guān)系,同時(shí)為后期的知識(shí)圖譜的繪制工作奠定基礎(chǔ)。
[Abstract]:The rapid development of the information era, led the world into mobile, intelligent information. The explosive growth of data, promote the terminology is constantly updated and development. In terms of the growth in the size of data information, the rich culture, promote concept meaning gradually diversity. Diversity in terms of the concept to develop the standards of terminology documents to the international standards organization ISO when bring great difficulties; at the same time the international experts need a term term recommendation system, the realization of national term concept update and develop work simultaneously. The term semantic similarity calculation method plays a basic role in the development of other disciplines, such as information retrieval, artificial intelligence, Machine Translation, researchers continue to update optimization and in order to meet the new demand. Most of the current calculation method of data partition based on the organizational form, such as based on structured data and unstructured data Two. Structured data including ontology, Hownet, WordNet; unstructured form of organization is often large-scale data, no fixed structure. Semantic similarity calculation of unstructured data is used to train the model through machine learning, and then call the model calculation. This paper focuses on the calculation method of term semantic similarity structured data and unstructured data of large-scale ontology based on the research and application, mainly includes the following contents: (1) the term semantic similarity calculation method based on structured data are not taking into account the various factors, or to determine the factor weights based on expert experience, resulting in inaccurate calculation problems. The calculation method of mixing type of semantic similarity based on ontology, using fuzzy optimization to determine the different factors of the right sort of thinking value, improve The accuracy of the calculation. At the same time, the proposed method is applied to the recommended terms in terms of recommendation in terms of experts before the need for semantic similarity calculation of recommended terms, determine whether the file exists in the standard terminology of synonyms or near synonyms, which will then be submitted to the term recommendation system. The term document update (2). With the advent of the era of big data, the term semantic similarity of large unstructured data corpus method has gradually become a research hotspot. In the data extracted in terms of semantic similar words and visual display, is another focus of this paper. The semantic similarity of large-scale unstructured data calculation, through the calculation method of semantic terms the similarity of word vector based on the model of training corpus by using Word2vec, the text corpus of the word by word vector representation. Vector of semantic similarity calculation, get the term semantic similarity word. Secondly, call the Prefuse component of the semantic similarity between word network visual display, so convenient term workers excavate the potential relations between terms, and lay the foundation for the knowledge map drawing work.
【學(xué)位授予單位】:聊城大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.3
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 胡艷波;崔新春;路青;;2002~2011年國(guó)內(nèi)語(yǔ)義相似度研究計(jì)量分析[J];情報(bào)科學(xué);2013年07期
2 王家琴;李仁發(fā);李仲生;唐劍波;;一種基于本體的概念語(yǔ)義相似度方法的研究[J];計(jì)算機(jī)工程;2007年11期
3 劉俊;;基于語(yǔ)義相似度的關(guān)鍵詞生成在企業(yè)搜索引擎營(yíng)銷中應(yīng)用[J];電腦知識(shí)與技術(shù);2008年14期
4 宗裕朋;吳剛;;一種基于上下文的語(yǔ)義相似度算法[J];微計(jì)算機(jī)信息;2008年30期
5 劉春辰;劉大有;王生生;趙靜濱;王兆丹;;改進(jìn)的語(yǔ)義相似度計(jì)算模型及應(yīng)用[J];吉林大學(xué)學(xué)報(bào)(工學(xué)版);2009年01期
6 徐猛;劉宗田;周文;;一種基于知網(wǎng)語(yǔ)義相似度計(jì)算的應(yīng)用研究[J];微計(jì)算機(jī)信息;2010年03期
7 孫海霞;錢慶;成穎;;基于本體的語(yǔ)義相似度計(jì)算方法研究綜述[J];現(xiàn)代圖書情報(bào)技術(shù);2010年01期
8 魏椺;向陽(yáng);陳千;;計(jì)算術(shù)語(yǔ)間語(yǔ)義相似度的混合方法[J];計(jì)算機(jī)應(yīng)用;2010年06期
9 馬續(xù)補(bǔ);郭菊娥;;基于《知網(wǎng)》語(yǔ)義相似度的企業(yè)事實(shí)主題診斷研究[J];情報(bào)雜志;2010年05期
10 魏凱斌;冉延平;余牛;;語(yǔ)義相似度的計(jì)算方法研究與分析[J];計(jì)算機(jī)技術(shù)與發(fā)展;2010年07期
相關(guān)會(huì)議論文 前10條
1 關(guān)毅;王曉龍;;基于統(tǒng)計(jì)的漢語(yǔ)詞匯間語(yǔ)義相似度計(jì)算[A];語(yǔ)言計(jì)算與基于內(nèi)容的文本處理——全國(guó)第七屆計(jì)算語(yǔ)言學(xué)聯(lián)合學(xué)術(shù)會(huì)議論文集[C];2003年
2 李月雷;師瑞峰;林麗冰;周一民;;漢語(yǔ)語(yǔ)句語(yǔ)義相似度的計(jì)算方法[A];2008'中國(guó)信息技術(shù)與應(yīng)用學(xué)術(shù)論壇論文集(一)[C];2008年
3 馮新元;魏建國(guó);路文煥;黨建武;;引入領(lǐng)域知識(shí)的基于《知網(wǎng)》詞語(yǔ)語(yǔ)義相似度計(jì)算[A];第十二屆全國(guó)人機(jī)語(yǔ)音通訊學(xué)術(shù)會(huì)議(NCMMSC'2013)論文集[C];2013年
4 章成志;;詞語(yǔ)的語(yǔ)義相似度計(jì)算及其應(yīng)用研究[A];NCIRCS2004第一屆全國(guó)信息檢索與內(nèi)容安全學(xué)術(shù)會(huì)議論文集[C];2004年
5 劉寒磊;關(guān)毅;徐永東;;多文檔文摘中基于語(yǔ)義相似度的最大邊緣相關(guān)技術(shù)研究[A];全國(guó)第八屆計(jì)算語(yǔ)言學(xué)聯(lián)合學(xué)術(shù)會(huì)議(JSCL-2005)論文集[C];2005年
6 石靜;邱立坤;王菲;吳云芳;;相似詞獲取的集成方法[A];中國(guó)計(jì)算語(yǔ)言學(xué)研究前沿進(jìn)展(2009-2011)[C];2011年
7 陳明;鹿e,
本文編號(hào):1390442
本文鏈接:http://sikaile.net/shoufeilunwen/xixikjs/1390442.html