詞語(yǔ)語(yǔ)義相關(guān)度計(jì)算研究

發(fā)布時(shí)間：2018-03-26 12:04

本文選題：語(yǔ)義相關(guān)度　切入點(diǎn)：核函數(shù)　出處：《華中師范大學(xué)》2013年碩士論文

【摘要】：詞語(yǔ)語(yǔ)義相關(guān)度是表示兩個(gè)詞語(yǔ)相關(guān)程度的一個(gè)概念,它反映的是詞語(yǔ)的關(guān)聯(lián)程度,即看到一個(gè)詞語(yǔ),是不是可以想到另外一個(gè)詞語(yǔ),我們可以用兩個(gè)詞語(yǔ)在同一語(yǔ)境下共同出現(xiàn)的可能性來(lái)衡量這兩個(gè)詞語(yǔ)的語(yǔ)義相關(guān)度。語(yǔ)義相似度和語(yǔ)義相關(guān)度是兩個(gè)很容易混淆的概念,語(yǔ)義相似度是指詞語(yǔ)之間的相似性。語(yǔ)義相關(guān)度和語(yǔ)義相似度之間是有聯(lián)系的,如果兩個(gè)詞語(yǔ)語(yǔ)義相似,那么它們一定語(yǔ)義相關(guān),但是反過(guò)來(lái),如果兩個(gè)詞語(yǔ)語(yǔ)義相關(guān),它們不一定語(yǔ)義相似,所以我們可以將語(yǔ)義相似度作為語(yǔ)義相關(guān)度計(jì)算的一個(gè)組成部分。語(yǔ)義相關(guān)度計(jì)算對(duì)于機(jī)器翻譯、信息檢索、文本分析等自然語(yǔ)言處理研究任務(wù)具有重要意義,是一項(xiàng)基礎(chǔ)性的研究工作。本文研究了現(xiàn)有的語(yǔ)義相關(guān)度計(jì)算方法,然后提出了一種基于搜索引擎的語(yǔ)義相關(guān)度計(jì)算方法,具體的工作如下：第一、現(xiàn)有的詞語(yǔ)語(yǔ)義相關(guān)度計(jì)算方法大致可以分為傳統(tǒng)的語(yǔ)義相關(guān)度計(jì)算方法和基于網(wǎng)絡(luò)百科全書的語(yǔ)義相關(guān)度計(jì)算方法；而傳統(tǒng)的方法又可以進(jìn)一步分為兩類：基于語(yǔ)義詞典(WordNet、知網(wǎng))的計(jì)算方法和基于語(yǔ)料庫(kù)的計(jì)算方法。本文對(duì)這些方法需要用到的語(yǔ)義資源做了詳細(xì)的介紹,緊接著闡述了每一類中具有代表性的幾種語(yǔ)義相關(guān)度計(jì)算方法,詳細(xì)分析它們的理論基礎(chǔ)和特點(diǎn)。第二、提出了一種核函數(shù)與Page Counts相結(jié)合的語(yǔ)義相關(guān)度計(jì)算方法,Page Counts是我們使用搜索引擎進(jìn)行查詢時(shí)返回的頁(yè)面數(shù)。這為我們進(jìn)行語(yǔ)義相關(guān)度研究提供了一個(gè)新的方向,充分利用高速發(fā)展的網(wǎng)絡(luò)技術(shù),為我們的研究服務(wù)。同時(shí),我們還從以下三個(gè)方面驗(yàn)證了該方法的有效性：1、分析其理論依據(jù)；2、在標(biāo)準(zhǔn)測(cè)試集上實(shí)驗(yàn),然后與人工判斷的結(jié)果做比較；3、特定環(huán)境下評(píng)估該方法。通過(guò)實(shí)驗(yàn)驗(yàn)證,本文提出的方法與單獨(dú)使用核函數(shù)或者Page Counts計(jì)算語(yǔ)義相關(guān)度對(duì)比,得到的結(jié)果與人工判斷的結(jié)果更接近,所以本文提出的方法是有效的。第三、本文介紹了語(yǔ)義相關(guān)度計(jì)算的一個(gè)應(yīng)用——文本聚類,在詞語(yǔ)語(yǔ)義相關(guān)度計(jì)算結(jié)果的基礎(chǔ)上,對(duì)文本的語(yǔ)義相關(guān)度進(jìn)行計(jì)算,我們可以提高文本聚類的精度。
[Abstract]:Semantic relevance of words is a concept that indicates the correlation between two words. It reflects the degree of relevance of a word, that is, if you see a word, can you think of another word? We can use the possibility that two words appear together in the same context to measure the semantic relevance of the two words. Semantic similarity and semantic relevance are two very confusing concepts. Semantic similarity refers to the similarity between words. There is a connection between semantic similarity and semantic similarity. If two words are semantic similar, then they must be semantically related, but conversely, if two words are semantically related, They are not necessarily semantic similarity, so we can use semantic similarity as an integral part of semantic correlation calculation. Semantic relevance computing is of great significance to natural language processing research tasks such as machine translation, information retrieval, text analysis and so on. Then, a method of semantic relevance calculation based on search engine is proposed. The specific work is as follows:. First, the existing semantic relevance calculation methods can be roughly divided into traditional semantic relevance calculation method and network encyclopedia based semantic relevance calculation method. However, the traditional methods can be further divided into two categories: the computing methods based on semantic dictionary (WordNet) and the methods based on corpus. In this paper, the semantic resources that need to be used in these methods are introduced in detail. Then, several representative semantic correlation calculation methods in each class are introduced, and their theoretical basis and characteristics are analyzed in detail. Secondly, this paper proposes a semantic relevance calculation method which combines kernel function with Page Counts. Page Counts is the number of pages returned when we use search engine to query, which provides a new direction for us to study semantic relevance. At the same time, we verify the validity of this method from the following three aspects, analyze its theoretical basis and experiment on the standard test set. Then compared with the result of manual judgment, the method is evaluated in a specific environment. The experimental results show that the method proposed in this paper is compared with the semantic correlation calculated by using kernel function or Page Counts alone. The results obtained are closer to those obtained by manual judgment, so the method proposed in this paper is effective. Thirdly, this paper introduces an application of semantic relevance calculation-text clustering. On the basis of the result of semantic correlation, we can improve the accuracy of text clustering by calculating the semantic relevance of text.
【學(xué)位授予單位】：華中師范大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2013
【分類號(hào)】：TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文前3條

1 許云,樊孝忠,張鋒;基于知網(wǎng)的語(yǔ)義相關(guān)度計(jì)算[J];北京理工大學(xué)學(xué)報(bào);2005年05期

2 吳友政,趙軍,段湘煜,徐波;問答式檢索技術(shù)及評(píng)測(cè)研究綜述[J];中文信息學(xué)報(bào);2005年03期

3 董振東;董強(qiáng);郝長(zhǎng)伶;;知網(wǎng)的理論發(fā)現(xiàn)[J];中文信息學(xué)報(bào);2007年04期

相關(guān)博士學(xué)位論文前1條

1 鐘茂生;基于內(nèi)容相關(guān)度計(jì)算的文本結(jié)構(gòu)分析方法研究[D];上海交通大學(xué);2010年

，

本文編號(hào)：1667827

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1667827.html

上一篇：基于GPU的并行排序?qū)W習(xí)算法研究
下一篇：基于Kademlia的MP2P研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

詞語(yǔ)語(yǔ)義相關(guān)度計(jì)算研究