詞語語義相關(guān)度計算研究
發(fā)布時間:2018-03-26 12:04
本文選題:語義相關(guān)度 切入點:核函數(shù) 出處:《華中師范大學(xué)》2013年碩士論文
【摘要】:詞語語義相關(guān)度是表示兩個詞語相關(guān)程度的一個概念,它反映的是詞語的關(guān)聯(lián)程度,即看到一個詞語,是不是可以想到另外一個詞語,我們可以用兩個詞語在同一語境下共同出現(xiàn)的可能性來衡量這兩個詞語的語義相關(guān)度。語義相似度和語義相關(guān)度是兩個很容易混淆的概念,語義相似度是指詞語之間的相似性。語義相關(guān)度和語義相似度之間是有聯(lián)系的,如果兩個詞語語義相似,那么它們一定語義相關(guān),但是反過來,如果兩個詞語語義相關(guān),它們不一定語義相似,所以我們可以將語義相似度作為語義相關(guān)度計算的一個組成部分。 語義相關(guān)度計算對于機器翻譯、信息檢索、文本分析等自然語言處理研究任務(wù)具有重要意義,是一項基礎(chǔ)性的研究工作。本文研究了現(xiàn)有的語義相關(guān)度計算方法,然后提出了一種基于搜索引擎的語義相關(guān)度計算方法,具體的工作如下: 第一、現(xiàn)有的詞語語義相關(guān)度計算方法大致可以分為傳統(tǒng)的語義相關(guān)度計算方法和基于網(wǎng)絡(luò)百科全書的語義相關(guān)度計算方法;而傳統(tǒng)的方法又可以進一步分為兩類:基于語義詞典(WordNet、知網(wǎng))的計算方法和基于語料庫的計算方法。本文對這些方法需要用到的語義資源做了詳細的介紹,緊接著闡述了每一類中具有代表性的幾種語義相關(guān)度計算方法,詳細分析它們的理論基礎(chǔ)和特點。 第二、提出了一種核函數(shù)與Page Counts相結(jié)合的語義相關(guān)度計算方法,Page Counts是我們使用搜索引擎進行查詢時返回的頁面數(shù)。這為我們進行語義相關(guān)度研究提供了一個新的方向,充分利用高速發(fā)展的網(wǎng)絡(luò)技術(shù),為我們的研究服務(wù)。同時,我們還從以下三個方面驗證了該方法的有效性:1、分析其理論依據(jù);2、在標(biāo)準(zhǔn)測試集上實驗,然后與人工判斷的結(jié)果做比較;3、特定環(huán)境下評估該方法。通過實驗驗證,本文提出的方法與單獨使用核函數(shù)或者Page Counts計算語義相關(guān)度對比,得到的結(jié)果與人工判斷的結(jié)果更接近,所以本文提出的方法是有效的。 第三、本文介紹了語義相關(guān)度計算的一個應(yīng)用——文本聚類,在詞語語義相關(guān)度計算結(jié)果的基礎(chǔ)上,對文本的語義相關(guān)度進行計算,我們可以提高文本聚類的精度。
[Abstract]:Semantic relevance of words is a concept that indicates the correlation between two words. It reflects the degree of relevance of a word, that is, if you see a word, can you think of another word? We can use the possibility that two words appear together in the same context to measure the semantic relevance of the two words. Semantic similarity and semantic relevance are two very confusing concepts. Semantic similarity refers to the similarity between words. There is a connection between semantic similarity and semantic similarity. If two words are semantic similar, then they must be semantically related, but conversely, if two words are semantically related, They are not necessarily semantic similarity, so we can use semantic similarity as an integral part of semantic correlation calculation. Semantic relevance computing is of great significance to natural language processing research tasks such as machine translation, information retrieval, text analysis and so on. Then, a method of semantic relevance calculation based on search engine is proposed. The specific work is as follows:. First, the existing semantic relevance calculation methods can be roughly divided into traditional semantic relevance calculation method and network encyclopedia based semantic relevance calculation method. However, the traditional methods can be further divided into two categories: the computing methods based on semantic dictionary (WordNet) and the methods based on corpus. In this paper, the semantic resources that need to be used in these methods are introduced in detail. Then, several representative semantic correlation calculation methods in each class are introduced, and their theoretical basis and characteristics are analyzed in detail. Secondly, this paper proposes a semantic relevance calculation method which combines kernel function with Page Counts. Page Counts is the number of pages returned when we use search engine to query, which provides a new direction for us to study semantic relevance. At the same time, we verify the validity of this method from the following three aspects, analyze its theoretical basis and experiment on the standard test set. Then compared with the result of manual judgment, the method is evaluated in a specific environment. The experimental results show that the method proposed in this paper is compared with the semantic correlation calculated by using kernel function or Page Counts alone. The results obtained are closer to those obtained by manual judgment, so the method proposed in this paper is effective. Thirdly, this paper introduces an application of semantic relevance calculation-text clustering. On the basis of the result of semantic correlation, we can improve the accuracy of text clustering by calculating the semantic relevance of text.
【學(xué)位授予單位】:華中師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.1
【參考文獻】
相關(guān)期刊論文 前3條
1 許云,樊孝忠,張鋒;基于知網(wǎng)的語義相關(guān)度計算[J];北京理工大學(xué)學(xué)報;2005年05期
2 吳友政,趙軍,段湘煜,徐波;問答式檢索技術(shù)及評測研究綜述[J];中文信息學(xué)報;2005年03期
3 董振東;董強;郝長伶;;知網(wǎng)的理論發(fā)現(xiàn)[J];中文信息學(xué)報;2007年04期
相關(guān)博士學(xué)位論文 前1條
1 鐘茂生;基于內(nèi)容相關(guān)度計算的文本結(jié)構(gòu)分析方法研究[D];上海交通大學(xué);2010年
,本文編號:1667827
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1667827.html
最近更新
教材專著