基于中文維基百科的概念相關(guān)詞群研究

發(fā)布時(shí)間：2018-07-30 06:38

【摘要】：互聯(lián)網(wǎng)飛速發(fā)展,人們對(duì)信息獲取需求的不斷提高,同時(shí)信息爆炸式增長(zhǎng),導(dǎo)致信息的收集和查找日益困難,如何在有限的時(shí)間內(nèi)查找到準(zhǔn)確而全面的信息對(duì)于搜索技術(shù)研究提出了重大的挑戰(zhàn),而在搜索引擎系統(tǒng)中加入語義知識(shí)就是提高查詢效率的一個(gè)重要途徑。詞語作為語義表示的最小單位,由于一詞多義、別名等眾多復(fù)雜情況導(dǎo)致單個(gè)詞語表達(dá)意思時(shí)語義不明確,傳統(tǒng)的一些詞語相關(guān)度計(jì)算方法不能很好地解決詞語消歧義問題。傳統(tǒng)計(jì)算方法大概可以分兩種方法,一是在大規(guī)模語料上使用統(tǒng)計(jì)方法,但是現(xiàn)實(shí)生活中缺少規(guī)模足夠大且精確的語料；二是基于人工構(gòu)建知識(shí)系統(tǒng)的計(jì)算方法,也存在一些問題,如人工構(gòu)建知識(shí)系統(tǒng)規(guī)模小、維護(hù)成本高等。面對(duì)傳統(tǒng)詞語相關(guān)度計(jì)算方法的一些不足以及當(dāng)今自然語言處理領(lǐng)域?qū)φZ義知識(shí)的需求,本文著重于詞語相關(guān)度計(jì)算與概念相關(guān)詞群挖掘的研究,具體內(nèi)容如下：一、對(duì)中文維基百科資源整理加工的基礎(chǔ)上,使用改進(jìn)的WLVM方法建立了-個(gè)詞語間相關(guān)度數(shù)據(jù)集,對(duì)數(shù)據(jù)集進(jìn)行了評(píng)估和分析,整理出一些概念的相關(guān)詞群,概念詞群可以用于該概念的語義表示,同樣也可以被廣泛的應(yīng)用于自然語言處理的其他方面,比如,文本擴(kuò)展、知識(shí)庫構(gòu)建等。二、提出一種詞語相關(guān)度計(jì)算方法。在分析前人詞語相關(guān)性計(jì)算方法的基礎(chǔ)上,對(duì)比大規(guī)模語料、人工構(gòu)建的知識(shí)系統(tǒng)與維基百科的差別,本文提出一種詞語間語義相關(guān)度計(jì)算方法,綜合利用了鏈接、分類系統(tǒng)、文本資源和錨文本等語義知識(shí),并對(duì)相關(guān)性計(jì)算結(jié)果進(jìn)行消歧義處理。在實(shí)驗(yàn)中,使用本文提出的方法分別在文本資源和鏈接、分類系統(tǒng)中計(jì)算詞語相關(guān)度、并與其他多種方法做了對(duì)比,證明了本方法的有效性。
[Abstract]:With the rapid development of the Internet, the increasing demand for information acquisition and the explosive growth of information make it more and more difficult to collect and find information. How to find accurate and comprehensive information in a limited time poses a great challenge to the research of search technology, and adding semantic knowledge to search engine system is an important way to improve query efficiency. As the smallest unit of semantic representation, because of the complexity of polysemy, aliases, etc., the semantic of a single word is not clear, so some traditional methods of calculating the correlation degree of words can not solve the problem of word disambiguation. The traditional computing methods can be divided into two methods: one is to use statistical methods on large-scale corpus, but in real life there is a lack of large enough and accurate data; the other is to calculate the knowledge system based on artificial construction. There are also some problems, such as small scale of artificial construction of knowledge system, high maintenance cost and so on. In the face of the shortcomings of traditional computing methods of word relevance and the need of semantic knowledge in the field of natural language processing, this paper focuses on the research of word relevance calculation and concept related word group mining. The specific contents are as follows: first, based on the processing of Chinese Wikipedia resources, we establish a set of words correlation data set by using the improved WLVM method, and evaluate and analyze the data set. The concept group can be used for semantic representation of the concept, and can also be widely used in other aspects of natural language processing, such as text expansion, knowledge base construction and so on. Second, a method for calculating the relevance of words is proposed. On the basis of analyzing the previous methods of word correlation calculation and comparing the differences between large-scale corpus, artificial knowledge system and Wikipedia, this paper proposes a method to calculate the semantic relevance between words and phrases, which makes comprehensive use of link and classification system. Semantic knowledge such as text resources and anchor text are used to disambiguate the results of correlation calculation. In the experiment, the method proposed in this paper is used to calculate the relevance of words in the text resources, links and classification system respectively, and compared with other methods, the effectiveness of this method is proved.
【學(xué)位授予單位】：華中師范大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2012
【分類號(hào)】：TP391.1

【引證文獻(xiàn)】

相關(guān)碩士學(xué)位論文前2條

1 駱超;基于LDA模型的文檔排序方法研究[D];華中師范大學(xué);2013年

2 劉強(qiáng);面向查詢語句的擴(kuò)展過濾及權(quán)重計(jì)算研究[D];華中師范大學(xué);2013年

，

本文編號(hào)：2154149

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2154149.html

上一篇：規(guī)劃設(shè)計(jì)領(lǐng)域的文檔模型及知識(shí)搜索的研究與實(shí)現(xiàn)
下一篇：Google給圖書館帶來的十大機(jī)遇與挑戰(zhàn)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于中文維基百科的概念相關(guān)詞群研究