WAF改進(jìn)算法在基于語義分析的查詢擴(kuò)展上的應(yīng)用
發(fā)布時(shí)間:2018-04-27 00:14
本文選題:查詢擴(kuò)展 + 詞激活力。 參考:《北京郵電大學(xué)》2012年碩士論文
【摘要】:查詢擴(kuò)展是信息檢索中的一項(xiàng)重要技術(shù),是輔助用戶更好使用搜索引擎的有效手段。但是,隨著互聯(lián)網(wǎng)信息的復(fù)雜化和多遠(yuǎn)化,尤其是微博、微信等社交方式高速發(fā)展,傳統(tǒng)的查詢擴(kuò)展算法由于忽略了文檔中詞間的語義關(guān)系,已無法在不規(guī)范的短文本上推薦出有效的關(guān)鍵詞。傳統(tǒng)檢索模型的詞獨(dú)立性假設(shè)和短文本的信息缺失,導(dǎo)致現(xiàn)有查詢擴(kuò)展算法無法獲取足夠的語義信息,進(jìn)入無法解決用戶檢索時(shí)普遍存在的同義詞和多義詞問題。 本文針對(duì)以上問題對(duì)經(jīng)典的信息檢索模型和查詢擴(kuò)展方法展開了深入調(diào)研,分析得出引發(fā)查詢擴(kuò)展問題的根本原因在于缺少行之有效的語義分析,本文創(chuàng)造性地提出將詞激活力算法WAF應(yīng)用在基于話題的查詢擴(kuò)展中,意在通過精準(zhǔn)的語義分析手段為查詢擴(kuò)展的提高尋找突破口。 本文通過對(duì)WAF理論的深入學(xué)習(xí),提出一種全新的基于WAF的查詢擴(kuò)展算法,主要工作如下: 第一,通過WAF與傳統(tǒng)詞關(guān)聯(lián)算法在微博語料上的大量對(duì)比實(shí)驗(yàn),證明了WAF在語義分析和詞網(wǎng)建模上的巨大優(yōu)勢,尤其是話題核心詞的擴(kuò)展和高價(jià)值詞的挖掘。 第二,針對(duì)短文本的不規(guī)范性和信息缺失,本文通過調(diào)整WAF中詞激活力的計(jì)算方式,使其充分利用短文本特點(diǎn),弱化噪聲特征對(duì)于核心語義分析的影響。為了提高WAF的詞擴(kuò)展質(zhì)量,本文提出在詞網(wǎng)模型的基礎(chǔ)上,通過詞親和度的整體分布對(duì)關(guān)聯(lián)詞列表的排序進(jìn)行調(diào)整。 第三,本文將WAF的語義分析和話題聚類相結(jié)合,設(shè)計(jì)出一種較為完備的查詢擴(kuò)展算法,并且嵌入到微博監(jiān)控項(xiàng)目的整體框架中,應(yīng)用在微博語料的檢索上。經(jīng)過與基于BM25權(quán)重機(jī)制的查詢擴(kuò)展的對(duì)比實(shí)驗(yàn),證明了WAF生成的詞網(wǎng)模型在查詢擴(kuò)展中的巨大潛力。
[Abstract]:Query expansion is an important technology in information retrieval and an effective means to assist users to use search engine better. However, with the complexity and remoteness of Internet information, especially the rapid development of Weibo, WeChat and other social methods, traditional query expansion algorithms ignore the semantic relationship between words in the document. It is no longer possible to recommend valid keywords on an irregular essay. Because of the assumption of word independence in traditional retrieval model and the lack of information in short text, the existing query expansion algorithms can not obtain enough semantic information and can not solve the problem of synonyms and polysemous words commonly existing in user retrieval. In this paper, the classical information retrieval model and query expansion method are investigated, and the basic reason of the query expansion problem is the lack of effective semantic analysis. This paper creatively proposes to apply the word activation algorithm (WAF) to the topic based query expansion in order to find a breakthrough for the improvement of query expansion by means of precise semantic analysis. In this paper, a new query extension algorithm based on WAF is proposed through the in-depth study of WAF theory. The main work is as follows: First, through a large number of comparative experiments between WAF and traditional word association algorithm in Weibo corpus, it is proved that WAF has great advantages in semantic analysis and word net modeling, especially the expansion of topic core words and the mining of high-value words. Secondly, in view of the lack of information and the irregularity of short text, this paper adjusts the calculation method of word activation force in WAF to make full use of the feature of short text, and weakens the influence of noise feature on core semantic analysis. In order to improve the word extension quality of WAF, this paper proposes to adjust the ranking of associated words through the global distribution of word affinity on the basis of word net model. Thirdly, this paper combines the semantic analysis of WAF and topic clustering to design a more complete query expansion algorithm, and embed it into the overall framework of Weibo monitoring project, which is applied to the retrieving of Weibo corpus. By comparing with the query expansion based on BM25 weight mechanism, it is proved that the word net model generated by WAF has great potential in query expansion.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 胡佳妮,徐蔚然,郭軍,鄧偉洪;中文文本分類中的特征選擇算法研究[J];光通信研究;2005年03期
2 林鴻飛,楊元生;用戶興趣模型的表示和更新機(jī)制[J];計(jì)算機(jī)研究與發(fā)展;2002年07期
相關(guān)碩士學(xué)位論文 前2條
1 楊海南;基于語義詞典和局部分析的查詢擴(kuò)展研究[D];武漢理工大學(xué);2010年
2 趙欣;基于雙語命名實(shí)體識(shí)別的詞匯對(duì)齊和機(jī)器翻譯研究[D];廈門大學(xué);2009年
,本文編號(hào):1808313
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1808313.html
最近更新
教材專著