面向?qū)嶓w查詢(xún)的開(kāi)放式信息抽取技術(shù)研究
發(fā)布時(shí)間:2018-07-09 16:01
本文選題:維基百科 + 實(shí)體抽取。 參考:《北方工業(yè)大學(xué)》2012年碩士論文
【摘要】:查詢(xún)推薦是搜索引擎系統(tǒng)中的一項(xiàng)重要技術(shù),其通過(guò)推薦更合適的查詢(xún)以提高用戶(hù)的搜索體驗(yàn)現(xiàn),現(xiàn)有方法能夠找到直接通過(guò)某種屬性關(guān)聯(lián)的相似查詢(xún),卻忽略了具有間接關(guān)聯(lián)的語(yǔ)義相關(guān)查詢(xún)。 為解決上述問(wèn)題,本文采用開(kāi)放式的知識(shí)庫(kù)維基百科,并以此提出了一種新型的查詢(xún)擴(kuò)展系統(tǒng)。該方法通過(guò)抽取維基百科的部分結(jié)構(gòu)化信息及自然文本信息,形成了以實(shí)體為骨架,以實(shí)體特征和實(shí)體關(guān)系為網(wǎng)絡(luò)的層級(jí)語(yǔ)料庫(kù),基于此語(yǔ)料庫(kù)完成相應(yīng)的用戶(hù)查詢(xún)推薦系統(tǒng),并進(jìn)一步針對(duì)用戶(hù)查詢(xún)未被收錄在維基百科時(shí),設(shè)計(jì)輔助查詢(xún)系統(tǒng)改進(jìn)查詢(xún)推薦效果。 本文主要?jiǎng)?chuàng)新點(diǎn)如下: 提出一種基于隨機(jī)游走模型的查詢(xún)意圖識(shí)別算法RWM。該方法能夠解決一些數(shù)據(jù)稀疏的問(wèn)題,通過(guò)隨機(jī)游走過(guò)程,對(duì)未直接關(guān)聯(lián)的概念進(jìn)行了擴(kuò)展,從而有效的達(dá)到查詢(xún)意圖的識(shí)別。 提出一種共同利用維基百科的結(jié)構(gòu)化知識(shí)和web知識(shí)的稀有查詢(xún)分類(lèi)算法WWRQ,該方法利用搜索引擎得到檢索結(jié)果,通過(guò)從維基百科抽取的特征信息進(jìn)行投票,得到查詢(xún)分類(lèi)。 實(shí)驗(yàn)結(jié)果表明:與傳統(tǒng)的查詢(xún)推薦系統(tǒng)相比,隨機(jī)游走模型的查詢(xún)意圖識(shí)別算法能夠同時(shí)兼顧準(zhǔn)確率和召回率,顯著提高查詢(xún)精度;诰S基百科和web知識(shí)的稀有查詢(xún)算法有效解決了針對(duì)簡(jiǎn)短查詢(xún)無(wú)法準(zhǔn)確定位的問(wèn)題。
[Abstract]:Query recommendation is an important technology in search engine system. By recommending more appropriate queries to improve the user's search experience, the existing methods can find similar queries directly associated with some attributes. The semantic correlation query with indirect association is ignored. In order to solve the above problems, an open knowledge base Wikipedia is adopted and a new query extension system is proposed. By extracting part of structured information and natural text information from Wikipedia, the method forms a hierarchical corpus based on entity skeleton and entity feature and entity relationship. Based on this corpus, the corresponding user query recommendation system is completed. Furthermore, an auxiliary query system is designed to improve the performance of query recommendation when the user query is not included in Wikipedia. The main innovations of this paper are as follows: a query intention recognition algorithm RWM based on random walk model is proposed. This method can solve the problem of sparse data. By random walk process, the concept that is not directly related is extended, so that the identification of query intention can be achieved effectively. This paper proposes a rare query classification algorithm, WWRQ, which uses the structured knowledge of Wikipedia and web knowledge together. The search engine is used to obtain the retrieval results, and the feature information extracted from Wikipedia is used to vote to obtain the query classification. The experimental results show that, compared with the traditional query recommendation system, the search intention recognition algorithm based on random walk model can improve the query accuracy and recall rate simultaneously. The rare query algorithm based on Wikipedia and web effectively solves the problem that short queries can not be located accurately.
【學(xué)位授予單位】:北方工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類(lèi)號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 張海粟;馬大明;鄧智龍;;基于維基百科的語(yǔ)義知識(shí)庫(kù)及其構(gòu)建方法研究[J];計(jì)算機(jī)應(yīng)用研究;2011年08期
2 王錦;王會(huì)珍;張俐;;基于維基百科類(lèi)別的文本特征表示[J];中文信息學(xué)報(bào);2011年02期
,本文編號(hào):2109888
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2109888.html
最近更新
教材專(zhuān)著