基于本體和用戶日志的查詢擴展研究

發(fā)布時間：2018-05-03 19:12

本文選題：本體 + 查詢擴展��；參考：《湖南大學(xué)》2013年碩士論文

【摘要】：隨著因特網(wǎng)信息的爆炸式增長，用戶如何從大量的信息中獲取自己真正想要的信息變得越來越棘手。搜索引擎在一定程度上解決了用戶查找有用信息的問題。但用戶在使用搜索引擎時往往無法準(zhǔn)確表達(dá)自己的查詢意圖，經(jīng)常出現(xiàn)查詢詞使用不當(dāng)或者查詢詞過短等問題導(dǎo)致搜索引擎查全率和查準(zhǔn)率低下，無法返回有用信息。對用戶查詢進行擴展變得十分迫切。查詢擴展技術(shù)經(jīng)歷了幾十年的發(fā)展，國內(nèi)外的研究人員已提出多種查詢擴展方法。然而這些常見方法在進行擴展時往往不能從語義層面理解用戶輸入，且因其擴展詞的來源具有不確定性，容易加入查詢無關(guān)詞，造成“查詢漂移”問題。本文結(jié)合領(lǐng)域本體和用戶查詢?nèi)罩咎岢鲆环N基于本體和用戶日志的查詢擴展算法。利用領(lǐng)域本體從語義層面擴展用戶查詢形成初始擴展概念集，結(jié)合用戶查詢?nèi)罩纠迷~共現(xiàn)分析對初始擴展概念集進行二次篩選。主要內(nèi)容如下： (1)闡述了課題的研究背景與意義，分析了當(dāng)前查詢擴展技術(shù)的研究進展與存在的不足、對課題相關(guān)的背景知識和相關(guān)理論作了介紹，為后文研究工作的開展奠定了理論基礎(chǔ)。 (2)提出了一種基于本體的概念語義相似度計算公式，對候選擴展詞進行語義相似度計算，從語義層面對用戶查詢進行擴展。 (3)提出了一種基于用戶日志的詞共現(xiàn)計算公式，，對初始擴展詞進行詞共現(xiàn)計算，以計算結(jié)果作為擴展詞的詞共現(xiàn)權(quán)值，結(jié)合擴展詞的語義相似度權(quán)值和詞共現(xiàn)權(quán)值進行二次篩選，從而避免初始擴展易出現(xiàn)的“查詢漂移”問題。 (4)根據(jù)本文提出的基于本體和用戶日志的查詢擴展算法，結(jié)合國產(chǎn)軟硬件售后服務(wù)跟蹤系統(tǒng)的查詢需求設(shè)計并實現(xiàn)了一個原型系統(tǒng)。介紹了系統(tǒng)的整體框架及各個組成模塊。最后在該系統(tǒng)上進行了對比實驗測試。實驗結(jié)果表明，與傳統(tǒng)的查詢擴展方法相比較，本文方法在保障良好魯棒性的同時，有效地提高了檢索準(zhǔn)確率。
[Abstract]:With the explosive growth of Internet information, it becomes more and more difficult for users to obtain the information they really want from a large amount of information. Search engine solves the problem of searching useful information to some extent. However, when users use search engines, they often can not express their query intention accurately. Problems such as improper use of query words or too short query words often lead to low recall and precision of search engines, which can not return useful information. It is urgent to extend user queries. Query extension technology has experienced decades of development, researchers at home and abroad have proposed a variety of query expansion methods. However, these common methods are often unable to understand user input from the semantic level, and because of the uncertainty of the source of the extension words, it is easy to add query independent words, resulting in the problem of "query drift". This paper presents an extended query algorithm based on domain ontology and user log. Domain ontology is used to extend user query from semantic level to form initial extended concept set. Combined with user query log, the initial extended concept set is filtered twice by word cooccurrence analysis. The main contents are as follows: 1) the research background and significance of the subject are expounded, the research progress and shortcomings of the current query extension technology are analyzed, and the related background knowledge and related theories are introduced, which lays a theoretical foundation for the later research work. (2) an ontology-based formula for calculating semantic similarity of concepts is proposed to calculate the semantic similarity of candidate extension words and to extend user queries from the semantic level. In this paper, a formula of word co-occurrence calculation based on user log is proposed, and the result is used as the word co-occurrence weight of the extended word. Combining the semantic similarity weights and co-occurrence weights of extended words, the problem of "query drift" which is easy to occur in initial extension can be avoided. 4) according to the query expansion algorithm based on ontology and user log proposed in this paper, a prototype system is designed and implemented according to the query requirements of domestic hardware and software after-sales service tracking system. The whole frame and each component module of the system are introduced. Finally, a comparative experiment was carried out on the system. The experimental results show that compared with the traditional query expansion method, this method not only guarantees good robustness, but also effectively improves the retrieval accuracy.
【學(xué)位授予單位】：湖南大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2013
【分類號】：TP391.1

【參考文獻】

相關(guān)期刊論文前10條

1 袁里馳;;一種基于互信息的詞聚類算法[J];系統(tǒng)工程;2008年05期

2 王建勇,單松巍,雷鳴,謝正茂,李曉明;海量Web搜索引擎系統(tǒng)中用戶行為的分布特征及其啟示[J];中國科學(xué)E輯:技術(shù)科學(xué);2001年04期

3 張超盟;李戰(zhàn)懷;溫宗臣;;局部上下文分析剪枝概念樹的查詢擴展[J];計算機工程;2009年14期

4 趙偉,戴新宇,尹存燕,陳家駿;一種規(guī)則與統(tǒng)計相結(jié)合的漢語分詞方法[J];計算機應(yīng)用研究;2004年03期

5 黃名選;嚴(yán)小衛(wèi);張師超;;查詢擴展技術(shù)進展與展望[J];計算機應(yīng)用與軟件;2007年11期

6 余慧佳;劉奕群;張敏;茹立云;馬少平;;基于大規(guī)模日志分析的搜索引擎用戶行為分析[J];中文信息學(xué)報;2007年01期

7 陳

本文編號：1839733

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1839733.html

上一篇：試析當(dāng)代西方社會思潮網(wǎng)絡(luò)傳播的主要方式
下一篇：腦功能磁共振成像及其處理分析技術(shù)對神經(jīng)科學(xué)研究的價值

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于本體和用戶日志的查詢擴展研究