基于多表數(shù)據(jù)庫的中文關鍵詞Top-N查詢處理
發(fā)布時間:2018-05-28 20:24
本文選題:關系數(shù)據(jù)庫 + 中文關鍵詞; 參考:《河北大學》2013年碩士論文
【摘要】:關鍵詞查詢的理論和技術在信息檢索和Web搜索引擎中得到了廣泛深入的研究和應用。傳統(tǒng)數(shù)據(jù)庫管理系統(tǒng)僅支持模式匹配,不支持自由形態(tài)的關鍵詞查詢。鑒于此,近年來關系數(shù)據(jù)庫上的關鍵詞查詢處理的研究成為備受關注的前沿課題之一。傳統(tǒng)關系數(shù)據(jù)庫系統(tǒng)運用結構化查詢語言(SQL)對數(shù)據(jù)庫進行操作,需要用戶掌握SQL和數(shù)據(jù)庫模式,這對于普通用戶是困難的。此外,對返回的查詢結果,傳統(tǒng)數(shù)據(jù)庫系統(tǒng)只能進行簡單排序,用戶要想從中獲取最感興趣的信息是很困難的。目前,關鍵詞查詢的研究主要針對英文關鍵詞,因此針對具有多表的數(shù)據(jù)庫,本文給出一種中文關鍵詞top-N查詢處理方法。此方法創(chuàng)建索引表存儲從數(shù)據(jù)庫中析出的中文元組字及其相關信息,進而構造索引用以快速匹配查詢關鍵字,借鑒IR的相似度公式構造適合中文關鍵詞查詢的排序策略。對于一個中文關鍵詞查詢,利用索引快速匹配查詢字和元組字得到相應信息,,并根據(jù)這些信息創(chuàng)建候選元組生成鏈表和SQL查詢語句,進而得到候選元組及其與查詢之間的相似度,最終按相似度返回Top-N結果。此方法實現(xiàn)了按字搜索及中文的縮略詞的查詢處理。最后利用真實數(shù)據(jù)集進行實驗,實驗內(nèi)容包括對查詢相應時間和準確性的驗證,實驗數(shù)據(jù)顯示本文方法是有效的。
[Abstract]:The theory and technology of keyword query have been widely studied and applied in information retrieval and Web search engine. Traditional database management system only supports pattern matching, not free form keyword query. In view of this, the research of keyword query processing on relational database has become one of the most concerned topics in recent years. The traditional relational database system uses structured query language SQL) to operate the database, which requires users to master SQL and database schema, which is difficult for ordinary users. In addition, the traditional database system can only sort the returned query results simply, so it is difficult for users to obtain the most interesting information from them. At present, the research of keyword query is mainly focused on English keywords, so for the database with multiple tables, this paper presents a method of Chinese keyword top-N query processing. In this method, the index table is created to store the Chinese tuples and related information extracted from the database, and then the index is constructed to match the query keywords quickly, and the ranking strategy suitable for the Chinese keyword query is constructed by using the similarity formula of IR. For a Chinese keyword query, the index is used to quickly match the query word and the tuple word to get the corresponding information. According to this information, the candidate tuples are created to generate the linked list and the SQL query statement. Then the candidate tuples and their similarity with the query are obtained, and the Top-N results are returned according to the similarity. This method realizes word search and Chinese acronym query processing. Finally, the real data set is used to carry out the experiment, which includes the verification of the time and accuracy of the query, and the experimental data show that the method in this paper is effective.
【學位授予單位】:河北大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP311.13;TP391.1
【參考文獻】
相關期刊論文 前5條
1 郗君甫;劉國華;唐軍軍;祁瑞麗;朱鶴;;基于本體的關系數(shù)據(jù)庫關鍵詞語義查詢擴展方法[J];燕山大學學報;2010年03期
2 馬志柔;葉屹;;一種有效的多關鍵詞詞頻統(tǒng)計方法[J];計算機工程;2006年10期
3 柳佳剛;陳山;;基于PAT-tree的中文關鍵詞自動檢索模式的研究[J];計算技術與自動化;2009年02期
4 黎方正;謝東;;基于完全化語義的關鍵詞檢索研究[J];計算機應用研究;2010年10期
5 王珊;張俊;彭朝暉;戰(zhàn)疆;杜小勇;;基于本體的關系數(shù)據(jù)庫語義檢索[J];計算機科學與探索;2007年01期
本文編號:1948054
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1948054.html
最近更新
教材專著