天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

面向數(shù)據(jù)空間的異構(gòu)數(shù)據(jù)索引方法研究

發(fā)布時(shí)間:2018-07-11 12:16

  本文選題:數(shù)據(jù)空間 + 索引; 參考:《哈爾濱工程大學(xué)》2013年碩士論文


【摘要】:當(dāng)前,個(gè)人和組織的信息呈現(xiàn)急劇增長趨勢(shì)且非結(jié)構(gòu)化數(shù)據(jù)所占比重在不斷的增加,這些屬于某個(gè)主體的海量、分布、異構(gòu)和共存的數(shù)據(jù)構(gòu)成了一個(gè)數(shù)據(jù)空間,如何為用戶提供高效、便捷和多樣化的搜索查詢服務(wù)是數(shù)據(jù)空間面臨的巨大挑戰(zhàn)。然而,為數(shù)據(jù)空間中異構(gòu)數(shù)據(jù)構(gòu)建高效的索引方法是解決這一問題的基礎(chǔ)。因此,研究數(shù)據(jù)空間中異構(gòu)數(shù)據(jù)索引方法具有重要意義。 數(shù)據(jù)管理研究社區(qū)對(duì)索引方法已經(jīng)存在很多的研究。過去,對(duì)索引方法的研究通常是基于單一數(shù)據(jù)格式和查詢方式,例如,搜索引擎中的無結(jié)構(gòu)化數(shù)據(jù)格式和關(guān)鍵詞查詢和關(guān)系數(shù)據(jù)庫上的關(guān)系表和SQL查詢。然而,數(shù)據(jù)空間中的數(shù)據(jù)具有多數(shù)據(jù)源、異構(gòu)等特點(diǎn),它可能包含結(jié)構(gòu)化、半結(jié)構(gòu)化和無結(jié)構(gòu)化等多種數(shù)據(jù)格式,,另外,由于數(shù)據(jù)空間的Pay-as-you-go特性使得需要提供從關(guān)鍵字查詢到結(jié)構(gòu)化查詢等多樣化搜索查詢服務(wù),例如,起初由于抽取信息較弱和數(shù)據(jù)源之間沒有建立語義關(guān)聯(lián),可以只提供基本的關(guān)鍵字搜索服務(wù),隨著時(shí)間的推移用戶和系統(tǒng)將會(huì)逐漸的建立更多的模式、語義關(guān)聯(lián)信息,系統(tǒng)也將能夠支持更加豐富的查詢方式。因此,與傳統(tǒng)的索引方法不同,數(shù)據(jù)空間中的索引方法需要能夠索引多種格式數(shù)據(jù),同時(shí)支持關(guān)鍵字查詢和結(jié)構(gòu)化查詢等多種查詢方式。 通過對(duì)現(xiàn)有數(shù)據(jù)模型和查詢分析,本文使用iMeMex數(shù)據(jù)模型作為數(shù)據(jù)空間的數(shù)據(jù)模型且給出了關(guān)鍵字查詢、謂詞查詢和路徑查詢?nèi)N查詢方式的定義,在此基礎(chǔ)上提出了一種新的索引方法來提高對(duì)數(shù)據(jù)空間中異構(gòu)數(shù)據(jù)的搜索查詢效率,本文稱之為EIBH混合索引方法。新的索引方法由擴(kuò)展的倒排列表和兩個(gè)輔助索引構(gòu)成,通過擴(kuò)展倒排列表表的關(guān)鍵字列和鏈表節(jié)點(diǎn)信息索引資源視圖來支持三種查詢和提高查詢處理效率;利用兩個(gè)輔助索引來解決索引連接效率低下問題。實(shí)驗(yàn)結(jié)果表明:該索引方法能夠有效、可行的解決數(shù)據(jù)空間中異構(gòu)數(shù)據(jù)索引和查詢效率問題。
[Abstract]:At present, the information of individuals and organizations is increasing rapidly and the proportion of unstructured data is increasing continuously. These data, which belong to a certain entity, are massive, distributed, heterogeneous and coexisting, and constitute a data space. How to provide users with efficient, convenient and diversified search and query services is a huge challenge in data space. However, efficient indexing method for heterogeneous data in data space is the basis to solve this problem. Therefore, it is of great significance to study the index method of heterogeneous data in data space. There has been a lot of research on indexing methods in the data management research community. In the past, indexing methods were usually based on a single data format and query methods, such as unstructured data formats and keyword queries in search engines and relational tables and SQL queries in relational databases. However, the data in the data space has the characteristics of multiple data sources and heterogeneity. It may contain many kinds of data formats, such as structured, semi-structured and unstructured. The Pay-as-you-go feature of the data space makes it necessary to provide a variety of search query services, from keyword queries to structured queries, for example, because of weak extraction information and no semantic association between data sources, It can only provide basic keyword search services. Over time, users and systems will gradually build more patterns, semantic association information, the system will also be able to support more rich query methods. Therefore, unlike traditional indexing methods, indexing methods in data space need to be able to index multiple formats of data, and support a variety of query methods, such as keyword query and structured query. By analyzing the existing data models and queries, this paper uses iMeMex data model as the data model of data space and gives the definitions of keyword query, predicate query and path query. On this basis, a new indexing method is proposed to improve the efficiency of searching and querying heterogeneous data in data space, which is called EIBH mixed index method. The new indexing method is composed of extended inverted list and two auxiliary indexes. It supports three kinds of queries and improves the efficiency of query processing by extending the keyword column and linked list node information index resource view of inverted list table. Two auxiliary indexes are used to solve the problem of low efficiency of index join. Experimental results show that the proposed indexing method is effective and feasible to solve the problem of index and query efficiency of heterogeneous data in data space.
【學(xué)位授予單位】:哈爾濱工程大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前3條

1 李保利,陳玉忠,俞士汶;信息抽取研究綜述[J];計(jì)算機(jī)工程與應(yīng)用;2003年10期

2 劉遷;焦慧;賈惠波;;信息抽取技術(shù)的發(fā)展現(xiàn)狀及構(gòu)建方法的研究[J];計(jì)算機(jī)應(yīng)用研究;2007年07期

3 李玉坤;孟小峰;張相於;;數(shù)據(jù)空間技術(shù)研究[J];軟件學(xué)報(bào);2008年08期



本文編號(hào):2115169

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2115169.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶fd48f***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com