天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

面向數(shù)據(jù)空間的異構(gòu)數(shù)據(jù)索引方法研究

發(fā)布時間:2018-07-11 12:16

  本文選題:數(shù)據(jù)空間 + 索引 ; 參考:《哈爾濱工程大學(xué)》2013年碩士論文


【摘要】:當(dāng)前,個人和組織的信息呈現(xiàn)急劇增長趨勢且非結(jié)構(gòu)化數(shù)據(jù)所占比重在不斷的增加,這些屬于某個主體的海量、分布、異構(gòu)和共存的數(shù)據(jù)構(gòu)成了一個數(shù)據(jù)空間,如何為用戶提供高效、便捷和多樣化的搜索查詢服務(wù)是數(shù)據(jù)空間面臨的巨大挑戰(zhàn)。然而,為數(shù)據(jù)空間中異構(gòu)數(shù)據(jù)構(gòu)建高效的索引方法是解決這一問題的基礎(chǔ)。因此,研究數(shù)據(jù)空間中異構(gòu)數(shù)據(jù)索引方法具有重要意義。 數(shù)據(jù)管理研究社區(qū)對索引方法已經(jīng)存在很多的研究。過去,對索引方法的研究通常是基于單一數(shù)據(jù)格式和查詢方式,例如,搜索引擎中的無結(jié)構(gòu)化數(shù)據(jù)格式和關(guān)鍵詞查詢和關(guān)系數(shù)據(jù)庫上的關(guān)系表和SQL查詢。然而,數(shù)據(jù)空間中的數(shù)據(jù)具有多數(shù)據(jù)源、異構(gòu)等特點,它可能包含結(jié)構(gòu)化、半結(jié)構(gòu)化和無結(jié)構(gòu)化等多種數(shù)據(jù)格式,,另外,由于數(shù)據(jù)空間的Pay-as-you-go特性使得需要提供從關(guān)鍵字查詢到結(jié)構(gòu)化查詢等多樣化搜索查詢服務(wù),例如,起初由于抽取信息較弱和數(shù)據(jù)源之間沒有建立語義關(guān)聯(lián),可以只提供基本的關(guān)鍵字搜索服務(wù),隨著時間的推移用戶和系統(tǒng)將會逐漸的建立更多的模式、語義關(guān)聯(lián)信息,系統(tǒng)也將能夠支持更加豐富的查詢方式。因此,與傳統(tǒng)的索引方法不同,數(shù)據(jù)空間中的索引方法需要能夠索引多種格式數(shù)據(jù),同時支持關(guān)鍵字查詢和結(jié)構(gòu)化查詢等多種查詢方式。 通過對現(xiàn)有數(shù)據(jù)模型和查詢分析,本文使用iMeMex數(shù)據(jù)模型作為數(shù)據(jù)空間的數(shù)據(jù)模型且給出了關(guān)鍵字查詢、謂詞查詢和路徑查詢?nèi)N查詢方式的定義,在此基礎(chǔ)上提出了一種新的索引方法來提高對數(shù)據(jù)空間中異構(gòu)數(shù)據(jù)的搜索查詢效率,本文稱之為EIBH混合索引方法。新的索引方法由擴展的倒排列表和兩個輔助索引構(gòu)成,通過擴展倒排列表表的關(guān)鍵字列和鏈表節(jié)點信息索引資源視圖來支持三種查詢和提高查詢處理效率;利用兩個輔助索引來解決索引連接效率低下問題。實驗結(jié)果表明:該索引方法能夠有效、可行的解決數(shù)據(jù)空間中異構(gòu)數(shù)據(jù)索引和查詢效率問題。
[Abstract]:At present, the information of individuals and organizations is increasing rapidly and the proportion of unstructured data is increasing continuously. These data, which belong to a certain entity, are massive, distributed, heterogeneous and coexisting, and constitute a data space. How to provide users with efficient, convenient and diversified search and query services is a huge challenge in data space. However, efficient indexing method for heterogeneous data in data space is the basis to solve this problem. Therefore, it is of great significance to study the index method of heterogeneous data in data space. There has been a lot of research on indexing methods in the data management research community. In the past, indexing methods were usually based on a single data format and query methods, such as unstructured data formats and keyword queries in search engines and relational tables and SQL queries in relational databases. However, the data in the data space has the characteristics of multiple data sources and heterogeneity. It may contain many kinds of data formats, such as structured, semi-structured and unstructured. The Pay-as-you-go feature of the data space makes it necessary to provide a variety of search query services, from keyword queries to structured queries, for example, because of weak extraction information and no semantic association between data sources, It can only provide basic keyword search services. Over time, users and systems will gradually build more patterns, semantic association information, the system will also be able to support more rich query methods. Therefore, unlike traditional indexing methods, indexing methods in data space need to be able to index multiple formats of data, and support a variety of query methods, such as keyword query and structured query. By analyzing the existing data models and queries, this paper uses iMeMex data model as the data model of data space and gives the definitions of keyword query, predicate query and path query. On this basis, a new indexing method is proposed to improve the efficiency of searching and querying heterogeneous data in data space, which is called EIBH mixed index method. The new indexing method is composed of extended inverted list and two auxiliary indexes. It supports three kinds of queries and improves the efficiency of query processing by extending the keyword column and linked list node information index resource view of inverted list table. Two auxiliary indexes are used to solve the problem of low efficiency of index join. Experimental results show that the proposed indexing method is effective and feasible to solve the problem of index and query efficiency of heterogeneous data in data space.
【學(xué)位授予單位】:哈爾濱工程大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.3

【參考文獻】

相關(guān)期刊論文 前3條

1 李保利,陳玉忠,俞士汶;信息抽取研究綜述[J];計算機工程與應(yīng)用;2003年10期

2 劉遷;焦慧;賈惠波;;信息抽取技術(shù)的發(fā)展現(xiàn)狀及構(gòu)建方法的研究[J];計算機應(yīng)用研究;2007年07期

3 李玉坤;孟小峰;張相於;;數(shù)據(jù)空間技術(shù)研究[J];軟件學(xué)報;2008年08期



本文編號:2115169

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2115169.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶fd48f***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
99热中文字幕在线精品| 东京不热免费观看日本| 中国美女草逼一级黄片视频| 国产一区二区三区丝袜不卡| 九九视频通过这里有精品| 视频一区日韩经典中文字幕| 亚洲品质一区二区三区| 又大又长又粗又猛国产精品| 91欧美视频在线观看免费| 欧美日韩一级aa大片| 一区二区三区四区亚洲另类| 午夜福利直播在线视频| 成人精品一区二区三区在线| 在线免费看国产精品黄片| 国产成人一区二区三区久久| 丰满人妻一二区二区三区av| 日本加勒比在线播放一区| 久久精品国产在热亚洲| 国产不卡视频一区在线| 日韩精品人妻少妇一区二区| 午夜精品一区免费视频| 精品少妇人妻av一区二区蜜桃 | 日韩aa一区二区三区| 亚洲国产成人av毛片国产| 黄色污污在线免费观看| 激情丁香激情五月婷婷| 2019年国产最新视频| 在线观看视频国产你懂的| 欧美一区二区三区十区| 丰满少妇被粗大猛烈进出视频| 久久婷婷综合色拍亚洲| av在线免费播放一区二区| 日韩一区欧美二区国产| 99热在线精品视频观看| 麻豆一区二区三区在线免费| 国产精品偷拍视频一区| 免费在线播放一区二区| 精品欧美在线观看国产| 亚洲日本加勒比在线播放| 欧美日韩国产福利在线观看| 精品国产丝袜一区二区|