面向圖數(shù)據(jù)的Top-k檢索算法研究

發(fā)布時間：2018-08-20 15:44

【摘要】：近年來隨著社交網(wǎng)絡(luò)、知識圖譜等應(yīng)用的飛速發(fā)展,圖數(shù)據(jù)大量出現(xiàn)在學(xué)術(shù)界和工業(yè)界,如何有效管理圖數(shù)據(jù)并從中挖掘有價值的信息已經(jīng)成為當(dāng)前數(shù)據(jù)管理領(lǐng)域的研究熱點(diǎn)。其中面向圖數(shù)據(jù)的Top-k檢索問題廣泛存在于在各類應(yīng)用中,旨在從圖中找出與查詢最相關(guān)的k個匹配結(jié)果,是圖數(shù)據(jù)管理的一個重要研究問題。針對當(dāng)前圖數(shù)據(jù)的數(shù)據(jù)規(guī)模大、節(jié)點(diǎn)和邊的種類多、內(nèi)容動態(tài)變化等特點(diǎn),本研究重點(diǎn)關(guān)注面向圖數(shù)據(jù)的Top-k檢索中三個關(guān)鍵問題:分布式環(huán)境的Top-k子圖檢索、多數(shù)據(jù)源的Top-k子圖檢索和動態(tài)數(shù)據(jù)的持續(xù)相關(guān)Top-k檢索,歸納總結(jié)已有工作的優(yōu)點(diǎn)和不足,提出相應(yīng)改進(jìn)算法。本研究的創(chuàng)新點(diǎn)主要有:·提出基于星形查詢拆分的分布式Top-k子圖檢索算法。算法基于星形子圖拆分策略,包含對原始查詢進(jìn)行子查詢拆分、檢索子查詢匹配結(jié)果、合并子查詢匹配結(jié)果三個主要步驟。本研究基于算法停止條件的相關(guān)特點(diǎn)對算法子查詢執(zhí)行和合并子查詢匹配結(jié)果過程進(jìn)行了三方面的優(yōu)化,進(jìn)一步降低了算法的跨節(jié)點(diǎn)數(shù)據(jù)傳輸開銷。實(shí)驗(yàn)結(jié)果表明:算法具有很好的可擴(kuò)展性,優(yōu)化后算法執(zhí)行時間比已有方法平均低30%-40%。·提出面向多數(shù)據(jù)源Top-k子圖檢索的排序連接算法。算法包含構(gòu)造來自不同數(shù)據(jù)源的匹配結(jié)果列表、連接結(jié)果列表并計(jì)算所得完整結(jié)果的分?jǐn)?shù)和判斷算法停止條件是否滿足三個主要步驟。針對已有方法計(jì)算尚未訪問結(jié)果得分上界松弛的問題,本研究提出優(yōu)化的匹配結(jié)果列表排序函數(shù)和基于最小下降分?jǐn)?shù)的結(jié)果列表讀取策略,并從理論上證明了排序函數(shù)最優(yōu)參數(shù)配置應(yīng)該滿足的條件,以及優(yōu)化讀取策略產(chǎn)生的結(jié)果訪問順序最優(yōu)。實(shí)驗(yàn)結(jié)果表明:本研究提出的排序連接算法比已有工作開銷低50%到70%�！ぬ岢龌趧討B(tài)區(qū)間窗口劃分的持續(xù)相關(guān)Top-k檢索算法。算法基于從時間維度對問題空間進(jìn)行單位時間區(qū)間劃分的思想,在每個區(qū)間執(zhí)行Top-k檢索,再歸并計(jì)算持續(xù)相關(guān)Top-k結(jié)果。針對單位區(qū)間劃分可擴(kuò)展性差的問題,本研究進(jìn)一步提出基于動態(tài)區(qū)間窗口劃分的算法,檢索過程中動態(tài)維護(hù)時間窗口,避免不必要區(qū)間劃分,減少了子問題的數(shù)量,同時合并不同區(qū)間重復(fù)的計(jì)算子過程,提升了算法的空間和時間效率。實(shí)驗(yàn)結(jié)果表明:基于動態(tài)區(qū)間窗口劃分算法在消耗更少內(nèi)存的情況下,執(zhí)行時間比已有方法低50%以上。
[Abstract]:In recent years, with the rapid development of social network, knowledge map and other applications, graph data appear in academia and industry. How to effectively manage graph data and mine valuable information has become a research hotspot in the field of data management. The problem of graph data oriented Top-k retrieval is widely used in all kinds of applications. It is an important research problem in graph data management to find k matching results that are most relevant to query from graph. In view of the characteristics of large scale of graph data, variety of nodes and edges and dynamic change of content, this paper focuses on three key problems in Top-k retrieval for graph data: Top-k subgraph retrieval in distributed environment. The Top-k subgraph retrieval of multiple data sources and the continuous correlated Top-k retrieval of dynamic data are summarized, and the advantages and disadvantages of existing work are summarized, and the corresponding improved algorithm is proposed. The main innovations of this study are as follows: a distributed Top-k subgraph retrieval algorithm based on star query splitting is proposed. The algorithm is based on star subgraph splitting strategy, which consists of three main steps: subquery splitting, subquery matching, merging subquery matching. Based on the characteristics of the algorithm stopping condition, this paper optimizes the performance of the algorithm subquery and the matching process of the merged sub-query, and further reduces the cross-node data transmission overhead of the algorithm. The experimental results show that the algorithm has good scalability and the execution time of the optimized algorithm is 30 to 40 less than that of the existing methods. A sorting join algorithm for multi-data source Top-k subgraph retrieval is proposed. The algorithm consists of constructing a list of matching results from different data sources, connecting the result list, calculating the score of the resulting complete result and judging whether the stopping condition of the algorithm meets the three main steps. To solve the problem of calculating the upper bound relaxation of unvisited results by existing methods, this paper proposes an optimized matching result list sorting function and a result list reading strategy based on the minimum descent score. It is proved theoretically that the optimal parameter configuration of the sort function should satisfy the conditions and the optimal access order of the results generated by the optimized reading strategy. The experimental results show that the proposed sorting connection algorithm is 50% to 70% lower than the existing overhead, and a continuous correlation Top-k retrieval algorithm based on dynamic interval window partition is proposed. Based on the idea of dividing the problem space into unit time intervals from the time dimension, the algorithm performs Top-k retrieval in each interval, and then merges and calculates the sustained correlation Top-k results. In order to solve the problem of poor scalability of unit interval partitioning, this paper proposes an algorithm based on dynamic interval window partitioning, which maintains the time window dynamically in the retrieval process, avoids unnecessary interval partitioning, and reduces the number of sub-problems. At the same time, the computation subprocesses with different interval repetition are combined to improve the space and time efficiency of the algorithm. The experimental results show that the execution time of the algorithm based on dynamic interval window partition is more than 50% lower than that of the existing methods under the condition of less memory consumption.
【學(xué)位授予單位】：清華大學(xué)
【學(xué)位級別】：博士
【學(xué)位授予年份】：2016
【分類號】：TP391.3

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 秦天增,王紅玉;二分檢索算法改進(jìn)初探[J];運(yùn)城高等�？茖W(xué)校學(xué)報;2002年03期

2 姜敏;張?jiān)?董道毅;Tzyh-Jong Tarn;;量子二分檢索算法及其實(shí)現(xiàn)線路[J];光電子.激光;2007年08期

3 郭立新;吳，

本文編號：2194158

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/shoufeilunwen/xxkjbs/2194158.html

上一篇：面向云計(jì)算的多維數(shù)據(jù)索引研究
下一篇：基于側(cè)邊拋磨光纖的表面等離子體共振折射率傳感器

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向圖數(shù)據(jù)的Top-k檢索算法研究