面向不確定數(shù)據(jù)流的Top-k查詢處理

發(fā)布時間：2019-03-16 10:13

【摘要】：不確定數(shù)據(jù)廣泛存在于信息社會的各個領(lǐng)域之中,包括金融、軍事、位置服務、醫(yī)療以及氣象等。隨著移動互聯(lián)網(wǎng)的快速普及以及新型數(shù)據(jù)采集技術(shù)的不斷問世,不確定數(shù)據(jù)的規(guī)模急遽增長。因此,不確定數(shù)據(jù)管理技術(shù)受到了學術(shù)界與工業(yè)界研究人員的共同關(guān)注。數(shù)據(jù)不確定性出現(xiàn)在關(guān)系數(shù)據(jù)、半結(jié)構(gòu)化數(shù)據(jù)、數(shù)據(jù)流以及多維數(shù)據(jù)之中。本文研究如何解決不確定數(shù)據(jù)流的Top-k查詢處理。不確定數(shù)據(jù)流是一個高速到達的海量不確定數(shù)據(jù)元組序列,主要處理的難點有:(1)數(shù)據(jù)流到達速率極快,必須及時進行處理;(2)數(shù)據(jù)規(guī)模潛在無限,往往無法將全部數(shù)據(jù)存放在內(nèi)存之中;(3)由于概率的存在,需要設(shè)計高效的優(yōu)化算法,來降低計算成本。目前,雖然學術(shù)界已經(jīng)積累了眾多的研究成果,但現(xiàn)有方法在應對具體場景時仍存在局限性,因此亟需開發(fā)新型不確定數(shù)據(jù)流管理技術(shù)。本文提出了一種新型的不確定數(shù)據(jù)流近似查詢算法,可以處理不確定數(shù)據(jù)流的ER-Topk與TTk查詢問題。此外,為了實現(xiàn)數(shù)據(jù)流吞吐與查詢響應的雙重性能提升,我們設(shè)計出了一套通用的不確定數(shù)據(jù)流的查詢處理框架。本文的工作主要包括以下幾個方面:海量數(shù)據(jù)流近似查詢算法解決了目前不確定數(shù)據(jù)流在處理ER-Topk與TTk查詢時所遇到的存儲空間消耗過大的問題。該算法可以有效地對到達的不確定數(shù)據(jù)流進行過濾處理,在控制數(shù)據(jù)精度的情況下減少數(shù)據(jù)處理壓力,提升系統(tǒng)的整體性能。實時不確定數(shù)據(jù)流處理框架基于近似算法提出一種針對于處理ER-Topk與TTk的數(shù)據(jù)流批處理框架�？蚣懿捎貌⑿刑幚砑夹g(shù)以實現(xiàn)對不斷快速到達數(shù)據(jù)的高吞吐處理。數(shù)據(jù)流誤差檢測不確定數(shù)據(jù)流往往由于各種因素的影響而存在錯誤信息。為了避免錯誤數(shù)據(jù)對查詢結(jié)果產(chǎn)生嚴重影響,本文提出了一種錯誤數(shù)據(jù)檢測方法,通過對數(shù)據(jù)特征的分析實現(xiàn)異常判斷�？蚣艿挠行则炞C本文提出的近似算法與框架旨在解決不確定數(shù)據(jù)流上的ER-Topk與TTk查詢。為了驗證算法與框架的數(shù)據(jù)吞吐能力、可靠性以及查詢響應速率,本文通過設(shè)計不同的實驗策略,結(jié)合模擬數(shù)據(jù)與真實數(shù)據(jù)來檢測算法與框架的真實表現(xiàn)。
[Abstract]:Uncertain data exist widely in all fields of the information society, including finance, military, location services, medical care, meteorology and so on. With the rapid popularization of mobile Internet and the advent of new data acquisition technology, the scale of uncertain data increases rapidly. Therefore, uncertain data management technology has been concerned by researchers both in academia and industry. Data uncertainty occurs in relational data, semi-structured data, data streams, and multidimensional data. In this paper, we study how to solve the Top-k query processing of uncertain data streams. Uncertain data flow is a large number of uncertain data tuples which arrive at a high speed. The main difficulties of data flow processing are: (1) the arrival rate of data stream is very fast and must be processed in time; (2) the scale of data is potentially infinite and it is often impossible to store all the data in memory; (3) because of the existence of probability, it is necessary to design an efficient optimization algorithm to reduce the computation cost. At present, although the academic circles have accumulated a lot of research results, the existing methods still have limitations in dealing with specific scenarios, so it is urgent to develop a new type of uncertain data flow management technology. In this paper, a new approximate query algorithm for uncertain data streams is proposed, which can deal with the ER-Topk and TTk queries of uncertain data streams. In addition, in order to improve the performance of data stream throughput and query response, we design a general query processing framework for uncertain data streams. The work of this paper mainly includes the following aspects: the approximate query algorithm for massive data streams solves the problem that the uncertain data streams consume too much storage space when dealing with ER-Topk and TTk queries. The algorithm can filter the uncertain data flow effectively, reduce the pressure of data processing and improve the overall performance of the system under the condition of controlling the data precision. A real-time uncertain data stream processing framework based on approximate algorithm is proposed to deal with ER-Topk and TTk data stream batch processing framework. Parallel processing technology is used in the framework to realize high throughput processing of fast reaching data. Data flow error detection uncertainty data flow is often due to the influence of various factors and there are error messages. In order to avoid the serious influence of the error data on the query result, this paper proposes a method of error data detection, which realizes abnormal judgment by analyzing the characteristics of the data. The validity of the framework validates the approximate algorithm and framework proposed in this paper to solve the ER-Topk and TTk queries on uncertain data streams. In order to verify the data throughput, reliability and query response rate of the algorithm and the framework, this paper designs different experimental strategies to detect the real performance of the algorithm and the framework by combining the simulated data and the real data.
【學位授予單位】：華東師范大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：TP311.13

【參考文獻】

相關(guān)期刊論文前4條

1 肖丹萍;葉東毅;;基于免疫原理的不確定數(shù)據(jù)流聚類算法[J];模式識別與人工智能;2012年05期

2 李文鳳;彭智勇;李德毅;;不確定性Top-K查詢處理[J];軟件學報;2012年06期

3 張晨;金澈清;周傲英;;一種不確定數(shù)據(jù)流聚類算法[J];軟件學報;2010年09期

4 周傲英;金澈清;王國仁;李建中;;不確定性數(shù)據(jù)管理技術(shù)研究綜述[J];計算機學報;2009年01期

相關(guān)博士學位論文前2條

1 侯東風;流式數(shù)據(jù)多維建模與查詢關(guān)鍵技術(shù)研究[D];國防科學技術(shù)大學;2010年

2 劉青寶;模糊、動態(tài)多維數(shù)據(jù)建模理論與方法研究[D];國防科學技術(shù)大學;2006年

，

本文編號：2441163

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/shoufeilunwen/xixikjs/2441163.html

上一篇：航站樓非平穩(wěn)碳濃度調(diào)節(jié)系統(tǒng)研究與設(shè)計
下一篇：基于考場監(jiān)控視頻的智能監(jiān)考方法研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向不確定數(shù)據(jù)流的Top-k查詢處理