天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

基于模糊查詢的大數(shù)據(jù)分析處理系統(tǒng)的研究與實現(xiàn)

發(fā)布時間:2018-05-07 06:54

  本文選題:在線聚集 + 樣本; 參考:《浙江大學(xué)》2017年碩士論文


【摘要】:隨著大數(shù)據(jù)分析技術(shù)的日漸成熟,大數(shù)據(jù)所蘊含的巨大價值已經(jīng)越來越被重視。由于數(shù)據(jù)量巨大,對大數(shù)據(jù)進行分析一般是很耗費時間的。然而,在很多情況下,用戶并不需要精確的查詢結(jié)果,數(shù)據(jù)大概的輪廓就可以滿足大部分的分析需求。本文研究并實現(xiàn)了一種基于模糊查詢的大數(shù)據(jù)分析處理系統(tǒng)。該系統(tǒng)為用戶定義了一套查詢接口,這些接口支持用戶進行各種聚集查詢(Group By)。系統(tǒng)將會為用戶查詢返回一個模糊結(jié)果。本系統(tǒng)可以在秒級內(nèi)返回上百G數(shù)據(jù)的模糊查詢結(jié)果。利用在線聚集技術(shù)可以快速生成數(shù)據(jù)輪廓的特點,本文將在線聚集技術(shù)應(yīng)用到了系統(tǒng)中。同時,系統(tǒng)中相鄰查詢得到的結(jié)果集是有交疊的,如果能夠?qū)⑾到y(tǒng)已經(jīng)處理的查詢所采集到的樣本和計算出的中間結(jié)果保存起來,就可以加速系統(tǒng)處理后面查詢的速度;诖,本文對在線聚集技術(shù)做了優(yōu)化。首先,本文對數(shù)據(jù)集進行隨機化處理,生成一個隨機數(shù)據(jù)集,這樣,就可以通過順序掃描隨機數(shù)據(jù)集來達到在數(shù)據(jù)集中隨機采樣的效果。然后,本文通過在線聚集技術(shù)處理用戶的查詢請求。在線聚集技術(shù)在生成查詢結(jié)果的同時,會把已經(jīng)獲取的樣本和產(chǎn)生的中間結(jié)果存儲在一棵樣本管理樹中。相應(yīng)的,用戶的查詢也會首先在這棵樹中進行處理。當(dāng)在樹中查詢到的結(jié)果不能滿足用戶的需求時,系統(tǒng)再從數(shù)據(jù)源讀取數(shù)據(jù)。通過這種方式,在線聚集技術(shù)中采取的樣本和中間結(jié)果可以有效地被多個查詢使用。同時,本文還提供了一種整合多個中間結(jié)果的方法,以生成最終查詢結(jié)果。最后,通過在TPC-H基準(zhǔn)上的實驗結(jié)果,驗證了本文所設(shè)計并實現(xiàn)的系統(tǒng)的有效性。
[Abstract]:With the maturation of big data's analytical technology, the great value contained by big data has been paid more and more attention. Because of the huge amount of data, big data is generally very time-consuming analysis. However, in many cases, users do not need accurate query results, the profile of the data can meet most of the analysis requirements. This paper studies and implements a big data analysis and processing system based on fuzzy query. The system defines a set of query interfaces for users. The system will return a fuzzy result for the user query. The system can return the fuzzy query results of hundreds of gigabytes in seconds. In this paper, the on-line aggregation technique is applied to the system. At the same time, the result sets of the adjacent queries in the system are overlapped. If we can save the samples collected from the queries processed by the system and the intermediate results calculated, we can speed up the processing of the later queries. Based on this, this paper optimizes the technique of online aggregation. First, the data set is randomly processed to generate a random data set, so that the random data set can be scanned sequentially to achieve the effect of random sampling in the data set. Then, this paper deals with the query request of the user through the online aggregation technology. While generating query results, the online aggregation technique stores the obtained samples and the generated intermediate results in a sample management tree. Accordingly, the user's query is first processed in this tree. When the query results in the tree can not meet the needs of the user, the system reads the data from the data source. In this way, the samples and intermediate results taken in the online aggregation technique can be effectively used by multiple queries. At the same time, this paper also provides a method to integrate multiple intermediate results to generate the final query results. Finally, the effectiveness of the system designed and implemented in this paper is verified by the experimental results on the TPC-H benchmark.
【學(xué)位授予單位】:浙江大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP311.13

【參考文獻】

相關(guān)期刊論文 前4條

1 汪鳳鳴;慈祥;孟小峰;;云環(huán)境下的Max/Min在線聚集技術(shù)研究[J];小型微型計算機系統(tǒng);2015年10期

2 安明遠;孫秀明;孫凝暉;;動態(tài)分片在線聚集[J];計算機研究與發(fā)展;2010年11期

3 韓希先;楊東華;李建中;;海量數(shù)據(jù)上的近似連接聚集操作[J];計算機學(xué)報;2010年10期

4 程思瑤;姜守旭;李建中;;P2P網(wǎng)絡(luò)中時變數(shù)據(jù)的近似聚集方法[J];軟件學(xué)報;2009年07期

,

本文編號:1855864

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1855864.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶fd6ff***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com