基于云計(jì)算的海量時(shí)空數(shù)據(jù)存儲(chǔ)及挖掘方法的研究和應(yīng)用
發(fā)布時(shí)間:2018-06-02 13:14
本文選題:數(shù)據(jù)挖掘 + 云計(jì)算; 參考:《杭州電子科技大學(xué)》2014年碩士論文
【摘要】:近年來(lái),越來(lái)越多的應(yīng)用程序收集和存儲(chǔ)大量時(shí)空數(shù)據(jù)在分布式數(shù)據(jù)庫(kù)中,使得時(shí)空數(shù)據(jù)挖掘的需求不斷增加。在公安交通管理領(lǐng)域,由于交通流數(shù)據(jù)急劇增加,加上其數(shù)據(jù)具有顯著的時(shí)空特性,,使得在處理海量的時(shí)空數(shù)據(jù)上面臨著嚴(yán)重的挑戰(zhàn)。針對(duì)日益增長(zhǎng)的海量數(shù)據(jù)分析,傳統(tǒng)的處理方法在存儲(chǔ)空間和計(jì)算效率上已不能滿足用戶需求,需要有支持海量數(shù)據(jù)存儲(chǔ)和分析的平臺(tái)來(lái)適應(yīng)新的需求。 時(shí)空異常探測(cè)是時(shí)空數(shù)據(jù)挖掘領(lǐng)域中一個(gè)重要分支。本文針對(duì)傳統(tǒng)處理方法在時(shí)空異常探測(cè)方面的局限性,設(shè)計(jì)實(shí)現(xiàn)了一個(gè)大數(shù)據(jù)存儲(chǔ)及分析平臺(tái)。主要研究?jī)?nèi)容和創(chuàng)新如下: (1)本文分析和研究云平臺(tái)下Hadoop、HBase、Hive及Zookeeper的技術(shù)原理,研究了Hadoop框架的HDFS原理及MapReduce編程模型,重點(diǎn)研究了HBase分布式數(shù)據(jù)庫(kù)的數(shù)據(jù)存儲(chǔ)架構(gòu)底層實(shí)現(xiàn)原理及HBase表的數(shù)據(jù)模型。在此基礎(chǔ)上,本文構(gòu)建了基于Hadoop、HBase、Hive及Zookeeper的云平臺(tái),并搭建了HBase+Hive系統(tǒng)擴(kuò)展架構(gòu)。 (2)對(duì)時(shí)空異常探測(cè)方法進(jìn)行了深入研究,分析研究了現(xiàn)有的一些時(shí)空異常模式,通過(guò)挖掘預(yù)先定義的時(shí)空異常模式得到有價(jià)值的知識(shí)。提出了基于云平臺(tái)的四步驟時(shí)空異常探測(cè)方法(數(shù)據(jù)預(yù)處理、分布式異常探測(cè)方法、知識(shí)規(guī)則應(yīng)用、結(jié)果驗(yàn)證)來(lái)挖掘預(yù)先定義的時(shí)空異常模式,使用交通數(shù)據(jù)流中的一個(gè)真實(shí)應(yīng)用來(lái)驗(yàn)證該方法。實(shí)驗(yàn)表明該方法具有較高的運(yùn)行效率和正確性。 (3)研究了HBase行鍵設(shè)計(jì),提出了基于行鍵的數(shù)據(jù)模型。在明確設(shè)計(jì)目標(biāo)的基礎(chǔ)上,利用行鍵來(lái)設(shè)計(jì)輔助索引表和副本恢復(fù)表,實(shí)現(xiàn)了一種基于HBase的分布式輔助索引并應(yīng)用于交通流過(guò)車數(shù)據(jù)應(yīng)用中。實(shí)驗(yàn)表明該索引機(jī)制可以高效地實(shí)現(xiàn)海量數(shù)據(jù)的查詢。 (4)結(jié)合上述的研究?jī)?nèi)容,本文設(shè)計(jì)實(shí)現(xiàn)了大數(shù)據(jù)存儲(chǔ)及分析平臺(tái),包括云平臺(tái)、后臺(tái)程序和前臺(tái)展示系統(tǒng)。將時(shí)空異常探測(cè)的真實(shí)應(yīng)用集成到該平臺(tái)中,給用戶提供方便操作及結(jié)果展示。
[Abstract]:In recent years, more and more applications collect and store a large amount of spatio-temporal data in distributed databases, which makes the demand of spatio-temporal data mining increasing. In the field of public security traffic management, due to the sharp increase of traffic flow data and the remarkable spatio-temporal characteristics of traffic flow data, there are serious challenges in dealing with massive spatio-temporal data. For the growing mass data analysis, the traditional processing methods can not meet the needs of users in terms of storage space and computing efficiency, and need a platform to support mass data storage and analysis to meet the new needs. Spatiotemporal anomaly detection is an important branch of spatiotemporal data mining. In this paper, a big data storage and analysis platform is designed and implemented in view of the limitation of the traditional processing methods in the detection of space-time anomalies. The main research contents and innovations are as follows: 1) this paper analyzes and studies the technical principle of Hadoop HBaseHive and Zookeeper under the cloud platform, studies the HDFS principle and MapReduce programming model of Hadoop framework, and emphatically studies the underlying realization principle of HBase distributed database data storage architecture and the data model of HBase table. On this basis, this paper constructs a cloud platform based on Hadoop HBaseHive and Zookeeper, and builds a HBase Hive system extension architecture. 2) the methods of detecting space-time anomalies are deeply studied, and some existing spatio-temporal anomaly patterns are analyzed and studied, and valuable knowledge is obtained by mining predefined spatio-temporal anomaly patterns. A four-step spatio-temporal anomaly detection method based on cloud platform (data preprocessing, distributed anomaly detection, knowledge rule application and result verification) is proposed to mine predefined spatio-temporal anomaly patterns. Use a real application in traffic data flow to verify the method. Experiments show that the method has high efficiency and correctness. The design of HBase row key is studied, and the data model based on line key is proposed. On the basis of clear design goal, the auxiliary index table and replica recovery table are designed by using row key, and a distributed auxiliary index based on HBase is implemented and applied to traffic passing vehicle data application. Experiments show that the indexing mechanism can efficiently realize the query of massive data. This paper designs and implements big data storage and analysis platform, including cloud platform, background program and foreground display system. The real application of space-time anomaly detection is integrated into the platform to provide users with convenient operation and display of results.
【學(xué)位授予單位】:杭州電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP333;TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 舒紅;陳軍;史文中;;時(shí)空數(shù)據(jù)模型研究綜述[J];計(jì)算機(jī)科學(xué);1998年06期
2 柴曉路;曹晶;施伯樂(lè);;時(shí)空信息的層次存儲(chǔ)和管理[J];計(jì)算機(jī)科學(xué);2000年07期
3 王珊;王會(huì)舉;覃雄派;周p
本文編號(hào):1968888
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1968888.html
最近更新
教材專著