基于Hadoop的海量城市交通流數(shù)據(jù)分布式存儲與分析研究
發(fā)布時間:2018-04-07 17:38
本文選題:智能交通 切入點:Hadoop 出處:《揚州大學》2015年碩士論文
【摘要】:隨著智能交通基礎建設的快速發(fā)展,城鎮(zhèn)居民收入水平逐步提高,城市汽車擁有量大幅度增加。遍布每個城市道路的感應線圈、卡口斷面系統(tǒng),能夠及時地采集、記錄、匯總并上傳監(jiān)控數(shù)據(jù)。但是由于城市道路交通流存在著數(shù)據(jù)量大、實時性高等特點,傳統(tǒng)的數(shù)據(jù)存儲與處理技術存在著數(shù)據(jù)結構與數(shù)據(jù)存儲容量無法靈活擴展、分布式并行數(shù)據(jù)挖掘難、高容錯恢復能力差等問題。如何將海量的交通流數(shù)據(jù)實時地上傳、匯總和存儲利用,以及如何對數(shù)據(jù)進行統(tǒng)計挖掘成為一個較大的難題。以Hadoop為代表的大數(shù)據(jù)技術成為解決這一系列問題的有效手段之一。基于現(xiàn)階段城市交通發(fā)展帶來的數(shù)據(jù)存儲與分析等突出問題,本文通過對基于Hadoop的MapReduce、HBase等大數(shù)據(jù)技術的研究,提出了相應的解決方案,其主要研究工作和成果如下:(1)本文提出了基于Hadoop的交通流數(shù)據(jù)存儲與分析總體架構。將架構分為5個層面:數(shù)據(jù)采集層、硬件平臺層、數(shù)據(jù)存儲與計算層、挖掘分析層和應用服務層,同時研究與設計了節(jié)點在故障或宕機情況下,Hadoop集群具有高容錯恢復能力的可用性方案。(2)本文提出了基于HBase的海量交通流數(shù)據(jù)分布式存儲方案。根據(jù)交通流數(shù)據(jù)特點與處理應用需求,設計了可解決“熱點”問題的交通流數(shù)據(jù)表行健結構。同時研究了HBase的協(xié)處理器,設計了用于針對列查詢的快速數(shù)據(jù)檢索的二級索引表。(3)本文還根據(jù)交通車流量與密度的關系,設計了流量與密度計算模型,提出了基于MapReduce的流量密度計算的并行化實現(xiàn),解決了海量交通流數(shù)據(jù)情況下的流量、密度快速計算難題。同時,采用K近鄰非參數(shù)回歸算法來預測短時交通流,通過對K近鄰狀態(tài)向量、距離度量方式、近鄰個數(shù)以及預測算法的選擇及研究,提出了基于MapReduce的KNN預測短時交通流的并行化實現(xiàn),加快K最近鄰算法的搜索速度,實現(xiàn)對短時交通流的定時預測。(4)最后,根據(jù)總體架構應用層需求,基于Hadoop平臺,構建并實現(xiàn)了城市道路交通流數(shù)據(jù)分析系統(tǒng)。本文對系統(tǒng)進行了詳細的功能模塊設計,并實現(xiàn)了對交通流量進行實時監(jiān)測、海量數(shù)據(jù)分析的圖形化展示等功能。
[Abstract]:With the rapid development of intelligent transportation infrastructure, the income level of urban residents has gradually increased, and the number of urban car ownership has increased significantly.Induction coil and bayonet section system all over every city road can collect, record, collect and upload monitoring data in time.However, due to the characteristics of large amount of data and high real-time performance in urban road traffic flow, the traditional data storage and processing technology can not extend data structure and storage capacity flexibly, and distributed parallel data mining is difficult.Poor recovery ability of high fault tolerance and so on.How to upload, aggregate, store and utilize massive traffic flow data in real time, and how to mine the data statistically has become a big problem.Big data technology, represented by Hadoop, has become one of the effective means to solve this series of problems.Based on the outstanding problems of data storage and analysis brought about by the development of urban traffic at present, this paper puts forward the corresponding solutions through the research of big data technology such as MapReduceHBase based on Hadoop.The main research work and results are as follows: (1) this paper proposes an overall framework of traffic flow data storage and analysis based on Hadoop.The architecture is divided into five layers: data acquisition layer, hardware platform layer, data storage and computing layer, mining analysis layer and application service layer.At the same time, we study and design the availability scheme of Hadoop cluster with high fault-tolerant recovery ability in the event of failure or downtime) this paper proposes a distributed storage scheme for massive traffic flow data based on HBase.According to the characteristics of traffic flow data and the requirement of application, the traffic flow data table is designed to solve the "hot spot" problem.At the same time, the coprocessor of HBase is studied, and a two-level index table for fast data retrieval for column query is designed. In this paper, according to the relationship between traffic flow and density, the calculation model of traffic flow and density is also designed.A parallel implementation of traffic density calculation based on MapReduce is proposed, which solves the problem of fast calculation of traffic density in the case of massive traffic flow data.At the same time, the K-nearest neighbor nonparametric regression algorithm is used to predict the short-term traffic flow. The selection and research of K-nearest neighbor state vector, distance measurement, number of nearest neighbors and prediction algorithm are carried out.In this paper, the parallel implementation of short time traffic flow prediction with KNN based on MapReduce is proposed, which speeds up the search speed of K nearest neighbor algorithm, and realizes the timing prediction of short time traffic flow. Finally, according to the requirements of the application layer of the overall architecture, it is based on Hadoop platform.The data analysis system of urban road traffic flow is constructed and implemented.In this paper, the function module of the system is designed in detail, and the functions of real-time monitoring of traffic flow and graphical display of mass data analysis are realized.
【學位授予單位】:揚州大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:U495;TP311.13
【參考文獻】
相關期刊論文 前10條
1 梁軻;譚建軍;李英遠;;一種基于MapReduce的短時交通流預測方法[J];計算機工程;2015年01期
2 陸婷;房俊;喬彥克;;基于HBase的交通流數(shù)據(jù)實時存儲系統(tǒng)[J];計算機應用;2015年01期
3 謝海紅;戴許昊;齊遠;;短時交通流預測的改進K近鄰算法[J];交通運輸工程學報;2014年03期
4 焦冬冬;徐新國;;一種基于HBase的海量微博數(shù)據(jù)高效存儲方案[J];微型機與應用;2014年11期
5 李建國;;智能交通發(fā)展中的大數(shù)據(jù)分析[J];硅谷;2014年06期
6 司文;;hadoop技術在交通卡口數(shù)據(jù)管理中的應用[J];電子技術與軟件工程;2013年17期
7 閆永剛;馬廷淮;王建;;KNN分類算法的MapReduce并行化實現(xiàn)[J];南京航空航天大學學報;2013年04期
8 陳美;;大數(shù)據(jù)在公共交通中的應用[J];圖書與情報;2012年06期
9 朱晨杰;楊永麗;;基于MapReduce的BP神經網(wǎng)絡算法研究[J];微型電腦應用;2012年10期
10 于濱;鄔珊華;王明華;趙志宏;;K近鄰短時交通流預測模型[J];交通運輸工程學報;2012年02期
,本文編號:1720181
本文鏈接:http://sikaile.net/kejilunwen/daoluqiaoliang/1720181.html