基于Hadoop和C4.5算法的車聯(lián)網數(shù)據處理系統(tǒng)

發(fā)布時間：2018-06-10 00:04

本文選題：Hadoop + 車聯(lián)網　；參考：《江蘇大學》2017年碩士論文

【摘要】：隨著國民經濟的發(fā)展以及我國城市化進程的加快,汽車作為生活必需品開始走進千家萬戶�，F(xiàn)如今汽車上都安裝有電子控制單元Electronic Control Unit(ECU),ECU可以采集各種傳感數(shù)據,如車速,加速踏板開度信號,轉速等。這些數(shù)據通過車聯(lián)網傳輸?shù)綌?shù)據中心并保存,傳感器數(shù)據具有數(shù)據量大和非結構化特點。這給大數(shù)據存儲和分析帶來了一定的困難,如何對這些數(shù)據進行有效地存儲和分析成為車聯(lián)網企業(yè)面臨的重要挑戰(zhàn)之一。云計算和大數(shù)據的發(fā)展為大量車聯(lián)網數(shù)據的存儲和分析提供了契機。論文基于Hadoop大數(shù)據處理平臺及其生態(tài)系統(tǒng),采用HBase分布式數(shù)據庫實現(xiàn)對大量的車聯(lián)網傳感數(shù)據進行有效地存儲;基于MapReduce和優(yōu)化的C4.5算法對車聯(lián)網數(shù)據進行高效的分析,主要工作如下:1、基于HBase的車聯(lián)網數(shù)據管理系統(tǒng)的設計,采用HBase分布式數(shù)據庫對傳感器采集到的汽車工況參數(shù)進行存儲,包括數(shù)據庫的設計;存儲與查詢數(shù)據的接口函數(shù)設計;構建二級索引實現(xiàn)多條件查詢;與Hive的集成來實現(xiàn)SQL引擎;基于MapReduce實現(xiàn)數(shù)據遷移;開發(fā)了網頁端數(shù)據管理系統(tǒng)。2、根據C4.5算法的特點,采用泰勒中值定理對C4.5算法的屬性選擇度量進行簡化,避免對數(shù)運算,降低算法計算復雜度,提高算法的效率;基于MapReduce對優(yōu)化的C4.5算法并行化實現(xiàn),進一步提高算法的運行效率。對車聯(lián)網數(shù)據進行特征提取,用優(yōu)化C4.5算法對車輛加速性能分類,生成判斷加速性能的決策樹分類規(guī)則。3、搭建系統(tǒng)平臺并對系統(tǒng)進行測試,基于Hadoop和HBase搭建測試平臺,對HBase和SQL Server的數(shù)據操作性能進行對比測試;測試特征提取的并行化運行效率;通過特征提取后的數(shù)據集來驗證優(yōu)化的C4.5算法的效率和準確率。測試結果表明,與SQL Server相比,系統(tǒng)中HBase的讀寫效率都得到了明顯的提高;數(shù)字特征提取的運行效率隨著集群節(jié)點數(shù)量的增加而成倍增加;與原C4.5算法相比,在分類準確率沒有降低的情況下,優(yōu)化后的C4.5算法提高了分類的效率。
[Abstract]:With the development of national economy and the acceleration of urbanization in our country, automobile as a necessity of life began to enter thousands of households. Nowadays, electronic control unit ECU is installed on the automobile. It can collect all kinds of sensing data, such as speed, acceleration pedal opening signal, speed and so on. These data are transferred to the data center and stored through the vehicle network. The sensor data is characterized by large amount of data and unstructured data. This brings some difficulties to the storage and analysis of big data. How to store and analyze these data effectively becomes one of the important challenges faced by car networking enterprises. The development of cloud computing and big data provides an opportunity for the storage and analysis of a large number of vehicle network data. Based on Hadoop big data processing platform and its ecosystem, this paper uses HBase distributed database to realize the efficient storage of a large number of vehicle network sensing data, and analyzes the vehicle networking data efficiently based on MapReduce and optimized C4.5 algorithm. The main work is as follows: 1. The design of the vehicle network data management system based on HBASE, using the HBase distributed database to store the parameters of the vehicle working condition collected by the sensor, including the design of the database, the design of the interface function between storing and querying data, and the design of the system. Build secondary index to realize multi-condition query; integrate with Hive to realize SQL engine; realize data migration based on MapReduce; develop web-side data management system .2. according to the characteristics of C4.5 algorithm, Using Taylor mean value theorem to simplify the attribute selection metric of C4.5 algorithm, to avoid logarithmic operation, to reduce the computational complexity of the algorithm, to improve the efficiency of the algorithm, to realize the optimized C4.5 algorithm parallelization based on MapReduce. Further improve the efficiency of the algorithm. The feature extraction of the vehicle network data is carried out, the vehicle acceleration performance is classified by optimized C4.5 algorithm, and the decision tree classification rule .3 is generated to judge the acceleration performance. The system platform is built and tested, and the testing platform is built based on Hadoop and HBase. The data operation performance of HBase and SQL Server is compared and tested; the parallelization efficiency of feature extraction is tested; the efficiency and accuracy of the optimized C4.5 algorithm are verified by the data set after feature extraction. The test results show that compared with SQL Server, the efficiency of reading and writing of HBase in the system has been obviously improved; the efficiency of digital feature extraction has increased exponentially with the increase of the number of cluster nodes; and compared with the original C4.5 algorithm, the efficiency of HBase reading and writing in the system has been greatly improved. The optimized C4.5 algorithm improves the classification efficiency without reducing the classification accuracy.
【學位授予單位】：江蘇大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：TP311.13

【參考文獻】

相關期刊論文前2條

1 馬煜;;基于C4.5算法的高校教師評價研究[J];現(xiàn)代計算機(專業(yè)版);2017年02期

2 孔堯;曹曦文;葉潤武;郭華;王洪初;;鋁材B2B電子商務系統(tǒng)設計與實現(xiàn)[J];軟件導刊;2016年09期

相關碩士學位論文前10條

1 劉順文;基于Hadoop平臺的大學生個性化就業(yè)推薦系統(tǒng)的構建與研究[D];東華理工大學;2016年

2 陸婷;基于HBase的交通流數(shù)據實時存儲系統(tǒng)的設計與實現(xiàn)[D];北方工業(yè)大學;2016年

3 張靜;大數(shù)據技術在學生業(yè)績分析中的研究與應用[D];吉林大學;2016年

4 胡亞偉;面向數(shù)據分發(fā)的車聯(lián)網RSU放置問題研究[D];中國科學技術大學;2016年

5 劉越甲;車聯(lián)網路口場景下分簇算法的研究[D];北京交通大學;2016年

6 徐巖;Hadoop中MapReduce的性能優(yōu)化及可視化工具開發(fā)[D];北京交通大學;2016年

7 余大州;基于Hadoop混合存儲解決方案的研究[D];吉林大學;2016年

8 孟慶翔;基于HBase的日志異常分析與相關算法研究[D];電子科技大學;2016年

9 楊樂;基于實時流數(shù)據平臺的車聯(lián)網數(shù)據監(jiān)控系統(tǒng)[D];電子科技大學;2016年

10 周斌;基于Hadoop的海量工程數(shù)據關聯(lián)規(guī)劃挖掘方法研究[D];北京交通大學;2016年

，

本文編號：2001290

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2001290.html

上一篇：面向室外監(jiān)控場景的天氣分類與低質圖像增強技術研究
下一篇：基于Web的碩士教育虛擬教室平臺的設計與實現(xiàn)

論文發(fā)表

·知網|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于Hadoop和C4.5算法的車聯(lián)網數(shù)據處理系統(tǒng)