基于Hadoop和C4.5算法的車聯(lián)網(wǎng)數(shù)據(jù)處理系統(tǒng)
本文選題:Hadoop + 車聯(lián)網(wǎng) ; 參考:《江蘇大學(xué)》2017年碩士論文
【摘要】:隨著國民經(jīng)濟(jì)的發(fā)展以及我國城市化進(jìn)程的加快,汽車作為生活必需品開始走進(jìn)千家萬戶,F(xiàn)如今汽車上都安裝有電子控制單元Electronic Control Unit(ECU),ECU可以采集各種傳感數(shù)據(jù),如車速,加速踏板開度信號,轉(zhuǎn)速等。這些數(shù)據(jù)通過車聯(lián)網(wǎng)傳輸?shù)綌?shù)據(jù)中心并保存,傳感器數(shù)據(jù)具有數(shù)據(jù)量大和非結(jié)構(gòu)化特點(diǎn)。這給大數(shù)據(jù)存儲和分析帶來了一定的困難,如何對這些數(shù)據(jù)進(jìn)行有效地存儲和分析成為車聯(lián)網(wǎng)企業(yè)面臨的重要挑戰(zhàn)之一。云計(jì)算和大數(shù)據(jù)的發(fā)展為大量車聯(lián)網(wǎng)數(shù)據(jù)的存儲和分析提供了契機(jī)。論文基于Hadoop大數(shù)據(jù)處理平臺及其生態(tài)系統(tǒng),采用HBase分布式數(shù)據(jù)庫實(shí)現(xiàn)對大量的車聯(lián)網(wǎng)傳感數(shù)據(jù)進(jìn)行有效地存儲;基于MapReduce和優(yōu)化的C4.5算法對車聯(lián)網(wǎng)數(shù)據(jù)進(jìn)行高效的分析,主要工作如下:1、基于HBase的車聯(lián)網(wǎng)數(shù)據(jù)管理系統(tǒng)的設(shè)計(jì),采用HBase分布式數(shù)據(jù)庫對傳感器采集到的汽車工況參數(shù)進(jìn)行存儲,包括數(shù)據(jù)庫的設(shè)計(jì);存儲與查詢數(shù)據(jù)的接口函數(shù)設(shè)計(jì);構(gòu)建二級索引實(shí)現(xiàn)多條件查詢;與Hive的集成來實(shí)現(xiàn)SQL引擎;基于MapReduce實(shí)現(xiàn)數(shù)據(jù)遷移;開發(fā)了網(wǎng)頁端數(shù)據(jù)管理系統(tǒng)。2、根據(jù)C4.5算法的特點(diǎn),采用泰勒中值定理對C4.5算法的屬性選擇度量進(jìn)行簡化,避免對數(shù)運(yùn)算,降低算法計(jì)算復(fù)雜度,提高算法的效率;基于MapReduce對優(yōu)化的C4.5算法并行化實(shí)現(xiàn),進(jìn)一步提高算法的運(yùn)行效率。對車聯(lián)網(wǎng)數(shù)據(jù)進(jìn)行特征提取,用優(yōu)化C4.5算法對車輛加速性能分類,生成判斷加速性能的決策樹分類規(guī)則。3、搭建系統(tǒng)平臺并對系統(tǒng)進(jìn)行測試,基于Hadoop和HBase搭建測試平臺,對HBase和SQL Server的數(shù)據(jù)操作性能進(jìn)行對比測試;測試特征提取的并行化運(yùn)行效率;通過特征提取后的數(shù)據(jù)集來驗(yàn)證優(yōu)化的C4.5算法的效率和準(zhǔn)確率。測試結(jié)果表明,與SQL Server相比,系統(tǒng)中HBase的讀寫效率都得到了明顯的提高;數(shù)字特征提取的運(yùn)行效率隨著集群節(jié)點(diǎn)數(shù)量的增加而成倍增加;與原C4.5算法相比,在分類準(zhǔn)確率沒有降低的情況下,優(yōu)化后的C4.5算法提高了分類的效率。
[Abstract]:With the development of national economy and the acceleration of urbanization in our country, automobile as a necessity of life began to enter thousands of households. Nowadays, electronic control unit ECU is installed on the automobile. It can collect all kinds of sensing data, such as speed, acceleration pedal opening signal, speed and so on. These data are transferred to the data center and stored through the vehicle network. The sensor data is characterized by large amount of data and unstructured data. This brings some difficulties to the storage and analysis of big data. How to store and analyze these data effectively becomes one of the important challenges faced by car networking enterprises. The development of cloud computing and big data provides an opportunity for the storage and analysis of a large number of vehicle network data. Based on Hadoop big data processing platform and its ecosystem, this paper uses HBase distributed database to realize the efficient storage of a large number of vehicle network sensing data, and analyzes the vehicle networking data efficiently based on MapReduce and optimized C4.5 algorithm. The main work is as follows: 1. The design of the vehicle network data management system based on HBASE, using the HBase distributed database to store the parameters of the vehicle working condition collected by the sensor, including the design of the database, the design of the interface function between storing and querying data, and the design of the system. Build secondary index to realize multi-condition query; integrate with Hive to realize SQL engine; realize data migration based on MapReduce; develop web-side data management system .2. according to the characteristics of C4.5 algorithm, Using Taylor mean value theorem to simplify the attribute selection metric of C4.5 algorithm, to avoid logarithmic operation, to reduce the computational complexity of the algorithm, to improve the efficiency of the algorithm, to realize the optimized C4.5 algorithm parallelization based on MapReduce. Further improve the efficiency of the algorithm. The feature extraction of the vehicle network data is carried out, the vehicle acceleration performance is classified by optimized C4.5 algorithm, and the decision tree classification rule .3 is generated to judge the acceleration performance. The system platform is built and tested, and the testing platform is built based on Hadoop and HBase. The data operation performance of HBase and SQL Server is compared and tested; the parallelization efficiency of feature extraction is tested; the efficiency and accuracy of the optimized C4.5 algorithm are verified by the data set after feature extraction. The test results show that compared with SQL Server, the efficiency of reading and writing of HBase in the system has been obviously improved; the efficiency of digital feature extraction has increased exponentially with the increase of the number of cluster nodes; and compared with the original C4.5 algorithm, the efficiency of HBase reading and writing in the system has been greatly improved. The optimized C4.5 algorithm improves the classification efficiency without reducing the classification accuracy.
【學(xué)位授予單位】:江蘇大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 馬煜;;基于C4.5算法的高校教師評價研究[J];現(xiàn)代計(jì)算機(jī)(專業(yè)版);2017年02期
2 孔堯;曹曦文;葉潤武;郭華;王洪初;;鋁材B2B電子商務(wù)系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)[J];軟件導(dǎo)刊;2016年09期
相關(guān)碩士學(xué)位論文 前10條
1 劉順文;基于Hadoop平臺的大學(xué)生個性化就業(yè)推薦系統(tǒng)的構(gòu)建與研究[D];東華理工大學(xué);2016年
2 陸婷;基于HBase的交通流數(shù)據(jù)實(shí)時存儲系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];北方工業(yè)大學(xué);2016年
3 張靜;大數(shù)據(jù)技術(shù)在學(xué)生業(yè)績分析中的研究與應(yīng)用[D];吉林大學(xué);2016年
4 胡亞偉;面向數(shù)據(jù)分發(fā)的車聯(lián)網(wǎng)RSU放置問題研究[D];中國科學(xué)技術(shù)大學(xué);2016年
5 劉越甲;車聯(lián)網(wǎng)路口場景下分簇算法的研究[D];北京交通大學(xué);2016年
6 徐巖;Hadoop中MapReduce的性能優(yōu)化及可視化工具開發(fā)[D];北京交通大學(xué);2016年
7 余大州;基于Hadoop混合存儲解決方案的研究[D];吉林大學(xué);2016年
8 孟慶翔;基于HBase的日志異常分析與相關(guān)算法研究[D];電子科技大學(xué);2016年
9 楊樂;基于實(shí)時流數(shù)據(jù)平臺的車聯(lián)網(wǎng)數(shù)據(jù)監(jiān)控系統(tǒng)[D];電子科技大學(xué);2016年
10 周斌;基于Hadoop的海量工程數(shù)據(jù)關(guān)聯(lián)規(guī)劃挖掘方法研究[D];北京交通大學(xué);2016年
,本文編號:2001290
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2001290.html