天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 路橋論文 >

基于Hadoop的出租車(chē)數(shù)據(jù)質(zhì)量分析與處理

發(fā)布時(shí)間:2018-01-22 13:54

  本文關(guān)鍵詞: Hadoop 數(shù)據(jù)質(zhì)量 數(shù)據(jù)清洗 ?奎c(diǎn) 出處:《武漢理工大學(xué)》2015年碩士論文 論文類(lèi)型:學(xué)位論文


【摘要】:深圳市通過(guò)智能交通系統(tǒng)(Intelligent Transportation System,ITS)建設(shè),建立了智能交通公用信息平臺(tái),信息平臺(tái)每天采集到海量的交通數(shù)據(jù),這些數(shù)據(jù)蘊(yùn)含著豐富的交通信息。高質(zhì)量的交通數(shù)據(jù)是ITS做出正確決策的保證,然而,實(shí)際的交通數(shù)據(jù)采集過(guò)程中,由于設(shè)備故障、外界環(huán)境干擾、人為操作失誤等多種因素的影響使得獲取的原始數(shù)據(jù)不可避免地存在丟失、冗余等質(zhì)量問(wèn)題。本文結(jié)合項(xiàng)目需求,采用基于Hadoop搭建的云計(jì)算平臺(tái)對(duì)深圳市海量出租車(chē)數(shù)據(jù)進(jìn)行數(shù)據(jù)質(zhì)量分析,并面向數(shù)據(jù)質(zhì)量進(jìn)行數(shù)據(jù)處理,主要工作包括以下幾個(gè)方面:(1)研究國(guó)內(nèi)外學(xué)者數(shù)據(jù)質(zhì)量評(píng)估和數(shù)據(jù)清洗方面取得的成果與不足,并在此基礎(chǔ)上引出本文的研究?jī)?nèi)容。(2)根據(jù)項(xiàng)目需求設(shè)計(jì)了基于決策學(xué)中層次分析法結(jié)合歷史數(shù)據(jù)的評(píng)價(jià)體系,利用層次分析法計(jì)算評(píng)價(jià)指標(biāo)權(quán)值并以歷史數(shù)據(jù)的期望為基準(zhǔn)得到數(shù)據(jù)質(zhì)量分?jǐn)?shù),將數(shù)據(jù)質(zhì)量問(wèn)題量化,直觀的反映數(shù)據(jù)質(zhì)量狀況。(3)針對(duì)深圳市出租車(chē)數(shù)據(jù)特征提出了GPS數(shù)據(jù)和營(yíng)運(yùn)數(shù)據(jù)質(zhì)量評(píng)價(jià)方案,首先找到影響數(shù)據(jù)質(zhì)量的主要因素,確定各自的評(píng)價(jià)指標(biāo),然后針對(duì)數(shù)據(jù)集中存在的冗余、不完整和錯(cuò)誤數(shù)據(jù),提出相應(yīng)的評(píng)價(jià)規(guī)則算法判斷是否符合條件。(4)面向深圳市出租車(chē)數(shù)據(jù)質(zhì)量分析結(jié)果,提高數(shù)據(jù)質(zhì)量。重點(diǎn)研究了重復(fù)數(shù)據(jù)清洗技術(shù),提出了基于MapReduce的分塊去重算法刪除重復(fù)數(shù)據(jù)。然后分別對(duì)GPS數(shù)據(jù)和營(yíng)運(yùn)數(shù)據(jù)提出了基于Hadoop平臺(tái)的出租車(chē)數(shù)據(jù)清洗方案,數(shù)據(jù)清洗方案主要針對(duì)數(shù)據(jù)不完整、冗余和錯(cuò)誤的質(zhì)量問(wèn)題,將傳統(tǒng)的清洗技術(shù)遷移到云平臺(tái)。(5)將清洗后高質(zhì)量的GPS數(shù)據(jù)應(yīng)用于出租車(chē)?奎c(diǎn)研究,提出了基于DBSCAN的?奎c(diǎn)檢測(cè)算法,從非載客的軌跡數(shù)據(jù)中找到出租車(chē)?奎c(diǎn),檢測(cè)算法主要分為三個(gè)步驟:候選點(diǎn)獲取,候選點(diǎn)過(guò)濾和?奎c(diǎn)候選點(diǎn)聚類(lèi)。候選點(diǎn)的獲取是根據(jù)候選點(diǎn)檢測(cè)算法,然后利用時(shí)間和空間屬性對(duì)候選點(diǎn)過(guò)濾,最后分析各種聚類(lèi)算法優(yōu)缺點(diǎn),選擇DBSCAN聚類(lèi)算法進(jìn)行?奎c(diǎn)聚類(lèi)。通過(guò)建立的數(shù)據(jù)質(zhì)量評(píng)價(jià)體系,對(duì)出租車(chē)的GPS數(shù)據(jù)和營(yíng)運(yùn)數(shù)據(jù)質(zhì)量進(jìn)行評(píng)估,最終得到兩個(gè)數(shù)據(jù)集的數(shù)據(jù)質(zhì)量得分,能夠直觀的反應(yīng)數(shù)據(jù)質(zhì)量的好壞,為后面的清洗任務(wù)提供依據(jù)。根據(jù)數(shù)據(jù)質(zhì)量評(píng)價(jià)結(jié)果研究相應(yīng)的數(shù)據(jù)清洗方案,能夠有效的提高了數(shù)據(jù)質(zhì)量,為ITS做出正確的決策提供支持。根據(jù)清洗后的數(shù)據(jù)研究出租車(chē)?奎c(diǎn),有助于城市管理人員更好的了解出租車(chē)駕駛員情況,對(duì)司機(jī)尋找乘客也有指導(dǎo)意義。
[Abstract]:Through the construction of Intelligent Transportation system in Shenzhen, the public information platform of intelligent transportation has been established. The information platform collects massive traffic data every day, which contains abundant traffic information. High quality traffic data is the guarantee for ITS to make the correct decision. However, in the actual traffic data collection process. Due to equipment failure, external environment interference, human error and other factors, the original data is inevitably lost, redundant and other quality problems. The cloud computing platform based on Hadoop is used to analyze the data quality of the mass taxi data in Shenzhen, and the data processing is oriented to the data quality. The main work includes the following aspects: 1) to study the achievements and shortcomings of domestic and foreign scholars in data quality assessment and data cleaning. On the basis of this, the research content of this paper is elicited. 2) according to the project requirements, the evaluation system based on AHP and historical data in decision science is designed. The weight value of evaluation index is calculated by AHP, and the data quality score is obtained based on the expectation of historical data, and the problem of data quality is quantified. According to the characteristics of taxi data in Shenzhen, the paper puts forward the evaluation scheme of GPS data and operation data quality. Firstly, it finds out the main factors that affect the data quality. Determine the respective evaluation indicators, and then address the data set of redundant, incomplete and erroneous data. The corresponding evaluation rule algorithm is put forward to judge whether or not it conforms to condition. (4) face to the result of taxi data quality analysis in Shenzhen to improve the data quality. The repeated data cleaning technology is studied emphatically. A block de-duplication algorithm based on MapReduce is proposed to delete the duplicate data. Then the cleaning scheme of taxi data based on Hadoop platform is proposed for GPS data and operation data respectively. The data cleaning scheme mainly aims at the quality problems of incomplete data, redundancy and error. The traditional cleaning technology is migrated to cloud platform. 5) the high quality GPS data after cleaning is applied to the research of taxi parking points. In this paper, a DBSCAN based algorithm for detecting stopping points is proposed. The algorithm can be divided into three steps: obtaining candidate points from the track data of non-passengers. Candidate point filtering and docking point candidate point clustering. Candidate points are obtained according to candidate point detection algorithm, then use time and space attributes to filter candidate points, and finally analyze the advantages and disadvantages of various clustering algorithms. The DBSCAN clustering algorithm is selected to cluster the docking points. Through the established data quality evaluation system, the GPS data and operation data quality of the taxi are evaluated. Finally, the data quality scores of the two data sets are obtained, which can directly reflect the quality of the data, and provide the basis for the later cleaning tasks. According to the evaluation results of data quality, the corresponding data cleaning scheme is studied. Can effectively improve the quality of data for ITS to make the right decision to provide support. According to the data washed after the study of taxi parking points, it is helpful for city managers to better understand the taxi driver situation. It is also instructive for drivers to find passengers.
【學(xué)位授予單位】:武漢理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類(lèi)號(hào)】:U495

【參考文獻(xiàn)】

相關(guān)博士學(xué)位論文 前6條

1 王國(guó)華;高效重復(fù)數(shù)據(jù)刪除技術(shù)研究[D];華南理工大學(xué);2014年

2 喬媛媛;基于Hadoop的網(wǎng)絡(luò)流量分析系統(tǒng)的研究與應(yīng)用[D];北京郵電大學(xué);2014年

3 樊華;面向物聯(lián)網(wǎng)的RFID不確定數(shù)據(jù)清洗與存儲(chǔ)技術(shù)研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2013年

4 夏英;智能交通系統(tǒng)中的時(shí)空數(shù)據(jù)分析關(guān)鍵技術(shù)研究[D];西南交通大學(xué);2012年

5 王燦;基于在線重復(fù)數(shù)據(jù)消除的海量數(shù)據(jù)處理關(guān)鍵技術(shù)研究[D];電子科技大學(xué);2012年

6 魏建生;高性能重復(fù)數(shù)據(jù)檢測(cè)與刪除技術(shù)研究[D];華中科技大學(xué);2012年

相關(guān)碩士學(xué)位論文 前4條

1 盧本新;數(shù)據(jù)倉(cāng)庫(kù)數(shù)據(jù)質(zhì)量管理的研究[D];大連理工大學(xué);2013年

2 王洵;宏觀統(tǒng)計(jì)數(shù)據(jù)質(zhì)量評(píng)估實(shí)證分析[D];廈門(mén)大學(xué);2013年

3 劉中超;數(shù)據(jù)中心的數(shù)據(jù)質(zhì)量管理工具設(shè)計(jì)與實(shí)現(xiàn)[D];華中科技大學(xué);2013年

4 苗潤(rùn)華;基于聚類(lèi)和孤立點(diǎn)檢測(cè)的數(shù)據(jù)預(yù)處理方法的研究[D];北京交通大學(xué);2012年

,

本文編號(hào):1454849

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/daoluqiaoliang/1454849.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶4bb00***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
国产在线日韩精品欧美| 国产盗摄精品一区二区视频| 欧美日韩综合综合久久久| 国产精品免费视频视频| 亚洲精品高清国产一线久久| 国产又猛又大又长又粗| 精品日韩视频在线观看| 蜜臀人妻一区二区三区| av在线免费观看一区二区三区| 日本加勒比系列在线播放| 欧美人妻盗摄日韩偷拍| 熟女中文字幕一区二区三区| 国产成人精品99在线观看| 欧美日韩三区在线观看| 亚洲欧美国产精品一区二区| 欧美特色特黄一级大黄片| 亚洲熟妇熟女久久精品 | av中文字幕一区二区三区在线| 日本高清二区视频久二区| 麻豆视传媒短视频在线看| 精品人妻一区二区三区四区久久| 内用黄老外示儒术出处| 亚洲国产91精品视频| 亚洲熟女国产熟女二区三区| 99久只有精品免费视频播放| 国产精品偷拍视频一区| 亚洲成人黄色一级大片| 日本人妻精品有码字幕| 日韩欧美三级中文字幕| 日韩熟妇人妻一区二区三区| 国产日韩欧美国产欧美日韩| 国产一级内片内射免费看 | 国产福利一区二区久久| 国产美女精品午夜福利视频 | 人妻久久一区二区三区精品99| 精品国产亚洲免费91| 中文字幕高清免费日韩视频| 日系韩系还是欧美久久| 日本高清加勒比免费在线| 91人妻人人揉人人澡人| 在线观看日韩欧美综合黄片|