面向鐵路運(yùn)維的大數(shù)據(jù)流式處理技術(shù)的研究與應(yīng)用
本文選題:大數(shù)據(jù) + 鐵路運(yùn)維 ; 參考:《北京交通大學(xué)》2017年碩士論文
【摘要】:目前,我們正在處于大數(shù)據(jù)時(shí)代,在鐵路運(yùn)輸行業(yè)也是這樣。我國(guó)目前已經(jīng)在高鐵行業(yè)進(jìn)入了世界領(lǐng)先的水平,掌握了許多高速列車的核心技術(shù)。在鐵路的運(yùn)維過(guò)程中,也已借助先進(jìn)的傳感器技術(shù)、數(shù)據(jù)采集設(shè)備和計(jì)算機(jī)存儲(chǔ)設(shè)備積累了海量的運(yùn)維數(shù)據(jù)。如何對(duì)這些海量的數(shù)據(jù)做分析處理,對(duì)鐵路的維修維護(hù)工作具有重大的意義。面對(duì)目前鐵路運(yùn)維數(shù)據(jù)所呈現(xiàn)的大容量、多樣化和積累迅速等特點(diǎn),傳統(tǒng)的數(shù)據(jù)處理方式已經(jīng)較難有效處理,其弊端主要體現(xiàn)在耗時(shí)長(zhǎng),難以滿足運(yùn)維過(guò)程中的實(shí)時(shí)性需求。因此本論文提出了基于流式處理技術(shù)的方案,并應(yīng)用于鐵路運(yùn)維的數(shù)據(jù)處理中,解決了目前鐵路運(yùn)維過(guò)程中處理大量快速增長(zhǎng)的數(shù)據(jù)時(shí)耗時(shí)較長(zhǎng)的問(wèn)題。本論文調(diào)研了當(dāng)前鐵路運(yùn)維數(shù)據(jù)的特征,并對(duì)流式處理技術(shù)和傳統(tǒng)處理技術(shù)的差異進(jìn)行比較,提出基于流式框架的數(shù)據(jù)處理方案。在此基礎(chǔ)上,本文實(shí)現(xiàn)了基于Spark Streaming框架的鐵路通信光纖監(jiān)測(cè)日志文件的流式處理系統(tǒng),深入研究了 concurrentJobs,batchDuration等參數(shù)對(duì)處理性能的影響,并對(duì)系統(tǒng)進(jìn)行了優(yōu)化。本論文主要進(jìn)行了以下幾個(gè)方面的工作:(1)在分析了流式處理計(jì)算框架的核心技術(shù)的基礎(chǔ)上,根據(jù)當(dāng)前鐵路運(yùn)維過(guò)程中的數(shù)據(jù)特征和處理需求,提出了基于流式框架的解決方案。目前,鐵路行業(yè)的流式數(shù)據(jù)增長(zhǎng)迅速,然而鐵路運(yùn)維過(guò)程中仍采用傳統(tǒng)的數(shù)據(jù)處理技術(shù)開(kāi)展應(yīng)用分析,數(shù)據(jù)處理的時(shí)效性不強(qiáng)。對(duì)此本文提出了基于流式處理技術(shù)的方案,解決了傳統(tǒng)處理技術(shù)在應(yīng)對(duì)大量快速增長(zhǎng)的數(shù)據(jù)時(shí)處理時(shí)間較長(zhǎng)的問(wèn)題。實(shí)驗(yàn)表明流處理方式與傳統(tǒng)的處理方式相比在時(shí)效性上有很大的提升。(2)設(shè)計(jì)并實(shí)現(xiàn)了基于Spark Streaming的光纖監(jiān)測(cè)日志數(shù)據(jù)處理系統(tǒng)。首先搭建了分布式流處理實(shí)驗(yàn)環(huán)境。然后利用流式處理框架對(duì)日志文件進(jìn)行基于內(nèi)存的分布式處理,提取日志文件中的關(guān)鍵字段并保存在數(shù)據(jù)倉(cāng)庫(kù)中。最后利用交互式查詢工具對(duì)提取出的數(shù)據(jù)進(jìn)行業(yè)務(wù)分析。(3)在(2)工作的基礎(chǔ)上,對(duì)基于Spark Streaming的流式處理系統(tǒng)進(jìn)行了優(yōu)化,提升了系統(tǒng)的性能。具體地,首先在架構(gòu)上整合了分布式消息隊(duì)列Kafka,實(shí)現(xiàn)了數(shù)據(jù)讀入過(guò)程的并行化;接著針對(duì)Spark Streaming的concurrentJobs,batchDuration等參數(shù)進(jìn)行了優(yōu)化,提升了日志數(shù)據(jù)的處理效率。本文對(duì)所提出的流式處理方案進(jìn)行實(shí)驗(yàn)驗(yàn)證,實(shí)驗(yàn)數(shù)據(jù)采用生產(chǎn)環(huán)境中所積累的光纖監(jiān)測(cè)日志數(shù)據(jù),分別設(shè)計(jì)不同實(shí)驗(yàn)并與傳統(tǒng)的數(shù)據(jù)處理方式進(jìn)行了對(duì)比。實(shí)驗(yàn)結(jié)果表明,本文所提出的方案能夠更快速地完成日志文件的處理,并且分布式的系統(tǒng)架構(gòu)具有很好的擴(kuò)展性,系統(tǒng)性能隨著節(jié)點(diǎn)數(shù)量的增加會(huì)有進(jìn)一步的提升。本論文所實(shí)現(xiàn)的流式處理系統(tǒng)滿足了運(yùn)維中的時(shí)效性需求,能夠快速地處理運(yùn)維過(guò)程中積累的數(shù)據(jù),提高了鐵路運(yùn)維中數(shù)據(jù)處理的效率。
[Abstract]:At present, we are in the big data era, in the railway transport industry is the same. At present, China has entered the world leading level in high-speed rail industry, and has mastered the core technology of many high-speed trains. In the process of railway operation and maintenance, the advanced sensor technology, data acquisition equipment and computer storage equipment have accumulated a large amount of operational and maintenance data. How to analyze and deal with these massive data is of great significance to railway maintenance and maintenance. In the face of the characteristics of large capacity, diversification and rapid accumulation of railway operation and maintenance data, the traditional data processing method has been difficult to deal with effectively, and its disadvantages are mainly reflected in the time consuming, which is difficult to meet the real-time requirements in the process of operation and maintenance. Therefore, this paper puts forward a scheme based on flow processing technology, and applies it to the data processing of railway operation and maintenance, which solves the problem that it takes a long time to deal with a large number of rapidly increasing data in the process of railway operation and maintenance. In this paper, the characteristics of current railway operation and maintenance data are investigated, and the differences between flow processing technology and traditional processing technology are compared, and a data processing scheme based on flow framework is proposed. On this basis, this paper implements the flow processing system of railway communication optical fiber monitoring log file based on Spark Streaming framework. The effect of parameters such as concurrent obsbatch duration on processing performance is deeply studied, and the system is optimized. On the basis of analyzing the core technology of the flow processing computing framework and according to the data characteristics and processing requirements in the current railway operation and maintenance process, this paper puts forward a solution based on the flow frame. At present, the flow data of railway industry is growing rapidly, however, the traditional data processing technology is still used in the railway operation and maintenance process to carry out application analysis, and the timeliness of data processing is not strong. In this paper, a scheme based on flow processing technology is proposed, which solves the problem of long processing time of traditional processing technology in dealing with a large number of rapidly increasing data. The experimental results show that the stream processing method is much more time-efficient than the traditional one.) the design and implementation of the optical fiber monitoring log data processing system based on Spark Streaming is carried out. First, a distributed flow processing experimental environment is built. Then, the memory based distributed processing of log files is carried out by using streaming processing framework, and the key fields in log files are extracted and stored in the data warehouse. Finally, an interactive query tool is used to analyze the service of extracted data. On the basis of 2), the flow processing system based on Spark Streaming is optimized and the performance of the system is improved. Specifically, the distributed message queue Kafka is integrated in the architecture, which realizes the parallelization of the data read-in process, and then optimizes the parameters such as concurrent JobsbatchDuration of Spark Streaming to improve the efficiency of log data processing. In this paper, the proposed flow processing scheme is verified by experiments. The experimental data is based on the optical fiber monitoring log data accumulated in the production environment. Different experiments are designed and compared with the traditional data processing methods. The experimental results show that the proposed scheme can process log files more quickly, and the distributed system architecture has a good scalability, and the system performance will be further improved with the increase of the number of nodes. The flow processing system realized in this paper can meet the requirement of timeliness in operation and maintenance, and can process the data accumulated in the process of operation and maintenance quickly, and improve the efficiency of data processing in railway operation and maintenance.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 黃廷輝;王玉良;汪振;崔更申;;基于內(nèi)存與文件共享機(jī)制的Spark I/O性能優(yōu)化[J];計(jì)算機(jī)工程;2017年03期
2 季惠英;王昌頻;;調(diào)度自動(dòng)化系統(tǒng)海量日志處理的設(shè)計(jì)與實(shí)現(xiàn)[J];江蘇科技信息;2016年32期
3 馬小寧;李平;史天運(yùn);;鐵路大數(shù)據(jù)應(yīng)用體系架構(gòu)研究[J];鐵路計(jì)算機(jī)應(yīng)用;2016年09期
4 劉俊;史天運(yùn);李平;徐貴紅;楊連報(bào);;智能鐵路大數(shù)據(jù)服務(wù)平臺(tái)選型方法研究[J];鐵路計(jì)算機(jī)應(yīng)用;2016年09期
5 史天運(yùn);劉軍;李平;馬小寧;;鐵路大數(shù)據(jù)平臺(tái)總體方案及關(guān)鍵技術(shù)研究[J];鐵路計(jì)算機(jī)應(yīng)用;2016年09期
6 李祥池;;基于ELK和Spark Streaming的日志分析系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)[J];電子科學(xué)技術(shù);2015年06期
7 方艾;徐雄;梁冰;張玉忠;楊翊平;;主流大數(shù)據(jù)處理開(kāi)源架構(gòu)的分析及對(duì)比評(píng)測(cè)[J];電信科學(xué);2015年07期
8 邵長(zhǎng)虹;莊紅男;賈曉非;;大數(shù)據(jù)環(huán)境下的鐵路統(tǒng)計(jì)信息化平臺(tái)研究[J];中國(guó)鐵路;2015年07期
9 崔星燦;禹曉輝;劉洋;呂朝陽(yáng);;分布式流處理技術(shù)綜述[J];計(jì)算機(jī)研究與發(fā)展;2015年02期
10 郭超;劉波;林偉偉;;基于Impala的大數(shù)據(jù)查詢分析計(jì)算性能研究[J];計(jì)算機(jī)應(yīng)用研究;2015年05期
,本文編號(hào):1991880
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1991880.html