Storm環(huán)境下基于資源感知的任務(wù)調(diào)度研究
[Abstract]:With the increasing speed of data creation in large data applications, a lot of data need to be processed in time. Apache Storm is a flow processing system. It has the advantages of real-time, distributed, scalable and high reliable data processing. It is paid much attention in the academia and industry. In a complex flow event processing engine, data is necessary. The event real-time flow that must be quickly analyzed and processed is mainly used in large data, and the generated data streams are processed and processed to prepare for the further generation of new event data streams. In order to assess whether the resource allocation strategy is successful, three performance metrics are used to check the adaptability of the resource volatility during resource scheduling. These performance metrics include processing latency, resource throughput, and user satisfaction. Executing scheduling related components are defined as basic computing components, aggregated into a single topology structure. Real time data streams with different arrival rates and changing operating conditions pose new challenges to data handling. Therefore, scheduling efficiency is improved. As the main problem solved in this article, it is also the key link to find the optimal Strom arrangement between active physical nodes. However, like many other large data processing systems, Storm has no intelligent scheduling mechanism. At present, the default cyclic scheduling mechanism in Storm does not fully consider the resource requirements and availability, resulting in the failure of the resources to be filled. An elastic solution that can cope with the sudden fluctuation of the input data flow is a recent hot research field. The traditional scheduling scheme, to a large extent, relies on the measurement of a set of performance metrics and makes appropriate scheduling by comparing it with another set of predetermined thresholds. In this paper, a resource adaptive scheduler based on CPU, memory, network bandwidth is proposed for Storm framework, which can allocate resources and improve performance more effectively, and consider the data transmission rate and load balance between tasks of Storm, and assign the task pairs of high communication to the same group. Compared with the default scheduling provided by Storm, the scheduling algorithm in this paper has a significant improvement. It distributes the whole task in the cluster, perceiving the changes of CPU, memory, and network bandwidth to perform task scheduling. By analyzing the characteristics and performance of the Storm default task scheduling strategy, this paper designs and implements a flow based on the Storm resource perception. The data processing system. Compared with the default Storm scheduling, the improved Storm scheduling has the following ideal features: (1) to dynamically allocate or reassign tasks to speed up data processing based on the runtime state, dynamically allocate or reassign tasks through the efficient resource aware scheduling, thus minimizing the inter node and inter process resource overhead while ensuring no working nodes. Overload; (2) it can integrate the resources of the work node to make fine-grained control so that the improved Storm can achieve better performance with fewer work nodes; (3) it allows the scheduling algorithm to implement modularized management through the code and allow the adjustment of the scheduling parameters; (4) it is transparent to the Storm user, and the Storm application can On the platform of the improved Strom scheduling. Based on the three Benchmark stream data processing applications of SOL, RollingSort and WordCount, this article adds the monitoring program code that perceiving CPU, memory, network bandwidth, storing the monitoring information into the database, and the scheduler obtains data from the database based on the improved algorithm program and Instead of the default scheduling policy, a statistical table of throughput and time delay between the topology nodes is automatically generated for performance evaluation. Several experimental results show that the improved Storm is better than the Storm default scheduler on SOL, RollingSort and WordCount.
【學(xué)位授予單位】:新疆大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP301.6
【參考文獻】
相關(guān)期刊論文 前7條
1 陳伯雄;艾中良;;差異化作業(yè)調(diào)度在Storm上的實現(xiàn)[J];軟件;2017年01期
2 熊安萍;王賢穩(wěn);鄒洋;;基于Storm拓撲結(jié)構(gòu)熱邊的調(diào)度算法[J];計算機工程;2017年01期
3 黃容;王賢穩(wěn);;基于Storm slot使用率低優(yōu)先的動態(tài)負載均衡策略[J];電腦知識與技術(shù);2016年36期
4 楊秋吉;于俊清;莫斌生;何云峰;;面向Storm的數(shù)據(jù)流編程模型與編譯優(yōu)化方法研究[J];計算機工程與科學(xué);2016年12期
5 孫大為;;大數(shù)據(jù)流式計算:應(yīng)用特征和技術(shù)挑戰(zhàn)[J];大數(shù)據(jù);2015年03期
6 孫大為;張廣艷;鄭緯民;;大數(shù)據(jù)流式計算:關(guān)鍵技術(shù)及系統(tǒng)實例[J];軟件學(xué)報;2014年04期
7 孟小峰;慈祥;;大數(shù)據(jù)管理:概念、技術(shù)與挑戰(zhàn)[J];計算機研究與發(fā)展;2013年01期
相關(guān)碩士學(xué)位論文 前3條
1 談杰;基于storm的實時物流數(shù)據(jù)查詢系統(tǒng)設(shè)計與實現(xiàn)[D];南京郵電大學(xué);2016年
2 李萍;基于SLA感知的Hadoop YARN節(jié)能調(diào)度策略研究[D];山東大學(xué);2016年
3 王冬;基于Storm的鐵道供電監(jiān)控信息實時流計算處理研究[D];華東交通大學(xué);2016年
,本文編號:2148344
本文鏈接:http://sikaile.net/shoufeilunwen/xixikjs/2148344.html