Storm環(huán)境下基于資源感知的任務調(diào)度研究

發(fā)布時間：2018-07-27 15:58

【摘要】：隨著大數(shù)據(jù)應用程序中數(shù)據(jù)創(chuàng)建速度的不斷提高,需要及時實時處理大量的數(shù)據(jù),Apache Storm是一個流處理系統(tǒng),具有實時、分布式、可擴展和高可靠的數(shù)據(jù)處理優(yōu)勢,在學術界和產(chǎn)業(yè)界備受關注。在一個復雜的流事件處理引擎中,數(shù)據(jù)是必須被快速分析處理的事件實時流,這種形式主要用于大數(shù)據(jù)中,不斷產(chǎn)生的數(shù)據(jù)流被加工使用和處理結果為進一步生成新事件數(shù)據(jù)流做準備。為了評估資源分配策略是否成功,三個性能指標用來檢查其在資源調(diào)度時資源波動的適應性,這些性能指標包括處理延遲,資源吞吐量和用戶滿意度。執(zhí)行調(diào)度相關的元件,被定義為基本計算組件,聚合到單個topology結構中執(zhí)行。不同到達率的實時數(shù)據(jù)流以及不斷變化的操作條件對數(shù)據(jù)處理提出了新挑戰(zhàn),因此,提高調(diào)度效率成為本文解決的主要問題,也成為在活躍的物理節(jié)點間查找Strom優(yōu)化布置的關鍵環(huán)節(jié)。然而,像許多其他大數(shù)據(jù)處理系統(tǒng)一樣,Storm沒有智能調(diào)度機制。目前在Storm中默認循環(huán)調(diào)度機制沒有充分考慮資源需求和可用性,導致了資源不能被充分使用或過度利用。設計出可以應對輸入數(shù)據(jù)流突然波動的彈性解決方案是最近熱門的研究領域。傳統(tǒng)的調(diào)度方案在很大程度上依賴于一組性能指標的測量,通過將其與另一組預定閾值進行比較來做出適當?shù)恼{(diào)度。這種方案缺乏對可用資源量的實時變化的適應性。本文提出了一個用于Storm框架的基于CPU、內(nèi)存、網(wǎng)絡帶寬的資源自適應調(diào)度器,能更有效地分配資源并提高性能,并且考慮了Storm的任務間的數(shù)據(jù)傳輸速率和負載均衡,將高度通信的任務對分配給同一組計算節(jié)點。同Storm提供的默認調(diào)度相比,本文的調(diào)度算法具有顯著的改進,它將整個任務分布在集群中,感知CPU、內(nèi)存、網(wǎng)絡帶寬的變化來進行任務調(diào)度。通過分析Storm默認任務調(diào)度策略的特點和性能,本文設計并實現(xiàn)了基于Storm資源感知的流數(shù)據(jù)處理系統(tǒng)。與默認的Storm調(diào)度相比,改進后的Storm調(diào)度具有以下理想特征:(1)基于運行時狀態(tài),通過有效的資源感知調(diào)度來動態(tài)地分配或重新分配任務來加速數(shù)據(jù)處理,從而最小化節(jié)點間和進程間資源開銷的同時確保沒有工作節(jié)點過載;(2)它能夠對工作節(jié)點進行資源整合,從而進行細粒度的控制,使改進后Storm能夠以更少的工作節(jié)點實現(xiàn)更好的性能;(3)它允許調(diào)度算法通過代碼實現(xiàn)模塊化管理,也允許調(diào)度參數(shù)的調(diào)整;(4)它對Storm用戶是透明的,Storm應用程序可以被移植到改進后Strom調(diào)度的平臺上。本文在SOL、RollingSort和WordCount這三種Benchmark流數(shù)據(jù)處理應用程序的基礎上添加感知CPU、內(nèi)存、網(wǎng)絡帶寬的監(jiān)控程序代碼,將監(jiān)控信息存入數(shù)據(jù)庫中,調(diào)度器根據(jù)改進后的算法程序從數(shù)據(jù)庫中獲取數(shù)據(jù)并替換默認的調(diào)度策略,自動生成對topology節(jié)點的吞吐量和節(jié)點間的時間延遲的統(tǒng)計表以進行性能評估。多次的實驗結果表明,與Storm默認調(diào)度程序相比,改進后的Storm在SOL、RollingSort和WordCount上的性能更優(yōu)。
[Abstract]:With the increasing speed of data creation in large data applications, a lot of data need to be processed in time. Apache Storm is a flow processing system. It has the advantages of real-time, distributed, scalable and high reliable data processing. It is paid much attention in the academia and industry. In a complex flow event processing engine, data is necessary. The event real-time flow that must be quickly analyzed and processed is mainly used in large data, and the generated data streams are processed and processed to prepare for the further generation of new event data streams. In order to assess whether the resource allocation strategy is successful, three performance metrics are used to check the adaptability of the resource volatility during resource scheduling. These performance metrics include processing latency, resource throughput, and user satisfaction. Executing scheduling related components are defined as basic computing components, aggregated into a single topology structure. Real time data streams with different arrival rates and changing operating conditions pose new challenges to data handling. Therefore, scheduling efficiency is improved. As the main problem solved in this article, it is also the key link to find the optimal Strom arrangement between active physical nodes. However, like many other large data processing systems, Storm has no intelligent scheduling mechanism. At present, the default cyclic scheduling mechanism in Storm does not fully consider the resource requirements and availability, resulting in the failure of the resources to be filled. An elastic solution that can cope with the sudden fluctuation of the input data flow is a recent hot research field. The traditional scheduling scheme, to a large extent, relies on the measurement of a set of performance metrics and makes appropriate scheduling by comparing it with another set of predetermined thresholds. In this paper, a resource adaptive scheduler based on CPU, memory, network bandwidth is proposed for Storm framework, which can allocate resources and improve performance more effectively, and consider the data transmission rate and load balance between tasks of Storm, and assign the task pairs of high communication to the same group. Compared with the default scheduling provided by Storm, the scheduling algorithm in this paper has a significant improvement. It distributes the whole task in the cluster, perceiving the changes of CPU, memory, and network bandwidth to perform task scheduling. By analyzing the characteristics and performance of the Storm default task scheduling strategy, this paper designs and implements a flow based on the Storm resource perception. The data processing system. Compared with the default Storm scheduling, the improved Storm scheduling has the following ideal features: (1) to dynamically allocate or reassign tasks to speed up data processing based on the runtime state, dynamically allocate or reassign tasks through the efficient resource aware scheduling, thus minimizing the inter node and inter process resource overhead while ensuring no working nodes. Overload; (2) it can integrate the resources of the work node to make fine-grained control so that the improved Storm can achieve better performance with fewer work nodes; (3) it allows the scheduling algorithm to implement modularized management through the code and allow the adjustment of the scheduling parameters; (4) it is transparent to the Storm user, and the Storm application can On the platform of the improved Strom scheduling. Based on the three Benchmark stream data processing applications of SOL, RollingSort and WordCount, this article adds the monitoring program code that perceiving CPU, memory, network bandwidth, storing the monitoring information into the database, and the scheduler obtains data from the database based on the improved algorithm program and Instead of the default scheduling policy, a statistical table of throughput and time delay between the topology nodes is automatically generated for performance evaluation. Several experimental results show that the improved Storm is better than the Storm default scheduler on SOL, RollingSort and WordCount.
【學位授予單位】：新疆大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：TP301.6

【參考文獻】

相關期刊論文前7條

1 陳伯雄;艾中良;;差異化作業(yè)調(diào)度在Storm上的實現(xiàn)[J];軟件;2017年01期

2 熊安萍;王賢穩(wěn);鄒洋;;基于Storm拓撲結構熱邊的調(diào)度算法[J];計算機工程;2017年01期

3 黃容;王賢穩(wěn);;基于Storm slot使用率低優(yōu)先的動態(tài)負載均衡策略[J];電腦知識與技術;2016年36期

4 楊秋吉;于俊清;莫斌生;何云峰;;面向Storm的數(shù)據(jù)流編程模型與編譯優(yōu)化方法研究[J];計算機工程與科學;2016年12期

5 孫大為;;大數(shù)據(jù)流式計算:應用特征和技術挑戰(zhàn)[J];大數(shù)據(jù);2015年03期

6 孫大為;張廣艷;鄭緯民;;大數(shù)據(jù)流式計算:關鍵技術及系統(tǒng)實例[J];軟件學報;2014年04期

7 孟小峰;慈祥;;大數(shù)據(jù)管理:概念、技術與挑戰(zhàn)[J];計算機研究與發(fā)展;2013年01期

相關碩士學位論文前3條

1 談杰;基于storm的實時物流數(shù)據(jù)查詢系統(tǒng)設計與實現(xiàn)[D];南京郵電大學;2016年

2 李萍;基于SLA感知的Hadoop YARN節(jié)能調(diào)度策略研究[D];山東大學;2016年

3 王冬;基于Storm的鐵道供電監(jiān)控信息實時流計算處理研究[D];華東交通大學;2016年

，

本文編號：2148344

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/shoufeilunwen/xixikjs/2148344.html

上一篇：基于嵌入式的遠程視頻監(jiān)控系統(tǒng)的設計與實現(xiàn)
下一篇：安徽省高校圖書館特色數(shù)字資源建設調(diào)查與分析

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

Storm環(huán)境下基于資源感知的任務調(diào)度研究