數(shù)據(jù)并行程序正確性分析與網(wǎng)絡(luò)流量優(yōu)化
發(fā)布時間:2018-11-20 07:46
【摘要】:數(shù)據(jù)并行編程模型以其簡單的特點在大數(shù)據(jù)計算領(lǐng)域獲得了廣泛的應(yīng)用。但是,編寫用戶自定義函數(shù)的串行思維與實際的并行執(zhí)行之間存在顯著的差異,使得程序員容易忽略并行執(zhí)行中的不確定性和通訊開銷,并引入正確性與性能方面的問題。本文分別從這兩方面問題入手,發(fā)現(xiàn)并改進(jìn)了現(xiàn)有工作的不足。本文的主要工作包括: (1)針對并行執(zhí)行不確定性引起的正確性問題,本文從上萬個真實數(shù)據(jù)并行程序中提取了507個不同的Reduce函數(shù)并進(jìn)行了人工分析,發(fā)現(xiàn)了大量并無正確性問題的不可交換Reduce函數(shù),否定了現(xiàn)有工作關(guān)于不可交換Reduce函數(shù)必為程序缺陷的假設(shè)。此外,本文進(jìn)一步發(fā)現(xiàn)了5種不可交換模式,以及部分模式依賴特定隱含數(shù)據(jù)性質(zhì)保證確定性的特點。通過檢查實際數(shù)據(jù)上的性質(zhì),本文成功發(fā)現(xiàn)了5個長期隱藏在產(chǎn)品環(huán)境中的真實程序缺陷,并討論了改進(jìn)現(xiàn)有測試方法的思路。 (2)本文提出了Cybertron,一種動靜態(tài)結(jié)合的數(shù)據(jù)并行程序網(wǎng)絡(luò)流量優(yōu)化技術(shù)。通過結(jié)合靜態(tài)程序分析結(jié)果和運行時動態(tài)信息,Cybertron克服了現(xiàn)有靜態(tài)技術(shù)在處理實際程序中面臨的限制,,通過在運行時精確跟蹤給定運算對數(shù)據(jù)的使用情況,更細(xì)粒度地過濾無用數(shù)據(jù),并使用數(shù)據(jù)約束編碼的技術(shù)對數(shù)據(jù)進(jìn)行更高效的編碼。它在合理的運行時開銷下顯著提高了現(xiàn)有方法對數(shù)據(jù)并行程序網(wǎng)絡(luò)流量的優(yōu)化效果,并在各種網(wǎng)絡(luò)環(huán)境下有效提高了程序性能。
[Abstract]:Data parallel programming model has been widely used in big data computing field because of its simple characteristics. However, there is a significant difference between the serial thinking of writing user-defined functions and the actual parallel execution, which makes it easy for programmers to ignore the uncertainty and communication overhead in parallel execution, and introduce the problems of correctness and performance. Starting with these two problems, this paper finds out and improves the shortcomings of the existing work. The main work of this paper includes: (1) aiming at the correctness problem caused by the uncertainty of parallel execution, we extract 507 different Reduce functions from tens of thousands of real data parallel programs and carry out manual analysis. A large number of non-commutative Reduce functions with no correctness problems are found, which negates the existing hypothesis that non-commutative Reduce functions must be program defects. In addition, five kinds of non-commutative patterns are further found, and some patterns depend on the nature of the specific implicit data to ensure certainty. By checking the properties of the actual data, this paper successfully finds five real program defects hidden in the product environment for a long time, and discusses the ideas of improving the existing testing methods. (2) in this paper, a dynamic and static data parallel program network traffic optimization technique based on Cybertron, is proposed. By combining the results of static program analysis and runtime dynamic information, Cybertron overcomes the limitations of existing static techniques in dealing with practical programs and accurately tracks the usage of data in given operations at run time. Filter useless data with finer granularity and use data constraint coding technology to encode data more efficiently. With reasonable runtime overhead, it can significantly improve the efficiency of the existing methods in optimizing the network traffic of data parallel programs, and improve the program performance in various network environments.
【學(xué)位授予單位】:清華大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2014
【分類號】:TP311.1;TP393.06
[Abstract]:Data parallel programming model has been widely used in big data computing field because of its simple characteristics. However, there is a significant difference between the serial thinking of writing user-defined functions and the actual parallel execution, which makes it easy for programmers to ignore the uncertainty and communication overhead in parallel execution, and introduce the problems of correctness and performance. Starting with these two problems, this paper finds out and improves the shortcomings of the existing work. The main work of this paper includes: (1) aiming at the correctness problem caused by the uncertainty of parallel execution, we extract 507 different Reduce functions from tens of thousands of real data parallel programs and carry out manual analysis. A large number of non-commutative Reduce functions with no correctness problems are found, which negates the existing hypothesis that non-commutative Reduce functions must be program defects. In addition, five kinds of non-commutative patterns are further found, and some patterns depend on the nature of the specific implicit data to ensure certainty. By checking the properties of the actual data, this paper successfully finds five real program defects hidden in the product environment for a long time, and discusses the ideas of improving the existing testing methods. (2) in this paper, a dynamic and static data parallel program network traffic optimization technique based on Cybertron, is proposed. By combining the results of static program analysis and runtime dynamic information, Cybertron overcomes the limitations of existing static techniques in dealing with practical programs and accurately tracks the usage of data in given operations at run time. Filter useless data with finer granularity and use data constraint coding technology to encode data more efficiently. With reasonable runtime overhead, it can significantly improve the efficiency of the existing methods in optimizing the network traffic of data parallel programs, and improve the program performance in various network environments.
【學(xué)位授予單位】:清華大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2014
【分類號】:TP311.1;TP393.06
【共引文獻(xiàn)】
相關(guān)期刊論文 前10條
1 李冰雨;呂帥;何麗莉;;改進(jìn)的基于逆向流分析的C程序切片算法[J];吉林大學(xué)學(xué)報(信息科學(xué)版);2014年01期
2 陳剛;;新能源發(fā)電并網(wǎng)控制策略研究[J];裝備制造技術(shù);2014年11期
3 龍慶;王t
本文編號:2344299
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2344299.html
最近更新
教材專著