基于嵌入式多核系統(tǒng)的音頻解碼程序并行化方法
發(fā)布時(shí)間:2018-04-24 11:25
本文選題:并行編程 + 程序分析; 參考:《浙江大學(xué)》2015年碩士論文
【摘要】:隨著多核處理器的發(fā)展,軟件多線程日益成為制約程序性能的瓶頸。鑒于在嵌入式多核處理器上運(yùn)行的很大一類(lèi)程序是多媒體解碼程序,本文研究對(duì)其進(jìn)行并行化的方法。 程序并行化包括4個(gè)步驟:1)并行性分析,2)并行方案制定,3)代碼生成,4)運(yùn)行時(shí)管理。本文重點(diǎn)研究其中的并行性分析部分,包括程序結(jié)構(gòu)分析和并行區(qū)域檢測(cè)兩個(gè)步驟。 本文的程序結(jié)構(gòu)分析方法結(jié)合了動(dòng)靜態(tài)分析,該方法先對(duì)代碼進(jìn)行預(yù)編譯,然后進(jìn)行動(dòng)態(tài)分析,把開(kāi)銷(xiāo)以注釋的形式添加到源代碼中,再進(jìn)行靜態(tài)分析,建立程序調(diào)用圖,它以函數(shù)和循環(huán)為節(jié)點(diǎn),以函數(shù)和循環(huán)間的調(diào)用關(guān)系為有向邊,節(jié)點(diǎn)上的數(shù)值代表其開(kāi)銷(xiāo)。得到的程序調(diào)用圖用于為后續(xù)的并行區(qū)域檢測(cè)提供參考。 本文的并行區(qū)域檢測(cè)方法融合了多種粒度的并行區(qū)域檢測(cè)。對(duì)于數(shù)據(jù)并行,檢測(cè)對(duì)于連續(xù)地址的連續(xù)讀寫(xiě)操作;對(duì)于任務(wù)并行,檢測(cè)函數(shù)之間的讀寫(xiě)依賴(lài)關(guān)系;對(duì)于流水并行,檢測(cè)循環(huán)內(nèi)各個(gè)任務(wù)之間的依賴(lài)關(guān)系。該檢測(cè)方法基于動(dòng)態(tài)分析,能夠避免靜態(tài)分析所帶來(lái)的保守估計(jì)。 我們使用APE和MP3解碼程序作為實(shí)驗(yàn)對(duì)象,分別在2核和4核軟件模擬器多核平臺(tái)上進(jìn)行了評(píng)估,分別獲得了7.28和3.97的加速比,功耗比則分別為0.29和0.47,在提升速度的同時(shí),也降低了功耗,證實(shí)了該方法的有效性和良好的可擴(kuò)展性。
[Abstract]:With the development of multi-core processor, software multi-thread is becoming the bottleneck of program performance. In view of the fact that a large class of programs running on embedded multicore processors are multimedia decoding programs, this paper studies the method of parallelizing them. Program parallelization consists of four steps: 1) parallelism Analysis 2) parallel programming / 3) Code Generation / 4) runtime Management. This paper focuses on the parallelism analysis, which includes two steps: program structure analysis and parallel region detection. The method of program structure analysis in this paper combines dynamic and static analysis. The method first precompiled the code, then dynamically analyzed it, then added the overhead to the source code in the form of annotation, then static analysis, established the program call diagram. It takes function and loop as nodes, and calls between functions and loops as directed edges, and the values on nodes represent its overhead. The obtained program call diagram can be used as a reference for subsequent parallel region detection. The parallel region detection method in this paper combines multiple granularity parallel region detection. For data parallelism, continuous read and write operations for continuous addresses are detected; for task parallelism, read-write dependencies between functions are detected; for pipelined parallelism, dependencies between tasks in the loop are detected. The detection method is based on dynamic analysis and can avoid the conservative estimation brought by static analysis. Using APE and MP3 decode programs as experimental objects, we evaluated them on multi-core platforms of 2-core and 4-core software simulators respectively. The speedup ratios of 7.28 and 3.97 were obtained respectively, and the power ratios were 0.29 and 0.47, respectively. It also reduces the power consumption and proves the effectiveness and scalability of the method.
【學(xué)位授予單位】:浙江大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類(lèi)號(hào)】:TP332
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 伊君翰;;基于多核處理器的并行編程模型[J];計(jì)算機(jī)工程;2009年08期
,本文編號(hào):1796429
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1796429.html
最近更新
教材專(zhuān)著