面向嵌入式系統(tǒng)的自調(diào)數(shù)據(jù)預(yù)取
本文選題:數(shù)據(jù)預(yù)取 切入點(diǎn):多核處理器 出處:《浙江大學(xué)》2013年碩士論文 論文類(lèi)型:學(xué)位論文
【摘要】:針對(duì)計(jì)算機(jī)系統(tǒng)中存在的存儲(chǔ)墻問(wèn)題,現(xiàn)代處理器采用預(yù)取技術(shù),利用應(yīng)用程序中存在的規(guī)律性地址訪問(wèn)模式,來(lái)對(duì)存儲(chǔ)訪問(wèn)行為進(jìn)行預(yù)測(cè),以減少高速緩存缺失次數(shù)。然而目前工業(yè)和學(xué)術(shù)界的各種預(yù)取技術(shù)存在以下問(wèn)題:1)應(yīng)用程序中存在大量的鏈表指針模式,而主流商業(yè)處理器上的預(yù)取引擎只針對(duì)線性地址模式進(jìn)行預(yù)測(cè);2)現(xiàn)有的指針預(yù)取方法對(duì)返回值進(jìn)行類(lèi)地址判斷,其預(yù)取準(zhǔn)確率較低,通常在10%以下;3)在多核處理器上數(shù)據(jù)預(yù)取引擎會(huì)加劇對(duì)共享資源的沖突,進(jìn)而導(dǎo)致系統(tǒng)總體性能降低。 本文開(kāi)發(fā)了一款兼容MIPS32指令集的周期級(jí)軟件模擬器,來(lái)對(duì)嵌入式單核/多核處理器的功能、時(shí)序和成本三方面進(jìn)行建模。在該平臺(tái)上針對(duì)上述現(xiàn)有預(yù)取技術(shù)中存在的問(wèn)題探索解決方案。根據(jù)對(duì)應(yīng)用特性的分析和優(yōu)化空間探索,提出了用于嵌入式單核處理器的多模式自調(diào)數(shù)據(jù)預(yù)取方案。該解決方案根據(jù)硬件統(tǒng)計(jì)的運(yùn)行時(shí)信息,通過(guò)特殊預(yù)取指令對(duì)兩種預(yù)取模式的激進(jìn)度進(jìn)行自適應(yīng)調(diào)節(jié),通過(guò)鏈?zhǔn)胶途性模式判斷提高了預(yù)取的準(zhǔn)確率。在單核軟件模擬器上執(zhí)行EEMBC、 SPEC CPU2006和OLDEN評(píng)測(cè)程序,結(jié)果表明,多模式預(yù)取引擎的準(zhǔn)確率分別平均為36%,40%和56%,而內(nèi)容指導(dǎo)(Content direct prefetching, CDP)的指針預(yù)取準(zhǔn)確率分別為8%,9%和24%,相對(duì)流預(yù)取、CDP指針預(yù)取和GHB預(yù)取性能分別提升7%、6%和9%。 本文針對(duì)多核多線程的應(yīng)用環(huán)境,提出一種線程分類(lèi)的預(yù)取機(jī)制,來(lái)降低數(shù)據(jù)預(yù)取導(dǎo)致的存儲(chǔ)系統(tǒng)資源競(jìng)爭(zhēng)。提出的多核數(shù)據(jù)預(yù)取機(jī)制包括:(1)采用過(guò)濾方式通知硬件單元,丟棄預(yù)取請(qǐng)求會(huì)導(dǎo)致線程間數(shù)據(jù)無(wú)效化的預(yù)取。(2)根據(jù)運(yùn)行時(shí)信息對(duì)線程進(jìn)行分類(lèi),調(diào)整各線程數(shù)據(jù)預(yù)取引擎的開(kāi)關(guān)狀態(tài)和激進(jìn)程度,從而降低了線程間的資源沖突。在16核系統(tǒng)進(jìn)行建模,采用PARSEC、SPLASH-2和科學(xué)計(jì)算程序進(jìn)行評(píng)估,結(jié)果表明:相比于基準(zhǔn)預(yù)取引擎,采用過(guò)濾機(jī)制和線程分類(lèi)調(diào)整預(yù)取策略,系統(tǒng)性能分別可以提升2%和6%。相比將反饋指導(dǎo)預(yù)取(Feedback direct prefetching, FDP)技術(shù)應(yīng)用于基準(zhǔn)預(yù)取引擎上的結(jié)果,本文提出的預(yù)取機(jī)制提升了4%的系統(tǒng)性能,并減少了4%的能量時(shí)間積。
[Abstract]:Aiming at the problem of storage wall in computer system, modern processor uses prefetching technology to predict storage access behavior by using regular address access mode in application program. To reduce the number of cache deletions. However, the current industrial and academic prefetching technologies have the following problems: 1) there are a large number of linked list pointer patterns in applications, On the other hand, the prefetching engine on the mainstream commercial processor only predicts the linear address mode. (2) the existing pointer prefetching method can judge the return value by class address, and the accuracy of prefetching is low. Generally less than 10%) data prefetching engines on multicore processors can exacerbate the conflict on shared resources and thus result in a deterioration in overall system performance. In this paper, a cycle level software simulator compatible with MIPS32 instruction set is developed to perform the function of embedded single core / multi core processor. Based on the analysis of the characteristics of the application and optimization of space exploration, this platform explores solutions to the problems existing in the existing prefetching technologies mentioned above. This paper presents a multi-mode self-tuning data prefetching scheme for embedded single-core processors, which adaptively adjusts the radicalization of the two prefetching modes through special prefetching instructions according to the runtime information of hardware statistics. The accuracy of prefetching is improved by chain and linear mode judgment. The EEMBC, SPEC CPU2006 and OLDEN evaluation programs are executed on the single core software simulator, and the results show that, The average accuracy of multi-mode prefetching engine is 36% and 56%, respectively, while the accuracy of content direct prefetching is 8% and 24%, respectively. The relative flow prefetching and GHB prefetching performance are improved by 7% and 9%, respectively. In this paper, a prefetching mechanism of thread classification is proposed to reduce the resource competition of storage system caused by data prefetching. The multi-core data prefetching mechanism includes: 1) notifying the hardware unit by filtering method. Pre-fetching requests, which will invalidate data between threads, categorize threads according to runtime information, and adjust the switch state and radicalization of each thread's data prefetching engine. Thus, the resource conflict between threads is reduced. Modeling in 16-core system, using PARS ECS / SPLASH-2 and scientific calculation program to evaluate, the results show that compared with the benchmark prefetching engine, filtering mechanism and thread classification are used to adjust the prefetching strategy. The system performance can be improved by 2% and 6% respectively. Compared with the result of applying feedback guidance prefetching (FDP) technique to the reference prefetching engine, the proposed prefetching mechanism improves the system performance by 4% and reduces the energy time product by 4%.
【學(xué)位授予單位】:浙江大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類(lèi)號(hào)】:TP333
【參考文獻(xiàn)】
相關(guān)期刊論文 前8條
1 高豐,劉鵬,姚慶棟,李東曉;一種基于HDTV信源集成解碼芯片的RTOS的設(shè)計(jì)與實(shí)現(xiàn)[J];電路與系統(tǒng)學(xué)報(bào);2002年03期
2 樊建平,陳明宇;網(wǎng)格化的動(dòng)態(tài)自組織高性能計(jì)算機(jī)體系結(jié)構(gòu)DSAG[J];計(jì)算機(jī)研究與發(fā)展;2003年12期
3 胡偉武;張福新;李祖松;;龍芯2號(hào)處理器設(shè)計(jì)和性能分析[J];計(jì)算機(jī)研究與發(fā)展;2006年06期
4 胡偉武,唐志敏;龍芯1號(hào)處理器結(jié)構(gòu)設(shè)計(jì)[J];計(jì)算機(jī)學(xué)報(bào);2003年04期
5 張福新;章隆兵;胡偉武;;基于SimpleScalar的龍芯CPU模擬器Sim-Godson[J];計(jì)算機(jī)學(xué)報(bào);2007年01期
6 郇丹丹;李祖松;胡偉武;劉志勇;;結(jié)合訪存失效隊(duì)列狀態(tài)的預(yù)取策略[J];計(jì)算機(jī)學(xué)報(bào);2007年07期
7 高翔;張福新;湯彥;章隆兵;胡偉武;唐志敏;;基于龍芯CPU的多核全系統(tǒng)模擬器SimOS-Goodson[J];軟件學(xué)報(bào);2007年04期
8 包云崗;許建衛(wèi);陳明宇;樊建平;;一種新型計(jì)算機(jī)體系結(jié)構(gòu)模擬器的研究與實(shí)現(xiàn)[J];系統(tǒng)仿真學(xué)報(bào);2007年07期
,本文編號(hào):1652221
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1652221.html