末級高速緩存性能優(yōu)化關(guān)鍵技術(shù)研究

發(fā)布時間：2018-08-02 20:31

【摘要】：現(xiàn)代處理器普遍采用多級層次化高速緩存結(jié)構(gòu)以來彌補(bǔ)處理器和存儲器之間不斷擴(kuò)大的性能差距。與指令和數(shù)據(jù)分離的一級高速緩存設(shè)計(jì)不同，共享的末級高速緩存受內(nèi)層高速緩存的過濾作用，導(dǎo)致訪問末級高速緩存的數(shù)據(jù)局部性相對較差。因此，通常面向傳統(tǒng)的小容量私有一級高速緩存的管理策略難以有效利用末級高速緩存空間，嚴(yán)重影響處理器訪存性能的提升。對末級高速緩存進(jìn)行有效管理，減少末級高速緩存失效對于提高系統(tǒng)整體性能具有重要的意義。操作系統(tǒng)負(fù)責(zé)分配物理內(nèi)存，建立虛實(shí)地址映射關(guān)系。通過修改物理頁框分配策略可以影響末級高速緩存中的數(shù)據(jù)布局，優(yōu)化數(shù)據(jù)的局部性，減少末級高速緩存失效。同基于硬件設(shè)計(jì)和編譯技術(shù)的傳統(tǒng)末級高速緩存優(yōu)化策略相比，上述方法具有硬件改動小、應(yīng)用透明等優(yōu)點(diǎn)。然而，現(xiàn)有操作系統(tǒng)設(shè)計(jì)并沒有充分考慮末級高速緩存優(yōu)化，缺乏控制和管理末級高速緩存的有效手段。本文分別從操作系統(tǒng)內(nèi)存管理策略設(shè)計(jì)和軟硬件協(xié)同末級高速緩存設(shè)計(jì)兩個方面，展開面向末級高速緩存的性能優(yōu)化關(guān)鍵技術(shù)研究，主要研究工作和成果包括如下： 1.提出了一種降低末級高速緩存污染的分區(qū)域軟件劃分方法。局部性差數(shù)據(jù)進(jìn)入到末級高速緩存后可能會將經(jīng)常被訪問到的數(shù)據(jù)替換出去，產(chǎn)生末級高速緩存污染問題。該方法采用基于訪存蹤跡的局部性剖視反饋機(jī)制，檢測并發(fā)現(xiàn)訪存密集型程序內(nèi)局部性差的污染數(shù)據(jù)區(qū)域；并通過修改操作系統(tǒng)物理頁框分配策略，將污染數(shù)據(jù)集合分配到較小的末級高速緩存空間中。采用該方法可以在末級高速緩存中保護(hù)局部性良好的數(shù)據(jù)，提高末級高速緩存命中率。實(shí)驗(yàn)結(jié)果表明同現(xiàn)有Linux操作系統(tǒng)相比，采用本方法后末級高速緩存每千行指令失效數(shù)MPKI平均減少15.23%，程序性能平均提高了7.01%。 2.提出了一種將進(jìn)程間劃分和污染區(qū)域隔離相結(jié)合的多核處理器共享末級高速緩存優(yōu)化方法。并發(fā)進(jìn)程數(shù)據(jù)以及進(jìn)程內(nèi)不同數(shù)據(jù)區(qū)域會相互搶占多核處理器共享末級高速緩存空間，產(chǎn)生嚴(yán)重的共享末級高速緩存數(shù)據(jù)訪問沖突。該方法檢測并發(fā)現(xiàn)應(yīng)用程序在不同共享末級高速緩存配置下的污染數(shù)據(jù)區(qū)域分布，并在末級高速緩存中設(shè)置全局污染緩沖區(qū)集中映射各個并發(fā)進(jìn)程內(nèi)部污染數(shù)據(jù)區(qū)域。該方法可以在進(jìn)程間劃分的基礎(chǔ)上進(jìn)一步提高多核處理器多進(jìn)程并發(fā)執(zhí)行環(huán)境下共享末級高速緩存利用率。實(shí)驗(yàn)結(jié)果表明，同現(xiàn)有Linux操作系統(tǒng)和進(jìn)程間劃分方法RapidMRC相比，采用該方法后多核系統(tǒng)的整體性能分別提高了26.31%和5.86%。 3.提出了一種輕量級硬件支持的頁粒度軟件控制末級高速緩存插入策略。由于記錄的訪存信息有限，單純基于硬件實(shí)現(xiàn)的末級高速緩存管理策略難以識別程序內(nèi)不同數(shù)據(jù)區(qū)域的訪存行為，，無法有效檢測并定位局部性差的污染數(shù)據(jù)。該方法利用現(xiàn)有處理器頁表項(xiàng)保留位設(shè)計(jì)末級高速緩存插入策略軟件控制接口；同時，在剖視信息的指導(dǎo)下以頁為單位控制污染區(qū)域數(shù)據(jù)進(jìn)入末級高速緩存的插入位置。該方法具有較小的硬件開銷，可以在采用錦標(biāo)賽機(jī)制的硬件插入策略的基礎(chǔ)上進(jìn)一步降低末級高速緩存污染，提高處理器訪存性能。實(shí)驗(yàn)結(jié)果表明，同現(xiàn)有的LRU、DIP和DRRIP方法相比，采用該方法后末級高速緩存MPKI平均降低了14.33%、9.68%和6.24%；處理器平均性能分別提高了8.3%、6.23%和4.24%。 4.提出了一種面向虛擬地址區(qū)域的軟硬件協(xié)同末級高速緩存管理策略。在程序運(yùn)行過程中，連續(xù)的虛擬地址區(qū)域中的數(shù)據(jù)往往被映射到分散的物理頁框中�，F(xiàn)有末級高速緩存性能監(jiān)視器難以統(tǒng)計(jì)這種數(shù)據(jù)分布的情況，無法為運(yùn)行時刻優(yōu)化方案提供指導(dǎo)。該方法首先設(shè)計(jì)了一種面向虛擬地址空間的末級高速緩存分區(qū)域性能監(jiān)視器，用于在線記錄程序內(nèi)不同數(shù)據(jù)區(qū)域的末級高速緩存訪問信息；其次，設(shè)計(jì)了一種分區(qū)域性能監(jiān)視器支持的在線剖視分析方法，在運(yùn)行時刻了解程序內(nèi)不同數(shù)據(jù)區(qū)域的訪存行為和局部性特征；最后，設(shè)計(jì)了末級高速緩存軟件控制接口。操作系統(tǒng)在剖視信息的指導(dǎo)下，根據(jù)每個數(shù)據(jù)區(qū)域的訪存行為，為不同數(shù)據(jù)區(qū)域配置合理的旁路和插入策略。采用該方法可以在不顯著增加硬件開銷的前提下，有效提高末級高速緩存利用率。實(shí)驗(yàn)結(jié)果表明，與現(xiàn)有的LRU、DIP和DRRIP方法相比，采用本方法后處理器平均性能分別提高了8.05%、5.94%和4.01%。
[Abstract]:Modern processors generally adopt multi-level caching architecture to compensate for the increasing performance gap between processor and memory. Unlike the first level cache design for separation of instructions and data, the shared last level cache is filtered by the internal cache, resulting in access to the data locality of the last stage cache. It is relatively poor. Therefore, the traditional small capacity private first level cache management strategy is difficult to effectively use the last level cache space, which seriously affects the improvement of processor memory performance. The effective management of the last stage cache and the reduction of the last stage cache failure are of great significance to improving the overall performance of the system.
The operating system is responsible for the allocation of physical memory and the establishment of virtual address mapping relations. By modifying the physical page frame allocation strategy, the data layout in the last level cache can be affected, the locality of the data is optimized and the last stage cache failure is reduced. Compared with the traditional last level cache optimization strategy based on the hardware design and compilation technology, The method has the advantages of small hardware modification and transparent application. However, the existing operating system design does not fully consider the last stage cache optimization, and is lack of effective means to control and manage the last level cache. This paper is based on two aspects of the operating system memory management strategy design and the software and hardware coincident with the last level cache design. Research on Key Technologies of performance optimization for last level caching, the main research work and achievements are as follows:
1. a subregion software partition method is proposed to reduce the end level cache pollution. The local poor data may replace the frequently accessed data after entering the last level cache to produce the last level cache pollution problem. This method uses a local profile feedback mechanism based on the memory tracking trace to detect concurrent visits. The contaminated data region of the locality in the stored intensive program; and by modifying the physical page frame allocation strategy of the operating system to allocate the pollution data set to the smaller end level cache space. This method can protect the local good data in the last stage cache and improve the hit rate of the last stage cache. Compared with the existing Linux operating system, the average number of failure number per thousand lines MPKI reduced by 15.23%, and the performance of the program improved by 7.01%..
2. a multi core processor sharing final cache optimization method is proposed, which combines interprocess partitioning and contaminated area isolation. The concurrent process data and the different data regions in the process will share the multi core processor to share the last stage cache space, resulting in a serious shared last level cache data access conflict. Detection and discovery of the distributed data area of the application program under the different shared last level cache configuration, and set the global pollution buffer in the last level cache to map the internal pollution data regions of each concurrent process. This method can further improve multi processor multi process concurrent execution on the basis of inter process division. The experimental results show that compared with the Linux operating system and the inter process partition method RapidMRC, the overall performance of the multi core system is increased by 26.31% and 5.86%. respectively.
3. a lightweight hardware supported page granularity software to control the last level cache insertion strategy is proposed. Due to the limited memory access information, the last level cache management strategy based on hardware implementation is difficult to identify the memory access behavior of different data regions in the program and can not effectively detect and locate the local poor pollution data. The method uses the existing processor page table items to design the last level cache insertion strategy software control interface. At the same time, under the guidance of the profile information, the contaminated area data is controlled into the insertion position of the last stage cache. This method has small hardware overhead and can be inserted in the hardware of the tournament mechanism. On the basis of the strategy, the last level cache pollution is further reduced and the processor memory performance is improved. Experimental results show that compared with the existing LRU, DIP and DRRIP methods, the last level cache MPKI is reduced by 14.33%, 9.68% and 6.24%, and the average performance of the processor is increased by 8.3%, 6.23% and 4.24%., respectively.
4. a software and hardware cooperative last level cache management strategy for virtual address area is proposed. In the process of running the program, data in the continuous virtual address area are often mapped to the scattered physical page frames. The existing last stage cache performance monitor is difficult to count the data distribution and can not be used for the running time. The optimization scheme provides guidance. This method first designs an end level cache partition domain performance monitor for the virtual address space, which is used to record the last level cache access information in different data regions within the program. Secondly, an online profile analysis method supported by the sub regional performance monitor is designed and it is running at the time of operation. In the end, the last level caching software control interface is designed. Under the guidance of the profile information, the operating system configuring a reasonable bypass and insertion strategy for different data regions. The method can not be significantly increased. The experimental results show that, compared with the existing LRU, DIP and DRRIP methods, the average performance of the post processor is improved by 8.05%, 5.94% and 4.01%., respectively.
【學(xué)位授予單位】：北京大學(xué)
【學(xué)位級別】：博士
【學(xué)位授予年份】：2013
【分類號】：TP333

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 ;Altera宣布為Nios Ⅱ處理器系統(tǒng)提供新的C語言至硬件加速工具[J];電子與電腦;2006年05期

2 ;汽車用GPS導(dǎo)航系統(tǒng)解決方案[J];世界電子元器件;2006年09期

3 徐鳳英;;Quad FX反戈一擊[J];新電腦;2007年02期

4 孫俊杰;;Xilinx要做處理器推Zynx平臺[J];中國電子商情(基礎(chǔ)電子);2011年04期

5 ;請問,您到底需要多少處理器?[J];每周電腦報;1997年15期

6 ;要聞速遞[J];電腦采購周刊;2001年34期

7 岳陽;;領(lǐng)略英特爾“超線程”技術(shù)[J];電腦采購周刊;2002年46期

8 張?jiān)?;圖形工作站升級雙核 Dell Precision 670[J];個人電腦;2006年02期

9 John Goodacre;;多重處理的設(shè)計(jì)選擇:多處理器或多線程技術(shù)[J];電子設(shè)計(jì)應(yīng)用;2006年08期

10 付漢杰;;利用NIOS Ⅱ處理器構(gòu)建節(jié)省成本的嵌入式系統(tǒng)[J];今日電子;2007年05期

相關(guān)會議論文前10條

1 羅懷林;張玲玉;鄭自求;;重載低速傳動末級齒輪自適應(yīng)的運(yùn)動學(xué)和力學(xué)分析[A];面向制造業(yè)的自動化與信息化技術(shù)創(chuàng)新設(shè)計(jì)的基礎(chǔ)技術(shù)——2001年中國機(jī)械工程學(xué)會年會暨第九屆全國特種加工學(xué)術(shù)年會論文集[C];2001年

2 單書暢;胡瑜;李曉維;;多核處理器的核級冗余容錯技術(shù)[A];第六屆中國測試學(xué)術(shù)會議論文集[C];2010年

3 張曉輝;程歸鵬;從明;;龍芯處理器上的TLB性能優(yōu)化技術(shù)[A];2010年第16屆全國信息存儲技術(shù)大會（IST2010）論文集[C];2010年

4 商宇;何斌;盧中俊;楊長柱;徐榮冬;;1200mm末級動葉片開發(fā)試驗(yàn)研究[A];中國動力工程學(xué)會透平專業(yè)委員會2011年學(xué)術(shù)研討會論文集[C];2011年

5 陳深龍;張玉清;;基于國家標(biāo)準(zhǔn)的風(fēng)險評估方法研究[A];全國計(jì)算機(jī)安全學(xué)術(shù)交流會論文集（第二十二卷）[C];2007年

6 賀元成;羅旭東;馮彥賓;鄭自求;;一種重載低速傳動末級小齒輪自調(diào)位裝置的設(shè)計(jì)研究[A];2005年中國機(jī)械工程學(xué)會年會論文集[C];2005年

7 楊建道;;引進(jìn)型百萬千瓦超超臨界汽輪機(jī)低壓流場分析[A];中國動力工程學(xué)會第三屆青年學(xué)術(shù)年會論文集[C];2005年

8 祁舒U

本文編號：2160641

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2160641.html

上一篇：存儲虛擬化原理分析及其實(shí)現(xiàn)的研究
下一篇：嵌入式處理器中多媒體加速單元的研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

末級高速緩存性能優(yōu)化關(guān)鍵技術(shù)研究