末級高速緩存性能優(yōu)化關(guān)鍵技術(shù)研究
[Abstract]:Modern processors generally adopt multi-level caching architecture to compensate for the increasing performance gap between processor and memory. Unlike the first level cache design for separation of instructions and data, the shared last level cache is filtered by the internal cache, resulting in access to the data locality of the last stage cache. It is relatively poor. Therefore, the traditional small capacity private first level cache management strategy is difficult to effectively use the last level cache space, which seriously affects the improvement of processor memory performance. The effective management of the last stage cache and the reduction of the last stage cache failure are of great significance to improving the overall performance of the system.
The operating system is responsible for the allocation of physical memory and the establishment of virtual address mapping relations. By modifying the physical page frame allocation strategy, the data layout in the last level cache can be affected, the locality of the data is optimized and the last stage cache failure is reduced. Compared with the traditional last level cache optimization strategy based on the hardware design and compilation technology, The method has the advantages of small hardware modification and transparent application. However, the existing operating system design does not fully consider the last stage cache optimization, and is lack of effective means to control and manage the last level cache. This paper is based on two aspects of the operating system memory management strategy design and the software and hardware coincident with the last level cache design. Research on Key Technologies of performance optimization for last level caching, the main research work and achievements are as follows:
1. a subregion software partition method is proposed to reduce the end level cache pollution. The local poor data may replace the frequently accessed data after entering the last level cache to produce the last level cache pollution problem. This method uses a local profile feedback mechanism based on the memory tracking trace to detect concurrent visits. The contaminated data region of the locality in the stored intensive program; and by modifying the physical page frame allocation strategy of the operating system to allocate the pollution data set to the smaller end level cache space. This method can protect the local good data in the last stage cache and improve the hit rate of the last stage cache. Compared with the existing Linux operating system, the average number of failure number per thousand lines MPKI reduced by 15.23%, and the performance of the program improved by 7.01%..
2. a multi core processor sharing final cache optimization method is proposed, which combines interprocess partitioning and contaminated area isolation. The concurrent process data and the different data regions in the process will share the multi core processor to share the last stage cache space, resulting in a serious shared last level cache data access conflict. Detection and discovery of the distributed data area of the application program under the different shared last level cache configuration, and set the global pollution buffer in the last level cache to map the internal pollution data regions of each concurrent process. This method can further improve multi processor multi process concurrent execution on the basis of inter process division. The experimental results show that compared with the Linux operating system and the inter process partition method RapidMRC, the overall performance of the multi core system is increased by 26.31% and 5.86%. respectively.
3. a lightweight hardware supported page granularity software to control the last level cache insertion strategy is proposed. Due to the limited memory access information, the last level cache management strategy based on hardware implementation is difficult to identify the memory access behavior of different data regions in the program and can not effectively detect and locate the local poor pollution data. The method uses the existing processor page table items to design the last level cache insertion strategy software control interface. At the same time, under the guidance of the profile information, the contaminated area data is controlled into the insertion position of the last stage cache. This method has small hardware overhead and can be inserted in the hardware of the tournament mechanism. On the basis of the strategy, the last level cache pollution is further reduced and the processor memory performance is improved. Experimental results show that compared with the existing LRU, DIP and DRRIP methods, the last level cache MPKI is reduced by 14.33%, 9.68% and 6.24%, and the average performance of the processor is increased by 8.3%, 6.23% and 4.24%., respectively.
4. a software and hardware cooperative last level cache management strategy for virtual address area is proposed. In the process of running the program, data in the continuous virtual address area are often mapped to the scattered physical page frames. The existing last stage cache performance monitor is difficult to count the data distribution and can not be used for the running time. The optimization scheme provides guidance. This method first designs an end level cache partition domain performance monitor for the virtual address space, which is used to record the last level cache access information in different data regions within the program. Secondly, an online profile analysis method supported by the sub regional performance monitor is designed and it is running at the time of operation. In the end, the last level caching software control interface is designed. Under the guidance of the profile information, the operating system configuring a reasonable bypass and insertion strategy for different data regions. The method can not be significantly increased. The experimental results show that, compared with the existing LRU, DIP and DRRIP methods, the average performance of the post processor is improved by 8.05%, 5.94% and 4.01%., respectively.
【學(xué)位授予單位】:北京大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2013
【分類號】:TP333
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 ;Altera宣布為Nios Ⅱ處理器系統(tǒng)提供新的C語言至硬件加速工具[J];電子與電腦;2006年05期
2 ;汽車用GPS導(dǎo)航系統(tǒng)解決方案[J];世界電子元器件;2006年09期
3 徐鳳英;;Quad FX反戈一擊[J];新電腦;2007年02期
4 孫俊杰;;Xilinx要做處理器 推Zynx平臺[J];中國電子商情(基礎(chǔ)電子);2011年04期
5 ;請問,您到底需要多少處理器?[J];每周電腦報;1997年15期
6 ;要聞速遞[J];電腦采購周刊;2001年34期
7 岳陽;;領(lǐng)略英特爾“超線程”技術(shù)[J];電腦采購周刊;2002年46期
8 張?jiān)?;圖形工作站 升級雙核 Dell Precision 670[J];個人電腦;2006年02期
9 John Goodacre;;多重處理的設(shè)計(jì)選擇:多處理器或多線程技術(shù)[J];電子設(shè)計(jì)應(yīng)用;2006年08期
10 付漢杰;;利用NIOS Ⅱ處理器構(gòu)建節(jié)省成本的嵌入式系統(tǒng)[J];今日電子;2007年05期
相關(guān)會議論文 前10條
1 羅懷林;張玲玉;鄭自求;;重載低速傳動末級齒輪自適應(yīng)的運(yùn)動學(xué)和力學(xué)分析[A];面向制造業(yè)的自動化與信息化技術(shù)創(chuàng)新設(shè)計(jì)的基礎(chǔ)技術(shù)——2001年中國機(jī)械工程學(xué)會年會暨第九屆全國特種加工學(xué)術(shù)年會論文集[C];2001年
2 單書暢;胡瑜;李曉維;;多核處理器的核級冗余容錯技術(shù)[A];第六屆中國測試學(xué)術(shù)會議論文集[C];2010年
3 張曉輝;程歸鵬;從明;;龍芯處理器上的TLB性能優(yōu)化技術(shù)[A];2010年第16屆全國信息存儲技術(shù)大會(IST2010)論文集[C];2010年
4 商宇;何斌;盧中俊;楊長柱;徐榮冬;;1200mm末級動葉片開發(fā)試驗(yàn)研究[A];中國動力工程學(xué)會透平專業(yè)委員會2011年學(xué)術(shù)研討會論文集[C];2011年
5 陳深龍;張玉清;;基于國家標(biāo)準(zhǔn)的風(fēng)險評估方法研究[A];全國計(jì)算機(jī)安全學(xué)術(shù)交流會論文集(第二十二卷)[C];2007年
6 賀元成;羅旭東;馮彥賓;鄭自求;;一種重載低速傳動末級小齒輪自調(diào)位裝置的設(shè)計(jì)研究[A];2005年中國機(jī)械工程學(xué)會年會論文集[C];2005年
7 楊建道;;引進(jìn)型百萬千瓦超超臨界汽輪機(jī)低壓流場分析[A];中國動力工程學(xué)會第三屆青年學(xué)術(shù)年會論文集[C];2005年
8 祁舒U
本文編號:2160641
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2160641.html