基于飛騰平臺(tái)硬件數(shù)據(jù)壓縮的研究與設(shè)計(jì)實(shí)現(xiàn)
發(fā)布時(shí)間:2018-03-19 12:13
本文選題:大數(shù)據(jù) 切入點(diǎn):zlib函數(shù)庫 出處:《國(guó)防科學(xué)技術(shù)大學(xué)》2013年碩士論文 論文類型:學(xué)位論文
【摘要】:數(shù)據(jù)壓縮是有效處理大數(shù)據(jù)的關(guān)鍵技術(shù),隨著互聯(lián)網(wǎng)技術(shù)的快速發(fā)展和普及,需要通過計(jì)算機(jī)存儲(chǔ)和處理的數(shù)據(jù)正在以指數(shù)級(jí)的速度增長(zhǎng),研究如何對(duì)存儲(chǔ)的大規(guī)模數(shù)據(jù)進(jìn)行高效率的壓縮以減少存儲(chǔ)空間,以及如何對(duì)網(wǎng)絡(luò)傳輸?shù)臄?shù)據(jù)進(jìn)行壓縮以減少網(wǎng)絡(luò)傳輸流量,是現(xiàn)在大規(guī)模數(shù)據(jù)處理的重點(diǎn)。在國(guó)產(chǎn)飛騰處理器平臺(tái)上基于zlib庫的數(shù)據(jù)壓縮解壓通常采用軟件實(shí)現(xiàn),飛騰處理器在進(jìn)行大數(shù)據(jù)的壓縮解壓時(shí)存在以下問題:1)受限于處理器性能,采用軟件壓縮解壓時(shí)速度慢,耗費(fèi)大量時(shí)間;2)在進(jìn)行大量數(shù)據(jù)壓縮解壓時(shí)CPU使用率居高不下,影響了整個(gè)系統(tǒng)的性能。針對(duì)這些問題,本文通過研究zlib編程函數(shù)庫,結(jié)合飛騰處理器的特性,完成了在飛騰平臺(tái)中基于硬件的數(shù)據(jù)壓縮解壓縮實(shí)現(xiàn)。首先,本文研究了數(shù)據(jù)壓縮的的相關(guān)原理,并介紹了數(shù)據(jù)壓縮實(shí)現(xiàn)的兩種方式:軟件數(shù)據(jù)壓縮和硬件數(shù)據(jù)壓縮;接著介紹了硬件數(shù)據(jù)壓縮的編碼方式以及使用硬件實(shí)現(xiàn)數(shù)據(jù)壓縮的三種方法:采用帶固件協(xié)同處理器的方式、采用專用集成電路的方式和將二者相結(jié)合起來的協(xié)同處理器加專用集成電路的方式;接著介紹了軟件數(shù)據(jù)壓縮的常用算法,并分析了這些算法的優(yōu)缺點(diǎn),并從算法的壓縮效率、壓縮比、算法適應(yīng)性以及硬件實(shí)現(xiàn)難易程度對(duì)這些算法進(jìn)行了深入的研究分析。其次,介紹了飛騰平臺(tái)的硬件架構(gòu)及其處理器的特點(diǎn):FT(飛騰)1000處理器采用sparc精簡(jiǎn)指令集、4核32線程,在多線程的任務(wù)處理方面性能突出;硬件CN61XX架構(gòu)和特點(diǎn):CN61XX包含一個(gè)MIPS處理器和專用壓縮解壓協(xié)處理器,適合對(duì)本地大批量數(shù)據(jù)進(jìn)行壓縮和解壓處理,結(jié)合這些硬件特點(diǎn)和原始zlib函數(shù)庫和CN61XX驅(qū)動(dòng)基礎(chǔ),最后在飛騰平臺(tái)上設(shè)計(jì)了FTHC(FT Hardware Compression)系統(tǒng)軟件,該系統(tǒng)軟件主要包括應(yīng)用函數(shù)庫模塊、內(nèi)核驅(qū)動(dòng)模塊和底層硬件模塊。本文設(shè)計(jì)實(shí)現(xiàn)的FTHC系統(tǒng)結(jié)構(gòu)簡(jiǎn)單合理、適應(yīng)性強(qiáng),可運(yùn)行于windows、linux、sparc體系結(jié)構(gòu)的系統(tǒng)中。再次,對(duì)實(shí)現(xiàn)數(shù)據(jù)壓縮解壓關(guān)鍵算法和內(nèi)核驅(qū)動(dòng)優(yōu)化所使用的基于滑動(dòng)窗口的字符匹配策略、數(shù)據(jù)分片技術(shù)、CN61XX識(shí)別技術(shù)、DMA緩沖區(qū)分配管理技術(shù)以及地址映射技術(shù)進(jìn)行了深入研究,并結(jié)合飛騰平臺(tái)和CN61XX特點(diǎn),設(shè)計(jì)實(shí)現(xiàn)了FT-zlib編程函數(shù)庫并提出了CN61XX高效DMA傳輸機(jī)制和基于一致性內(nèi)存的命令環(huán)機(jī)制。FT-zlib編程函數(shù)庫相對(duì)以前的zlib函數(shù)庫進(jìn)行了優(yōu)化,采用了數(shù)據(jù)分片技術(shù)、增加了對(duì)CN61XX設(shè)備的識(shí)別探測(cè)機(jī)制并且對(duì)算法接口進(jìn)行了設(shè)計(jì),提高了函數(shù)庫的可移植性;高效數(shù)據(jù)傳輸機(jī)制是為了提高CN61XX和主機(jī)之間數(shù)據(jù)傳輸?shù)男识岢?通常情況下采用聚集DMA操作方式,其缺點(diǎn)是一次操作大量數(shù)據(jù)不利于單個(gè)控制,而高效數(shù)據(jù)傳輸機(jī)制彌補(bǔ)了聚集DMA操作時(shí)對(duì)于單個(gè)DMA方向性的控制還提高了DMA傳輸?shù)男?基于一致性內(nèi)存的命令環(huán)機(jī)制的提出有效的解決了設(shè)備驅(qū)動(dòng)和CN61XX設(shè)備對(duì)同一塊內(nèi)存訪問的需要,對(duì)于內(nèi)存空間很容易出現(xiàn)寫完之后再讀的而產(chǎn)生的數(shù)據(jù)不一致,從而導(dǎo)致程序出錯(cuò),命令環(huán)中使用的數(shù)據(jù)一致性管理策略很好的解決了主機(jī)和CN61XX互斥訪問的問題,同時(shí)提高了整個(gè)系統(tǒng)的實(shí)時(shí)性。最后,采用Google提供的標(biāo)準(zhǔn)Snappy套件對(duì)整個(gè)系統(tǒng)進(jìn)行了性能測(cè)試,以及使用Hadoop工具對(duì)系統(tǒng)進(jìn)行了功能測(cè)試,測(cè)試結(jié)果表明在飛騰平臺(tái)中基于硬件的數(shù)據(jù)壓縮相對(duì)于以前的軟件壓縮效率提高了30到50倍,而軟件的解壓效率也提高了10倍,從而有效的緩解了飛騰平臺(tái)中對(duì)大數(shù)據(jù)的壓縮解壓處理所帶來的性能下降問題。
[Abstract]:Data compression is the key technology in the processing of large data, along with the rapid development and popularization of Internet technology, through the computer storage and processing data with exponential growth, how to study the compression of large-scale data on the storage of high efficiency to reduce the storage space, and how to compress the data transmission network to reduce network the traffic flow, is now focusing on large-scale data processing. In the domestic intelligent processor platform based on zlib data compression and decompression is usually realized by software, intelligent processor there is a problem in the big data compression decompression: 1) limited to processor performance, the software compression and decompression speed, spend a lot of time; 2) in the use of the CPU high rate of large amounts of data compression and decompression, affects the performance of the whole system. To solve these problems, this article through Study on zlib programming library, combined with the characteristics of intelligent processor, completed in the FT platform hardware based data compression and decompression implementation. Firstly, this paper studies the data compression principle, and introduces two methods to realize data compression, data compression software and hardware data compression; then introduces three methods to realize the data compression hardware data compression encoding and the use of hardware: with firmware coprocessor, using ASIC and co processor and special integrated circuit two ways to combine both of them; then introduces the software algorithms of data compression, and analyzes the advantages and disadvantages of these algorithms, and from the efficiency of compression, the compression ratio, the adaptability of algorithm and hardware implementation of the degree of difficulty of in-depth research and analysis of these algorithms. Secondly, introduces the fly The characteristics of the hardware platform and the architecture of Teng processor: FT (1000 ft) RISC processor using SPARC, 4 core 32 thread in a multi-threaded task processing, outstanding performance; CN61XX hardware architecture and features: CN61XX contains a MIPS processor and dedicated coprocessor for compression and decompression, compression and decompression treatment on local a large amount of data, combined with the hardware features and the original zlib function library and CN61XX driver foundation, finally in Feiteng platform design of FTHC (FT Hardware Compression) system software, the system software mainly includes the use function module, kernel module and hardware module of FTHC system. This paper introduces the design and implementation of simple and reasonable structure, adaptability strong, can run on windows, Linux, SPARC system. Thirdly, to realize the data compression and decompression algorithms using kernel driver optimization based on slip Moving window matching strategy, data slicing technique, CN61XX technique, DMA buffer allocation management technology and address mapping technology is studied, and combined with the Feiteng platform and CN61XX characteristics, design and implementation of the FT-zlib programming library and puts forward CN61XX efficient DMA transmission mechanism and based on the uniform memory command ring mechanism.FT-zlib the relative programming library zlib library before were optimized using data slices, increase the recognition of the CN61XX device detection mechanism and algorithm interface were designed, improved library portability; efficient data transfer mechanism is put forward in order to improve the efficiency of data transmission between the CN61XX and the host. Usually the aggregation of DMA mode of operation, its disadvantage is that a single control is not conducive to the operation of large amounts of data, and efficient data transmission mechanism for gathering DMA The operation for the control of a single DMA direction also improves the efficiency of DMA transmission; memory consistency mechanism based on command ring is proposed to solve the needs of device driver and CN61XX device access to the same memory, the memory space is very easy to appear after read and produce inconsistent data, resulting in the program is in error, the data consistency management strategy using the command in the ring is a good solution to the host and CN61XX exclusive access problems, and improve the real-time performance of the whole system. Finally, through the use of Google standard Snappy kit to test the property of the whole system, and the use of Hadoop tool system function test, test the results show that the intelligent platform based on hardware data compression compared to the previous software compression efficiency is improved by 30 to 50 times, while the software decompression efficiency is increased by 10 times, It effectively alleviates the performance degradation caused by the compression and decompression of large data in the flying platform.
【學(xué)位授予單位】:國(guó)防科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP333
,
本文編號(hào):1634217
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1634217.html
最近更新
教材專著