改善Hadoop文件處理效率的技術(shù)研究
發(fā)布時(shí)間:2018-06-04 16:32
本文選題:分布式文件系統(tǒng) + 小文件。 參考:《微電子學(xué)與計(jì)算機(jī)》2014年07期
【摘要】:提出一種改善Hadoop文件處理效率的方法,在Hadoop中添加一個(gè)小文件處理模塊SFPM,根據(jù)文件名為海量小文件建立二級(jí)索引,同時(shí)采用預(yù)加載技術(shù)將索引提前存入緩存,可提高文件查找訪問(wèn)效率;在合并文件時(shí),采取舍棄多余空間的策略,避免將一個(gè)文件拆分存儲(chǔ)在兩個(gè)block上,減少了文件訪問(wèn)時(shí)間開(kāi)銷.實(shí)驗(yàn)結(jié)果表明該方法能有效減輕NameNode的負(fù)荷,提高小文件讀寫效率.
[Abstract]:A method to improve the efficiency of Hadoop file processing is put forward. A small file processing module, SFPM, is added to the Hadoop. According to the name of the file, a two level index is set up for a large number of small files. At the same time, the preload technology is used to save the index into the cache in advance. It can improve the efficiency of the file search access. It is avoided to save a file on two block and reduce the time cost of file access. Experimental results show that the method can effectively reduce the load of NameNode and improve the efficiency of reading and writing of small files.
【作者單位】: 廣東工業(yè)大學(xué)計(jì)算機(jī)學(xué)院;
【基金】:廣東省戰(zhàn)略性新興產(chǎn)業(yè)核心技術(shù)攻關(guān)項(xiàng)目(2012A010701004) 廣東省自然科學(xué)基金重點(diǎn)項(xiàng)目(S2012020011071) 廣東省教育部產(chǎn)學(xué)研合作項(xiàng)目(2012B091000037,2012B091000041)
【分類號(hào)】:TP316.4;TP333
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 欒亞建;黃爛,
本文編號(hào):1978055
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1978055.html
最近更新
教材專著