基于HDFS的電子文件集中存儲(chǔ)和檢索系統(tǒng)
發(fā)布時(shí)間:2018-07-15 09:54
【摘要】:我國(guó)電子文件隨著政府信息化進(jìn)程的推進(jìn)得到很大的發(fā)展,政府工作中產(chǎn)生的電子文件數(shù)量已經(jīng)超過(guò)紙質(zhì)文件數(shù)量。相對(duì)于紙質(zhì)文件的管理方式,電子文件的管理還不成熟,特別在存儲(chǔ)方面,電子文件憑借其自身易于傳輸和保存的特點(diǎn),可以不在局限于按照地域分散存儲(chǔ)。對(duì)電子文件進(jìn)行集中存儲(chǔ)可以有效的加強(qiáng)電子文件的管控力度,提高辦公效率,減少人力資源開銷,并解決文件丟失、泄露等問(wèn)題。但同時(shí)怎樣實(shí)現(xiàn)海量電子文件的集中存儲(chǔ)直接影響到整個(gè)系統(tǒng)的實(shí)現(xiàn)和效率。云存儲(chǔ)是一個(gè)網(wǎng)絡(luò)在線存儲(chǔ)模型,數(shù)據(jù)被存儲(chǔ)在存儲(chǔ)虛擬池中,只要硬件容許它幾乎可以提供無(wú)限的廉價(jià)存儲(chǔ)能力。云存儲(chǔ)技術(shù)可以高效的解決海量電子文件集中存儲(chǔ)問(wèn)題;贕oogle File System(GFS)設(shè)計(jì)思想的開源云存儲(chǔ)文件系統(tǒng)Hadoop Distributed File System(HDFS)憑借其出色的處理超大文件的性能和可靠性成為云存儲(chǔ)技術(shù)研究的熱點(diǎn)。而電子政務(wù)中的電子文件以小文件為主,HDFS在處理海量小文件的存儲(chǔ)和訪問(wèn)時(shí)性能低下。 本文針對(duì)HDFS處理小文件的不足,提出一種通過(guò)使用存儲(chǔ)緩存和讀取緩存的策略來(lái)提高海量小文件的存儲(chǔ)和訪問(wèn)效率。其基本思想為設(shè)計(jì)實(shí)現(xiàn)HDFS中間件在滿足存儲(chǔ)訪問(wèn)需求的同時(shí)減少HDFS的訪問(wèn)次數(shù),從而提高存儲(chǔ)訪問(wèn)效率。存儲(chǔ)緩存策略的基本思想為設(shè)置多個(gè)緩沖區(qū),存儲(chǔ)小文件時(shí)通過(guò)多個(gè)緩沖區(qū)的優(yōu)化選擇來(lái)提高緩沖區(qū)的利用率,從而減少HDFS訪問(wèn)次數(shù)。讀取緩存策咯的基本思想為使用buddy system的方式管理固定大小的整個(gè)讀取緩存,并為每個(gè)分段緩存設(shè)置效率閾值,通過(guò)效率閾值來(lái)控制緩存的更新策略,最大限度提高緩存利用率,從而使訪問(wèn)文件時(shí)盡可能的利用讀取緩存,減少訪問(wèn)HDFS的次數(shù)。本文在安全性方面也有一些策略設(shè)置,通過(guò)使用多級(jí)加密的形式來(lái)保證電子文件的集中存儲(chǔ)訪問(wèn)過(guò)程中的機(jī)密性和隱私性。最后,本文實(shí)現(xiàn)原型系統(tǒng)并進(jìn)行測(cè)試分析,以證明以上思想方法的可行性和可用性。
[Abstract]:With the development of government informatization, the number of electronic documents produced in government work has exceeded the number of paper documents. Compared with the management mode of paper files, the management of electronic files is not mature, especially in the storage, electronic files can not be limited to distributed storage according to their own characteristics of easy transmission and preservation. Centralized storage of electronic files can effectively strengthen the control of electronic documents, improve office efficiency, reduce the cost of human resources, and solve the problems of file loss and leakage. However, how to realize the centralized storage of massive electronic files directly affects the implementation and efficiency of the whole system. Cloud storage is a network online storage model, where data is stored in a virtual pool, as long as the hardware allows it to provide almost unlimited cheap storage capacity. Cloud storage technology can efficiently solve the problem of mass electronic file centralized storage. Hadoop distributed File system (HDFS), an open source cloud storage file system (HDFS) based on Google File system (GFS), has become a hot topic in cloud storage technology because of its excellent performance and reliability in processing large files. However, in E-government, small files are the main function of HDFS in dealing with the storage and access of large amount of small files. Aiming at the shortage of HDFS in dealing with small files, this paper proposes a strategy of using storage cache and reading cache to improve the storage and access efficiency of large amount of small files. The basic idea is to design and implement HDFS middleware to meet the storage access requirements and reduce the number of HDFS access so as to improve storage access efficiency. The basic idea of storage cache policy is to set up multiple buffers, and to improve the utilization of buffers by optimizing the selection of buffers when storing small files, thus reducing the number of HDFS visits. The basic idea of reading cache policy is to use buddy system to manage the whole read cache of fixed size, and set the efficiency threshold for each segment cache. The update strategy of cache is controlled by the efficiency threshold, and the cache utilization is maximized. In order to access the file as much as possible to use read cache, reduce the number of visits to HDFS. This paper also has some policy settings in the aspect of security, by using the form of multi-level encryption to ensure the confidentiality and privacy in the process of centralized storage and access of electronic files. Finally, the prototype system is implemented and tested to prove the feasibility and availability of the above methods.
【學(xué)位授予單位】:南京大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP333;TP391.3
[Abstract]:With the development of government informatization, the number of electronic documents produced in government work has exceeded the number of paper documents. Compared with the management mode of paper files, the management of electronic files is not mature, especially in the storage, electronic files can not be limited to distributed storage according to their own characteristics of easy transmission and preservation. Centralized storage of electronic files can effectively strengthen the control of electronic documents, improve office efficiency, reduce the cost of human resources, and solve the problems of file loss and leakage. However, how to realize the centralized storage of massive electronic files directly affects the implementation and efficiency of the whole system. Cloud storage is a network online storage model, where data is stored in a virtual pool, as long as the hardware allows it to provide almost unlimited cheap storage capacity. Cloud storage technology can efficiently solve the problem of mass electronic file centralized storage. Hadoop distributed File system (HDFS), an open source cloud storage file system (HDFS) based on Google File system (GFS), has become a hot topic in cloud storage technology because of its excellent performance and reliability in processing large files. However, in E-government, small files are the main function of HDFS in dealing with the storage and access of large amount of small files. Aiming at the shortage of HDFS in dealing with small files, this paper proposes a strategy of using storage cache and reading cache to improve the storage and access efficiency of large amount of small files. The basic idea is to design and implement HDFS middleware to meet the storage access requirements and reduce the number of HDFS access so as to improve storage access efficiency. The basic idea of storage cache policy is to set up multiple buffers, and to improve the utilization of buffers by optimizing the selection of buffers when storing small files, thus reducing the number of HDFS visits. The basic idea of reading cache policy is to use buddy system to manage the whole read cache of fixed size, and set the efficiency threshold for each segment cache. The update strategy of cache is controlled by the efficiency threshold, and the cache utilization is maximized. In order to access the file as much as possible to use read cache, reduce the number of visits to HDFS. This paper also has some policy settings in the aspect of security, by using the form of multi-level encryption to ensure the confidentiality and privacy in the process of centralized storage and access of electronic files. Finally, the prototype system is implemented and tested to prove the feasibility and availability of the above methods.
【學(xué)位授予單位】:南京大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP333;TP391.3
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 肖美華,劉文革;優(yōu)化文件分配及磁盤文件存儲(chǔ)之策略[J];南昌航空工業(yè)學(xué)院學(xué)報(bào);2001年01期
2 嚴(yán)小衛(wèi);;通過(guò)改變文件分配簇進(jìn)行的加密和解密[J];微型機(jī)與應(yīng)用;1990年11期
3 陳俊杰,張武生,沈美明,鄭緯民;文件分配問(wèn)題的一種動(dòng)態(tài)解決算法[J];小型微型計(jì)算機(jī)系統(tǒng);2004年07期
4 邵志毅;;文件恢復(fù)的可行性分析[J];陜西師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2007年S2期
5 賀新征;費(fèi)金龍;劉楠;祝躍飛;;基于文件過(guò)濾驅(qū)動(dòng)的數(shù)據(jù)安全系統(tǒng)的研究與實(shí)現(xiàn)[J];微電子學(xué)與計(jì)算機(jī);2008年03期
6 王明哲;;試談根據(jù),
本文編號(hào):2123688
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2123688.html
最近更新
教材專著