溯源的高效存儲(chǔ)管理及在安全方面的應(yīng)用研究
發(fā)布時(shí)間:2018-09-10 08:28
【摘要】:如今,全世界每天都在爆炸性的產(chǎn)生各種新的信息量。對(duì)于存儲(chǔ)系統(tǒng)的容量需求,也從PB(Petabyte)、EB(Exabyte)到如今能容納‘'Big Data"的海量存儲(chǔ)系統(tǒng)在發(fā)展。盡管有各種新的存儲(chǔ)器件在不斷產(chǎn)生,新的存儲(chǔ)體系架構(gòu)也在不斷提出,但對(duì)于海量數(shù)據(jù)本身的分析和理解卻停滯不前。比如,當(dāng)我們?cè)谠贫双@取某些重要數(shù)據(jù)時(shí),我們可能會(huì)問(wèn),這些數(shù)據(jù)從哪里來(lái),之前有人用過(guò)么,可靠性和安全性如何? 溯源(Provenance),作為一種包含了數(shù)據(jù)對(duì)象歷史信息的元數(shù)據(jù),正好可以用來(lái)解答這樣的問(wèn)題。比如,一個(gè)數(shù)據(jù)對(duì)象是如何被創(chuàng)建的,經(jīng)過(guò)了哪些修改,兩個(gè)數(shù)據(jù)對(duì)象的祖先有什么不同。在系統(tǒng)領(lǐng)域,一個(gè)數(shù)據(jù)的溯源是所有影響這個(gè)數(shù)據(jù)最終狀態(tài)的進(jìn)程信息和相關(guān)數(shù)據(jù)。正因?yàn)樗菰唇沂玖藬?shù)據(jù)對(duì)象的過(guò)去或產(chǎn)生過(guò)程,使得溯源具有了更廣泛的使用價(jià)值。現(xiàn)在,溯源已經(jīng)被科學(xué)家用來(lái)驗(yàn)證重要的實(shí)驗(yàn)數(shù)據(jù)集,提高桌面搜索的效率,審計(jì)重要的財(cái)務(wù)賬目等,還有一些研究正在將它用于重復(fù)性數(shù)據(jù)刪除,分布式安全等領(lǐng)域。但目前針對(duì)溯源特點(diǎn)的研究還并不多。比如,溯源的一大特點(diǎn)是數(shù)據(jù)量大,但現(xiàn)在還很少有比較好的算法在大量壓縮溯源的同時(shí)支持對(duì)溯源的高效查詢(xún)。另外,溯源記載了數(shù)據(jù)的生成歷史,但對(duì)于用溯源來(lái)保證數(shù)據(jù)可靠性以及根據(jù)這種生成歷史來(lái)分析系統(tǒng)入侵行為的研究卻并不多。 提出了一種可高效壓縮溯源的基于web圖形壓縮和字典編碼的混合壓縮方法。通過(guò)利用溯源圖和web圖的相似性,該方法充分挖掘了溯源圖節(jié)點(diǎn)中的局部性和相似性特征,以及消除了溯源信息中固有的一些重復(fù)性字符串。和以往的壓縮方法相比,該方法能進(jìn)一步壓縮溯源圖中邊上的信息,具有更細(xì)的壓縮粒度,并且支持對(duì)溯源的高效查詢(xún)。在大量溯源trace上的實(shí)驗(yàn)表明,該方法在壓縮率、壓縮時(shí)間和查詢(xún)性能等方面,相比其它壓縮模式提供了最好的折衷。 提出了一種面向單個(gè)數(shù)據(jù)對(duì)象進(jìn)行重建、可并行重建及設(shè)置重建優(yōu)先級(jí)的基于溯源的數(shù)據(jù)重建方法。通過(guò)回溯數(shù)據(jù)文件的生成過(guò)程,該方法可以準(zhǔn)確地重建丟失或受損的文件。相比以往更注重整個(gè)硬盤(pán)或系統(tǒng)安全的保證數(shù)據(jù)存儲(chǔ)可靠性的解決方案(例如,日志文件、快照或備份),其優(yōu)勢(shì)主要在于,能重建單個(gè)數(shù)據(jù)對(duì)象,能并行重建多個(gè)數(shù)據(jù)對(duì)象,以及優(yōu)先重建重要的數(shù)據(jù)文件;谒菰吹臄(shù)據(jù)重建系統(tǒng)在文件被正常讀取時(shí),能夠收集文件的溯源信息。而在文件丟失或損壞后,能自動(dòng)重建這些文件。并且在重建過(guò)程中,能恢復(fù)受影響的其它文件。實(shí)驗(yàn)結(jié)果表明,基于溯源的重建性能顯著優(yōu)于以日志為基礎(chǔ)的重建性能。盡管有溯源數(shù)據(jù)庫(kù)大小等影響溯源重建的因素,但實(shí)驗(yàn)表明,這些因素對(duì)基于溯源的重建性能影響并不大。 提出了一種采用溯源信息來(lái)進(jìn)行入侵檢測(cè)的方法,通過(guò)對(duì)和系統(tǒng)進(jìn)行交互的進(jìn)程收集溯源信息,從而確定入侵進(jìn)程對(duì)文件訪問(wèn)和修改的詳細(xì)行為模式,進(jìn)而方便快捷地判斷系統(tǒng)是否入侵以及找出系統(tǒng)漏洞。該方法克服了采用傳統(tǒng)的系統(tǒng)/網(wǎng)絡(luò)日志來(lái)進(jìn)行人工分析時(shí)的復(fù)雜性和低效性。另外,由于日志一般記錄的僅僅是系統(tǒng)事件中的部分信息,比如說(shuō)]HTTP連接或者Login記錄,從而使得整個(gè)分析過(guò)程非常困難。基于溯源的入侵檢測(cè)方法,將和系統(tǒng)進(jìn)行交互的網(wǎng)絡(luò)連接當(dāng)做文件對(duì)象,并收集系統(tǒng)進(jìn)程和文件對(duì)象之間依賴(lài)關(guān)系的溯源信息,然后構(gòu)造溯源圖,這樣管理員就可以找出入侵路徑。通過(guò)對(duì)入侵鏈上的每個(gè)事件進(jìn)行分析,就可以確定系統(tǒng)漏洞以及入侵攻擊來(lái)源。實(shí)驗(yàn)結(jié)果表明,基于溯源的入侵檢測(cè)機(jī)制和傳統(tǒng)方法相比,具有較低的誤檢率以及更高的檢測(cè)率,只有較小的空間開(kāi)銷(xiāo),并且?guī)缀鯇?duì)系統(tǒng)性能無(wú)影響。 提出了一種利用基于對(duì)象的主動(dòng)存儲(chǔ)技術(shù)來(lái)顯著優(yōu)化溯源處理和在網(wǎng)絡(luò)上傳輸?shù)男阅艿姆椒āK菰磾?shù)據(jù)產(chǎn)生的持續(xù)性和大量性,使得溯源數(shù)據(jù)在網(wǎng)絡(luò)環(huán)境下的傳輸成為了一個(gè)重要的網(wǎng)絡(luò)瓶頸因素。采用基于對(duì)象的主動(dòng)存儲(chǔ)技術(shù)能很好地解決這一問(wèn)題。一方面,主動(dòng)存儲(chǔ)技術(shù)將溯源的處理從主機(jī)下放到存儲(chǔ)設(shè)備,從而大大減少了溯源經(jīng)由存儲(chǔ)設(shè)備在網(wǎng)絡(luò)上傳輸?shù)臄?shù)據(jù)量;另一方面,基于對(duì)象的存儲(chǔ)設(shè)備相比傳統(tǒng)的塊設(shè)備,具有更強(qiáng)大的處理能力,可以更加智能化、自動(dòng)化的處理溯源。在對(duì)象存儲(chǔ)設(shè)備內(nèi),普通的數(shù)據(jù)文件和溯源數(shù)據(jù)庫(kù)記錄都被當(dāng)做用戶(hù)對(duì)象。而各種數(shù)據(jù)處理任務(wù)則被當(dāng)做功能對(duì)象,它們將被靈活的調(diào)度執(zhí)行來(lái)完成系統(tǒng)所要執(zhí)行的一系列任務(wù),如溯源數(shù)據(jù)的壓縮、查詢(xún)、數(shù)據(jù)的重建等。評(píng)估表明,基于對(duì)象的主動(dòng)存儲(chǔ)技術(shù)能顯著地提升利用溯源來(lái)重建數(shù)據(jù)的性能。
[Abstract]:Nowadays, all kinds of new information are produced explosively all over the world. The capacity requirement of storage system is also developing from PB (Petabyte), EB (Exabyte) to mass storage system which can accommodate''Big Data'. The analysis and understanding of quantitative data itself is stagnant. For example, when we get some important data in the cloud, we might ask, where does this data come from, have anyone used it before, and how reliable and secure is it?
Provenance, as a metadata that contains historical information about data objects, can be used to answer questions such as how a data object is created, what modifications have been made, and how the ancestors of the two data objects differ. Traceability is now being used by scientists to validate important experimental datasets, improve the efficiency of desktop search, audit important financial accounts, and so on. It is used in the fields of repetitive data deletion, distributed security and so on. However, there are not many researches on traceability. For example, traceability is characterized by large amount of data, but few good algorithms support efficient query of traceability while compressing a large amount of traceability. However, there are few studies on traceability to ensure data reliability and to analyze system intrusion based on this generation history.
A hybrid compression method based on web graphics compression and dictionary encoding is proposed, which can compress traceability efficiently. By using the similarity between traceability graph and web graph, the locality and similarity characteristics of traceability graph nodes are fully exploited, and some repetitive strings inherent in traceability information are eliminated. Compared with other compression schemes, this method can further compress the edge information in the traceability graph, has finer compression granularity, and supports efficient query for traceability.
This paper presents a traceability-based data reconstruction method for reconstructing a single data object, which can reconstruct and prioritize the reconstructed data in parallel. By tracing back the generation process of data files, this method can reconstruct the lost or damaged files accurately. Sexual solutions (e.g., log files, snapshots, or backups) have the advantage of reconstructing a single data object, reconstructing multiple data objects in parallel, and giving priority to reconstructing important data files. The experimental results show that the performance of traceability-based reconstruction is significantly better than that of log-based reconstruction. Although there are factors such as the size of traceability database that affect traceability reconstruction, experiments show that these factors affect traceability-based reconstruction. Performance has little impact.
This paper presents a method of Intrusion Detection Based on traceability information. By collecting traceability information from the process interacting with the system, the intrusion process can determine the detailed behavior mode of file access and modification, and then judge whether the system is intruded and find out the system vulnerabilities quickly and conveniently. The complexity and inefficiency of system/network logs for manual analysis. In addition, because logs generally record only part of the information in system events, such as] HTTP connections or login records, the whole analysis process is very difficult. File objects collect the traceability information of dependencies between system processes and file objects, and then construct traceability graph, so that administrators can find the intrusion path. By analyzing each event in the intrusion chain, we can determine the system vulnerabilities and the source of intrusion attacks. Compared with traditional methods, the proposed method has lower false alarm rate and higher detection rate, less space overhead and almost no impact on system performance.
An object-based active storage technique is proposed to significantly optimize the performance of traceability processing and transmission over the network. The persistence and abundance of traceability data make the transmission of traceability data become an important bottleneck factor in the network environment. On the one hand, active storage technology reduces the amount of data transmitted by traceability from the host to the storage device, and on the other hand, object-based storage devices have more powerful processing power and can be more intelligent than traditional block devices. Automated processing traceability. In object storage devices, ordinary data files and traceable database records are treated as user objects. While various data processing tasks are treated as functional objects, they will be flexibly scheduled to perform a series of tasks, such as compression of traceable data, query, data reconstruction. Evaluations show that object-based active storage technology can significantly improve the performance of data reconstruction using traceability.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2013
【分類(lèi)號(hào)】:TP333
本文編號(hào):2233914
[Abstract]:Nowadays, all kinds of new information are produced explosively all over the world. The capacity requirement of storage system is also developing from PB (Petabyte), EB (Exabyte) to mass storage system which can accommodate''Big Data'. The analysis and understanding of quantitative data itself is stagnant. For example, when we get some important data in the cloud, we might ask, where does this data come from, have anyone used it before, and how reliable and secure is it?
Provenance, as a metadata that contains historical information about data objects, can be used to answer questions such as how a data object is created, what modifications have been made, and how the ancestors of the two data objects differ. Traceability is now being used by scientists to validate important experimental datasets, improve the efficiency of desktop search, audit important financial accounts, and so on. It is used in the fields of repetitive data deletion, distributed security and so on. However, there are not many researches on traceability. For example, traceability is characterized by large amount of data, but few good algorithms support efficient query of traceability while compressing a large amount of traceability. However, there are few studies on traceability to ensure data reliability and to analyze system intrusion based on this generation history.
A hybrid compression method based on web graphics compression and dictionary encoding is proposed, which can compress traceability efficiently. By using the similarity between traceability graph and web graph, the locality and similarity characteristics of traceability graph nodes are fully exploited, and some repetitive strings inherent in traceability information are eliminated. Compared with other compression schemes, this method can further compress the edge information in the traceability graph, has finer compression granularity, and supports efficient query for traceability.
This paper presents a traceability-based data reconstruction method for reconstructing a single data object, which can reconstruct and prioritize the reconstructed data in parallel. By tracing back the generation process of data files, this method can reconstruct the lost or damaged files accurately. Sexual solutions (e.g., log files, snapshots, or backups) have the advantage of reconstructing a single data object, reconstructing multiple data objects in parallel, and giving priority to reconstructing important data files. The experimental results show that the performance of traceability-based reconstruction is significantly better than that of log-based reconstruction. Although there are factors such as the size of traceability database that affect traceability reconstruction, experiments show that these factors affect traceability-based reconstruction. Performance has little impact.
This paper presents a method of Intrusion Detection Based on traceability information. By collecting traceability information from the process interacting with the system, the intrusion process can determine the detailed behavior mode of file access and modification, and then judge whether the system is intruded and find out the system vulnerabilities quickly and conveniently. The complexity and inefficiency of system/network logs for manual analysis. In addition, because logs generally record only part of the information in system events, such as] HTTP connections or login records, the whole analysis process is very difficult. File objects collect the traceability information of dependencies between system processes and file objects, and then construct traceability graph, so that administrators can find the intrusion path. By analyzing each event in the intrusion chain, we can determine the system vulnerabilities and the source of intrusion attacks. Compared with traditional methods, the proposed method has lower false alarm rate and higher detection rate, less space overhead and almost no impact on system performance.
An object-based active storage technique is proposed to significantly optimize the performance of traceability processing and transmission over the network. The persistence and abundance of traceability data make the transmission of traceability data become an important bottleneck factor in the network environment. On the one hand, active storage technology reduces the amount of data transmitted by traceability from the host to the storage device, and on the other hand, object-based storage devices have more powerful processing power and can be more intelligent than traditional block devices. Automated processing traceability. In object storage devices, ordinary data files and traceable database records are treated as user objects. While various data processing tasks are treated as functional objects, they will be flexibly scheduled to perform a series of tasks, such as compression of traceable data, query, data reconstruction. Evaluations show that object-based active storage technology can significantly improve the performance of data reconstruction using traceability.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2013
【分類(lèi)號(hào)】:TP333
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 王黎維;鮑芝峰;KOEHLER Henning;周曉方;SADIQ Shazia;;一種優(yōu)化關(guān)系型溯源信息存儲(chǔ)的新方法[J];計(jì)算機(jī)學(xué)報(bào);2011年10期
,本文編號(hào):2233914
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2233914.html
最近更新
教材專(zhuān)著