溯源數(shù)據(jù)精簡方法研究
本文關鍵詞: 數(shù)據(jù)溯源 數(shù)據(jù)精簡 中心性分析 圖聚類 出處:《山東大學》2017年碩士論文 論文類型:學位論文
【摘要】:數(shù)據(jù)溯源是對目標數(shù)據(jù)衍生前的原始數(shù)據(jù)及其演變過程的追溯、重現(xiàn)與展示。因其在監(jiān)測數(shù)據(jù)流失、完成數(shù)據(jù)重建以及驗證數(shù)據(jù)的安全與可信性等方面具有獨特的優(yōu)勢,在大數(shù)據(jù)工程和信息安全領域具有廣闊的應用前景。但是,自溯源系統(tǒng)出現(xiàn)以來,溯源數(shù)據(jù)的規(guī)模問題一直是制約其應用的瓶頸。為保證目標數(shù)據(jù)的可溯源性,溯源數(shù)據(jù)的規(guī)模常常遠大于目標數(shù)據(jù),而對于面向大數(shù)據(jù)工程的溯源系統(tǒng),這個問題更為突出。規(guī)模巨大的溯源數(shù)據(jù)不僅嚴重降低了溯源查詢的效率,使其存儲、計算和管理成本激增,還因數(shù)據(jù)關聯(lián)過于復雜、細密,使溯源結果的理解更加困難,極大降低了數(shù)據(jù)溯源的質量,并直接影響到數(shù)據(jù)溯源技術的推廣應用。目前,國內(nèi)外關于精簡溯源數(shù)據(jù)主要采用的基于去冗壓縮和消噪過濾等方法不能從根本上解決溯源數(shù)據(jù)規(guī)模巨大的問題,本文基于溯源數(shù)據(jù)的特點以及溯源圖結構,從分離冷數(shù)據(jù)和細粒度關聯(lián)數(shù)據(jù)的角度,對大規(guī)模溯源數(shù)據(jù)進行粗粒度化,提出精簡溯源數(shù)據(jù)規(guī)模的有效方法。本文的主要工作包括:1.基于類型的溯源數(shù)據(jù)分層精簡方法的研究,利用數(shù)據(jù)項之間依賴關系的傳遞性重構數(shù)據(jù)對象間的依賴關聯(lián),將溯源數(shù)據(jù)按其類型進行分層劃分,對其中粒度較小、使用頻度較低的"冷數(shù)據(jù)"層進行剝離,并以此簡化溯源數(shù)據(jù),提高溯源效率。2.基于中心性差值的溯源數(shù)據(jù)精簡方法的研究,根據(jù)數(shù)據(jù)節(jié)點中心性差值對任務層數(shù)據(jù)進行邊界劃分,通過提取任務內(nèi)影響力較高的邊界數(shù)據(jù)節(jié)點作為關鍵溯源,實現(xiàn)溯源數(shù)據(jù)規(guī)模的精簡。3.基于相關性聚類的溯源數(shù)據(jù)精簡方法的研究,即:將數(shù)據(jù)按照相關性進行粗粒度聚類,對描述任務細節(jié)的非邊界數(shù)據(jù)進行分級存儲或修剪,從溯源數(shù)據(jù)粗粒度聚類角度實現(xiàn)溯源數(shù)據(jù)的精簡。本文的創(chuàng)新點為:1.提出一種基于類型的溯源數(shù)據(jù)分層精簡方法,該方法將溯源數(shù)據(jù)按其對象類型進行分層劃分后,剝離使用頻度較低的"冷數(shù)據(jù)"層,以此實現(xiàn)數(shù)據(jù)溯源規(guī)模精簡。2.提出一種基于中心差值的溯源數(shù)據(jù)精簡方法,該方法利用中心性差值識別粗粒度任務邊界,通過提取任務內(nèi)影響力較高的邊界數(shù)據(jù)節(jié)點作為關鍵溯源,實現(xiàn)溯源數(shù)據(jù)規(guī)模的精簡。3.提出一種基于相關性聚類的溯源數(shù)據(jù)精簡方法,該方法根據(jù)溯源數(shù)據(jù)之間的相關性,實現(xiàn)溯源數(shù)據(jù)的聚類,通過對聚類后內(nèi)關聯(lián)數(shù)據(jù)的剝離,實現(xiàn)溯源數(shù)據(jù)的精簡。本文基于哈佛大學PASSv2標準溯源Trace數(shù)據(jù)集,對所提出的溯源數(shù)據(jù)精簡方法分別進行了實驗,實驗結果驗證了所提出方法的可行性和有效性。
[Abstract]:Data traceability is the tracing, reproducing and displaying of the original data and its evolution process before the derivation of the target data, because of its unique advantages in monitoring the data loss, completing the data reconstruction and verifying the security and credibility of the data. Big data has a broad application prospect in the field of engineering and information security. However, since the emergence of traceability system, the scale of traceability data has been the bottleneck of its application. The scale of traceability data is often much larger than that of target data, but for the traceability system oriented to big data project, this problem is more prominent. The large scale traceability data not only reduces the efficiency of traceability query, but also makes it stored. The surge in computing and management costs, as well as the complexity and fineness of data association, make it more difficult to understand the traceability results, greatly reduce the quality of data traceability, and directly affect the popularization and application of data traceability technology. At home and abroad, the methods of reducing traceability data based on de-redundancy compression and denoising filtering can not fundamentally solve the problem of large scale traceability data. This paper is based on the characteristics of traceability data and traceability graph structure. From the angle of separating cold data from fine-grained correlation data, coarse-grained large-scale traceability data is coarse-grained. This paper proposes an effective method for reducing the scale of traceability data. The main work of this paper includes: 1.The hierarchical reduction method of traceability data based on type is studied, and the transitive relation between data items is used to reconstruct the dependency relation between data objects. The traceability data is stratified according to its type, and the "cold data" layer with smaller granularity and low frequency is used to simplify the traceability data. Improving traceability efficiency. 2. Research on the method of reducing traceability data based on centrality difference, divide the boundary of task layer data according to the centrality difference of data node, and extract the influential boundary data node in the task as the key traceability. Reduction of traceability data scale. 3. Research on traceability data reduction method based on correlation clustering, that is, coarse-grained clustering of data according to correlation, hierarchical storage or pruning of non-boundary data describing task details. From the point of view of coarse-grained clustering of traceability data, the innovation of this paper is: 1.This paper presents a typology based hierarchical reduction method for traceability data, which divides traceability data into layers according to their object types. In order to reduce the scale of data traceability, a traceability data reduction method based on central difference is proposed, in which the coarse-grained task boundary is identified by centrality difference. By extracting the influential boundary data node as the key traceability, the traceability data scale is reduced. 3. A traceability data reduction method based on correlation clustering is proposed, which is based on the correlation between traceability data. To realize the clustering of traceability data, the traceability data can be reduced by stripping the associated data after clustering. Based on the traceability Trace dataset of Harvard University PASSv2 standard, this paper makes experiments on the proposed traceability data reduction method. The experimental results show that the proposed method is feasible and effective.
【學位授予單位】:山東大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP311.13;TP309
【相似文獻】
相關期刊論文 前9條
1 慈瑞梅;;一種基于多維分層的數(shù)據(jù)精簡方法[J];揚州職業(yè)大學學報;2006年03期
2 魏瀛寰;雷邦成;周偉趙;;逆向工程中掃描數(shù)據(jù)精簡技術研究[J];汽車工藝與材料;2013年04期
3 柴興;馬淑梅;;散亂點云數(shù)據(jù)精簡技術研究[J];機械工程師;2007年12期
4 劉德平;陳建軍;;逆向工程中數(shù)據(jù)精簡技術的研究[J];西安電子科技大學學報;2008年02期
5 趙柳;馬禮;楊銀剛;紀麗婷;;逆向工程中散亂點云數(shù)據(jù)精簡研究[J];光電技術應用;2010年01期
6 上官建林;郭三刺;;反求工程中數(shù)據(jù)精簡技術的研究[J];機械管理開發(fā);2011年04期
7 王志清;李偉;張英平;鞠魯粵;;基于逆向工程的數(shù)據(jù)精簡方法研究[J];機械制造;2005年11期
8 李珂珍;婁小平;呂乃光;;用于點云曲面重構的數(shù)據(jù)精簡方法研究[J];北京機械工業(yè)學院學報;2009年01期
9 孫肖霞;孫殿柱;李延瑞;范志先;;反求工程中測量數(shù)據(jù)的精簡算法[J];機械設計與制造;2006年08期
相關碩士學位論文 前1條
1 密鴻吉;溯源數(shù)據(jù)精簡方法研究[D];山東大學;2017年
,本文編號:1510292
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1510292.html