數(shù)據(jù)中心的數(shù)據(jù)質(zhì)量管理工具設計與實現(xiàn)
本文選題:數(shù)據(jù)中心 + 數(shù)據(jù)質(zhì)量。 參考:《華中科技大學》2013年碩士論文
【摘要】:隨著信息技術在各行業(yè)的不斷發(fā)展,各行業(yè)也逐漸了累積了大量的業(yè)務數(shù)據(jù),為了能夠有效地利用這些業(yè)務數(shù)據(jù)便構建了數(shù)據(jù)中心。而為了保證進入數(shù)據(jù)中心的數(shù)據(jù)在數(shù)據(jù)質(zhì)量上符合要求,便出現(xiàn)了各種以處理數(shù)據(jù)質(zhì)量問題的數(shù)據(jù)清洗工具,但是即便如此,由于邏輯問題或者清洗處理過程中關注點不同等各種原因而導致進入數(shù)據(jù)中心后的數(shù)據(jù)仍可能有數(shù)據(jù)質(zhì)量問題,因此,需要對進入數(shù)據(jù)中心后的數(shù)據(jù)在數(shù)據(jù)質(zhì)量進行檢測處理。 為了分析處理進入數(shù)據(jù)中心后數(shù)據(jù)的數(shù)據(jù)質(zhì)量,設計了數(shù)據(jù)中心的數(shù)據(jù)質(zhì)量管理工具,包括數(shù)據(jù)質(zhì)量模型的研究分析以及對數(shù)據(jù)質(zhì)量管理工具的體系結構的分析。在具體實現(xiàn)上,有數(shù)據(jù)源管理模塊、規(guī)范化管理模塊、數(shù)據(jù)檢測管理模塊、數(shù)據(jù)質(zhì)量屬性分析和可視化模塊。數(shù)據(jù)源管理模塊用于處理數(shù)據(jù)中心的異構數(shù)據(jù)源的信息;規(guī)范化管理模塊包括對規(guī)范化元規(guī)則的分析和實現(xiàn)的管理以及將數(shù)據(jù)源和相應的規(guī)范化規(guī)則進行關聯(lián)并能夠根據(jù)關聯(lián)信息對數(shù)據(jù)源進行規(guī)范化處理等功能;數(shù)據(jù)檢測管理模塊包括由數(shù)據(jù)質(zhì)量屬性而提出四類數(shù)據(jù)檢測規(guī)則的實現(xiàn),以及將數(shù)據(jù)源中的數(shù)據(jù)集或者規(guī)范化后的數(shù)據(jù)集通過使用相應的檢測規(guī)則進行處理的檢測流程管理;數(shù)據(jù)質(zhì)量屬性分析和可視化模塊主要是對數(shù)據(jù)質(zhì)量屬性進行定量分析以及根據(jù)數(shù)據(jù)檢測模塊處理后的數(shù)據(jù)分析出對應檢測數(shù)據(jù)集在數(shù)據(jù)質(zhì)量屬性的整體情況,,并根據(jù)分析結果給予相關建議。 通過對數(shù)據(jù)質(zhì)量管理工具進行測試,然后對相應的結果進行分析,說明了該工具在功能方面的可用性,能夠?qū)?shù)據(jù)中心的數(shù)據(jù)進行有效地分析處理。
[Abstract]:With the continuous development of information technology in various industries, each industry has gradually accumulated a large number of business data, in order to effectively use these business data to build a data center. In order to ensure that the data entering the data center meets the requirements in terms of data quality, a variety of data cleaning tools have emerged to deal with data quality problems, but even so, The data after entering the data center may still have data quality problems due to logic problems or different concerns in the cleaning process. Therefore, it is necessary to check the data quality after entering the data center. In order to analyze the data quality after entering the data center, the data quality management tools of the data center are designed, including the research and analysis of the data quality model and the analysis of the architecture of the data quality management tool. In the implementation, there are data source management module, standardized management module, data detection management module, data quality attribute analysis and visualization module. The data source management module is used to deal with the information of heterogeneous data sources in the data center. The standardized management module includes the management of the analysis and implementation of the normalized meta-rules, the association of the data source and the corresponding normalized rules, and the ability to normalize the data sources according to the association information. The data detection management module includes the implementation of four kinds of data detection rules proposed by the data quality attribute, and the management of the data set or the standardized data set in the data source by using the corresponding detection rules. The data quality attribute analysis and visualization module is mainly for the quantitative analysis of the data quality attributes, and according to the data processing of the data detection module to analyze the whole situation of the corresponding detection data set in the data quality attributes. And according to the results of the analysis to give relevant recommendations. By testing the data quality management tool and analyzing the corresponding results, the availability of the tool in function is illustrated, and the data in the data center can be effectively analyzed and processed.
【學位授予單位】:華中科技大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP308
【參考文獻】
相關期刊論文 前9條
1 韓京宇;徐立臻;董逸生;;數(shù)據(jù)質(zhì)量研究綜述[J];計算機科學;2008年02期
2 劉芳,何飛;基于聚類分析技術的數(shù)據(jù)清洗研究[J];計算機工程與科學;2005年06期
3 郭志懋,周傲英;數(shù)據(jù)質(zhì)量和數(shù)據(jù)清洗研究綜述[J];軟件學報;2002年11期
4 王詠梅,陳家琪,耿玉良;一種可交互的數(shù)據(jù)清洗系統(tǒng)[J];計算機工程與設計;2005年04期
5 包陽;齊璇;李海龍;;大型軟件系統(tǒng)數(shù)據(jù)質(zhì)量問題研究[J];計算機工程與設計;2011年03期
6 湯琰;金勇進;;數(shù)據(jù)質(zhì)量評估框架及其信息量分析[J];商業(yè)經(jīng)濟與管理;2011年09期
7 許滌龍;葉少波;;統(tǒng)計數(shù)據(jù)質(zhì)量評估方法研究述評[J];統(tǒng)計與信息論壇;2011年07期
8 蔣萍;田成詩;;全方位、立體性數(shù)據(jù)質(zhì)量概念的建立與實施[J];統(tǒng)計研究;2010年12期
9 黃武鋒;鄭華;;面向企業(yè)信息化的數(shù)據(jù)質(zhì)量評估研究[J];計算機技術與發(fā)展;2011年01期
相關博士學位論文 前1條
1 吳愛華;不一致數(shù)據(jù)的查詢處理[D];復旦大學;2010年
本文編號:2117992
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2117992.html