數(shù)據(jù)中心的數(shù)據(jù)質(zhì)量管理工具設(shè)計(jì)與實(shí)現(xiàn)
本文選題:數(shù)據(jù)中心 + 數(shù)據(jù)質(zhì)量; 參考:《華中科技大學(xué)》2013年碩士論文
【摘要】:隨著信息技術(shù)在各行業(yè)的不斷發(fā)展,各行業(yè)也逐漸了累積了大量的業(yè)務(wù)數(shù)據(jù),為了能夠有效地利用這些業(yè)務(wù)數(shù)據(jù)便構(gòu)建了數(shù)據(jù)中心。而為了保證進(jìn)入數(shù)據(jù)中心的數(shù)據(jù)在數(shù)據(jù)質(zhì)量上符合要求,便出現(xiàn)了各種以處理數(shù)據(jù)質(zhì)量問題的數(shù)據(jù)清洗工具,但是即便如此,由于邏輯問題或者清洗處理過程中關(guān)注點(diǎn)不同等各種原因而導(dǎo)致進(jìn)入數(shù)據(jù)中心后的數(shù)據(jù)仍可能有數(shù)據(jù)質(zhì)量問題,因此,需要對(duì)進(jìn)入數(shù)據(jù)中心后的數(shù)據(jù)在數(shù)據(jù)質(zhì)量進(jìn)行檢測(cè)處理。 為了分析處理進(jìn)入數(shù)據(jù)中心后數(shù)據(jù)的數(shù)據(jù)質(zhì)量,設(shè)計(jì)了數(shù)據(jù)中心的數(shù)據(jù)質(zhì)量管理工具,包括數(shù)據(jù)質(zhì)量模型的研究分析以及對(duì)數(shù)據(jù)質(zhì)量管理工具的體系結(jié)構(gòu)的分析。在具體實(shí)現(xiàn)上,有數(shù)據(jù)源管理模塊、規(guī)范化管理模塊、數(shù)據(jù)檢測(cè)管理模塊、數(shù)據(jù)質(zhì)量屬性分析和可視化模塊。數(shù)據(jù)源管理模塊用于處理數(shù)據(jù)中心的異構(gòu)數(shù)據(jù)源的信息;規(guī)范化管理模塊包括對(duì)規(guī)范化元規(guī)則的分析和實(shí)現(xiàn)的管理以及將數(shù)據(jù)源和相應(yīng)的規(guī)范化規(guī)則進(jìn)行關(guān)聯(lián)并能夠根據(jù)關(guān)聯(lián)信息對(duì)數(shù)據(jù)源進(jìn)行規(guī)范化處理等功能;數(shù)據(jù)檢測(cè)管理模塊包括由數(shù)據(jù)質(zhì)量屬性而提出四類數(shù)據(jù)檢測(cè)規(guī)則的實(shí)現(xiàn),以及將數(shù)據(jù)源中的數(shù)據(jù)集或者規(guī)范化后的數(shù)據(jù)集通過使用相應(yīng)的檢測(cè)規(guī)則進(jìn)行處理的檢測(cè)流程管理;數(shù)據(jù)質(zhì)量屬性分析和可視化模塊主要是對(duì)數(shù)據(jù)質(zhì)量屬性進(jìn)行定量分析以及根據(jù)數(shù)據(jù)檢測(cè)模塊處理后的數(shù)據(jù)分析出對(duì)應(yīng)檢測(cè)數(shù)據(jù)集在數(shù)據(jù)質(zhì)量屬性的整體情況,,并根據(jù)分析結(jié)果給予相關(guān)建議。 通過對(duì)數(shù)據(jù)質(zhì)量管理工具進(jìn)行測(cè)試,然后對(duì)相應(yīng)的結(jié)果進(jìn)行分析,說明了該工具在功能方面的可用性,能夠?qū)?shù)據(jù)中心的數(shù)據(jù)進(jìn)行有效地分析處理。
[Abstract]:With the continuous development of information technology in various industries, each industry has gradually accumulated a large number of business data, in order to effectively use these business data to build a data center. In order to ensure that the data entering the data center meets the requirements in terms of data quality, a variety of data cleaning tools have emerged to deal with data quality problems, but even so, The data after entering the data center may still have data quality problems due to logic problems or different concerns in the cleaning process. Therefore, it is necessary to check the data quality after entering the data center. In order to analyze the data quality after entering the data center, the data quality management tools of the data center are designed, including the research and analysis of the data quality model and the analysis of the architecture of the data quality management tool. In the implementation, there are data source management module, standardized management module, data detection management module, data quality attribute analysis and visualization module. The data source management module is used to deal with the information of heterogeneous data sources in the data center. The standardized management module includes the management of the analysis and implementation of the normalized meta-rules, the association of the data source and the corresponding normalized rules, and the ability to normalize the data sources according to the association information. The data detection management module includes the implementation of four kinds of data detection rules proposed by the data quality attribute, and the management of the data set or the standardized data set in the data source by using the corresponding detection rules. The data quality attribute analysis and visualization module is mainly for the quantitative analysis of the data quality attributes, and according to the data processing of the data detection module to analyze the whole situation of the corresponding detection data set in the data quality attributes. And according to the results of the analysis to give relevant recommendations. By testing the data quality management tool and analyzing the corresponding results, the availability of the tool in function is illustrated, and the data in the data center can be effectively analyzed and processed.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP308
【參考文獻(xiàn)】
相關(guān)期刊論文 前9條
1 韓京宇;徐立臻;董逸生;;數(shù)據(jù)質(zhì)量研究綜述[J];計(jì)算機(jī)科學(xué);2008年02期
2 劉芳,何飛;基于聚類分析技術(shù)的數(shù)據(jù)清洗研究[J];計(jì)算機(jī)工程與科學(xué);2005年06期
3 郭志懋,周傲英;數(shù)據(jù)質(zhì)量和數(shù)據(jù)清洗研究綜述[J];軟件學(xué)報(bào);2002年11期
4 王詠梅,陳家琪,耿玉良;一種可交互的數(shù)據(jù)清洗系統(tǒng)[J];計(jì)算機(jī)工程與設(shè)計(jì);2005年04期
5 包陽;齊璇;李海龍;;大型軟件系統(tǒng)數(shù)據(jù)質(zhì)量問題研究[J];計(jì)算機(jī)工程與設(shè)計(jì);2011年03期
6 湯琰;金勇進(jìn);;數(shù)據(jù)質(zhì)量評(píng)估框架及其信息量分析[J];商業(yè)經(jīng)濟(jì)與管理;2011年09期
7 許滌龍;葉少波;;統(tǒng)計(jì)數(shù)據(jù)質(zhì)量評(píng)估方法研究述評(píng)[J];統(tǒng)計(jì)與信息論壇;2011年07期
8 蔣萍;田成詩;;全方位、立體性數(shù)據(jù)質(zhì)量概念的建立與實(shí)施[J];統(tǒng)計(jì)研究;2010年12期
9 黃武鋒;鄭華;;面向企業(yè)信息化的數(shù)據(jù)質(zhì)量評(píng)估研究[J];計(jì)算機(jī)技術(shù)與發(fā)展;2011年01期
相關(guān)博士學(xué)位論文 前1條
1 吳愛華;不一致數(shù)據(jù)的查詢處理[D];復(fù)旦大學(xué);2010年
本文編號(hào):2117992
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2117992.html