基于業(yè)務(wù)規(guī)則的數(shù)據(jù)中心數(shù)據(jù)質(zhì)量研究
[Abstract]:In order to improve data quality, the factors affecting data quality and the methods to improve data quality have been studied extensively at home and abroad. These researches mainly focus on the data quality problems in data warehouse, and put forward the data quality measurement index and the calculation method of data quality index. At present, there are the following problems in the research of data quality: firstly, there is no systematic evaluation index of data quality, which leads to the failure to form a complete data quality system; Secondly, there is not yet an authoritative data quality reference model, the current research is aimed at a single problem; Finally, the definition of data quality content is variable, which requires that the data quality model has the corresponding expansibility to meet the changing requirements. In view of these problems, the following research focus has been carried out. Firstly, a complete data quality evaluation system is proposed and constructed. Seven kinds of data quality elements, such as accuracy, consistency, and non-empty constraints, and range constraints are defined. The data quality elements are used to describe the data quality. Data quality constraint rules reflect specific business rules and domain knowledge. The definition and algorithm of data quality evaluation index are given, and the architecture and process of data quality analysis and evaluation are presented. The whole architecture is divided into data layer and application layer. The data layer includes instance layer, mode layer, data quality layer and data quality extension layer. The data quality layer is the data quality metadata model, and the data quality extension layer provides the extension of the data quality metadata model. The application layer includes data quality analysis and evaluation layer and presentation layer. Thirdly, aiming at the problem of similar duplicate records in the data center, the traditional method of "sorting and merging" is adopted, and an improved detection method based on inner code order value clustering is proposed in this paper. In the string matching algorithm, we draw lessons from the sequence alignment algorithm in bioinformatics. The improved method improves the detection efficiency and achieves good results in practical application. Finally, under the background of data quality detection and evaluation of data center of Daqing oil field downhole operation branch, the proposed data quality detection and evaluation system is designed and implemented. The system realizes the management and maintenance of various business rules. Evaluate various data quality indicators. The system has been running in the underground operation data center, which plays an important role in improving the quality of the data center.
【學(xué)位授予單位】:東北石油大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP308
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 戴超凡,鄧蘇,陳文偉,唐九陽,陸昌輝;開放信息模型研究[J];計(jì)算機(jī)工程與應(yīng)用;2001年01期
2 戴超凡,陳文偉,鄧蘇,陸昌輝,唐九陽;數(shù)據(jù)倉庫中元數(shù)據(jù)技術(shù)研究[J];計(jì)算機(jī)工程與應(yīng)用;2001年14期
3 方幼林 ,楊冬青 ,唐世渭 ,張衛(wèi)華 ,余利波 ,付強(qiáng);數(shù)據(jù)倉庫中數(shù)據(jù)質(zhì)量控制研究[J];計(jì)算機(jī)工程與應(yīng)用;2003年13期
4 楊青云,趙培英,楊冬青,唐世渭,童云海;數(shù)據(jù)質(zhì)量評(píng)估方法研究[J];計(jì)算機(jī)工程與應(yīng)用;2004年09期
5 俞榮華;田增平;周傲英;;一種檢測(cè)多語言文本相似重復(fù)記錄的綜合方法[J];計(jì)算機(jī)科學(xué);2002年01期
6 陳怡海;繆淮扣;;OCL與Object-Z作為UML約束語言的分析比較[J];計(jì)算機(jī)科學(xué);2004年12期
7 阿不都克里木,高永強(qiáng),遲忠先;數(shù)據(jù)倉庫質(zhì)量及其應(yīng)用[J];計(jì)算機(jī)工程;2002年04期
8 郭志懋,俞榮華,田增平,周傲英;一個(gè)可擴(kuò)展的數(shù)據(jù)清洗系統(tǒng)[J];計(jì)算機(jī)工程;2003年03期
9 管尊友,馮建華;一個(gè)可擴(kuò)展的數(shù)據(jù)質(zhì)量元模型[J];計(jì)算機(jī)工程;2005年08期
10 邱越峰,田增平,季文
本文編號(hào):2406365
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2406365.html