基于業(yè)務(wù)規(guī)則的數(shù)據(jù)中心數(shù)據(jù)質(zhì)量研究

發(fā)布時間：2019-01-10 13:21

【摘要】：為了提高數(shù)據(jù)質(zhì)量，國內(nèi)外對影響數(shù)據(jù)質(zhì)量的因素及改善數(shù)據(jù)質(zhì)量的方法進行了大量研究。這些研究主要集中在數(shù)據(jù)倉庫中的數(shù)據(jù)質(zhì)量問題，提出了數(shù)據(jù)質(zhì)量度量指標及指標的計算方法。目前對數(shù)據(jù)質(zhì)量的研究主要存在以下問題：首先，沒有形成系統(tǒng)化的數(shù)據(jù)質(zhì)量評估指標，導(dǎo)致不能形成完整的數(shù)據(jù)質(zhì)量體系；其次，還沒有形成一個權(quán)威性的數(shù)據(jù)質(zhì)量參考模型，當前的研究都是針對單一問題進行的；最后，數(shù)據(jù)質(zhì)量內(nèi)容的定義是變化的，這就需要數(shù)據(jù)質(zhì)量模型具有相應(yīng)的擴展性，以滿足這種變化需求。針對這些問題重點進行了以下研究。首先，提出并構(gòu)建了完整的數(shù)據(jù)質(zhì)量評估體系。定義了準確性、一致性等七類數(shù)據(jù)質(zhì)量元素和非空約束、值域約束等十五個維度的規(guī)則，其中數(shù)據(jù)質(zhì)量元素用于描述數(shù)據(jù)質(zhì)量，數(shù)據(jù)質(zhì)量約束規(guī)則反映了具體業(yè)務(wù)規(guī)則和領(lǐng)域知識；對數(shù)據(jù)質(zhì)量評估指標給出了定義和具體算法；提出了數(shù)據(jù)質(zhì)量分析評估體系結(jié)構(gòu)及流程，整個體系結(jié)構(gòu)分為數(shù)據(jù)層和應(yīng)用層。數(shù)據(jù)層包括實例層、模式層、數(shù)據(jù)質(zhì)量層和數(shù)據(jù)質(zhì)量擴展層，數(shù)據(jù)質(zhì)量層即數(shù)據(jù)質(zhì)量元模型，數(shù)據(jù)質(zhì)量擴展層提供了對數(shù)據(jù)質(zhì)量元模型的擴展；應(yīng)用層包括數(shù)據(jù)質(zhì)量分析評估層、展示層。再次，針對數(shù)據(jù)中心中存在的相似重復(fù)記錄問題，，采用了傳統(tǒng)的“排序合并”的方法，本文提出了一種改進的基于內(nèi)碼序值聚類的檢測方法，在字符串匹配算法中本文借鑒了生物信息學(xué)中的序列比對算法。改進后的方法提高了檢測效率，并在實際應(yīng)用中取得了良好效果。最后，以大慶油田井下作業(yè)分公司數(shù)據(jù)中心數(shù)據(jù)質(zhì)量檢測與評估為背景，對所提出的數(shù)據(jù)質(zhì)量檢測與評估體系進行了設(shè)計與實現(xiàn)，該系統(tǒng)實現(xiàn)了對各種業(yè)務(wù)規(guī)則的管理與維護，對各種數(shù)據(jù)質(zhì)量指標的評估。該系統(tǒng)已在井下作業(yè)數(shù)據(jù)中心運行，對數(shù)據(jù)中心的據(jù)質(zhì)量的改善起到了重要作用。
[Abstract]:In order to improve data quality, the factors affecting data quality and the methods to improve data quality have been studied extensively at home and abroad. These researches mainly focus on the data quality problems in data warehouse, and put forward the data quality measurement index and the calculation method of data quality index. At present, there are the following problems in the research of data quality: firstly, there is no systematic evaluation index of data quality, which leads to the failure to form a complete data quality system; Secondly, there is not yet an authoritative data quality reference model, the current research is aimed at a single problem; Finally, the definition of data quality content is variable, which requires that the data quality model has the corresponding expansibility to meet the changing requirements. In view of these problems, the following research focus has been carried out. Firstly, a complete data quality evaluation system is proposed and constructed. Seven kinds of data quality elements, such as accuracy, consistency, and non-empty constraints, and range constraints are defined. The data quality elements are used to describe the data quality. Data quality constraint rules reflect specific business rules and domain knowledge. The definition and algorithm of data quality evaluation index are given, and the architecture and process of data quality analysis and evaluation are presented. The whole architecture is divided into data layer and application layer. The data layer includes instance layer, mode layer, data quality layer and data quality extension layer. The data quality layer is the data quality metadata model, and the data quality extension layer provides the extension of the data quality metadata model. The application layer includes data quality analysis and evaluation layer and presentation layer. Thirdly, aiming at the problem of similar duplicate records in the data center, the traditional method of "sorting and merging" is adopted, and an improved detection method based on inner code order value clustering is proposed in this paper. In the string matching algorithm, we draw lessons from the sequence alignment algorithm in bioinformatics. The improved method improves the detection efficiency and achieves good results in practical application. Finally, under the background of data quality detection and evaluation of data center of Daqing oil field downhole operation branch, the proposed data quality detection and evaluation system is designed and implemented. The system realizes the management and maintenance of various business rules. Evaluate various data quality indicators. The system has been running in the underground operation data center, which plays an important role in improving the quality of the data center.
【學(xué)位授予單位】：東北石油大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2012
【分類號】：TP308

【參考文獻】

相關(guān)期刊論文前10條

1 戴超凡,鄧蘇,陳文偉,唐九陽,陸昌輝;開放信息模型研究[J];計算機工程與應(yīng)用;2001年01期

2 戴超凡,陳文偉,鄧蘇,陸昌輝,唐九陽;數(shù)據(jù)倉庫中元數(shù)據(jù)技術(shù)研究[J];計算機工程與應(yīng)用;2001年14期

3 方幼林 ,楊冬青 ,唐世渭 ,張衛(wèi)華 ,余利波 ,付強;數(shù)據(jù)倉庫中數(shù)據(jù)質(zhì)量控制研究[J];計算機工程與應(yīng)用;2003年13期

4 楊青云,趙培英,楊冬青,唐世渭,童云海;數(shù)據(jù)質(zhì)量評估方法研究[J];計算機工程與應(yīng)用;2004年09期

5 俞榮華;田增平;周傲英;;一種檢測多語言文本相似重復(fù)記錄的綜合方法[J];計算機科學(xué);2002年01期

6 陳怡海;繆淮扣;;OCL與Object-Z作為UML約束語言的分析比較[J];計算機科學(xué);2004年12期

7 阿不都克里木,高永強,遲忠先;數(shù)據(jù)倉庫質(zhì)量及其應(yīng)用[J];計算機工程;2002年04期

8 郭志懋,俞榮華,田增平,周傲英;一個可擴展的數(shù)據(jù)清洗系統(tǒng)[J];計算機工程;2003年03期

9 管尊友,馮建華;一個可擴展的數(shù)據(jù)質(zhì)量元模型[J];計算機工程;2005年08期

10 邱越峰,田增平,季文

本文編號：2406365

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2406365.html

上一篇：網(wǎng)格計算環(huán)境下GML空間分析關(guān)鍵技術(shù)研究
下一篇：單片機技術(shù)教學(xué)改革研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于業(yè)務(wù)規(guī)則的數(shù)據(jù)中心數(shù)據(jù)質(zhì)量研究