天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 計(jì)算機(jī)論文 >

基于業(yè)務(wù)規(guī)則的數(shù)據(jù)中心數(shù)據(jù)質(zhì)量研究

發(fā)布時(shí)間:2019-01-10 13:21
【摘要】:為了提高數(shù)據(jù)質(zhì)量,國內(nèi)外對(duì)影響數(shù)據(jù)質(zhì)量的因素及改善數(shù)據(jù)質(zhì)量的方法進(jìn)行了大量研究。這些研究主要集中在數(shù)據(jù)倉庫中的數(shù)據(jù)質(zhì)量問題,提出了數(shù)據(jù)質(zhì)量度量指標(biāo)及指標(biāo)的計(jì)算方法。目前對(duì)數(shù)據(jù)質(zhì)量的研究主要存在以下問題:首先,沒有形成系統(tǒng)化的數(shù)據(jù)質(zhì)量評(píng)估指標(biāo),導(dǎo)致不能形成完整的數(shù)據(jù)質(zhì)量體系;其次,還沒有形成一個(gè)權(quán)威性的數(shù)據(jù)質(zhì)量參考模型,當(dāng)前的研究都是針對(duì)單一問題進(jìn)行的;最后,數(shù)據(jù)質(zhì)量?jī)?nèi)容的定義是變化的,這就需要數(shù)據(jù)質(zhì)量模型具有相應(yīng)的擴(kuò)展性,以滿足這種變化需求。針對(duì)這些問題重點(diǎn)進(jìn)行了以下研究。 首先,提出并構(gòu)建了完整的數(shù)據(jù)質(zhì)量評(píng)估體系。定義了準(zhǔn)確性、一致性等七類數(shù)據(jù)質(zhì)量元素和非空約束、值域約束等十五個(gè)維度的規(guī)則,其中數(shù)據(jù)質(zhì)量元素用于描述數(shù)據(jù)質(zhì)量,數(shù)據(jù)質(zhì)量約束規(guī)則反映了具體業(yè)務(wù)規(guī)則和領(lǐng)域知識(shí);對(duì)數(shù)據(jù)質(zhì)量評(píng)估指標(biāo)給出了定義和具體算法;提出了數(shù)據(jù)質(zhì)量分析評(píng)估體系結(jié)構(gòu)及流程,整個(gè)體系結(jié)構(gòu)分為數(shù)據(jù)層和應(yīng)用層。數(shù)據(jù)層包括實(shí)例層、模式層、數(shù)據(jù)質(zhì)量層和數(shù)據(jù)質(zhì)量擴(kuò)展層,數(shù)據(jù)質(zhì)量層即數(shù)據(jù)質(zhì)量元模型,數(shù)據(jù)質(zhì)量擴(kuò)展層提供了對(duì)數(shù)據(jù)質(zhì)量元模型的擴(kuò)展;應(yīng)用層包括數(shù)據(jù)質(zhì)量分析評(píng)估層、展示層。 再次,針對(duì)數(shù)據(jù)中心中存在的相似重復(fù)記錄問題,,采用了傳統(tǒng)的“排序合并”的方法,本文提出了一種改進(jìn)的基于內(nèi)碼序值聚類的檢測(cè)方法,在字符串匹配算法中本文借鑒了生物信息學(xué)中的序列比對(duì)算法。改進(jìn)后的方法提高了檢測(cè)效率,并在實(shí)際應(yīng)用中取得了良好效果。 最后,以大慶油田井下作業(yè)分公司數(shù)據(jù)中心數(shù)據(jù)質(zhì)量檢測(cè)與評(píng)估為背景,對(duì)所提出的數(shù)據(jù)質(zhì)量檢測(cè)與評(píng)估體系進(jìn)行了設(shè)計(jì)與實(shí)現(xiàn),該系統(tǒng)實(shí)現(xiàn)了對(duì)各種業(yè)務(wù)規(guī)則的管理與維護(hù),對(duì)各種數(shù)據(jù)質(zhì)量指標(biāo)的評(píng)估。該系統(tǒng)已在井下作業(yè)數(shù)據(jù)中心運(yùn)行,對(duì)數(shù)據(jù)中心的據(jù)質(zhì)量的改善起到了重要作用。
[Abstract]:In order to improve data quality, the factors affecting data quality and the methods to improve data quality have been studied extensively at home and abroad. These researches mainly focus on the data quality problems in data warehouse, and put forward the data quality measurement index and the calculation method of data quality index. At present, there are the following problems in the research of data quality: firstly, there is no systematic evaluation index of data quality, which leads to the failure to form a complete data quality system; Secondly, there is not yet an authoritative data quality reference model, the current research is aimed at a single problem; Finally, the definition of data quality content is variable, which requires that the data quality model has the corresponding expansibility to meet the changing requirements. In view of these problems, the following research focus has been carried out. Firstly, a complete data quality evaluation system is proposed and constructed. Seven kinds of data quality elements, such as accuracy, consistency, and non-empty constraints, and range constraints are defined. The data quality elements are used to describe the data quality. Data quality constraint rules reflect specific business rules and domain knowledge. The definition and algorithm of data quality evaluation index are given, and the architecture and process of data quality analysis and evaluation are presented. The whole architecture is divided into data layer and application layer. The data layer includes instance layer, mode layer, data quality layer and data quality extension layer. The data quality layer is the data quality metadata model, and the data quality extension layer provides the extension of the data quality metadata model. The application layer includes data quality analysis and evaluation layer and presentation layer. Thirdly, aiming at the problem of similar duplicate records in the data center, the traditional method of "sorting and merging" is adopted, and an improved detection method based on inner code order value clustering is proposed in this paper. In the string matching algorithm, we draw lessons from the sequence alignment algorithm in bioinformatics. The improved method improves the detection efficiency and achieves good results in practical application. Finally, under the background of data quality detection and evaluation of data center of Daqing oil field downhole operation branch, the proposed data quality detection and evaluation system is designed and implemented. The system realizes the management and maintenance of various business rules. Evaluate various data quality indicators. The system has been running in the underground operation data center, which plays an important role in improving the quality of the data center.
【學(xué)位授予單位】:東北石油大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP308

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 戴超凡,鄧蘇,陳文偉,唐九陽,陸昌輝;開放信息模型研究[J];計(jì)算機(jī)工程與應(yīng)用;2001年01期

2 戴超凡,陳文偉,鄧蘇,陸昌輝,唐九陽;數(shù)據(jù)倉庫中元數(shù)據(jù)技術(shù)研究[J];計(jì)算機(jī)工程與應(yīng)用;2001年14期

3 方幼林 ,楊冬青 ,唐世渭 ,張衛(wèi)華 ,余利波 ,付強(qiáng);數(shù)據(jù)倉庫中數(shù)據(jù)質(zhì)量控制研究[J];計(jì)算機(jī)工程與應(yīng)用;2003年13期

4 楊青云,趙培英,楊冬青,唐世渭,童云海;數(shù)據(jù)質(zhì)量評(píng)估方法研究[J];計(jì)算機(jī)工程與應(yīng)用;2004年09期

5 俞榮華;田增平;周傲英;;一種檢測(cè)多語言文本相似重復(fù)記錄的綜合方法[J];計(jì)算機(jī)科學(xué);2002年01期

6 陳怡海;繆淮扣;;OCL與Object-Z作為UML約束語言的分析比較[J];計(jì)算機(jī)科學(xué);2004年12期

7 阿不都克里木,高永強(qiáng),遲忠先;數(shù)據(jù)倉庫質(zhì)量及其應(yīng)用[J];計(jì)算機(jī)工程;2002年04期

8 郭志懋,俞榮華,田增平,周傲英;一個(gè)可擴(kuò)展的數(shù)據(jù)清洗系統(tǒng)[J];計(jì)算機(jī)工程;2003年03期

9 管尊友,馮建華;一個(gè)可擴(kuò)展的數(shù)據(jù)質(zhì)量元模型[J];計(jì)算機(jī)工程;2005年08期

10 邱越峰,田增平,季文

本文編號(hào):2406365


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2406365.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶dfc83***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com