天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 管理論文 > 信息管理論文 >

數(shù)據(jù)質(zhì)量校驗(yàn)規(guī)則提取技術(shù)的研究

發(fā)布時(shí)間:2018-06-11 17:16

  本文選題:數(shù)據(jù)質(zhì)量 + 規(guī)則提取; 參考:《東北石油大學(xué)》2017年碩士論文


【摘要】:隨著信息行業(yè)的發(fā)展,數(shù)據(jù)已經(jīng)滲透到各行各業(yè)生產(chǎn)經(jīng)營(yíng)環(huán)節(jié)中,數(shù)據(jù)量隨之越來(lái)越龐大。但同時(shí)“數(shù)據(jù)豐富,信息匱乏”現(xiàn)象也變得越來(lái)越突出,造成這種現(xiàn)象的主要原因:一方面是現(xiàn)在并沒有強(qiáng)有力的數(shù)據(jù)集成及數(shù)據(jù)分析技術(shù),另一方面是隨著臟數(shù)據(jù)的出現(xiàn),嚴(yán)重的影響了數(shù)據(jù)質(zhì)量,導(dǎo)致各行業(yè)不能有效的利用現(xiàn)有的數(shù)據(jù)。數(shù)據(jù)質(zhì)量是數(shù)據(jù)分析、挖掘、決策的前提和基礎(chǔ)。數(shù)據(jù)質(zhì)量的提高,不但可以準(zhǔn)確的反映現(xiàn)實(shí)世界的狀況,同時(shí)也可以高效地支持企業(yè)的運(yùn)作和決策。因此數(shù)據(jù)質(zhì)量問(wèn)題成為數(shù)據(jù)管理領(lǐng)域研究的一個(gè)熱點(diǎn)問(wèn)題。數(shù)據(jù)質(zhì)量管理的方式主要是采用數(shù)據(jù)質(zhì)量校驗(yàn)規(guī)則來(lái)判斷數(shù)據(jù)合法性以及評(píng)估數(shù)據(jù)質(zhì)量等級(jí)。數(shù)據(jù)質(zhì)量校驗(yàn)規(guī)則與業(yè)務(wù)領(lǐng)域緊密關(guān)聯(lián),目前數(shù)據(jù)質(zhì)量校驗(yàn)規(guī)則通常依靠領(lǐng)域?qū)<液蛿?shù)據(jù)管理專家采用手工方式制定。手工制定規(guī)則工作量大,效率低、耗時(shí)長(zhǎng),且規(guī)則完整性難以保證。因此本文采用軟件工程中“逆向工程”思想,借助機(jī)器學(xué)習(xí)相關(guān)技術(shù),研究數(shù)據(jù)質(zhì)量校驗(yàn)規(guī)則自動(dòng)生成技術(shù),可以為領(lǐng)域?qū)<姨峁└嗟囊?guī)則備選方案,提高數(shù)據(jù)質(zhì)量校驗(yàn)規(guī)則制定效率。為了全方面的檢查出數(shù)據(jù)庫(kù)中的所有質(zhì)量問(wèn)題,本文研究了數(shù)據(jù)質(zhì)量維度評(píng)估標(biāo)準(zhǔn),并以規(guī)則約束為研究點(diǎn),針對(duì)Oracle和Excel數(shù)據(jù)源的文本數(shù)據(jù)格式、值域以及函數(shù)依賴關(guān)系展開研究,設(shè)計(jì)三種數(shù)據(jù)質(zhì)量校驗(yàn)規(guī)則提取的學(xué)習(xí)算法的,研發(fā)具有較高通用性,且不受領(lǐng)域限制的數(shù)據(jù)質(zhì)量校驗(yàn)規(guī)則提取系統(tǒng)。
[Abstract]:With the development of information industry, the data has penetrated into all walks of life production and management, and the amount of data has become more and more huge. But at the same time, the phenomenon of "data rich, lack of information" has become more and more prominent. The main reason for this phenomenon is that there is no strong data integration and data analysis technology, and on the other hand, with the emergence of dirty data, Seriously affect the quality of data, resulting in the industry can not effectively use the existing data. Data quality is the premise and foundation of data analysis, mining and decision-making. The improvement of data quality can not only accurately reflect the real world situation, but also effectively support the operation and decision-making of enterprises. Therefore, data quality has become a hot issue in the field of data management. The main way of data quality management is to use the data quality check rule to judge the validity of data and evaluate the grade of data quality. The data quality verification rules are closely related to the business domain. At present, the data quality verification rules are usually formulated manually by domain experts and data management experts. Manual rule-making work is heavy, inefficient, time-consuming, and the integrity of the rules is difficult to guarantee. Therefore, this paper adopts the idea of "reverse engineering" in software engineering, with the help of machine learning related technology, to study the automatic generation technology of data quality verification rules, which can provide more alternative schemes for domain experts. Improve the efficiency of data quality calibration rules. In order to check out all the quality problems in the database in all aspects, this paper studies the evaluation standard of data quality dimension, and takes the rule constraint as the research point, aiming at the text data format of Oracle and Excel data sources. In this paper, three learning algorithms of data quality check rule extraction are designed and studied, and a data quality check rule extraction system with high generality and no limitation of domain is developed.
【學(xué)位授予單位】:東北石油大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:F49;TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 許滌龍;葉少波;;統(tǒng)計(jì)數(shù)據(jù)質(zhì)量評(píng)估方法研究述評(píng)[J];統(tǒng)計(jì)與信息論壇;2011年07期

2 龐雄文;姚占林;李擁軍;;大數(shù)據(jù)量的高效重復(fù)記錄檢測(cè)方法[J];華中科技大學(xué)學(xué)報(bào)(自然科學(xué)版);2010年02期

3 韓京宇;宋愛波;董逸生;;數(shù)據(jù)質(zhì)量維度量化方法[J];計(jì)算機(jī)工程與應(yīng)用;2008年36期

4 魯婧婧;張晉昕;袁向東;駱福添;古萍;張熙;薛允蓮;;歐氏距離的加權(quán)處理對(duì)K-means法聚類效果的改進(jìn)[J];中國(guó)醫(yī)院統(tǒng)計(jì);2008年01期

5 韓京宇;徐立臻;董逸生;;數(shù)據(jù)質(zhì)量研究綜述[J];計(jì)算機(jī)科學(xué);2008年02期

6 王守強(qiáng);朱大銘;徐小平;;求解K-means聚類更有效的算法[J];計(jì)算機(jī)工程與設(shè)計(jì);2008年02期

7 王學(xué)良;商廣娟;;多指標(biāo)的數(shù)據(jù)質(zhì)量評(píng)價(jià)方法綜述[J];航空標(biāo)準(zhǔn)化與質(zhì)量;2007年06期

8 劉韜;蔡淑琴;曹豐文;崔志磊;;基于距離濃度的K-均值聚類算法[J];華中科技大學(xué)學(xué)報(bào)(自然科學(xué)版);2007年10期

9 徐躍,馮宗憲;稅收征管數(shù)據(jù)質(zhì)量管理系統(tǒng)的開發(fā)研究[J];運(yùn)城學(xué)院學(xué)報(bào);2005年04期

10 管尊友,馮建華;一個(gè)可擴(kuò)展的數(shù)據(jù)質(zhì)量元模型[J];計(jì)算機(jī)工程;2005年08期

相關(guān)碩士學(xué)位論文 前6條

1 戰(zhàn)蒙蒙;油田開發(fā)數(shù)據(jù)質(zhì)量保障體系研究與實(shí)現(xiàn)[D];東北石油大學(xué);2016年

2 盧本新;數(shù)據(jù)倉(cāng)庫(kù)數(shù)據(jù)質(zhì)量管理的研究[D];大連理工大學(xué);2013年

3 趙兵兵;達(dá)夢(mèng)ETL數(shù)據(jù)質(zhì)量管理系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)[D];華中科技大學(xué);2012年

4 謝明吉;數(shù)據(jù)清洗中相似記錄檢測(cè)的研究[D];華南理工大學(xué);2010年

5 王彥茹;統(tǒng)計(jì)體制視角下的我國(guó)統(tǒng)計(jì)數(shù)據(jù)質(zhì)量研究[D];東北財(cái)經(jīng)大學(xué);2006年

6 周宏廣;異構(gòu)數(shù)據(jù)源集成中清洗策略的研究及應(yīng)用[D];中南大學(xué);2004年

,

本文編號(hào):2006058

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/sjfx/2006058.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶c4bc0***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com