天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 自動(dòng)化論文 >

基于KDB模型的無(wú)約束貝葉斯分類器的研究與應(yīng)用

發(fā)布時(shí)間:2018-05-05 18:43

  本文選題:數(shù)據(jù)挖掘 + 無(wú)約束貝葉斯分類 ; 參考:《吉林大學(xué)》2017年碩士論文


【摘要】:近年來,隨著信息時(shí)代的迅速發(fā)展產(chǎn)生了海量的數(shù)據(jù),人們?cè)谔幚頂?shù)據(jù)時(shí)不能判斷哪些數(shù)據(jù)起決策作用,如何能從在大量的數(shù)據(jù)中發(fā)現(xiàn)知識(shí)、獲取有用信息是數(shù)據(jù)挖掘技術(shù)需要解決的問題。數(shù)據(jù)挖掘中的分類問題是研究人員越來越關(guān)注的學(xué)習(xí)方向之一,對(duì)分類問題的研究將有助于根據(jù)潛在信息做出關(guān)鍵決策。貝葉斯網(wǎng)絡(luò)以貝葉斯理論和圖論為理論基礎(chǔ),被視為知識(shí)發(fā)現(xiàn)、人工智能和數(shù)據(jù)挖掘等領(lǐng)域最有前景的研究方法之一。貝葉斯分類模型是貝葉斯網(wǎng)絡(luò)的一個(gè)分支,分為約束型和無(wú)約束型兩種模型。樸素貝葉斯是最經(jīng)典的約束型貝葉斯分類模型,是構(gòu)成其它約束型貝葉斯分類模型的基本框架,簡(jiǎn)單快速并且高效,但是它的條件獨(dú)立性假設(shè)在實(shí)踐中通常并不成立。KDB分類器在樸素貝葉斯的基礎(chǔ)上進(jìn)一步放松屬性間的獨(dú)立假設(shè),可以構(gòu)造任意k階結(jié)構(gòu)復(fù)雜度的網(wǎng)絡(luò)結(jié)構(gòu),同時(shí)還能保持樸素貝葉斯的計(jì)算效率。然而,雖然KDB已經(jīng)表現(xiàn)出顯著的分類性能,但是,其受約束的網(wǎng)絡(luò)結(jié)構(gòu)使得它不可能表達(dá)對(duì)應(yīng)于最優(yōu)結(jié)構(gòu)的類變量的馬爾可夫毯,并且測(cè)試實(shí)例中蘊(yùn)含的有效信息沒有被充分利用,最終的決策可能是有偏差的,造成分類精度下降。針對(duì)以上問題,本文基于類變量的馬爾可夫毯分析,結(jié)合局部學(xué)習(xí)提出了無(wú)約束型k-依賴分類器UKDB。從結(jié)構(gòu)復(fù)雜度、分類效果和計(jì)算效率等方面,主要作了如下研究:1.UKDB可以表達(dá)屬性依賴系列的任意k階結(jié)構(gòu)復(fù)雜度,并且可以輸出兩種子分類器,即描述隱含在訓(xùn)練集中因果關(guān)系的全局模型和描述隱含在測(cè)試集中因果關(guān)系的局部模型。局部模型可以看成是全局模型的補(bǔ)充部分。2.在UCI機(jī)器學(xué)習(xí)數(shù)據(jù)庫(kù)的50個(gè)數(shù)據(jù)集的實(shí)驗(yàn)結(jié)果表明,UKDB在0-1損失、偏差和方差方面的綜合性能優(yōu)于KDB,并且只需要相對(duì)較小的計(jì)算復(fù)雜度。3.另外,將UKDB模型應(yīng)用在醫(yī)療診斷上具有很大意義,在Wisconsin乳腺癌數(shù)據(jù)集的實(shí)驗(yàn)結(jié)果表明,UKDB相比KDB而言,分類誤差顯著降低了54.1%?傮w而言,相對(duì)于KDB,UKDB更充分體現(xiàn)了結(jié)構(gòu)復(fù)雜度和性能之間的權(quán)衡。
[Abstract]:In recent years, with the rapid development of the information age, people can not judge which data play a decision role, how to find knowledge from a large number of data. Obtaining useful information is a problem that needs to be solved in data mining technology. The classification problem in data mining is one of the learning directions that researchers pay more and more attention to. The research on classification problem will be helpful to make the key decision based on the potential information. Based on Bayesian theory and graph theory, Bayesian network is regarded as one of the most promising research methods in the fields of knowledge discovery, artificial intelligence and data mining. Bayesian classification model is a branch of Bayesian network, which can be divided into two types: constrained model and unconstrained model. Naive Bayes is the most classical constrained Bayesian classification model, which is the basic framework of other constrained Bayesian classification models. It is simple, fast and efficient. However, its conditional independence hypothesis usually does not hold in practice. On the basis of naive Bayes, the KDB classifier further relaxes the independent assumption among attributes, and can construct a network structure with arbitrary k-order structural complexity. At the same time, the computational efficiency of naive Bayes can be maintained. However, although KDB has shown significant classification performance, its constrained network structure makes it impossible for it to express Markov blankets corresponding to the class variables of the optimal structure. And the effective information contained in the test example is not fully utilized, the final decision may be biased, resulting in a decline in classification accuracy. In order to solve the above problems, based on Markov blanket analysis of class variables and local learning, an unconstrained k- dependent classifier UKDBs is proposed. From the aspects of structure complexity, classification effect and computational efficiency, this paper mainly studies as follows: 1. UKDB can express any k-order structural complexity of attribute-dependent series, and can output two seed classifiers. That is to describe the global model of causality implied in the training set and the local model of the causality implied in the test set. The local model can be regarded as a supplement to the global model. The experimental results of 50 data sets in UCI machine learning database show that the comprehensive performance of UCI is superior to that of KDBs in 0-1 loss, deviation and variance, and only requires a relatively small computational complexity of .3. In addition, the application of UKDB model in medical diagnosis has great significance. The experimental results of Wisconsin breast cancer data set show that the classification error is significantly reduced by 54.1% compared with KDB. In general, the trade-off between structural complexity and performance is better than KDB-UKDB.
【學(xué)位授予單位】:吉林大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP18

【參考文獻(xiàn)】

相關(guān)期刊論文 前1條

1 張劍飛;劉克會(huì);杜曉昕;;基于k階依賴擴(kuò)展的貝葉斯網(wǎng)絡(luò)分類器集成學(xué)習(xí)算法[J];東北師大學(xué)報(bào)(自然科學(xué)版);2016年01期

相關(guān)碩士學(xué)位論文 前5條

1 李冬梅;樸素貝葉斯與決策樹混合分類方法的研究[D];大連海事大學(xué);2016年

2 阿曼;樸素貝葉斯分類算法的研究與應(yīng)用[D];大連理工大學(xué);2014年

3 孫文靜;基于依賴分析和假設(shè)檢驗(yàn)的貝葉斯分類器[D];西安電子科技大學(xué);2014年

4 孫秀亮;基于屬性加權(quán)的選擇性樸素貝葉斯分類研究[D];哈爾濱工程大學(xué);2013年

5 王國(guó)才;樸素貝葉斯分類器的研究與應(yīng)用[D];重慶交通大學(xué);2010年



本文編號(hào):1848860

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1848860.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶d3920***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com