面向離散屬性的決策樹分類方法研究

發(fā)布時間：2018-05-21 05:16

本文選題：數(shù)據(jù)挖掘 + 決策樹��；參考：《大連海事大學》2017年碩士論文

【摘要】：數(shù)據(jù)挖掘是指在大量已存在的數(shù)據(jù)中發(fā)現(xiàn)規(guī)律的一個過程。近年來,在大量數(shù)據(jù)中智能提取知識已經(jīng)引起了業(yè)界廣泛的關注。數(shù)據(jù)挖掘領域包括分類、聚類、聚簇、關聯(lián)分析等各種挖掘方法。決策樹算法因它提取知識簡單、高效、易于理解等優(yōu)點,在數(shù)據(jù)挖掘領域中占有無可替代的地位。在已有的決策樹算法中,計算決策樹分裂結點的標準大多以香農的信息熵為基礎,信息熵需反復地進行對數(shù)運算,分類效率不高。又因已有算法在選擇候選結點時的隨機性,使分類器無法進一步選擇判斷屬性分裂標準相同時的情況,進而降低預測分類的準確率。本文針對已有決策樹算法的缺點,提出以下改進:(1)本文針對已有決策樹算法分類效率不高的問題,為避免復雜的對數(shù)運算,提高CPU的利用率,提出了改進的屬性判斷標準的優(yōu)化函數(shù)。對比實驗顯示該優(yōu)化函數(shù)能有效提高分類效率和CPU的利用率。(2)本文針對生成后的決策樹分類器精確率低的問題,為避免當兩個或更多的屬性判斷標準的計算值接近某個閾值或相等,隨機選擇一個結點作為下一個屬性分裂的結點,進一步引入了一個基于堆的屬性判斷方法,以此來提高分類精確率。通過實驗驗證,該方法可以有效提高某些特定數(shù)據(jù)集的分類精確率。(3)本文進一步針對決策樹分類精確率不高以及過度擬合的問題,引入了基于分類規(guī)則的方法。利用改進的決策樹算法N次隨機抽樣生成N個決策樹分類器,再從這些分類器中挑選出最優(yōu)的分類規(guī)則,生成最終的決策樹模型。經(jīng)過實驗驗證,該算法相比已有算法,在分類效率和分類準確率上都有相應的提高。
[Abstract]:Data mining is a process of discovering laws in a large number of existing data. In recent years, intelligent knowledge extraction in a large number of data has attracted wide attention in the industry. Data mining includes classification, clustering, association analysis and other mining methods. Decision tree algorithm plays an irreplaceable role in the field of data mining because it is simple, efficient and easy to understand. In the existing decision tree algorithms, most of the criteria for computing decision tree splitting nodes are based on Shannon's information entropy, which needs repeated logarithmic operations, so the classification efficiency is not high. Because of the randomness of the existing algorithms in selecting candidate nodes, the classifier is unable to further select the case where the criterion of attribute splitting is the same, thus reducing the accuracy of prediction classification. In order to avoid the complex logarithmic operation and improve the utilization of CPU, this paper aims at the problem that the classification efficiency of the existing decision tree algorithm is not high. An improved optimization function of attribute judgment criterion is proposed. The comparison experiment shows that the optimized function can effectively improve the classification efficiency and the utilization ratio of CPU.) in this paper, we aim at the problem of low accuracy rate of the decision tree classifier. In order to avoid when two or more attribute judgment criteria are close to a threshold value or equal, a heap based attribute judgment method is further introduced by randomly selecting one node as the next attribute split node. In order to improve the accuracy of classification. Experimental results show that this method can effectively improve the classification accuracy rate of some specific data sets. (3) in this paper, we further introduce a method based on classification rules to solve the problem of low classification accuracy rate and over-fitting of decision trees. The improved decision tree algorithm is used to generate N decision tree classifiers by random sampling, and the optimal classification rules are selected from these classifiers to generate the final decision tree model. The experimental results show that compared with the existing algorithms, the proposed algorithm can improve the classification efficiency and classification accuracy.
【學位授予單位】：大連海事大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：TP311.13

【參考文獻】

相關期刊論文前10條

1 姚程寬;光峰;盧燦舉;曹立勇;詹U，

本文編號：1917892

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1917892.html

上一篇：基于虛擬現(xiàn)實的船舶輔機設備拆裝訓練系統(tǒng)
下一篇：基于海量數(shù)據(jù)分析的食品藥品安全檢測系統(tǒng)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向離散屬性的決策樹分類方法研究