結合屬性關聯(lián)度的決策樹算法研究及應用

發(fā)布時間：2019-03-17 17:16

【摘要】：在新的世紀,挑戰(zhàn)與機遇并存,可以說利用并駕馭所產生出的海量數(shù)據(jù)關系著各個行業(yè)未來的發(fā)展。在大數(shù)據(jù)領域的深入探索可以更宏觀的分析數(shù)據(jù)模式,發(fā)現(xiàn)潛在規(guī)律并對未來趨勢進行合理預測,從而可以洞察到更為深刻、有效、全面的信息。數(shù)據(jù)挖掘相關算法的研究,可以說是兼具科學研究價值與實用價值。本課題在經典的決策樹C4.5算法基礎上,利用Apriori關聯(lián)規(guī)則算法將數(shù)據(jù)源屬性間的關聯(lián)度結合到后期決策樹的計算中。傳統(tǒng)的C4.5算法對選擇分裂屬性時只是考慮待測屬性與類屬性之間的相關性,忽視了非類屬性間的關聯(lián)程度,而這種關聯(lián)程度決定了屬性間冗余度的大小。為了能夠降低冗余帶來的影響,本文運用信息增益的思想對待測屬性和其他非類屬性進行度量,并加入到原始的算法中,從而生成更加可靠的分裂屬性。另外在決策樹模型的構建過程中,針對屬性涵蓋信息量不足的情況,論文同時也采用了關聯(lián)規(guī)則Apriori算法生成一系列強規(guī)則,并根據(jù)提出的新屬性選擇標準,從這些強規(guī)則中繼續(xù)篩選出新的屬性并加入到原始屬性集合中,達到擴充信息量的目的,進而使C4.5算法預測的準確率得到提升。一個樣例中包含的信息往往是多樣而豐富的。利用傳統(tǒng)的決策樹算法,我們可以知道"屬性-分類"的關聯(lián)程度,但屬性間的關聯(lián)程度是一種對數(shù)據(jù)集的橫向分析,通過分析兩個屬性間的關系可以使我們的分析框架更為立體,結果更具有可用性。最后,本文將該方法應用到了實例中,利用歷史數(shù)據(jù)找出影響健身場館顧客入會的主次因素,結合相關屬性建立模型并進行預測,發(fā)現(xiàn)那些有興趣并且有較高價值的客戶群體,進一步說明算法在實際場景中的實用價值。
[Abstract]:In the new century, challenges and opportunities coexist, it can be said that the use and control of the mass of data is related to the future development of various industries. The in-depth exploration in big data's field can analyze the data model more macroscopically, discover the potential law and predict the future trend reasonably, so as to gain insight into more profound, effective and comprehensive information. The research of data mining related algorithms can be said to have both scientific research value and practical value. In this paper, based on the classical decision tree C4.5 algorithm, the Apriori association rule algorithm is used to combine the correlation degree among the attributes of data source into the calculation of the later decision tree. The traditional C4.5 algorithm only considers the correlation between the attributes to be tested and the class attributes when selecting the split attributes, and ignores the degree of association between the non-class attributes, which determines the degree of redundancy between the attributes. In order to reduce the influence of redundancy, this paper uses the idea of information gain to measure the measured attributes and other non-class attributes, and adds them to the original algorithm to generate more reliable splitting attributes. In addition, in the process of constructing the decision tree model, in view of the lack of information, the paper also uses the association rule Apriori algorithm to generate a series of strong rules, and according to the proposed new attribute selection criteria, the paper also uses the association rules Apriori algorithm to generate a series of strong rules. From these strong rules, new attributes are screened out and added to the original set of attributes to expand the amount of information, so that the prediction accuracy of C4.5 algorithm can be improved. The information contained in a sample is often diverse and rich. By using the traditional decision tree algorithm, we can know the degree of association of "attribute-classification", but the degree of correlation between attributes is a kind of horizontal analysis of data set. By analyzing the relationship between the two attributes, we can make our analysis framework more stereoscopic and the results more usable. Finally, this paper applies this method to practical examples, using historical data to find out the primary and secondary factors that affect the entrance of gymnasium customers, establish the model and forecast the relevant attributes, and find those interested and valuable customer groups. The practical value of the algorithm in the actual scene is further explained.
【學位授予單位】：山東大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：TP311.13

【參考文獻】

相關期刊論文前10條

1 程斐斐;王子牛;侯立鐸;;決策樹算法在Weka平臺上的數(shù)據(jù)挖掘應用[J];微型電腦應用;2015年06期

2 周發(fā)超;王志堅;葉楓;鄧玲玲;;關聯(lián)規(guī)則挖掘算法Apriori的研究改進[J];計算機科學與探索;2015年09期

3 翟霞;劉政宇;;關聯(lián)規(guī)則中Apriori算法的創(chuàng)新研究[J];數(shù)字技術與應用;2014年04期

4 周劍峰;陽愛民;劉吉財;;基于改進的C4.5算法的網絡流量分類方法[J];計算機工程與應用;2012年05期

5 謝妞妞;劉於勛;;決策樹屬性選擇標準的改進[J];計算機工程與應用;2010年34期

6 常少春;;基于Apriori有效關聯(lián)規(guī)則及其興趣度的研究[J];科學技術與工程;2010年28期

7 劉先花;;淺談數(shù)據(jù)挖掘技術及其研究現(xiàn)狀[J];現(xiàn)代情報;2010年03期

8 黃愛輝;;決策樹C4.5算法的改進及應用[J];科學技術與工程;2009年01期

9 李楠;段隆振;陳萌;;決策樹C4.5算法在數(shù)據(jù)挖掘中的分析及其應用[J];計算機與現(xiàn)代化;2008年12期

10 王曉國,黃韶坤,朱煒,李啟炎;應用C4.5算法構造客戶分類決策樹的方法[J];計算機工程;2003年14期

相關碩士學位論文前7條

1 付利紅;關聯(lián)規(guī)則挖掘算法在web日志挖掘中的應用研究[D];山東大學;2011年

2 周賢;基于數(shù)據(jù)挖掘的就業(yè)管理信息系統(tǒng)的研究[D];湖南大學;2011年

3 戴支立;分布式環(huán)境下關聯(lián)規(guī)則挖掘的隱私保護方法研究[D];南京郵電大學;2011年

4 李健平;決策樹技術在軍事訓練成績中的分析研究[D];昆明理工大學;2010年

5 馮宏亮;數(shù)據(jù)挖掘中若干關鍵算法的研究[D];西安科技大學;2010年

6 吳喜萍;基于關聯(lián)規(guī)則數(shù)據(jù)挖掘技術的高校學生學習成績分析[D];西南交通大學;2010年

7 黃杰;數(shù)據(jù)挖掘在軍隊人才培養(yǎng)上的應用研究[D];重慶大學;2005年

，

本文編號：2442507

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2442507.html

上一篇：一種改進的交通標志圖像識別算法
下一篇：基于PhraseLDA模型的主題短語挖掘方法研究

論文發(fā)表

·知網|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

結合屬性關聯(lián)度的決策樹算法研究及應用