結合屬性關聯(lián)度的決策樹算法研究及應用
[Abstract]:In the new century, challenges and opportunities coexist, it can be said that the use and control of the mass of data is related to the future development of various industries. The in-depth exploration in big data's field can analyze the data model more macroscopically, discover the potential law and predict the future trend reasonably, so as to gain insight into more profound, effective and comprehensive information. The research of data mining related algorithms can be said to have both scientific research value and practical value. In this paper, based on the classical decision tree C4.5 algorithm, the Apriori association rule algorithm is used to combine the correlation degree among the attributes of data source into the calculation of the later decision tree. The traditional C4.5 algorithm only considers the correlation between the attributes to be tested and the class attributes when selecting the split attributes, and ignores the degree of association between the non-class attributes, which determines the degree of redundancy between the attributes. In order to reduce the influence of redundancy, this paper uses the idea of information gain to measure the measured attributes and other non-class attributes, and adds them to the original algorithm to generate more reliable splitting attributes. In addition, in the process of constructing the decision tree model, in view of the lack of information, the paper also uses the association rule Apriori algorithm to generate a series of strong rules, and according to the proposed new attribute selection criteria, the paper also uses the association rules Apriori algorithm to generate a series of strong rules. From these strong rules, new attributes are screened out and added to the original set of attributes to expand the amount of information, so that the prediction accuracy of C4.5 algorithm can be improved. The information contained in a sample is often diverse and rich. By using the traditional decision tree algorithm, we can know the degree of association of "attribute-classification", but the degree of correlation between attributes is a kind of horizontal analysis of data set. By analyzing the relationship between the two attributes, we can make our analysis framework more stereoscopic and the results more usable. Finally, this paper applies this method to practical examples, using historical data to find out the primary and secondary factors that affect the entrance of gymnasium customers, establish the model and forecast the relevant attributes, and find those interested and valuable customer groups. The practical value of the algorithm in the actual scene is further explained.
【學位授予單位】:山東大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP311.13
【參考文獻】
相關期刊論文 前10條
1 程斐斐;王子牛;侯立鐸;;決策樹算法在Weka平臺上的數(shù)據(jù)挖掘應用[J];微型電腦應用;2015年06期
2 周發(fā)超;王志堅;葉楓;鄧玲玲;;關聯(lián)規(guī)則挖掘算法Apriori的研究改進[J];計算機科學與探索;2015年09期
3 翟霞;劉政宇;;關聯(lián)規(guī)則中Apriori算法的創(chuàng)新研究[J];數(shù)字技術與應用;2014年04期
4 周劍峰;陽愛民;劉吉財;;基于改進的C4.5算法的網絡流量分類方法[J];計算機工程與應用;2012年05期
5 謝妞妞;劉於勛;;決策樹屬性選擇標準的改進[J];計算機工程與應用;2010年34期
6 常少春;;基于Apriori有效關聯(lián)規(guī)則及其興趣度的研究[J];科學技術與工程;2010年28期
7 劉先花;;淺談數(shù)據(jù)挖掘技術及其研究現(xiàn)狀[J];現(xiàn)代情報;2010年03期
8 黃愛輝;;決策樹C4.5算法的改進及應用[J];科學技術與工程;2009年01期
9 李楠;段隆振;陳萌;;決策樹C4.5算法在數(shù)據(jù)挖掘中的分析及其應用[J];計算機與現(xiàn)代化;2008年12期
10 王曉國,黃韶坤,朱煒,李啟炎;應用C4.5算法構造客戶分類決策樹的方法[J];計算機工程;2003年14期
相關碩士學位論文 前7條
1 付利紅;關聯(lián)規(guī)則挖掘算法在web日志挖掘中的應用研究[D];山東大學;2011年
2 周賢;基于數(shù)據(jù)挖掘的就業(yè)管理信息系統(tǒng)的研究[D];湖南大學;2011年
3 戴支立;分布式環(huán)境下關聯(lián)規(guī)則挖掘的隱私保護方法研究[D];南京郵電大學;2011年
4 李健平;決策樹技術在軍事訓練成績中的分析研究[D];昆明理工大學;2010年
5 馮宏亮;數(shù)據(jù)挖掘中若干關鍵算法的研究[D];西安科技大學;2010年
6 吳喜萍;基于關聯(lián)規(guī)則數(shù)據(jù)挖掘技術的高校學生學習成績分析[D];西南交通大學;2010年
7 黃杰;數(shù)據(jù)挖掘在軍隊人才培養(yǎng)上的應用研究[D];重慶大學;2005年
,本文編號:2442507
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2442507.html