基于頻繁模式樹的關(guān)聯(lián)規(guī)則算法研究
發(fā)布時(shí)間:2018-11-02 08:34
【摘要】: 數(shù)據(jù)挖掘是近年來(lái)迅速發(fā)展的信息處理技術(shù),從大量的、不完全的、有噪聲的、模糊的、隨機(jī)的數(shù)據(jù)中,提取隱含在其中的、人們事先不知道的、但又是潛在有用的信息和知識(shí)的過(guò)程。它涉及數(shù)據(jù)庫(kù)、人工智能、機(jī)器學(xué)習(xí)、模式識(shí)別、知識(shí)工程、面向?qū)ο、信息檢索和可視化等一系列技術(shù)。關(guān)聯(lián)規(guī)則挖掘作為數(shù)據(jù)挖掘領(lǐng)域的一個(gè)重要研究分支,它的任務(wù)是發(fā)現(xiàn)所有滿足支持度閾值和置信度閾值的強(qiáng)關(guān)聯(lián)規(guī)則。關(guān)聯(lián)規(guī)則挖掘算法是關(guān)聯(lián)規(guī)則挖掘研究的主要內(nèi)容,迄今為止已經(jīng)提出了許多高效的關(guān)聯(lián)規(guī)則挖掘算法。 本文對(duì)經(jīng)典的Apriori和AprioriTid算法以及不產(chǎn)生候選集的FP-Growth算法進(jìn)行了分析和研究。FP-Growth算法比Apriori算法在性能上有了很大提高,它僅需要掃描數(shù)據(jù)庫(kù)兩次,并且避免了產(chǎn)生大量的候選項(xiàng)集。但FP-Growth算法主要的瓶頸之一就是空間開銷大。為了節(jié)省空間,提高頻繁項(xiàng)的發(fā)現(xiàn)效率,本文對(duì)傳統(tǒng)的頻繁模式樹和項(xiàng)頭表進(jìn)行了優(yōu)化,采用動(dòng)態(tài)構(gòu)造哈希鏈地址的方法來(lái)構(gòu)造項(xiàng)頭表,FP-Tree的每個(gè)結(jié)點(diǎn)只存儲(chǔ)該項(xiàng)在項(xiàng)頭表中的地址,避免了在地址上出現(xiàn)空指針,節(jié)省了存儲(chǔ)空間的開銷,同時(shí)增加樹結(jié)點(diǎn)的域?qū)崿F(xiàn)了方便的雙向遍歷。此外還通過(guò)對(duì)事務(wù)數(shù)據(jù)庫(kù)按一定的規(guī)則進(jìn)行了劃分,得到若干個(gè)數(shù)據(jù)庫(kù)子集,然后分別對(duì)每個(gè)數(shù)據(jù)庫(kù)子集進(jìn)行數(shù)據(jù)挖掘,因而占用內(nèi)存小,解決了內(nèi)存無(wú)法裝入頻繁模式樹的問(wèn)題,使數(shù)據(jù)挖掘得以順利進(jìn)行。 最后通過(guò)實(shí)驗(yàn)對(duì)基于頻繁模式樹的關(guān)聯(lián)規(guī)則挖掘的優(yōu)化算法與傳統(tǒng)的頻繁模式樹的FP-Growth算法進(jìn)行了比較,實(shí)驗(yàn)結(jié)果表明在挖掘大量數(shù)據(jù)信息時(shí)更有效。
[Abstract]:Data mining is a rapid development of information processing technology in recent years, from a large number of, incomplete, noisy, fuzzy, random data, extract hidden in them, people do not know in advance, But it is also a process of potentially useful information and knowledge. It involves a series of technologies, such as database, artificial intelligence, machine learning, pattern recognition, knowledge engineering, object oriented, information retrieval and visualization. As an important research branch in the field of data mining, the task of association rule mining is to find all strong association rules that satisfy the support threshold and confidence threshold. Association rule mining algorithm is the main content of association rule mining research. So far, many efficient association rules mining algorithms have been proposed. In this paper, the classical Apriori and AprioriTid algorithms and the FP-Growth algorithm without candidate set are analyzed and studied. The performance of the FP-Growth algorithm is greatly improved than that of the Apriori algorithm, and it only needs to scan the database twice. And avoid producing a large number of candidate items. However, one of the main bottlenecks of FP-Growth algorithm is the large space overhead. In order to save space and improve the efficiency of frequent item discovery, this paper optimizes the traditional frequent pattern tree and item header table, and constructs the item header table by dynamically constructing the hash chain address. Each node of FP-Tree only stores the address of the item in the item header table, avoids the null pointer on the address, saves the cost of storage space, and increases the domain of tree node to realize convenient two-way traversal. In addition, by dividing the transaction database according to certain rules, several subsets of the database are obtained, and then each subset of the database is mined separately, thus occupying less memory. It solves the problem that the memory can not load the frequent pattern tree, so that the data mining can proceed smoothly. Finally, the optimization algorithm of association rule mining based on frequent pattern tree is compared with the FP-Growth algorithm of traditional frequent pattern tree. The experimental results show that the algorithm is more effective in mining a large amount of data information.
【學(xué)位授予單位】:哈爾濱工程大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2008
【分類號(hào)】:TP311.13
本文編號(hào):2305499
[Abstract]:Data mining is a rapid development of information processing technology in recent years, from a large number of, incomplete, noisy, fuzzy, random data, extract hidden in them, people do not know in advance, But it is also a process of potentially useful information and knowledge. It involves a series of technologies, such as database, artificial intelligence, machine learning, pattern recognition, knowledge engineering, object oriented, information retrieval and visualization. As an important research branch in the field of data mining, the task of association rule mining is to find all strong association rules that satisfy the support threshold and confidence threshold. Association rule mining algorithm is the main content of association rule mining research. So far, many efficient association rules mining algorithms have been proposed. In this paper, the classical Apriori and AprioriTid algorithms and the FP-Growth algorithm without candidate set are analyzed and studied. The performance of the FP-Growth algorithm is greatly improved than that of the Apriori algorithm, and it only needs to scan the database twice. And avoid producing a large number of candidate items. However, one of the main bottlenecks of FP-Growth algorithm is the large space overhead. In order to save space and improve the efficiency of frequent item discovery, this paper optimizes the traditional frequent pattern tree and item header table, and constructs the item header table by dynamically constructing the hash chain address. Each node of FP-Tree only stores the address of the item in the item header table, avoids the null pointer on the address, saves the cost of storage space, and increases the domain of tree node to realize convenient two-way traversal. In addition, by dividing the transaction database according to certain rules, several subsets of the database are obtained, and then each subset of the database is mined separately, thus occupying less memory. It solves the problem that the memory can not load the frequent pattern tree, so that the data mining can proceed smoothly. Finally, the optimization algorithm of association rule mining based on frequent pattern tree is compared with the FP-Growth algorithm of traditional frequent pattern tree. The experimental results show that the algorithm is more effective in mining a large amount of data information.
【學(xué)位授予單位】:哈爾濱工程大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2008
【分類號(hào)】:TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前4條
1 宋余慶,朱玉全,孫志揮,陳耿;基于FP-Tree的最大頻繁項(xiàng)目集挖掘及更新算法[J];軟件學(xué)報(bào);2003年09期
2 馮玉才,馮劍琳;關(guān)聯(lián)規(guī)則的增量式更新算法[J];軟件學(xué)報(bào);1998年04期
3 陳濤;張瑋;;一個(gè)改進(jìn)的并行關(guān)聯(lián)規(guī)則算法研究[J];計(jì)算機(jī)技術(shù)與發(fā)展;2007年01期
4 李超;余昭平;;基于最大模式的關(guān)聯(lián)規(guī)則挖掘算法研究[J];微計(jì)算機(jī)信息;2006年06期
,本文編號(hào):2305499
本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/2305499.html
最近更新
教材專著