不確定數(shù)據(jù)的挖掘算法研究
發(fā)布時間:2018-01-23 18:27
本文關(guān)鍵詞: 數(shù)據(jù)挖掘 不確定數(shù)據(jù) 最大模式 頻繁模式 出處:《上海交通大學(xué)》2015年碩士論文 論文類型:學(xué)位論文
【摘要】:隨著信息技術(shù)日新月異的發(fā)展,在金融、物流以及天體研究等眾多領(lǐng)域,時刻都會產(chǎn)生和記錄海量的數(shù)據(jù)。而多數(shù)情況下,這些數(shù)據(jù)都存在著誤差或者僅是部分完整的,數(shù)據(jù)的不確定性導(dǎo)致傳統(tǒng)的數(shù)據(jù)挖掘方法不再適用于不確定數(shù)據(jù)。本文研究不確定數(shù)據(jù)的挖掘算法,對不確定數(shù)據(jù)的頻繁模式和最大模式的挖掘進(jìn)行分析研究,并分別提出新的算法,豐富了數(shù)據(jù)處理的手段,提高了數(shù)據(jù)挖掘的效率。頻繁模式挖掘是數(shù)據(jù)挖掘領(lǐng)域的核心問題,本文提出了一種基于垂直結(jié)構(gòu)的不確定數(shù)據(jù)頻繁模式挖掘算法ProEclat。ProEclat采用數(shù)據(jù)集的垂直格式表示,避免了對數(shù)據(jù)集的多次掃描,使用兩階段模型的頻繁項(xiàng)集判斷方式,大幅提高了計(jì)算效率。實(shí)驗(yàn)證明,ProEclat伸縮性良好,性能優(yōu)于同類算法。最大模式挖掘是頻繁項(xiàng)集挖掘的重要研究分支,本文提出一種基于深度優(yōu)先的不確定數(shù)據(jù)最大模式挖掘算法U-GenMax。U-GenMax采用多步回退機(jī)制、項(xiàng)排序策略、局部投影等剪枝優(yōu)化技術(shù),減少了算法運(yùn)行的時間。實(shí)驗(yàn)和分析表明,U-GenMax性能良好,尤其適用于稀疏數(shù)據(jù)集和支持度較高的稠密數(shù)據(jù)集。
[Abstract]:With the rapid development of information technology, in many fields such as finance, logistics and celestial research, huge amounts of data are generated and recorded at all times. And in most cases. Because of the uncertainty of data, the traditional data mining method is no longer suitable for uncertain data. In this paper, the mining algorithm of uncertain data is studied. The frequent pattern and maximum pattern mining of uncertain data are analyzed and studied, and new algorithms are proposed, which enrich the means of data processing. Frequent pattern mining is the core problem in the field of data mining. In this paper, a vertical structure based frequent pattern mining algorithm for uncertain data, ProEclat.ProEclat, is proposed, which uses the vertical format of the data set to avoid multiple scans of the data set. The efficiency of calculation is greatly improved by using the frequent itemset judgment method of the two-stage model. The experiments show that ProEclat has good scalability. The maximum pattern mining is an important research branch of frequent itemset mining. In this paper, a depth first based maximum pattern mining algorithm for uncertain data U-GenMax.U-GenMax is proposed, which uses multi-step backstepping mechanism, item sorting strategy, local projection and other pruning optimization techniques. Experiments and analysis show that U-GenMax has good performance, especially for sparse data sets and dense datasets with high support.
【學(xué)位授予單位】:上海交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2015
【分類號】:TP311.13
【參考文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前1條
1 張李一;不確定性數(shù)據(jù)頻繁模式挖掘算法的研究[D];復(fù)旦大學(xué);2011年
,本文編號:1457946
本文鏈接:http://sikaile.net/guanlilunwen/wuliuguanlilunwen/1457946.html
最近更新
教材專著