基于約束的頻繁模式挖掘方法以及應(yīng)用研究
[Abstract]:Frequent pattern mining based on constraints is one of the most basic problems in the research of data mining and has a wide range of practical applications. However, there are still three challenges in this field: (1) how to expand the new application? Specifically, in addition to the "support" of the model, how to design some new model indicators to better measure it Mode of interest to meet the needs of the new application; (2) the anti mononality of the model support is different, and the properties of the proposed new model are usually more complex, such as it does not satisfy monotonicity, anti mono tonal, conversion, simplicity, etc. then, for a pattern, for example, how to quickly calculate all the upper / lower bounds of all its parent patterns on the index, And using the characteristics of this new model to design an efficient algorithm; (3) usually, different applications, with different new model indicators, and then put forward different model / lower bound calculation method. Then, is there a general method to calculate the upper / lower bounds of any pattern index? For the above problems and challenges, this paper develops The method and application of constraint based frequent pattern mining are summarized. The main achievements and contributions are as follows:
First, a web content recommendation method based on pattern mining is proposed. The recommendation of web content is to find important content block combinations from web pages to recommend users, and there are many applications (such as web page intelligent printing, electronic reading on mobile devices, etc.). There are many ways to solve this problem at present, but in these parties, there are many ways to solve this problem. In the law, either is for a specific web page (such as a web page for news, bloggers) or semi automated (users need additional operations to select the content blocks of a web page). For any type of web page, how to automatically extract effective content from a web page has not been well solved. The method of selecting the similar web page by the former user, makes the problem form a pattern mining recommendation problem, and proposes a web content recommendation method based on pattern mining, which can provide more accurate web content recommendation for any type of web page. Specifically, the content block combination (pattern) recommended to the user is not only frequent. Other users choose, and the more complete, the better. In view of this, this paper presents a new pattern of interest index, that is, the degree of possession, to measure the integrity of the pattern on its support database. Experimental results on real data sets show that the proposed method can achieve more satisfactory recommendation results and operational efficiency.
Secondly, a general efficient algorithm for mining frequent pattern mining based on occupancy is proposed. This chapter extends the definition of occupancy, the method of boundary estimation and the application of three levels. Specifically, two different definitions of occupancy are proposed based on the different weighted mean (arithmetic mean and harmonic mean), that is, the arithmetic occupancy. And harmonic possession. Unlike the anti mononality of the pattern support, the nature of possession is not satisfied with monotonicity, anti mononality, and is not satisfied with the convertability and simplicity. Then, how to quickly calculate the upper bound of all the parent patterns about the degree of possession for a pattern? For this, for each definition, three The upper bound is efficient, the most 'tight' and the upper bound. The high efficient upper bound is more efficient for single node computing, but it is looser, it needs to search a lot of nodes; the most tight upper bound is compact, so it searches for a few nodes, but the calculation of a single node is more time-consuming; for this reason, this paper puts forward a middle upper bound, A balance between the tightness and computational complexity makes the overall performance of the algorithm optimal. The concept of occupancy is not only important for the application on the transaction database (such as web page content printing recommendation), but also is very important for the application of the sequence database (such as a tourist attraction recommendation). For this reason, this paper proposes A universal algorithm DOFRA can process applications on different types of databases at the same time. Finally, the validity of DOFRA is verified in two practical applications, and the efficiency of the DOFRA algorithm is verified in a large number of synthetic data.
Finally, a general model is proposed to efficiently estimate the upper / lower bounds of any pattern index. Constraint based mining is not only helpful to capture more semantic information of the pattern, but also can further improve the mining efficiency by using the nature of constraints. The interest degree of the metric pattern is labeled, then the upper / lower bounds of the model indexes are estimated, and a unified framework suitable for any pattern index is lacking. Therefore, this paper formally considers the boundary estimation problem of only item markers, and proposes a general model to efficiently solve the problem. For the effectiveness of the framework, this paper gives two typical model indexes as learning cases, namely, pattern utility and pattern occupancy. In addition, in order to meet different application requirements, this paper extends the traditional SQL based pattern indicators, such as min, Max, AVG, VaR, and so on. The experimental analysis on the data shows the versatility and effectiveness of the proposed scheme.
【學(xué)位授予單位】:中國(guó)科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP311.13;TP393.092
【共引文獻(xiàn)】
相關(guān)期刊論文 前10條
1 朱君;曲超;湯庸;;利用單詞超團(tuán)的二分圖文本聚類算法[J];電子科技大學(xué)學(xué)報(bào);2008年03期
2 張樂君;國(guó)林;張健沛;楊靜;夏磊;;測(cè)度屬性關(guān)系分析的分布式系統(tǒng)異常檢測(cè)[J];北京郵電大學(xué)學(xué)報(bào);2013年06期
3 馬麗生;姚光順;楊傳健;;基于FP-tree的極大超團(tuán)模式挖掘算法[J];計(jì)算機(jī)工程與應(yīng)用;2011年12期
4 卓鵬;肖波;藺志青;;基于事務(wù)拆分的超團(tuán)挖掘算法[J];計(jì)算機(jī)工程;2009年20期
5 曲超;潘曉衡;朱君;蔡少仲;胡天明;;基于單詞超團(tuán)的文本聚類方法[J];計(jì)算機(jī)工程;2011年11期
6 黃崇爭(zhēng);李海峰;陳紅;;數(shù)據(jù)流上近似非可導(dǎo)項(xiàng)集的挖掘算法[J];計(jì)算機(jī)學(xué)報(bào);2010年08期
7 Daniel Kunkle;張冬暉;Gene Cooperman;;Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy[J];Journal of Computer Science & Technology;2008年01期
8 ;Mining item-item and between-set correlated association rules[J];Journal of Zhejiang University-Science C(Computers & Electronics);2011年02期
9 高恩陽(yáng);劉偉軍;王天然;;一種基于線性規(guī)劃的孤立點(diǎn)檢測(cè)方法[J];控制工程;2013年06期
10 高峗;周薇;韓冀中;孟丹;;一種基于文法壓縮的日志異常檢測(cè)算法[J];計(jì)算機(jī)學(xué)報(bào);2014年01期
相關(guān)會(huì)議論文 前1條
1 黃崇爭(zhēng);李海峰;陳紅;;數(shù)據(jù)流上近似非可導(dǎo)項(xiàng)集的挖掘算法[A];NDBC2010第27屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集A輯一[C];2010年
相關(guān)博士學(xué)位論文 前10條
1 李強(qiáng);數(shù)據(jù)挖掘中關(guān)聯(lián)分析算法研究[D];哈爾濱工程大學(xué);2010年
2 沈斌;關(guān)聯(lián)規(guī)則相關(guān)技術(shù)研究[D];浙江大學(xué);2007年
3 沙朝鋒;基于信息論的數(shù)據(jù)挖掘算法[D];復(fù)旦大學(xué);2008年
4 耿汝年;加權(quán)頻繁模式挖掘算法研究[D];江南大學(xué);2008年
5 肖波;可信關(guān)聯(lián)規(guī)則挖掘算法研究[D];北京郵電大學(xué);2009年
6 賀惠新;燃機(jī)異常檢測(cè)系統(tǒng)的關(guān)鍵技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2013年
7 任維武;用于分布式入侵檢測(cè)系統(tǒng)的合作式本體模型[D];吉林大學(xué);2013年
8 陳斌;異常檢測(cè)方法及其關(guān)鍵技術(shù)研究[D];南京航空航天大學(xué);2013年
9 黃垂碧;應(yīng)用層網(wǎng)關(guān)攻擊檢測(cè)和性能優(yōu)化策略研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2014年
10 何曉旭;時(shí)間序列數(shù)據(jù)挖掘若干關(guān)鍵問題研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2014年
相關(guān)碩士學(xué)位論文 前10條
1 余強(qiáng);基于語(yǔ)義的設(shè)計(jì)知識(shí)個(gè)性化檢索技術(shù)研究及應(yīng)用[D];南京航空航天大學(xué);2010年
2 李世松;基于閉模式的關(guān)聯(lián)規(guī)則產(chǎn)生算法研究[D];江蘇大學(xué);2007年
3 卓鵬;關(guān)聯(lián)規(guī)則與超團(tuán)挖掘算法研究[D];北京郵電大學(xué);2009年
4 孟靜;異常數(shù)據(jù)挖掘算法研究與應(yīng)用[D];江南大學(xué);2013年
5 龐景月;滑動(dòng)窗口模型下的數(shù)據(jù)流自適應(yīng)異常檢測(cè)方法研究[D];哈爾濱工業(yè)大學(xué);2013年
6 肖托;一種改進(jìn)的支持向量數(shù)據(jù)描述算法[D];哈爾濱工程大學(xué);2013年
7 仲莉;基于隱馬爾科夫模型的低碳異常檢測(cè)方法研究及應(yīng)用[D];華南理工大學(xué);2013年
8 沈耀東;基于壓縮融合的無(wú)線傳感網(wǎng)事件檢測(cè)算法研究[D];中國(guó)地質(zhì)大學(xué);2013年
9 吳龍常;基于聚類分析的入侵檢測(cè)算法研究[D];東北大學(xué);2011年
10 劉彬彬;Android平臺(tái)的安全技術(shù)研究與實(shí)現(xiàn)[D];江蘇科技大學(xué);2013年
,本文編號(hào):2150063
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2150063.html