基于約束的頻繁模式挖掘方法以及應用研究
[Abstract]:Frequent pattern mining based on constraints is one of the most basic problems in the research of data mining and has a wide range of practical applications. However, there are still three challenges in this field: (1) how to expand the new application? Specifically, in addition to the "support" of the model, how to design some new model indicators to better measure it Mode of interest to meet the needs of the new application; (2) the anti mononality of the model support is different, and the properties of the proposed new model are usually more complex, such as it does not satisfy monotonicity, anti mono tonal, conversion, simplicity, etc. then, for a pattern, for example, how to quickly calculate all the upper / lower bounds of all its parent patterns on the index, And using the characteristics of this new model to design an efficient algorithm; (3) usually, different applications, with different new model indicators, and then put forward different model / lower bound calculation method. Then, is there a general method to calculate the upper / lower bounds of any pattern index? For the above problems and challenges, this paper develops The method and application of constraint based frequent pattern mining are summarized. The main achievements and contributions are as follows:
First, a web content recommendation method based on pattern mining is proposed. The recommendation of web content is to find important content block combinations from web pages to recommend users, and there are many applications (such as web page intelligent printing, electronic reading on mobile devices, etc.). There are many ways to solve this problem at present, but in these parties, there are many ways to solve this problem. In the law, either is for a specific web page (such as a web page for news, bloggers) or semi automated (users need additional operations to select the content blocks of a web page). For any type of web page, how to automatically extract effective content from a web page has not been well solved. The method of selecting the similar web page by the former user, makes the problem form a pattern mining recommendation problem, and proposes a web content recommendation method based on pattern mining, which can provide more accurate web content recommendation for any type of web page. Specifically, the content block combination (pattern) recommended to the user is not only frequent. Other users choose, and the more complete, the better. In view of this, this paper presents a new pattern of interest index, that is, the degree of possession, to measure the integrity of the pattern on its support database. Experimental results on real data sets show that the proposed method can achieve more satisfactory recommendation results and operational efficiency.
Secondly, a general efficient algorithm for mining frequent pattern mining based on occupancy is proposed. This chapter extends the definition of occupancy, the method of boundary estimation and the application of three levels. Specifically, two different definitions of occupancy are proposed based on the different weighted mean (arithmetic mean and harmonic mean), that is, the arithmetic occupancy. And harmonic possession. Unlike the anti mononality of the pattern support, the nature of possession is not satisfied with monotonicity, anti mononality, and is not satisfied with the convertability and simplicity. Then, how to quickly calculate the upper bound of all the parent patterns about the degree of possession for a pattern? For this, for each definition, three The upper bound is efficient, the most 'tight' and the upper bound. The high efficient upper bound is more efficient for single node computing, but it is looser, it needs to search a lot of nodes; the most tight upper bound is compact, so it searches for a few nodes, but the calculation of a single node is more time-consuming; for this reason, this paper puts forward a middle upper bound, A balance between the tightness and computational complexity makes the overall performance of the algorithm optimal. The concept of occupancy is not only important for the application on the transaction database (such as web page content printing recommendation), but also is very important for the application of the sequence database (such as a tourist attraction recommendation). For this reason, this paper proposes A universal algorithm DOFRA can process applications on different types of databases at the same time. Finally, the validity of DOFRA is verified in two practical applications, and the efficiency of the DOFRA algorithm is verified in a large number of synthetic data.
Finally, a general model is proposed to efficiently estimate the upper / lower bounds of any pattern index. Constraint based mining is not only helpful to capture more semantic information of the pattern, but also can further improve the mining efficiency by using the nature of constraints. The interest degree of the metric pattern is labeled, then the upper / lower bounds of the model indexes are estimated, and a unified framework suitable for any pattern index is lacking. Therefore, this paper formally considers the boundary estimation problem of only item markers, and proposes a general model to efficiently solve the problem. For the effectiveness of the framework, this paper gives two typical model indexes as learning cases, namely, pattern utility and pattern occupancy. In addition, in order to meet different application requirements, this paper extends the traditional SQL based pattern indicators, such as min, Max, AVG, VaR, and so on. The experimental analysis on the data shows the versatility and effectiveness of the proposed scheme.
【學位授予單位】:中國科學技術大學
【學位級別】:博士
【學位授予年份】:2014
【分類號】:TP311.13;TP393.092
【共引文獻】
相關期刊論文 前10條
1 朱君;曲超;湯庸;;利用單詞超團的二分圖文本聚類算法[J];電子科技大學學報;2008年03期
2 張樂君;國林;張健沛;楊靜;夏磊;;測度屬性關系分析的分布式系統(tǒng)異常檢測[J];北京郵電大學學報;2013年06期
3 馬麗生;姚光順;楊傳健;;基于FP-tree的極大超團模式挖掘算法[J];計算機工程與應用;2011年12期
4 卓鵬;肖波;藺志青;;基于事務拆分的超團挖掘算法[J];計算機工程;2009年20期
5 曲超;潘曉衡;朱君;蔡少仲;胡天明;;基于單詞超團的文本聚類方法[J];計算機工程;2011年11期
6 黃崇爭;李海峰;陳紅;;數(shù)據流上近似非可導項集的挖掘算法[J];計算機學報;2010年08期
7 Daniel Kunkle;張冬暉;Gene Cooperman;;Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy[J];Journal of Computer Science & Technology;2008年01期
8 ;Mining item-item and between-set correlated association rules[J];Journal of Zhejiang University-Science C(Computers & Electronics);2011年02期
9 高恩陽;劉偉軍;王天然;;一種基于線性規(guī)劃的孤立點檢測方法[J];控制工程;2013年06期
10 高峗;周薇;韓冀中;孟丹;;一種基于文法壓縮的日志異常檢測算法[J];計算機學報;2014年01期
相關會議論文 前1條
1 黃崇爭;李海峰;陳紅;;數(shù)據流上近似非可導項集的挖掘算法[A];NDBC2010第27屆中國數(shù)據庫學術會議論文集A輯一[C];2010年
相關博士學位論文 前10條
1 李強;數(shù)據挖掘中關聯(lián)分析算法研究[D];哈爾濱工程大學;2010年
2 沈斌;關聯(lián)規(guī)則相關技術研究[D];浙江大學;2007年
3 沙朝鋒;基于信息論的數(shù)據挖掘算法[D];復旦大學;2008年
4 耿汝年;加權頻繁模式挖掘算法研究[D];江南大學;2008年
5 肖波;可信關聯(lián)規(guī)則挖掘算法研究[D];北京郵電大學;2009年
6 賀惠新;燃機異常檢測系統(tǒng)的關鍵技術研究[D];哈爾濱工業(yè)大學;2013年
7 任維武;用于分布式入侵檢測系統(tǒng)的合作式本體模型[D];吉林大學;2013年
8 陳斌;異常檢測方法及其關鍵技術研究[D];南京航空航天大學;2013年
9 黃垂碧;應用層網關攻擊檢測和性能優(yōu)化策略研究[D];中國科學技術大學;2014年
10 何曉旭;時間序列數(shù)據挖掘若干關鍵問題研究[D];中國科學技術大學;2014年
相關碩士學位論文 前10條
1 余強;基于語義的設計知識個性化檢索技術研究及應用[D];南京航空航天大學;2010年
2 李世松;基于閉模式的關聯(lián)規(guī)則產生算法研究[D];江蘇大學;2007年
3 卓鵬;關聯(lián)規(guī)則與超團挖掘算法研究[D];北京郵電大學;2009年
4 孟靜;異常數(shù)據挖掘算法研究與應用[D];江南大學;2013年
5 龐景月;滑動窗口模型下的數(shù)據流自適應異常檢測方法研究[D];哈爾濱工業(yè)大學;2013年
6 肖托;一種改進的支持向量數(shù)據描述算法[D];哈爾濱工程大學;2013年
7 仲莉;基于隱馬爾科夫模型的低碳異常檢測方法研究及應用[D];華南理工大學;2013年
8 沈耀東;基于壓縮融合的無線傳感網事件檢測算法研究[D];中國地質大學;2013年
9 吳龍常;基于聚類分析的入侵檢測算法研究[D];東北大學;2011年
10 劉彬彬;Android平臺的安全技術研究與實現(xiàn)[D];江蘇科技大學;2013年
,本文編號:2150063
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2150063.html