關(guān)聯(lián)規(guī)則增量挖掘算法研究及應(yīng)用
本文選題:關(guān)聯(lián)規(guī)則增量挖掘 + FUP ; 參考:《安徽大學(xué)》2013年碩士論文
【摘要】:如何從大量數(shù)據(jù)中獲取不同的個(gè)性化信息是信息檢索領(lǐng)域的研究熱點(diǎn)。此方面的研究主要包括元搜索引擎和查詢擴(kuò)展。綜合多個(gè)搜索引擎返回結(jié)果的元搜索引擎其關(guān)注點(diǎn)在于為用戶提供更多查詢結(jié)果,查詢擴(kuò)展則通過(guò)將用戶提交短查詢擴(kuò)展為更多關(guān)鍵詞使得查詢結(jié)果更貼近用戶需求。 關(guān)聯(lián)規(guī)則挖掘是數(shù)據(jù)挖掘中的一個(gè)重要研究方向,也是查詢擴(kuò)展使用的一種重要方法。本文不僅提出了一種改進(jìn)的關(guān)聯(lián)規(guī)則增量挖掘算法,而且結(jié)合元搜索引擎和基于此關(guān)聯(lián)規(guī)則的查詢擴(kuò)展,提出個(gè)性化元搜索引擎的概念。 本文首先討論查詢擴(kuò)展使用的關(guān)聯(lián)規(guī)則增量挖掘算法。分析在基于FP-Tree的結(jié)構(gòu)上進(jìn)行增量挖掘時(shí)影響挖掘效率的因素以及FUFP中快速更新FP-Tree實(shí)現(xiàn)增量挖掘的策略。本文將基于Apriori的典型增量挖掘算法FUP思想引入TD-FP-Growth算法中TD-FP-Tree的快速更新,提出TD-FP-Tree快速更新算法(PFU-TDFP)。算法通過(guò)將所有涉及項(xiàng)分類處理,減少掃描原始事務(wù)數(shù)據(jù)庫(kù)的可能和次數(shù),且當(dāng)出現(xiàn)由非頻繁轉(zhuǎn)為頻繁的項(xiàng)時(shí)減少重新排序事務(wù)中項(xiàng)所要處理的事務(wù)數(shù)目,并在某些步驟采用并行處理進(jìn)一步提高效率。實(shí)驗(yàn)表明,本文提出的算法不僅可以快速更新TD-FP-Tree,而且在同基于FP-Tree結(jié)構(gòu)的增量挖掘相比可以進(jìn)一步提升整體挖掘效率。 接著使用PFU-TDFP算法挖掘用戶的搜索結(jié)果瀏覽習(xí)慣用于查詢擴(kuò)展,使得查詢關(guān)鍵詞組可以體現(xiàn)用戶的行業(yè)背景和興趣傾向,結(jié)合元搜索引擎提出個(gè)性化元搜索引擎的概念。對(duì)元搜索引擎的結(jié)果融合提出基于搜索結(jié)果的排序、題目和摘要等局部相似度的一種新的結(jié)果融合評(píng)分模型。最終實(shí)現(xiàn)了系統(tǒng)原型,對(duì)系統(tǒng)的實(shí)驗(yàn)表明,應(yīng)用PFU-TDFP可以快速更新挖掘用戶搜索瀏覽習(xí)慣,本文提出的元搜索引擎結(jié)果融合評(píng)分公式在P@N方法測(cè)試下也會(huì)為用戶提供更個(gè)性化的搜索結(jié)果。
[Abstract]:How to obtain different personalized information from a large amount of data is a hot topic in the field of information retrieval. This research mainly includes meta-search engine and query extension. The meta-search engine which synthesizes the results of multiple search engines focuses on providing more query results for users. Query extension extends the short query submitted by users to more keywords to make the query results more close to the users' needs. Association rule mining is an important research direction in data mining, and it is also an important method of query expansion. This paper not only proposes an improved incremental mining algorithm for association rules, but also proposes the concept of personalized meta search engine by combining meta search engine and query extension based on this association rule. This paper first discusses the incremental mining algorithm of association rules used in query extension. This paper analyzes the factors that affect the efficiency of incremental mining based on FP-Tree structure and the strategy of rapidly updating FP-Tree to realize incremental mining in FUFP. In this paper, the idea of FUP, a typical incremental mining algorithm based on Apriori, is introduced into the fast update of TD-FP-Tree in the TD-FP-Growth algorithm, and a fast update algorithm of TD-FP-Tree (PFU-T DFP) is proposed. The algorithm reduces the possibility and frequency of scanning the original transaction database by classifying all the items involved, and reduces the number of transactions to be processed in a reorder transaction when items that become frequent from infrequent to frequent appear. In some steps, parallel processing is used to further improve the efficiency. Experiments show that the proposed algorithm can not only update TD-FP-Tree quickly, but also further improve the overall mining efficiency compared with incremental mining based on FP-Tree structure. Then the PFU-TDFP algorithm is used to mine the search result browsing habits of users for query expansion, so that the key phrases can reflect the users' background and interest tendency, and bring forward the concept of personalized meta search engine combined with meta search engine. A new result fusion scoring model based on the local similarity of search results such as ranking, title and summary is proposed for meta-search engine. Finally, the prototype of the system is implemented, and the experiment results show that PFU-TDFP can quickly update the search and browse habits of mining users. The meta-search engine result fusion scoring formula proposed in this paper will also provide users with more personalized search results under the P@ N test.
【學(xué)位授予單位】:安徽大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 曾志勇;楊呈智;陶冶;;負(fù)載均衡的FP-growth并行算法研究[J];計(jì)算機(jī)工程與應(yīng)用;2010年04期
2 鄒力濵;張其善;;基于CAN-樹(shù)的高效關(guān)聯(lián)規(guī)則增量挖掘算法[J];計(jì)算機(jī)工程;2008年03期
3 黃建明;趙文靜;王星星;;基于十字鏈表的Apriori改進(jìn)算法[J];計(jì)算機(jī)工程;2009年02期
4 黃名選;張師超;嚴(yán)小衛(wèi);;基于查詢行為和關(guān)聯(lián)規(guī)則的相關(guān)反饋查詢擴(kuò)展[J];計(jì)算機(jī)工程;2009年10期
5 黃名選;馮平;馬瑞興;;基于頻繁項(xiàng)集和相關(guān)性的局部反饋查詢擴(kuò)展[J];計(jì)算機(jī)工程;2011年23期
6 趙孝敏;何松華;李賢鵬;尹波;;一種改進(jìn)的FP-Growth算法及其在業(yè)務(wù)關(guān)聯(lián)中的應(yīng)用[J];計(jì)算機(jī)應(yīng)用;2008年09期
7 劉華婷;郭仁祥;姜浩;;關(guān)聯(lián)規(guī)則挖掘Apriori算法的研究與改進(jìn)[J];計(jì)算機(jī)應(yīng)用與軟件;2009年01期
8 李琴琴;湯小春;靳明星;;個(gè)性化元搜索關(guān)鍵技術(shù)的研究[J];計(jì)算機(jī)與現(xiàn)代化;2012年03期
9 何波;;基于頻繁模式樹(shù)的分布式關(guān)聯(lián)規(guī)則挖掘算法[J];控制與決策;2012年04期
10 董樂(lè);謝紅薇;;元搜索引擎中排序融合算法的優(yōu)化研究[J];計(jì)算機(jī)應(yīng)用與軟件;2012年10期
,本文編號(hào):1948299
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1948299.html