天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 軟件論文 >

改進(jìn)型頻繁1-項(xiàng)集生成方法及實(shí)驗(yàn)研究

發(fā)布時(shí)間:2018-01-24 03:42

  本文關(guān)鍵詞: 數(shù)據(jù)挖掘 關(guān)聯(lián)分析 頻繁1-項(xiàng)集 增量式數(shù)據(jù)模式 節(jié)省時(shí)間 出處:《吉林大學(xué)》2017年碩士論文 論文類型:學(xué)位論文


【摘要】:從人類文明誕生起,人類對(duì)數(shù)據(jù)的收集以及相應(yīng)的分析都在進(jìn)行著。比如古人對(duì)天氣的總結(jié)以及預(yù)測(cè)都是在人們?nèi)粘I钪袑?duì)天氣變化情況也就是天氣變化的數(shù)據(jù)的收集以及分析得到的結(jié)論;還有農(nóng)作物的種植時(shí)間、種植方法等也是人們根據(jù)歷年來種植的情況以及經(jīng)驗(yàn)總結(jié)出來的方式方法,這也是對(duì)數(shù)據(jù)的收集以及分析的過程;其余的還有建筑、水利、商業(yè)等等,自古以來人們對(duì)數(shù)據(jù)的收集以及使用體現(xiàn)在生活的方方面面。在互聯(lián)網(wǎng)出現(xiàn)之前,人們對(duì)數(shù)據(jù)的使用大都局限在一個(gè)區(qū)域范圍內(nèi),區(qū)域內(nèi)的天氣、地域性的農(nóng)作物以及適應(yīng)氣候的建筑風(fēng)格等。伴隨著互聯(lián)網(wǎng)的出現(xiàn)與發(fā)展,伴隨著世界信息一體化的形成,人們可以更方便的得到更多更有用的數(shù)據(jù),這也就意味著更多有價(jià)值的東西會(huì)從數(shù)據(jù)中得到,也就是現(xiàn)在的數(shù)據(jù)挖掘。數(shù)據(jù)挖掘意在發(fā)現(xiàn)數(shù)據(jù)中的價(jià)值,主要有聚類分析、分類分析、關(guān)聯(lián)分析、預(yù)測(cè)以及偏差分析等。其中關(guān)聯(lián)分析是對(duì)數(shù)據(jù)中相關(guān)的項(xiàng)進(jìn)行總結(jié),從而進(jìn)行其他的分析工作,也是與本文相關(guān)的方向。為了方便關(guān)聯(lián)規(guī)則分析,出現(xiàn)了很多關(guān)聯(lián)分析的算法,意在找到數(shù)據(jù)中關(guān)聯(lián)性強(qiáng)的數(shù)據(jù)項(xiàng)。在大多數(shù)關(guān)聯(lián)規(guī)則算法中都需要先生成頻繁1-項(xiàng)集,隨后在生成的頻繁1-項(xiàng)集的基礎(chǔ)上繼續(xù)進(jìn)行后續(xù)的工作。對(duì)于只用進(jìn)行一次的關(guān)聯(lián)規(guī)則分析,頻繁1-項(xiàng)集的生成需要對(duì)數(shù)據(jù)庫(kù)進(jìn)行一次掃描,但是在數(shù)據(jù)不斷增加、關(guān)聯(lián)分析持續(xù)進(jìn)行的情況下,每次關(guān)聯(lián)分析時(shí)都要在生成頻繁1-項(xiàng)集時(shí)掃描數(shù)據(jù)庫(kù),這也就意味著后續(xù)的關(guān)聯(lián)分析中需要對(duì)舊的數(shù)據(jù)進(jìn)行重復(fù)的掃描工作,這必將浪費(fèi)很多時(shí)間。本文就針對(duì)這一情況對(duì)頻繁1-項(xiàng)集的生成進(jìn)行了改進(jìn),以達(dá)到節(jié)省不必要的數(shù)據(jù)庫(kù)讀取掃描時(shí)間。對(duì)于增量式數(shù)據(jù)情況下的頻繁1-項(xiàng)集生成的改進(jìn)主要是通過在生成頻繁1-項(xiàng)集過程中的候選1-項(xiàng)集的數(shù)據(jù)轉(zhuǎn)存來實(shí)現(xiàn),運(yùn)用的原理是數(shù)據(jù)條目數(shù)要遠(yuǎn)遠(yuǎn)大于數(shù)據(jù)項(xiàng)種類數(shù),從而節(jié)省了后續(xù)的關(guān)聯(lián)規(guī)則分析時(shí)生成頻繁1-項(xiàng)集的時(shí)間,從而節(jié)省整個(gè)算法的工作時(shí)間。
[Abstract]:Since the birth of human civilization. Human data collection and corresponding analysis are going on. For example, the ancient summary and prediction of the weather is the collection and analysis of weather change in people's daily life, that is, weather change data. Conclusions reached; There is also crop planting time, planting methods and so on are also based on the past years of cultivation and experience summed up the way and methods, which is also the process of data collection and analysis; The rest are buildings, water conservancy, commerce and so on, and the collection and use of data since ancient times has been reflected in all aspects of life. Before the advent of the Internet. People's use of data is limited to a regional scope, the regional weather, regional crops and climate adaptation of the architectural style. With the emergence and development of the Internet. With the formation of information integration in the world, people can easily get more and more useful data, which means that more valuable things will be obtained from the data. That is, the current data mining. Data mining is intended to find the value of data, mainly cluster analysis, classification analysis, association analysis. Prediction and deviation analysis. Among them, association analysis is to summarize the related items in the data, so as to carry out other analysis work, which is also related to the direction of this paper. In order to facilitate the analysis of association rules. There are many association analysis algorithms to find data items with strong correlation in data. In most association rules algorithms we need to generate frequent 1-item sets first. Then we continue to do the following work on the basis of the generated frequent 1-itemsets. For only one association rule analysis, the frequent 1-itemsets generation needs to scan the database once. However, when the data is increasing and the association analysis is ongoing, the database must be scanned at the time of generating frequent 1-item sets each time the association analysis is carried out. This means that the follow-up association analysis needs to scan the old data repeatedly, which will waste a lot of time. In this paper, we improve the generation of frequent 1-item sets. In order to save unnecessary scanning time of database reading. The improvement of frequent 1itemsets generation in the case of incremental data is mainly through the data transfer of candidate 1-itemsets in the process of generating frequent 1-itemsets. To make it happen. The principle is that the number of data items is much larger than the number of data items, thus saving the time of generating frequent 1-item sets in subsequent association rule analysis, thus saving the working time of the whole algorithm.
【學(xué)位授予單位】:吉林大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 宋余慶,朱玉全,孫志揮,陳耿;基于FP-Tree的最大頻繁項(xiàng)目集挖掘及更新算法[J];軟件學(xué)報(bào);2003年09期

2 呂佳;;Web日志挖掘技術(shù)應(yīng)用研究[J];重慶師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2006年04期

3 安建成;劉超慧;;頻繁項(xiàng)集快速挖掘及更新算法[J];微電子學(xué)與計(jì)算機(jī);2008年06期

4 余平;汪繼文;;Apriori算法的一種改進(jìn)研究[J];廊坊師范學(xué)院學(xué)報(bào)(自然科學(xué)版);2009年04期

5 錢秀檳;李錦川;方星;;信息安全事件定位中的Web日志分析方法[J];信息網(wǎng)絡(luò)安全;2010年06期

6 呂艷華;衛(wèi)榮娟;;基于知識(shí)獲取障礙分析的學(xué)術(shù)服務(wù)對(duì)策研究[J];中華醫(yī)學(xué)圖書情報(bào)雜志;2011年10期

7 賈l,

本文編號(hào):1459081


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1459081.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶22224***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com