天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

網(wǎng)絡(luò)流量測(cè)量中基于計(jì)數(shù)的頻繁項(xiàng)挖掘算法研究

發(fā)布時(shí)間:2018-03-12 20:51

  本文選題:數(shù)據(jù)流 切入點(diǎn):頻繁項(xiàng) 出處:《燕山大學(xué)》2014年碩士論文 論文類型:學(xué)位論文


【摘要】:大流識(shí)別是在網(wǎng)絡(luò)流量測(cè)量數(shù)據(jù)流中查詢頻繁項(xiàng)的過程,近些年成為了數(shù)據(jù)流挖掘領(lǐng)域的一個(gè)研究熱點(diǎn)。網(wǎng)絡(luò)流量數(shù)據(jù)流具有高速、不確定、時(shí)變等特征,并且大流識(shí)別對(duì)查詢結(jié)果的精度要求較高,對(duì)算法的時(shí)空開銷也有一定限制。故目前的數(shù)據(jù)流頻繁項(xiàng)挖掘算法仍存在識(shí)別效率低、處理實(shí)時(shí)性差、性能亟待改進(jìn)等問題。 首先,針對(duì)PLC(Probabilistic Lossy Counting)算法的計(jì)算公式和清除策略,進(jìn)一步優(yōu)化時(shí)空開銷,提出了一種基于計(jì)數(shù)的空間優(yōu)化的大流識(shí)別算法DPLC(Delta Probabilistic Lossy Counting)。該算法引入變化度和穩(wěn)態(tài)值兩個(gè)概念,來(lái)衡量計(jì)算出的窗口誤差值是否達(dá)到穩(wěn)態(tài),達(dá)到穩(wěn)態(tài)后可以省去計(jì)算公式的時(shí)間,從而降低時(shí)間開銷。此外,邊界刪除時(shí)采用更嚴(yán)格的誤差值,區(qū)分對(duì)待是否使用計(jì)算出的窗口誤差值替換表?xiàng)l目的估計(jì)值,,可以有效降低存儲(chǔ)開銷。 其次,針對(duì)LC(Lossy Counting)算法存儲(chǔ)開銷大且識(shí)別精度低,而PLC算法識(shí)別精度較高但可能出現(xiàn)漏報(bào)的問題,進(jìn)一步優(yōu)化算法性能,提出了一種非線性有損計(jì)數(shù)的大流識(shí)別算法NLC(Nonlinear Lossy Counting)。分別采用非線性函數(shù)作為統(tǒng)計(jì)值和刪除值,通過有效控制算法的存儲(chǔ)開銷來(lái)降低輸出結(jié)果的誤報(bào)和避免漏報(bào)。值得注意的是,非線性函數(shù)的參數(shù)值選取是否合適對(duì)NLC算法的性能發(fā)揮有影響,實(shí)驗(yàn)分析總結(jié)了非線性函數(shù)的參數(shù)值與包分布、支持度之間的變化規(guī)律。設(shè)置不同參數(shù)值的非線性函數(shù)可以改變NLC算法的存儲(chǔ)開銷,進(jìn)而改變了識(shí)別結(jié)果的誤報(bào)率和漏報(bào)率。 最后,通過實(shí)驗(yàn)使用人工合成數(shù)據(jù)集和真實(shí)網(wǎng)絡(luò)流量數(shù)據(jù)集評(píng)估算法的性能。實(shí)驗(yàn)結(jié)果展示了本文提出的算法在識(shí)別精度、時(shí)間開銷、存儲(chǔ)開銷等方面的優(yōu)勢(shì)。本文實(shí)驗(yàn)均在MyEclipse平臺(tái)下開發(fā),用java語(yǔ)言實(shí)現(xiàn)。
[Abstract]:Large flow identification is the process of querying frequent items in the network traffic measurement data stream. In recent years, it has become a research hotspot in the field of data stream mining. The network traffic data stream has the characteristics of high speed, uncertainty, time-varying and so on. Large stream recognition requires high precision of query results and also has some limitations on the space-time overhead of the algorithm. Therefore, the current algorithms for frequent item mining of data streams still have some problems such as low recognition efficiency, poor real-time processing, and the performance needs to be improved. First of all, aiming at the calculation formula and clearing strategy of PLC(Probabilistic Lossy counting algorithm, and further optimizing space-time overhead, a large flow recognition algorithm DPLC(Delta Probabilistic Lossy counting algorithm based on counting is proposed, which introduces the concepts of degree of variation and steady-state value. To measure whether the calculated window error is steady-state, the time of the formula can be saved and the time cost can be reduced. In addition, a more strict error is used when the boundary is deleted. It can effectively reduce the storage overhead by distinguishing whether to replace the estimated values of table entries with the calculated window error values. Secondly, aiming at the problem that the LC(Lossy counting algorithm has high storage cost and low recognition accuracy, but the PLC algorithm has high recognition accuracy but may be underreported, the performance of the algorithm is further optimized. In this paper, a large flow recognition algorithm for nonlinear lossy counting, NLC(Nonlinear Lossy counting algorithm, is proposed. The nonlinear function is used as the statistical value and the deletion value respectively. By effectively controlling the storage cost of the algorithm, the false positives of the output results are reduced and the false positives are avoided. Whether the parameter value of the nonlinear function is suitable or not has an effect on the performance of the NLC algorithm. The parameter value and the packet distribution of the nonlinear function are analyzed and summarized in the experiment. The nonlinear function with different parameter values can change the storage cost of NLC algorithm and then change the false alarm rate and false alarm rate of the recognition results. Finally, the performance of the algorithm is evaluated by using artificial data sets and real network traffic datasets. The experimental results show that the proposed algorithm has high recognition accuracy and time overhead. The experiments in this paper are developed on MyEclipse platform and realized by java language.
【學(xué)位授予單位】:燕山大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.06

【參考文獻(xiàn)】

相關(guān)期刊論文 前5條

1 ;Identifying heavy hitters in high-speed network monitoring[J];Science China(Information Sciences);2010年03期

2 李臻;楊雅輝;謝高崗;覃光成;;一種基于數(shù)據(jù)流計(jì)數(shù)的概率衰落大業(yè)務(wù)流識(shí)別方法[J];計(jì)算機(jī)研究與發(fā)展;2011年06期

3 祝然威;王鵬;劉馬金;;基于計(jì)數(shù)的數(shù)據(jù)流頻繁項(xiàng)挖掘算法[J];計(jì)算機(jī)研究與發(fā)展;2011年10期

4 王秀坤;王鐵存;周國(guó)能;馮維;;挖掘數(shù)據(jù)流近似頻繁項(xiàng)的改進(jìn)算法[J];計(jì)算機(jī)工程與應(yīng)用;2008年13期

5 王偉平;李建中;張冬冬;郭龍江;;一種有效的挖掘數(shù)據(jù)流近似頻繁項(xiàng)算法[J];軟件學(xué)報(bào);2007年04期



本文編號(hào):1603253

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1603253.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶42299***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
黄色国产精品一区二区三区| 成人区人妻精品一区二区三区| 日韩欧美中文字幕人妻| 精品久久av一二三区| 久久精品亚洲精品一区| 亚洲国产av在线视频| 国产视频福利一区二区| 免费在线成人午夜视频| 亚洲中文在线中文字幕91| 亚洲国产欧美久久精品| 午夜免费精品视频在线看| 国产一区二区三区午夜精品| 久久精品久久精品中文字幕| 大尺度剧情国产在线视频| 中文字幕乱码一区二区三区四区| 欧美精品日韩精品一区| 日本高清加勒比免费在线| 91免费精品国自产拍偷拍| 99久热只有精品视频最新| 亚洲欧美黑人一区二区| 国语对白刺激高潮在线视频| 中文字幕日韩精品人一妻| 国产伦精品一区二区三区高清版| 亚洲免费观看一区二区三区| 欧美日本道一区二区三区| 欧美成人精品一区二区久久| 在线观看欧美视频一区| 国产精品福利一级久久| 国产综合香蕉五月婷在线| 午夜视频成人在线免费| 好吊妞视频免费在线观看| 国产亚洲神马午夜福利| 精品久久综合日本欧美| 办公室丝袜高跟秘书国产| 日韩精品一级一区二区| 国产级别精品一区二区视频| 国产又猛又黄又粗又爽无遮挡 | 中文字幕日韩欧美理伦片| 欧美日韩亚洲精品内裤| 欧美激情区一区二区三区| 精品欧美一区二区三久久|