天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于近鄰決策樹(shù)的文件訪問(wèn)行為預(yù)測(cè)方法研究

發(fā)布時(shí)間:2018-03-23 01:24

  本文選題:大規(guī)模存儲(chǔ)系統(tǒng) 切入點(diǎn):元數(shù)據(jù) 出處:《華中科技大學(xué)》2012年碩士論文 論文類(lèi)型:學(xué)位論文


【摘要】:數(shù)據(jù)的快速增長(zhǎng)導(dǎo)致存儲(chǔ)需求的規(guī)模不斷上升,存儲(chǔ)系統(tǒng)中文件數(shù)目不斷增多,文件類(lèi)型千差萬(wàn)別,文件管理日趨復(fù)雜,同時(shí)各種新型存儲(chǔ)介質(zhì)不斷加入到存儲(chǔ)系統(tǒng),系統(tǒng)中各種介質(zhì)混合使用,文件分類(lèi)存儲(chǔ)管理需求不斷增長(zhǎng)。而文件管理的一個(gè)重要前提就是如何對(duì)文件未來(lái)的訪問(wèn)行為也就是訪問(wèn)頻率做出準(zhǔn)確預(yù)測(cè),現(xiàn)有的存儲(chǔ)系統(tǒng)不能有效地提供文件訪問(wèn)行為預(yù)測(cè)的功能,很難滿足文件分類(lèi)存儲(chǔ)管理的需求。 設(shè)計(jì)并實(shí)現(xiàn)了一種新型的文件訪問(wèn)分類(lèi)預(yù)測(cè)方法,它提供對(duì)文件未來(lái)訪問(wèn)行為分類(lèi)預(yù)測(cè)的功能,并能找出與任一文件最相似的K個(gè)文件,這能幫助存儲(chǔ)系統(tǒng)預(yù)知文件未來(lái)的訪問(wèn)行為,優(yōu)化文件物理布局,同時(shí)給文件緩存等提供決策支持。 文件訪問(wèn)分類(lèi)預(yù)測(cè)系統(tǒng)的主要思想是利用文件的靜態(tài)元數(shù)據(jù)和早期的訪問(wèn)記錄來(lái)建立分類(lèi)預(yù)測(cè)模型預(yù)知文件未來(lái)訪問(wèn)行為。它首先利用元數(shù)據(jù)建立決策分割樹(shù),然后在樹(shù)的葉子節(jié)點(diǎn)建立K近鄰分類(lèi)模型,然后利用這個(gè)混合模型來(lái)預(yù)測(cè)文件未來(lái)的訪問(wèn)行為。決策分割樹(shù)是一個(gè)高度平衡的多叉樹(shù),它的主要作用是利用文件的元數(shù)據(jù)對(duì)原始的訓(xùn)練集合做智能劃分,這樣不僅可以去除噪音數(shù)據(jù)而且能節(jié)省后續(xù)的分類(lèi)時(shí)間,而新來(lái)的文件通過(guò)決策分割樹(shù),,會(huì)被分到對(duì)應(yīng)的子集中去,之后通過(guò)在子集中利用最大堆掃描找到與它最相似的K個(gè)文件,通過(guò)這K個(gè)文件來(lái)投票決定它未來(lái)的訪問(wèn)行為。 實(shí)驗(yàn)結(jié)果表明,通過(guò)真實(shí)文件系統(tǒng)的記錄文件提取數(shù)據(jù),所設(shè)計(jì)的文件訪問(wèn)分類(lèi)預(yù)測(cè)系統(tǒng)能準(zhǔn)確預(yù)測(cè)文件未來(lái)的訪問(wèn)頻率,準(zhǔn)確率高達(dá)90%,而且其分類(lèi)所消耗的時(shí)間對(duì)比傳統(tǒng)的KNN算法縮短了近20倍。
[Abstract]:With the rapid growth of data, the scale of storage demand is increasing, the number of files in storage system is increasing, the file types are different, the file management is becoming more and more complicated, and various new storage media are added to the storage system. With the mixed use of all kinds of media in the system, the demand for file classification storage management is increasing, and one of the important prerequisites of file management is how to accurately predict the future access behavior of files, that is, the frequency of access. The existing storage system can not effectively provide the function of file access behavior prediction, and it is difficult to meet the requirements of file classification storage management. A new file access classification prediction method is designed and implemented. It provides the function of classifying and predicting the future access behavior of files, and can find the K files that are the most similar to any file. This can help the storage system to predict the future access behavior of files, optimize the physical layout of files, and provide decision support for file cache. The main idea of file access classification prediction system is to establish a classification prediction model to predict the future access behavior of files by using static metadata and early access records. Then the K-nearest neighbor classification model is established at the leaf node of the tree, and then the hybrid model is used to predict the future access behavior of the file. The decision partition tree is a highly balanced multitree. Its main function is to use the metadata of files to intelligently partition the original training set, which can not only remove the noise data but also save the subsequent classification time. It will be divided into the corresponding subsets, and then the most similar K files will be found by using the maximum heap scan in the subsets, and the K files will be used to vote for its future access behavior. The experimental results show that the designed file access classification and prediction system can accurately predict the future access frequency of the files by extracting the data from the real file system. The accuracy is as high as 90 and the time consumed by the classification is nearly 20 times shorter than that of the traditional KNN algorithm.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類(lèi)號(hào)】:TP333

【參考文獻(xiàn)】

相關(guān)期刊論文 前1條

1 王強(qiáng),劉東波,王建新;數(shù)據(jù)倉(cāng)庫(kù)元數(shù)據(jù)標(biāo)準(zhǔn)研究[J];計(jì)算機(jī)工程;2002年12期



本文編號(hào):1651282

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1651282.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶598ab***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com