天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

基于固態(tài)硬盤的搜索引擎混合式存儲結(jié)構(gòu)研究

發(fā)布時(shí)間:2018-03-08 11:45

  本文選題:全文檢索 切入點(diǎn):搜索引擎 出處:《華中科技大學(xué)》2012年碩士論文 論文類型:學(xué)位論文


【摘要】:大型搜索引擎索引了數(shù)以億計(jì)的海量文檔,每秒鐘需要處理數(shù)百萬個(gè)查詢請求。目前,許多大型搜索引擎使用磁盤(HDD)存儲海量的索引數(shù)據(jù),磁盤的低速I/O成為其主要性能瓶頸。與傳統(tǒng)的機(jī)械式磁盤不同,新型存儲設(shè)備固態(tài)硬盤(SSD)有許多優(yōu)點(diǎn),尤其是其較高的隨機(jī)數(shù)據(jù)存取能力,然而,它也有潛在的問題,如單位容量存儲成本高、讀寫速度不對稱和塊擦除次數(shù)有限等,因此,當(dāng)前大型搜索引擎還不能用SSD完全取代磁盤。 搜索引擎是典型的I/O密集型應(yīng)用,它在I/O模式上表現(xiàn)出明顯的特征,如讀為主、局部性、跳躍讀和隨機(jī)讀;赟SD的搜索引擎混合式存儲結(jié)構(gòu)是檢索性能、硬件成本和系統(tǒng)可靠性的折中,,它綜合考慮了SSD的讀寫特性和搜索引擎應(yīng)用的I/O特征,將熱點(diǎn)數(shù)據(jù)緩存在內(nèi)存和SSD中,盡可能減少訪問磁盤的次數(shù),提高系統(tǒng)的I/O性能。 基于SSD的搜索引擎混合式存儲的數(shù)據(jù)管理策略采用基于日志的思想組織SSD中的數(shù)據(jù),其目的是提高搜索引擎的檢索性能,同時(shí)降低SSD中的塊擦除操作,它主要包括三個(gè)方面:一是數(shù)據(jù)選擇策略,它是根據(jù)緩存數(shù)據(jù)的不同特征,合理的選擇數(shù)據(jù)存儲在內(nèi)存或SSD中;二是數(shù)據(jù)放置策略,它采用一種改進(jìn)的基于日志的數(shù)據(jù)管理策略來組織和管理SSD中的數(shù)據(jù),以確保其高效的讀寫;三是數(shù)據(jù)替換策略,它對SSD中緩存的結(jié)果和倒排表采取不同的覆寫策略,以盡量避免開銷昂貴的隨機(jī)寫操作,減少塊擦除操作。實(shí)驗(yàn)結(jié)果進(jìn)一步驗(yàn)證了上述數(shù)據(jù)管理策略的有效性,其中,緩存命中率提高了13.31%,檢索性能提高了41.05%,SSD中Flash平均訪問時(shí)間降低了43.83%,SSD中塊擦除次數(shù)減少了71.52%。
[Abstract]:Large search engines index hundreds of millions of massive documents, processing millions of query requests per second. Currently, many large search engines use disk disk HDDs to store huge amounts of indexed data. The low speed I / O of the disk is the main performance bottleneck. Unlike the traditional mechanical disk, the new storage device, the solid state hard disk (SSDs), has many advantages, especially its high random data access capability, but it also has potential problems. Such as high storage cost per unit capacity, asymmetric reading and writing speed and limited number of block erasures, etc., therefore, currently large search engines can not completely replace disks with SSD. Search engine is a typical I / O intensive application. It shows obvious characteristics in I / O mode, such as reading, locality, jumping reading and random reading. The hybrid storage structure of search engine based on SSD is retrieval performance. The tradeoff between hardware cost and system reliability takes into account the reading and writing characteristics of SSD and the I / O features of search engine applications. The hot data is cached in memory and SSD to minimize the number of disk access and improve the I / O performance of the system. The data management strategy of hybrid storage in search engine based on SSD uses the idea of log to organize the data in SSD. The purpose of the strategy is to improve the retrieval performance of search engine and reduce the block erasure operation in SSD. It mainly includes three aspects: one is the data selection strategy, it is according to the different characteristics of cached data, the reasonable choice of data storage in memory or SSD, the other is the data placement strategy, It uses an improved log-based data management strategy to organize and manage data in SSD to ensure its efficient reading and writing; third, a data replacement strategy, which overrides the cached results and inverted tables in SSD. In order to avoid expensive random write operation and reduce block erasure operation, the experimental results further verify the effectiveness of the above data management strategy. Cache hit rate increased 13.31%, retrieval performance improved 41.05% Flash average access time decreased 43.83% SSD block erasure times decreased 71.52%.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP391.3;TP333

【參考文獻(xiàn)】

相關(guān)期刊論文 前2條

1 劉沾沾;岳麗華;金培權(quán);向小巖;;一種針對閃存的高效緩沖區(qū)置換算法[J];小型微型計(jì)算機(jī)系統(tǒng);2010年08期

2 壽黎但;廖定柏;徐昶;陳剛;;PWLRU:一種面向閃存數(shù)據(jù)庫的緩沖區(qū)存取算法[J];浙江大學(xué)學(xué)報(bào)(工學(xué)版);2010年12期



本文編號:1583762

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1583762.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶15de1***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com