基于位圖索引的FITS文件分布式存儲(chǔ)與索引技術(shù)研究
發(fā)布時(shí)間:2018-01-08 20:15
本文關(guān)鍵詞:基于位圖索引的FITS文件分布式存儲(chǔ)與索引技術(shù)研究 出處:《昆明理工大學(xué)》2014年碩士論文 論文類型:學(xué)位論文
更多相關(guān)文章: FITS文件 分布式存儲(chǔ) 位圖索引 FITS文件檢索
【摘要】:大多數(shù)天文觀測(cè)中產(chǎn)生的數(shù)據(jù)是以FITS (Flexible Image Transport System)文件的形式存儲(chǔ)的,這種文件格式在全世界范圍內(nèi)被用于保存和交換數(shù)據(jù)。由于大量的大型多通道多波段天文望遠(yuǎn)鏡的應(yīng)用,當(dāng)今天文觀測(cè)產(chǎn)生的FITS文件的數(shù)量激增,這為如何存儲(chǔ)和快速檢索如此數(shù)量驚人的文件提出了挑戰(zhàn)。在以前,這止匕FITS文件是沒(méi)有被索引的。它們被直接存在硬盤或者其它存儲(chǔ)介質(zhì)上。當(dāng)一個(gè)硬盤存滿的時(shí)候,會(huì)被換上一個(gè)新的,被替換下來(lái)的硬盤將會(huì)被存放在一個(gè)專門用于存放使用過(guò)的硬盤的倉(cāng)庫(kù)內(nèi)。這些硬盤的替換工作都需要由人工來(lái)完成,造成了人力資源的浪費(fèi)。而且這些被替換下來(lái)的硬盤當(dāng)然不是聯(lián)機(jī)的,所以查詢?cè)谒鼈兩洗鎯?chǔ)的文件是一項(xiàng)困難的任務(wù)。所以只有當(dāng)查詢條件是一個(gè)日期或是一個(gè)時(shí)間段,才有可能比較容易獲得查詢結(jié)果,而像錐形檢索這樣復(fù)雜的檢索條件很難被完成。這種由數(shù)量激增的FITS文件所導(dǎo)致的問(wèn)題曾經(jīng)被數(shù)據(jù)庫(kù)管理系統(tǒng)(DBMS),如MySQL和Oracle等所解決。但是隨著文件的數(shù)量越來(lái)越快地增長(zhǎng),傳統(tǒng)的數(shù)據(jù)庫(kù)管理系統(tǒng)無(wú)法跟上文件數(shù)量增長(zhǎng)的腳步。這使得索引和查詢所花費(fèi)的時(shí)間也越來(lái)越長(zhǎng)。 本文介紹了使用分布式存儲(chǔ)系統(tǒng)來(lái)解決FITS文件存儲(chǔ)問(wèn)題的方法,介紹并通過(guò)實(shí)驗(yàn)對(duì)比了幾種分布式文件系統(tǒng)。通過(guò)對(duì)實(shí)驗(yàn)結(jié)果的分析,得出了類似GlusterFS和Lustre這類的對(duì)文件的寫入性能表現(xiàn)得較好的分布式文件系統(tǒng)更適合用于存儲(chǔ)在持續(xù)天文觀測(cè)中不斷產(chǎn)生的海量的FITS文件的結(jié)論。并且最終選取了GlusterFS作為FITS文件分布式存儲(chǔ)系統(tǒng)所使用的分布式文件系統(tǒng)。 在解決FITS文件的檢索問(wèn)題上,本文提出了使用位圖索引的方式加速FITS文件的檢索,并通過(guò)將FastBit位圖索引技術(shù)應(yīng)用在分布式系統(tǒng)上,開發(fā)了FITS文件分布式索引系統(tǒng),實(shí)現(xiàn)海量FITS文件的快速索引和查詢。本文通過(guò)實(shí)驗(yàn)證明了FastBit位圖索引技術(shù)在解決海量FITS文件索引的問(wèn)題上有其性能優(yōu)勢(shì),并證明了在FITS文件分布式存儲(chǔ)的情況下,基于FastBit位圖索引技術(shù)的FITS文件索引與查詢系統(tǒng)能很好地發(fā)揮多機(jī)協(xié)作的優(yōu)勢(shì),能較大地提高海量FITS文件的檢索速度。
[Abstract]:The majority of the astronomical observation data is based on FITS (Flexible Image Transport System) stored files, this file format is used to store and exchange of data within the scope of the whole world. Due to the application of a large number of large multi channel multi band astronomical telescope, when the number of observations today FITS file in the the challenge is how to store and retrieve such a surprising number of documents. In the past, this check dagger FITS file is not indexed. They are directly the existence of the hard disk or other storage medium. When a hard drive is full of time, will be replaced with a new, hard disk will be replaced stored in a specially used for storage of hard disk in the warehouse. These are hard to replace the work needs to be completed by the artificial, resulting in a waste of human resources. And these were replaced hard Of course the disk is not online, so the query stored in files on them is a difficult task. So only when the query is a date or a period of time, it may be easier to obtain query results, and like the cone search complex search condition is difficult to be completed by the surge in the number of. The FITS file has been the problem caused by the database management system (DBMS), such as MySQL and Oracle to solve. But as the number of files is becoming more and more fast growth, the traditional database management system can not keep up with the pace of growth in the number of documents. This makes the cost of indexing and query time is getting longer.
This paper introduces the use of distributed storage system to solve the problem of FITS file storage, and through the experimental comparison of several distributed file system. Through the analysis of experimental results, the write performance similar to GlusterFS and Lustre this kind of file was distributed file system is more suitable for storage are generated continuously the astronomical observation of massive FITS documents and final conclusion. GlusterFS has been selected as the distributed file system using FITS file distributed storage system.
In the search to solve the problem of FITS documents, this paper proposes the use of bitmap index way to accelerate the retrieval of FITS documents, and through the FastBit bitmap indexing technology application in the distributed system, the development of the FITS file distributed index system to achieve massive FITS file fast indexing and query. In this paper, experiments show that the FastBit bitmap indexing technology the performance advantage in solving the problem of massive FITS file index, and proved in the FITS file distributed storage case, FITS document indexing and query system of FastBit bitmap indexing technology can play a very good multi computer cooperation based on the advantages, can greatly improve the massive FITS file retrieval speed.
【學(xué)位授予單位】:昆明理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP333
【參考文獻(xiàn)】
相關(guān)期刊論文 前4條
1 徐虹;張欽;;EXT2文件系統(tǒng)的分析與研究[J];成都信息工程學(xué)院學(xué)報(bào);2007年03期
2 梁金千,張躍;NTFS文件系統(tǒng)的主要數(shù)據(jù)結(jié)構(gòu)[J];計(jì)算機(jī)工程與應(yīng)用;2003年08期
3 朱頌;;linux操作系統(tǒng)中EXT2文件的組成[J];武漢工程大學(xué)學(xué)報(bào);2011年04期
4 崔辰州;李文;于策;徐禎;趙永恒;于建軍;;FITS數(shù)據(jù)文件的檢索和訪問(wèn)[J];天文研究與技術(shù);2008年02期
,本文編號(hào):1398518
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1398518.html
最近更新
教材專著