基于NoSQL數(shù)據(jù)庫(kù)的海量天文圖像分布存儲(chǔ)研究
發(fā)布時(shí)間:2019-04-23 14:42
【摘要】:隨著計(jì)算機(jī)技術(shù)和網(wǎng)絡(luò)技術(shù)的迅猛發(fā)展,軟硬件的不斷更新?lián)Q代,現(xiàn)今數(shù)據(jù)成指數(shù)爆炸式增長(zhǎng)趨勢(shì)。如此龐大的數(shù)據(jù)我們稱之為海量數(shù)據(jù),甚至是大數(shù)據(jù)。這標(biāo)志這大數(shù)據(jù)時(shí)代的到來(lái)。跟以往數(shù)據(jù)不同的是越來(lái)越多的數(shù)據(jù)屬于非結(jié)構(gòu)化數(shù)據(jù),例如聲音、圖片以及視頻等類型文件。在天文領(lǐng)域,隨著天文觀測(cè)設(shè)備和終端設(shè)備的設(shè)計(jì)與制造技術(shù)不斷提高,各地觀測(cè)站的不斷建立以及規(guī)模的不斷擴(kuò)大,天文觀測(cè)能力成倍增長(zhǎng),已從古老的光學(xué)觀測(cè)變成全波段的天文學(xué)研究。天文數(shù)據(jù)每小時(shí)甚至每秒都在以驚人速度增加。天文領(lǐng)域面臨著海量數(shù)據(jù)存儲(chǔ)的挑戰(zhàn)。 面對(duì)天文海量數(shù)據(jù)存儲(chǔ)的要求,傳統(tǒng)關(guān)系型數(shù)據(jù)庫(kù)已不是解決問題的理想方案,它的固有特性甚至成為海量數(shù)據(jù)存儲(chǔ)的局限。而云計(jì)算、云存儲(chǔ)全新的存儲(chǔ)和計(jì)算思想給IT領(lǐng)域帶來(lái)新的變革。本文就是基于這種形勢(shì)研究云存儲(chǔ)平臺(tái)NoSQL數(shù)據(jù)庫(kù)在天文海量圖片存儲(chǔ)中的應(yīng)用前景。 本論文采用MongoDB對(duì)云存儲(chǔ)技術(shù)以及NoSQL數(shù)據(jù)庫(kù)在天文領(lǐng)域的應(yīng)用做了深入研究。 首先,基礎(chǔ)理論調(diào)研。 其次,研究基于MongoDB的海量數(shù)據(jù)存儲(chǔ)系統(tǒng)的構(gòu)建與關(guān)鍵技術(shù)實(shí)現(xiàn)。 再次,在海量天文數(shù)據(jù)存儲(chǔ)系統(tǒng)上做實(shí)驗(yàn)分析。本部分采用四組實(shí)驗(yàn)展開研究,通過存儲(chǔ)大量天文數(shù)據(jù)FITS文件得到實(shí)驗(yàn)數(shù)據(jù),然后進(jìn)行對(duì)比分析,最后得出了以下結(jié)論。第一,在NoSQL數(shù)據(jù)庫(kù)這樣的分布式存儲(chǔ)中,分片能很大程度上提升數(shù)據(jù)存儲(chǔ)和檢索性能。第二,不同的分片大小也會(huì)影響存儲(chǔ)和檢索性能,找到最佳分片大小對(duì)于分布式存儲(chǔ)至關(guān)重要,對(duì)于4M的FITS文件,所選取的分片中分片大小取值為512K的時(shí)候,存儲(chǔ)效率最高。第三,像MongoDB這樣的內(nèi)存映射存儲(chǔ)數(shù)據(jù)庫(kù),在存儲(chǔ)和檢索數(shù)據(jù)時(shí)都會(huì)出現(xiàn)一定阻塞,實(shí)驗(yàn)證明阻塞情況跟分片沒有明顯關(guān)系。第四,存儲(chǔ)不同文件大小,最佳分片大小的取值不一樣,在實(shí)驗(yàn)選取的七組分片中,存儲(chǔ)小于16M的FITS文件時(shí),最佳分片大小和文件大小之間的比例是1:8,而大于或等于16M的FITS文件,最佳分片大小不會(huì)隨著文件增大而增大,基本在1M時(shí)存儲(chǔ)效率最高。此外,本研究在僅使用兩臺(tái)普通服務(wù)器等條件下,通過對(duì)實(shí)驗(yàn)數(shù)據(jù)的分析,存取速度能達(dá)到80M/s,如果再改善集群條件(如高內(nèi)存、高帶寬、多網(wǎng)卡、多數(shù)據(jù)節(jié)點(diǎn)等),存儲(chǔ)的容量和速度都會(huì)有很大程度的提升,這樣就能實(shí)現(xiàn)海量天文數(shù)據(jù)的高效存儲(chǔ)。而云存儲(chǔ)就是這樣一個(gè)能整合網(wǎng)絡(luò)存儲(chǔ)資源以及實(shí)現(xiàn)多節(jié)點(diǎn)的平臺(tái),從而可推斷云存儲(chǔ)是海量天文數(shù)據(jù)存儲(chǔ)的必然趨勢(shì)。 最后,總結(jié)研究工作,得出研究結(jié)論并提出未來(lái)的展望。
[Abstract]:With the rapid development of computer technology and network technology, and the updating of software and hardware, the data is increasing exponentially. Such a huge amount of data we call massive data, even big data. This marks the arrival of the big data era. Unlike previous data, more and more data belong to unstructured data, such as sound, picture and video files. In the field of astronomy, with the continuous improvement of the design and manufacture technology of astronomical observation equipment and terminal equipment, the continuous establishment of local observation stations and the continuous expansion of the scale, the astronomical observation capability has doubled. It has changed from ancient optical observations to full-band astronomical research. Astronomical data are increasing at an alarming rate, even per second. Astronomical field faces the challenge of massive data storage. Facing the requirement of astronomical mass data storage, the traditional relational database is no longer the ideal solution to solve the problem, and its inherent characteristics even become the limitation of mass data storage. Cloud computing, cloud storage new storage and computing ideas to bring new changes in the field of IT. Based on this situation, this paper studies the application prospect of cloud storage platform NoSQL database in astronomical massive picture storage. In this paper, the cloud storage technology and the application of NoSQL database in astronomical field are deeply studied by MongoDB. First of all, basic theory investigation. Secondly, the construction and key technology realization of mass data storage system based on MongoDB are studied. Thirdly, experimental analysis is done on the massive astronomical data storage system. In this part, four groups of experiments are used to study. The experimental data are obtained by storing a large amount of astronomical data FITS files, and then the experimental data are compared and analyzed. Finally, the following conclusions are drawn. Firstly, in distributed storage such as NoSQL database, fragmentation can greatly improve the performance of data storage and retrieval. Second, different slice sizes will also affect the storage and retrieval performance. Finding the optimal slice size is very important for distributed storage. For 4m FITS files, the storage efficiency is the highest when the slice size is 512K. Thirdly, memory mapped storage database such as MongoDB will appear some blocking when storing and retrieving data, and the experiment proves that the blocking condition has no obvious relation with fragmentation. Fourth, different file sizes are stored, and the optimal slice size is different. In the seven-component slices selected in the experiment, when storing FITS files less than 16m, the ratio between the optimal fragment size and the file size is 1? 8. However, for FITS files greater than or equal to 16m, the optimal fragment size does not increase with the increase of the file size, and the storage efficiency is the highest in the case of 1m. In addition, under the condition of using only two ordinary servers and so on, through the analysis of the experimental data, the access speed can reach 80m. If the cluster conditions (such as high memory, high bandwidth, multi-network card, multi-data node, etc.) are improved, The storage capacity and speed will be greatly improved, so that the efficient storage of massive astronomical data can be achieved. Cloud storage is such a platform that can integrate network storage resources and realize multi-node, thus it can be inferred that cloud storage is the inevitable trend of massive astronomical data storage. Finally, summarize the research work, draw the research conclusion and put forward the prospect of the future.
【學(xué)位授予單位】:昆明理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP391.41;TP333
本文編號(hào):2463561
[Abstract]:With the rapid development of computer technology and network technology, and the updating of software and hardware, the data is increasing exponentially. Such a huge amount of data we call massive data, even big data. This marks the arrival of the big data era. Unlike previous data, more and more data belong to unstructured data, such as sound, picture and video files. In the field of astronomy, with the continuous improvement of the design and manufacture technology of astronomical observation equipment and terminal equipment, the continuous establishment of local observation stations and the continuous expansion of the scale, the astronomical observation capability has doubled. It has changed from ancient optical observations to full-band astronomical research. Astronomical data are increasing at an alarming rate, even per second. Astronomical field faces the challenge of massive data storage. Facing the requirement of astronomical mass data storage, the traditional relational database is no longer the ideal solution to solve the problem, and its inherent characteristics even become the limitation of mass data storage. Cloud computing, cloud storage new storage and computing ideas to bring new changes in the field of IT. Based on this situation, this paper studies the application prospect of cloud storage platform NoSQL database in astronomical massive picture storage. In this paper, the cloud storage technology and the application of NoSQL database in astronomical field are deeply studied by MongoDB. First of all, basic theory investigation. Secondly, the construction and key technology realization of mass data storage system based on MongoDB are studied. Thirdly, experimental analysis is done on the massive astronomical data storage system. In this part, four groups of experiments are used to study. The experimental data are obtained by storing a large amount of astronomical data FITS files, and then the experimental data are compared and analyzed. Finally, the following conclusions are drawn. Firstly, in distributed storage such as NoSQL database, fragmentation can greatly improve the performance of data storage and retrieval. Second, different slice sizes will also affect the storage and retrieval performance. Finding the optimal slice size is very important for distributed storage. For 4m FITS files, the storage efficiency is the highest when the slice size is 512K. Thirdly, memory mapped storage database such as MongoDB will appear some blocking when storing and retrieving data, and the experiment proves that the blocking condition has no obvious relation with fragmentation. Fourth, different file sizes are stored, and the optimal slice size is different. In the seven-component slices selected in the experiment, when storing FITS files less than 16m, the ratio between the optimal fragment size and the file size is 1? 8. However, for FITS files greater than or equal to 16m, the optimal fragment size does not increase with the increase of the file size, and the storage efficiency is the highest in the case of 1m. In addition, under the condition of using only two ordinary servers and so on, through the analysis of the experimental data, the access speed can reach 80m. If the cluster conditions (such as high memory, high bandwidth, multi-network card, multi-data node, etc.) are improved, The storage capacity and speed will be greatly improved, so that the efficient storage of massive astronomical data can be achieved. Cloud storage is such a platform that can integrate network storage resources and realize multi-node, thus it can be inferred that cloud storage is the inevitable trend of massive astronomical data storage. Finally, summarize the research work, draw the research conclusion and put forward the prospect of the future.
【學(xué)位授予單位】:昆明理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP391.41;TP333
【引證文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前1條
1 馬文杰;基于CAP理論的海量數(shù)據(jù)存儲(chǔ)研究與應(yīng)用[D];蘇州大學(xué);2013年
,本文編號(hào):2463561
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2463561.html
最近更新
教材專著