面向云計(jì)算的海量數(shù)據(jù)檢索技術(shù)研究與應(yīng)用
本文關(guān)鍵詞:面向云計(jì)算的海量數(shù)據(jù)檢索技術(shù)研究與應(yīng)用 出處:《電子科技大學(xué)》2013年碩士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 云計(jì)算 分布式存儲(chǔ) hadoop hbase
【摘要】:近十多年來(lái),互聯(lián)網(wǎng)產(chǎn)業(yè)迅猛發(fā)展日新月異,不僅讓投資互聯(lián)網(wǎng)的風(fēng)險(xiǎn)投資者們盈利頗豐,也成就了一批市值百億美元的大型互聯(lián)網(wǎng)企業(yè)。互聯(lián)網(wǎng)產(chǎn)業(yè)給新興經(jīng)濟(jì)實(shí)體帶來(lái)非常強(qiáng)勁的發(fā)展動(dòng)力。全球數(shù)億的網(wǎng)民、企業(yè)、事業(yè)及政府部門單位正通過(guò)互聯(lián)網(wǎng)中得到更多的資訊、數(shù)據(jù)交換、消費(fèi)和業(yè)務(wù)的推廣和應(yīng)用。這是一個(gè)龐大而且有層次的用戶群體。 云計(jì)算正是互聯(lián)網(wǎng)產(chǎn)業(yè)中一面新興的旗幟。它結(jié)合了互聯(lián)網(wǎng)絡(luò)的優(yōu)勢(shì),把大量的企業(yè)應(yīng)用,個(gè)人服務(wù)等應(yīng)用通過(guò)服務(wù)的方式基于互聯(lián)網(wǎng)提供了廣大的用戶,比如桌面云、云存儲(chǔ)等。面向云計(jì)算的應(yīng)用與服務(wù)將是二十一世紀(jì)中最有價(jià)值、最有前景的技術(shù)。 本文以云計(jì)算為基礎(chǔ),重點(diǎn)研究分布式數(shù)據(jù)存儲(chǔ)下非結(jié)構(gòu)化數(shù)據(jù)檢索技術(shù),并采用該技術(shù)設(shè)計(jì)一套面向廣電行業(yè)的新聞線索匯聚平臺(tái)的Sass服務(wù)。云平臺(tái)采用了Hadoop作為其分布式數(shù)據(jù)存儲(chǔ)平臺(tái),并構(gòu)建集群的基礎(chǔ)環(huán)境。采用hbase作為分布式面向列的非結(jié)構(gòu)化數(shù)據(jù)庫(kù),作為檢索引擎。研究過(guò)程中,注重實(shí)用性和科學(xué)性并重的原則。 1.分析Hadoop的分布式數(shù)據(jù)存儲(chǔ)架構(gòu),,采用該架構(gòu)構(gòu)建檢索的基礎(chǔ)分布式技術(shù)平臺(tái)。 2.研究MapReduce作業(yè)機(jī)制,采用該機(jī)制使用hbase基于列的分布式數(shù)據(jù)庫(kù)進(jìn)行設(shè)計(jì),構(gòu)建一個(gè)非結(jié)構(gòu)化的檢索引擎。 3.基于以上技術(shù)平臺(tái)設(shè)計(jì)并搭建一套新聞線索匯聚業(yè)務(wù)平臺(tái)。 系統(tǒng)按照云服務(wù)的架構(gòu)設(shè)計(jì),支撐海量的新聞線索,并提供快的檢索、分類聚類的業(yè)務(wù)功能。
[Abstract]:In the past more than 10 years, the rapid development of the Internet industry change rapidly, not only let the Internet investment risk investors are profitable, but also the achievements of a number of billions of dollars in market value of large Internet companies. The Internet industry has brought a very strong impetus to the development of the emerging economic entity. Enterprise global hundreds of millions of Internet users, and government departments, business units are more the information through the Internet, data exchange, promotion and application of consumer and business. This is a huge and hierarchical user groups.
Cloud computing is a banner of the emerging Internet industry. It combines the advantages of the Internet, a large number of enterprise applications, personal services and other applications by way of services based on the Internet provides the majority of users, such as desktop cloud, cloud storage. For cloud computing applications and services will be the most valuable in twenty-first Century in the most promising technology.
This paper focuses on the study of cloud based, distributed data storage under unstructured data retrieval technology, and design a set for the broadcasting industry news clues gathering platform Sass service cloud platform. Using the technology of using Hadoop as its distributed data storage platform, and build environment based clusters. Unstructured database using HBase as distributed column oriented, as search engines. In the course of the study, focusing on both practical and scientific principles.
1. analyze the distributed data storage architecture of Hadoop, and use this architecture to build a basic distributed technology platform for retrieval.
2. study the MapReduce operation mechanism, use this mechanism to design a HBase based distributed database, and build an unstructured retrieval engine.
3. design and build a set of news thread convergence business platform based on the above technical platform.
The system is designed in accordance with the architecture of cloud services, supporting massive news clues, and providing fast retrieval and classified clustering business functions.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP333
【參考文獻(xiàn)】
相關(guān)期刊論文 前4條
1 崔杰;李陶深;蘭紅星;;基于Hadoop的海量數(shù)據(jù)存儲(chǔ)平臺(tái)設(shè)計(jì)與開(kāi)發(fā)[J];計(jì)算機(jī)研究與發(fā)展;2012年S1期
2 毛杰;佘名高;;海量數(shù)據(jù)庫(kù)查詢優(yōu)化研究[J];軟件導(dǎo)刊;2010年05期
3 侯建;帥仁俊;侯文;;基于云計(jì)算的海量數(shù)據(jù)存儲(chǔ)模型[J];通信技術(shù);2011年05期
4 周建鴻;;海量數(shù)據(jù)庫(kù)的查詢優(yōu)化研究及實(shí)現(xiàn)[J];西南民族大學(xué)學(xué)報(bào)(自然科學(xué)版);2010年04期
相關(guān)會(huì)議論文 前1條
1 丁輝;張大華;羅志明;;基于Hadoop的海量數(shù)據(jù)處理平臺(tái)研究[A];2011電力通信管理暨智能電網(wǎng)通信技術(shù)論壇論文集[C];2011年
相關(guān)碩士學(xué)位論文 前1條
1 劉叢山;基于Hadoop的文本分類研究[D];上海交通大學(xué);2012年
本文編號(hào):1430584
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1430584.html