基于Hadoop的移動互聯(lián)網(wǎng)數(shù)據(jù)導(dǎo)入系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)
本文選題:移動互聯(lián)網(wǎng) 切入點(diǎn):網(wǎng)絡(luò)數(shù)據(jù) 出處:《北京郵電大學(xué)》2014年碩士論文 論文類型:學(xué)位論文
【摘要】:隨著移動互聯(lián)網(wǎng)絡(luò)基礎(chǔ)建設(shè)步伐的加快,智能手機(jī)終端的飛速發(fā)展和網(wǎng)絡(luò)應(yīng)用的日益廣泛,移動互聯(lián)網(wǎng)絡(luò)用戶的數(shù)量正在飛速增長。移動互聯(lián)網(wǎng)絡(luò)正在成為獲取信息的主要渠道。隨之而來的是移動互聯(lián)網(wǎng)絡(luò)數(shù)據(jù)流量的爆炸式增長。這些均對移動網(wǎng)絡(luò)運(yùn)營商的網(wǎng)絡(luò)平臺規(guī)劃和管理能力提出了新的要求。移動互聯(lián)網(wǎng)絡(luò)的用戶行為也呈現(xiàn)出新的規(guī)律。因此,對于移動互聯(lián)網(wǎng)絡(luò)網(wǎng)絡(luò)資源的使用狀況,業(yè)務(wù)成分分析和用戶行為特征的把握就有了極大的必要性。近年來,海量數(shù)據(jù)處理的需求量正在逐漸加大,分布式作為一種對計(jì)算機(jī)的存儲和計(jì)算能力的有效整合方式也隨之發(fā)展起來。而Hadoop作為現(xiàn)行的一種開源的、有效的分布式編程框架正在各項(xiàng)研究和項(xiàng)目中逐漸流行 本文將移動互聯(lián)網(wǎng)絡(luò)數(shù)據(jù)的數(shù)據(jù)特征以及監(jiān)測需求綜合考慮,提出了基于Hadoop的移動互聯(lián)網(wǎng)絡(luò)數(shù)據(jù)的兩種管理方式,分別是離線數(shù)據(jù)導(dǎo)入系統(tǒng)(DataLoader)和實(shí)時數(shù)據(jù)導(dǎo)入系統(tǒng)(LogUploader)。對應(yīng)分別解決了實(shí)驗(yàn)室集群數(shù)據(jù)上傳中大數(shù)據(jù)清理的問題和針對運(yùn)營商的話單查詢系統(tǒng)中海量話單數(shù)據(jù)的導(dǎo)入問題。兩個系統(tǒng)均完成原始數(shù)據(jù)和Hadoop的對接,離線數(shù)據(jù)導(dǎo)入系統(tǒng)主要負(fù)責(zé)對已經(jīng)形成文件的數(shù)據(jù)進(jìn)行一些清洗和其他處理,上傳至HDFS,為對這些數(shù)據(jù)的分析做必要的準(zhǔn)備;它實(shí)現(xiàn)了由傳統(tǒng)話單數(shù)據(jù)到HDFS的快速上傳和處理,并且,它為實(shí)驗(yàn)室集群的數(shù)據(jù)導(dǎo)入提供了一個指導(dǎo)性的編程框架,為以后的數(shù)據(jù)導(dǎo)入需求提供了一種快速實(shí)現(xiàn)的方式。實(shí)時數(shù)據(jù)導(dǎo)入系統(tǒng)主要分布在網(wǎng)絡(luò)監(jiān)測設(shè)備中,實(shí)時得將網(wǎng)絡(luò)產(chǎn)生的原始數(shù)據(jù)進(jìn)行處理后上傳至HDFS,并形成文件分片和BloomFilter索引結(jié)構(gòu),為后續(xù)的分析和查詢需求提供支持;它利用了Hadoop系統(tǒng)的穩(wěn)定性和一定的控制機(jī)制有效的保證了從數(shù)據(jù)采集到數(shù)據(jù)上傳過程中數(shù)據(jù)的完整性。最后,本文還對這兩個系統(tǒng)的測試進(jìn)行了詳細(xì)的介紹。
[Abstract]:With the rapid development of mobile Internet infrastructure, the rapid development of smart phone terminals and network applications are becoming increasingly widespread. The number of mobile internet users is growing rapidly. Mobile internet is becoming the main way to get information. Then comes the explosive growth of mobile internet traffic. The planning and management ability of business network platform has put forward new requirements. The user behavior of mobile Internet also presents new rules. For the use of mobile Internet network resources, business component analysis and user behavior characteristics of the grasp of great necessity. In recent years, the demand for massive data processing is gradually increasing. Distributed as an effective integration of computer storage and computing capabilities has also developed. Hadoop as an existing open source, effective distributed programming framework is gradually becoming popular in various research and projects. Considering the data characteristics and monitoring requirements of mobile Internet data, this paper proposes two management methods of mobile Internet data based on Hadoop. It is the offline data import system (DataLoader) and the real time data import system (LogUploader. respectively). The problem of big data cleaning in the data upload of laboratory cluster and the import of the massive volume of single data in the telephone list query system of the operator are solved respectively. Problem. Both systems complete the docking of raw data and Hadoop, The off-line data import system is mainly responsible for some cleaning and other processing of the data that has been formed and uploaded to the HDFS to make necessary preparations for the analysis of these data, and it realizes the rapid uploading and processing of the data from the traditional telephone bill to the HDFS. Moreover, it provides a guiding programming framework for the data import of the laboratory cluster, and provides a fast way to realize the data import requirements in the future. The real-time data import system is mainly distributed in the network monitoring equipment. The raw data generated by the network can be processed and uploaded to the HDFS in real time, and the file slicing and BloomFilter index structure can be formed to support the subsequent analysis and query requirements. It makes use of the stability of Hadoop system and a certain control mechanism to ensure the integrity of data from data acquisition to data upload. Finally, the test of the two systems is introduced in detail.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.01;TN929.5
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 姜力;李萌;肖喜武;劉康平;;復(fù)雜數(shù)據(jù)導(dǎo)入策略研究與應(yīng)用[J];現(xiàn)代計(jì)算機(jī)(專業(yè)版);2012年20期
2 王映龍;信息系統(tǒng)中數(shù)據(jù)導(dǎo)入和數(shù)據(jù)的合法性檢查的實(shí)現(xiàn)[J];江西農(nóng)業(yè)大學(xué)學(xué)報;2001年05期
3 葉玫;周文瓊;;高校財務(wù)數(shù)據(jù)導(dǎo)入系統(tǒng)的設(shè)計(jì)和實(shí)現(xiàn)[J];信息技術(shù);2012年12期
4 姚曉通;;將文字、圖片、聲音和數(shù)據(jù)導(dǎo)入幾何畫板[J];網(wǎng)絡(luò)科技時代(信息技術(shù)教育);2002年10期
5 李石;;淺析實(shí)現(xiàn)基于網(wǎng)頁的數(shù)據(jù)庫數(shù)據(jù)導(dǎo)入[J];中國新通信;2013年04期
6 程駿;如何解決數(shù)據(jù)導(dǎo)入中的問題[J];電腦編程技巧與維護(hù);2001年10期
7 陳立富;;檢驗(yàn)設(shè)備數(shù)據(jù)導(dǎo)入的設(shè)計(jì)及應(yīng)用[J];解放軍醫(yī)院管理雜志;2004年06期
8 葉含笑,吳洪潭,丁文;高考招生信息數(shù)據(jù)導(dǎo)入系統(tǒng)的設(shè)計(jì)[J];浙江中醫(yī)學(xué)院學(xué)報;2001年05期
9 婁寧,胡友志;用戶定制數(shù)據(jù)導(dǎo)入/導(dǎo)出接口的研究與設(shè)計(jì)[J];艦船電子對抗;2002年06期
10 丁鑫;張?jiān)孪?王文清;;基于對象關(guān)系的通用數(shù)據(jù)導(dǎo)入算法[J];計(jì)算機(jī)工程;2008年11期
相關(guān)會議論文 前1條
1 龐彥廣;于傳松;馬梅;孫功星;;基于UML技術(shù)的核分析實(shí)驗(yàn)數(shù)據(jù)導(dǎo)入系統(tǒng)的設(shè)計(jì)實(shí)現(xiàn)[A];第十二屆全國核電子學(xué)與核探測技術(shù)學(xué)術(shù)年會論文集[C];2004年
相關(guān)重要報紙文章 前4條
1 徐鵬程;數(shù)據(jù)導(dǎo)入的五大步驟[N];中國計(jì)算機(jī)報;2008年
2 徐鵬程;數(shù)據(jù)導(dǎo)入中的沖突處理[N];中國計(jì)算機(jī)報;2008年
3 上海 鄒伸;Excel的數(shù)據(jù)導(dǎo)入[N];中國電腦教育報;2000年
4 浙江 星之海洋;活用DW的數(shù)據(jù)導(dǎo)入、排序與美化[N];電腦報;2002年
相關(guān)碩士學(xué)位論文 前4條
1 柯正祥;基于Hadoop的移動互聯(lián)網(wǎng)數(shù)據(jù)導(dǎo)入系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];北京郵電大學(xué);2014年
2 陳佳木;SAP R/3系統(tǒng)中的物料主數(shù)據(jù)導(dǎo)入的研究與實(shí)現(xiàn)[D];華南理工大學(xué);2009年
3 劉永耀;Excel數(shù)據(jù)導(dǎo)入Oracle數(shù)據(jù)庫表方法的研究與對比[D];東華大學(xué);2014年
4 周里吉;網(wǎng)站分析系統(tǒng)中網(wǎng)站外數(shù)據(jù)導(dǎo)入方案的設(shè)計(jì)與實(shí)現(xiàn)[D];南京大學(xué);2013年
,本文編號:1589908
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1589908.html