基于云計(jì)算的日志挖掘系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-10-07 19:54
【摘要】:隨著社會(huì)信息化進(jìn)程的不斷加快,信息量不可避免呈現(xiàn)出一個(gè)爆炸式增長(zhǎng)的趨勢(shì)。如何有效應(yīng)對(duì)由此產(chǎn)生的海量數(shù)據(jù)存儲(chǔ)與計(jì)算的挑戰(zhàn),使得云計(jì)算成為解決這一難題的一個(gè)重要手段;谠朴(jì)算的日志挖掘系統(tǒng)利用云計(jì)算的方法,通過(guò)分析和挖掘搜索引擎的海量用戶日志,對(duì)其進(jìn)行復(fù)雜的多維度映射和交叉計(jì)算,轉(zhuǎn)化為數(shù)據(jù)倉(cāng)庫(kù)中各維度統(tǒng)計(jì)數(shù)據(jù),搭建起了數(shù)據(jù)挖掘的平臺(tái)。得到的搜索引擎網(wǎng)站的十三個(gè)具體流量指標(biāo),能通過(guò)網(wǎng)站流量的變化,為網(wǎng)站運(yùn)營(yíng)提供分析的基礎(chǔ),以及為產(chǎn)品、業(yè)務(wù)、決策做支撐。 按軟件工程的方法,首先對(duì)系統(tǒng)進(jìn)行了業(yè)務(wù)和需求分析,明確了日志挖掘系統(tǒng)的四項(xiàng)功能需求。然后進(jìn)行了系統(tǒng)的總體設(shè)計(jì),給出了系統(tǒng)的流程框架,提出了將系統(tǒng)分為日志預(yù)處理、日志分析統(tǒng)計(jì)作業(yè)、聯(lián)機(jī)分析處理三個(gè)模塊來(lái)進(jìn)行設(shè)計(jì)與實(shí)現(xiàn)。在系統(tǒng)設(shè)計(jì)中分別對(duì)各個(gè)數(shù)據(jù)模型、XML配置、維度和事實(shí)表以及維度映射和交叉規(guī)則的設(shè)計(jì)做了詳細(xì)的分析說(shuō)明。在系統(tǒng)的實(shí)現(xiàn)部分,給出了日志數(shù)據(jù)裝載過(guò)程、ETL過(guò)程的實(shí)現(xiàn),維度解析器和各個(gè)指標(biāo)算法的實(shí)現(xiàn),以及數(shù)據(jù)倉(cāng)庫(kù)對(duì)多維交叉分析的解決方案的實(shí)現(xiàn)。特別是對(duì)基于Hadoop云計(jì)算的指標(biāo)算法實(shí)現(xiàn)給出了詳細(xì)的實(shí)現(xiàn)流程。 通過(guò)對(duì)云計(jì)算技術(shù)、Hadoop的Map/Reduce編程框架、數(shù)據(jù)挖掘以及數(shù)據(jù)倉(cāng)庫(kù)的聯(lián)機(jī)分析處理等相關(guān)知識(shí)的應(yīng)用,,給出了一個(gè)基于云計(jì)算的日志挖掘系統(tǒng)的開(kāi)發(fā)實(shí)例。
[Abstract]:With the rapid development of social informatization, the amount of information inevitably presents a trend of explosive growth. How to effectively deal with the challenges of massive data storage and computing makes cloud computing an important means to solve this problem. The log mining system based on cloud computing uses the method of cloud computing, through analyzing and mining the massive user log of search engine, carries on the complex multi-dimensional mapping and cross calculation to it, and transforms it into the statistical data of each dimension in the data warehouse. Set up the platform of data mining. The 13 specific traffic indexes of the search engine website can provide the basis for the analysis of the website operation, as well as the support for the product, business and decision making through the change of the website traffic. According to the method of software engineering, the business and requirement of the system are analyzed firstly, and the four functional requirements of log mining system are clarified. Then the overall design of the system is carried out, the system flow framework is given, and the system is divided into three modules: log preprocessing, log analysis and statistics, on-line analysis and processing. In the system design, the XML configuration of each data model, the dimension and fact table, the design of dimension mapping and cross rules are analyzed in detail. In the implementation of the system, the implementation of the log data loading process and ETL process, the implementation of dimension parser and each index algorithm, and the solution of data warehouse to multidimensional cross analysis are given. In particular, the implementation process of index algorithm based on Hadoop cloud computing is given in detail. Through the application of Map/Reduce programming framework of cloud computing technology, data mining and on-line analytical processing of data warehouse, a development example of log mining system based on cloud computing is given.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP391.3
[Abstract]:With the rapid development of social informatization, the amount of information inevitably presents a trend of explosive growth. How to effectively deal with the challenges of massive data storage and computing makes cloud computing an important means to solve this problem. The log mining system based on cloud computing uses the method of cloud computing, through analyzing and mining the massive user log of search engine, carries on the complex multi-dimensional mapping and cross calculation to it, and transforms it into the statistical data of each dimension in the data warehouse. Set up the platform of data mining. The 13 specific traffic indexes of the search engine website can provide the basis for the analysis of the website operation, as well as the support for the product, business and decision making through the change of the website traffic. According to the method of software engineering, the business and requirement of the system are analyzed firstly, and the four functional requirements of log mining system are clarified. Then the overall design of the system is carried out, the system flow framework is given, and the system is divided into three modules: log preprocessing, log analysis and statistics, on-line analysis and processing. In the system design, the XML configuration of each data model, the dimension and fact table, the design of dimension mapping and cross rules are analyzed in detail. In the implementation of the system, the implementation of the log data loading process and ETL process, the implementation of dimension parser and each index algorithm, and the solution of data warehouse to multidimensional cross analysis are given. In particular, the implementation process of index algorithm based on Hadoop cloud computing is given in detail. Through the application of Map/Reduce programming framework of cloud computing technology, data mining and on-line analytical processing of data warehouse, a development example of log mining system based on cloud computing is given.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 程瑩;張?jiān)朴?徐雷;房秉毅;;基于Hadoop及關(guān)系型數(shù)據(jù)庫(kù)的海量數(shù)據(jù)分析研究[J];電信科學(xué);2010年11期
2 王峰;雷葆華;;Hadoop分布式文件系統(tǒng)的模型分析[J];電信科學(xué);2010年12期
3 李喬;鄭嘯;;云計(jì)算研究現(xiàn)狀綜述[J];計(jì)算機(jī)科學(xué);2011年04期
4 欒亞建;黃爛
本文編號(hào):2255474
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2255474.html
最近更新
教材專著