Web可視化技術(shù)在數(shù)據(jù)挖掘中的研究與應(yīng)用
發(fā)布時(shí)間:2018-08-03 08:09
【摘要】:近來(lái),隨著計(jì)算機(jī)軟硬件的飛速發(fā)展以及互聯(lián)網(wǎng)的普及,人們通過(guò)網(wǎng)絡(luò)和各種移動(dòng)設(shè)備所產(chǎn)生的數(shù)據(jù)量正在爆炸式的增長(zhǎng),可以說(shuō)我們正生活在一個(gè)浩瀚無(wú)邊的數(shù)據(jù)海洋中。因此如何從海量數(shù)據(jù)中迅速分析統(tǒng)計(jì)出有用的信息特征就顯得特別重要,而互聯(lián)網(wǎng)技術(shù)的蓬勃發(fā)展為這一問(wèn)題的解決提供了有效的途徑。數(shù)據(jù)挖掘應(yīng)運(yùn)而生,它是從大量的數(shù)據(jù)集中提取出潛在的、有價(jià)值的信息特征?梢暬前训玫降男畔⑻卣鬓D(zhuǎn)化為可視的表達(dá)形式的過(guò)程。在數(shù)據(jù)挖掘的理論研究與應(yīng)用中,與可視化技術(shù)的結(jié)合就產(chǎn)生了另一個(gè)重要的研究方向可視化數(shù)據(jù)挖掘。利用可視化技術(shù)并結(jié)合人的視覺(jué)特點(diǎn),把數(shù)據(jù)挖掘產(chǎn)生的信息以直觀的形式展現(xiàn)給用戶,從而使得挖掘結(jié)果更有價(jià)值,更易于理解。網(wǎng)絡(luò)信息化時(shí)代產(chǎn)生的數(shù)據(jù)多是通過(guò)網(wǎng)絡(luò)訪問(wèn)和網(wǎng)絡(luò)用戶行為所生成的,大多存儲(chǔ)于網(wǎng)絡(luò)信息平臺(tái)的大型資源數(shù)據(jù)庫(kù)中,這些數(shù)據(jù)往往是字段可變的而且數(shù)據(jù)格式多樣,可以是文本、圖像、聲音、視頻等,而對(duì)這些數(shù)據(jù)庫(kù)日志文件和數(shù)據(jù)文件的分析和應(yīng)用是電子商務(wù)數(shù)據(jù)流到信息流轉(zhuǎn)換分析的關(guān)鍵。論文首先針對(duì)大數(shù)據(jù)的背景進(jìn)行介紹,在此基礎(chǔ)上提出了Web可視化和數(shù)據(jù)挖掘技術(shù),并對(duì)可視化技術(shù)在數(shù)據(jù)挖掘中的研究現(xiàn)狀與研究意義作了介紹,為研究可視化技術(shù)在數(shù)據(jù)挖掘中的應(yīng)用奠定了基礎(chǔ)。其次,深入探討了Web可視化和數(shù)據(jù)挖掘相關(guān)技術(shù),介紹了可視化的基本流程、常用的Web前端可視化工具庫(kù)、多維標(biāo)度算法、Hadoop分布式處理系統(tǒng)等,為論文的整體設(shè)計(jì)提供技術(shù)支持。最后,結(jié)合具體實(shí)例,深入剖析了可視化技術(shù)在數(shù)據(jù)挖掘中的應(yīng)用。對(duì)于傳統(tǒng)的多維屬性變量數(shù)據(jù),采用多維標(biāo)度算法(MDS,multi-dimensional scaling),將多維屬性變量簡(jiǎn)化到低維空間進(jìn)行定位、分析,發(fā)掘數(shù)據(jù)的信息特征并可視化的展現(xiàn)數(shù)據(jù)結(jié)果。對(duì)于某商業(yè)論壇的日志數(shù)據(jù),采用Hadoop海量數(shù)據(jù)處理系統(tǒng),結(jié)合分布式文件系統(tǒng)(HDFS,Hadoop distributed file system)和Map/Reduce分布式計(jì)算模型,從該論壇的頁(yè)面瀏覽量、注冊(cè)用戶數(shù)、獨(dú)立IP數(shù)、跳出數(shù)等分析視角出發(fā),構(gòu)建可視化的數(shù)據(jù)模型,并設(shè)計(jì)了從后臺(tái)數(shù)據(jù)統(tǒng)計(jì)挖掘到前端可視化顯示的整體技術(shù)方案。
[Abstract]:Recently, with the rapid development of computer software and hardware and the popularity of the Internet, the amount of data generated by people through the network and various mobile devices is increasing explosively. It can be said that we are living in a vast ocean of data. Therefore, it is very important to quickly analyze and statistics useful information features from mass data, and the rapid development of Internet technology provides an effective way to solve this problem. Data mining emerges as the times require, it is to extract potential, valuable information features from a large number of data sets. Visualization is the process of transforming the obtained information features into visual representations. In the theoretical research and application of data mining, another important research direction is visual data mining. Using visualization technology and human visual characteristics, the information generated by data mining is presented to users in an intuitive form, which makes the mining results more valuable and easier to understand. Most of the data generated in the era of network information are generated by network access and network user behavior. Most of the data are stored in the large-scale resource database of the network information platform. These data are often variable in field and diverse in data format. It can be text, image, sound, video and so on, and the analysis and application of these database log files and data files is the key to the analysis of the conversion from electronic commerce data stream to information flow. Firstly, the paper introduces the background of big data, then puts forward Web visualization and data mining technology, and introduces the research status and significance of visualization technology in data mining. It lays a foundation for studying the application of visualization technology in data mining. Secondly, the related technologies of Web visualization and data mining are discussed in depth. The basic flow of visualization, the commonly used Web front-end visualization tool library, the multi-dimensional scaling algorithm and Hadoop distributed processing system are introduced. To provide technical support for the overall design of the paper. Finally, the application of visualization technology in data mining is analyzed with concrete examples. For the traditional multidimensional attribute variable data, the multi-dimensional scaling), (Multidimensional scaling algorithm) is used to simplify the multidimensional attribute variable to the low-dimensional space to locate, analyze, explore the information features of the data and visualize the data results. For the log data of a business forum, the Hadoop massive data processing system is adopted, and the distributed file system (HDFS) Hadoop distributed file system) and Map/Reduce distributed computing model are used to analyze the page views, the number of registered users and the independent IP number of the forum. The visual data model is constructed from the perspective of number analysis, and the overall technical scheme from the background data statistics mining to the front-end visual display is designed.
【學(xué)位授予單位】:湘潭大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP311.13
本文編號(hào):2161134
[Abstract]:Recently, with the rapid development of computer software and hardware and the popularity of the Internet, the amount of data generated by people through the network and various mobile devices is increasing explosively. It can be said that we are living in a vast ocean of data. Therefore, it is very important to quickly analyze and statistics useful information features from mass data, and the rapid development of Internet technology provides an effective way to solve this problem. Data mining emerges as the times require, it is to extract potential, valuable information features from a large number of data sets. Visualization is the process of transforming the obtained information features into visual representations. In the theoretical research and application of data mining, another important research direction is visual data mining. Using visualization technology and human visual characteristics, the information generated by data mining is presented to users in an intuitive form, which makes the mining results more valuable and easier to understand. Most of the data generated in the era of network information are generated by network access and network user behavior. Most of the data are stored in the large-scale resource database of the network information platform. These data are often variable in field and diverse in data format. It can be text, image, sound, video and so on, and the analysis and application of these database log files and data files is the key to the analysis of the conversion from electronic commerce data stream to information flow. Firstly, the paper introduces the background of big data, then puts forward Web visualization and data mining technology, and introduces the research status and significance of visualization technology in data mining. It lays a foundation for studying the application of visualization technology in data mining. Secondly, the related technologies of Web visualization and data mining are discussed in depth. The basic flow of visualization, the commonly used Web front-end visualization tool library, the multi-dimensional scaling algorithm and Hadoop distributed processing system are introduced. To provide technical support for the overall design of the paper. Finally, the application of visualization technology in data mining is analyzed with concrete examples. For the traditional multidimensional attribute variable data, the multi-dimensional scaling), (Multidimensional scaling algorithm) is used to simplify the multidimensional attribute variable to the low-dimensional space to locate, analyze, explore the information features of the data and visualize the data results. For the log data of a business forum, the Hadoop massive data processing system is adopted, and the distributed file system (HDFS) Hadoop distributed file system) and Map/Reduce distributed computing model are used to analyze the page views, the number of registered users and the independent IP number of the forum. The visual data model is constructed from the perspective of number analysis, and the overall technical scheme from the background data statistics mining to the front-end visual display is designed.
【學(xué)位授予單位】:湘潭大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前3條
1 曹陽(yáng);高志遠(yuǎn);楊勝春;姚建國(guó);梁云;孫云楓;;云計(jì)算模式在電力調(diào)度系統(tǒng)中的應(yīng)用[J];中國(guó)電力;2012年06期
2 張浩;郭燦;;數(shù)據(jù)可視化技術(shù)應(yīng)用趨勢(shì)與分類研究[J];軟件導(dǎo)刊;2012年05期
3 徐戈;王厚峰;;自然語(yǔ)言處理中主題模型的發(fā)展[J];計(jì)算機(jī)學(xué)報(bào);2011年08期
相關(guān)碩士學(xué)位論文 前2條
1 胡琴琴;基于Hadoop的數(shù)據(jù)可視化技術(shù)研究與應(yīng)用[D];北方工業(yè)大學(xué);2016年
2 姚銀鋒;基于Hadoop的應(yīng)用可視化研究與實(shí)現(xiàn)[D];北京郵電大學(xué);2015年
,本文編號(hào):2161134
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2161134.html
最近更新
教材專著