天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

基于Hadoop架構(gòu)的大數(shù)據(jù)文本分析研究

發(fā)布時間:2021-02-15 00:04
  我們正處于“大數(shù)據(jù)”時代,大數(shù)據(jù)的出現(xiàn)為處理海量數(shù)據(jù)帶來了新的機(jī)遇和挑戰(zhàn)。大數(shù)據(jù)在現(xiàn)代社會發(fā)揮了重要作用,為了從大量的數(shù)據(jù)中找到有用的信息,需要對數(shù)據(jù)進(jìn)行分析。數(shù)據(jù)分析需要從文本、圖像、視頻或社交媒體帖子等出現(xiàn)在網(wǎng)絡(luò)上的非結(jié)構(gòu)化數(shù)據(jù)中獲取信息。本文概述了大數(shù)據(jù)的優(yōu)勢和研究范圍,介紹了 Hadoop架構(gòu)及其組件中的大數(shù)據(jù)文本分析,還重點(diǎn)研究了大數(shù)據(jù)在數(shù)據(jù)挖掘中的應(yīng)用。文本分析是工業(yè)分析中最復(fù)雜的數(shù)據(jù)分析之一。原因是在開發(fā)文本挖掘時需要處理非結(jié)構(gòu)化數(shù)據(jù)(電子郵件、Facebook、Twitter和Linkedin提要),沒有明確定義觀察和變量(行和列)。因此,要進(jìn)行任何類型的數(shù)據(jù)分析,都需要先將這個非結(jié)構(gòu)化數(shù)據(jù)轉(zhuǎn)換為結(jié)構(gòu)化數(shù)據(jù)集,然后繼續(xù)使用普通的建?蚣堋⒎墙Y(jié)構(gòu)化數(shù)據(jù)轉(zhuǎn)換為結(jié)構(gòu)化格式的附加步驟由單詞字典提供便利,需要一本字典來做任何類型的信息提取,情感分析詞典可以在網(wǎng)上找到。然而,對于某些特定的分析,用戶需要創(chuàng)建自己的字典。本文用Hadoop eco系統(tǒng)描述了文本分析的兩個概念部分,以及具體的MapReduce。第一種方法是從2013年的tweets中收集一個大的文本文件(CSV文件)... 

【文章來源】:蘭州理工大學(xué)甘肅省

【文章頁數(shù)】:76 頁

【學(xué)位級別】:碩士

【文章目錄】:
中文摘要
Abstract
Chapter 1 Introduction
    1.1 Background and Motivation
        1.1.1 Background
        1.1.2 Motivation
    1.2 Research Status at Home and Abroad
    1.3 Objective of Research Work
    1.4 Methodology
    1.5 Required Resources
    1.6 Structure of Thesis
Chapter 2 Overview of Big Data Text Analysis Basedon Hadoop Architecture
    2.1 Big Data Overview
    2.2 Big Data Characteristics
        2.2.1 Volume
        2.2.2 Velocity
        2.2.3 Variety
        2.2.4 Veracity
        2.2.5 Value
    2.3 Different Types of Data
        2.3.1 Structured Data
        2.3.2 Unstructured Data
        2.3.3 Semi-Structured Data
        2.3.4 Metadata
    2.4 Data Analysis
    2.5 Big Data Adoption and Planning Considerations
        2.5.1 Data Procurement
        2.5.2 Privacy
        2.5.3 Security
        2.5.4 Provenance
        2.5.5 Limited Realtime Support
        2.5.6 Distinct Performance Challenges
    2.6 Hadoop Overview
        2.6.1 Hadoop Architecture
        2.6.2 MapReduce
        2.6.3 Hadoop Distributed File System
Chapter 3 Implementation plan
    3.1 Big Data Analysis Techniques
    3.2 Quantitative Analysis
    3.3 Qualitative Analysis
    3.4 Data Mining
    3.5 Statistical Analysis
        3.5.1 A/B Testing
        3.5.2 Correlation
        3.5.3 Regression
        3.5.4 Conclusions
Chapter 4 Experimental result
    4.1 Text Analysis Within Hadoop
    4.2 Test Cases
    4.3 Dataservices vs Hadoop: Comparing the Results
    4.4 Transferring Text Data Processing Libraries To The Hadoop Cluster
    4.5 Optimizing Text Data Processing For Use In The Hadoop Framework
    4.6 HDFS Source File Formats
    4.7 Text Data Processing Pushed Down to Hadoop
    4.8 Problem Tracking
    4.9 Other Type of Errors to Watch Out
    4.10 University Mobile App for Collecting Big Data
    4.11 Using GPS Location Coordinate for Text Analysis
    4.12 Results Comparison
Chapter 5 Summary and future work
    5.1 Summary
    5.2 Future Work
References
Acknowledgements
Appendix A. Mobile Apps Developed during the master's degree program
Appendix B. Key Codes used in this thesis



本文編號:3034074

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/3034074.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶2e30a***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com