基于文本挖掘技術(shù)分析糖尿病推文的研究
發(fā)布時間:2022-05-08 20:44
目前,全球的糖尿病患者與日俱增,如何快速又成功的治療糖尿病是針對健康問題的一項重大挑戰(zhàn)。隨著信息和科技的快速發(fā)展,研究學(xué)者在計算機(jī)系統(tǒng)的功能及安全上投入了更多精力,以期為糖尿病患者開發(fā)出更安全和更方便的護(hù)理程序。以往的研究大多基于存儲在電子醫(yī)療設(shè)備或系統(tǒng)中的患者數(shù)據(jù),但最近的研究發(fā)現(xiàn)社交媒體上關(guān)于糖尿病的文本數(shù)據(jù)具有重要的應(yīng)用價值,如何有效地利用這些大量的非結(jié)構(gòu)化的數(shù)據(jù)來設(shè)計和開發(fā)糖尿病患者的支持系統(tǒng)是目前研究的熱點也是難點。本文以Twitter、Google及百度上關(guān)于糖尿病的討論信息為研究對象,利用多種文本挖掘技術(shù)、LDA主題建模技術(shù)及SVM算法,實現(xiàn)關(guān)于糖尿病文本信息的挖掘,同時為糖尿病的預(yù)測提供了有效的手段,主要研究內(nèi)容如下:1.Tweet文本信息下載及量化,并對信息進(jìn)行特征和標(biāo)簽選擇。利用Python Twitter API函數(shù)將Twitter網(wǎng)站上的tweets下載并以CSV格式存儲到本地,采用Spacy庫實現(xiàn)文本數(shù)據(jù)的分詞,經(jīng)TF-IDF算法計算特征詞的權(quán)重,并通過主成分分析(PCA)算法對數(shù)據(jù)矩陣進(jìn)行降維,以降低數(shù)據(jù)集的復(fù)雜度。為了確保只分析與糖尿病相關(guān)的tweet,計...
【文章頁數(shù)】:75 頁
【學(xué)位級別】:碩士
【文章目錄】:
Abstract
摘要
Chapter1 Introduction
1.1 Background
1.2 Research Significance
1.3 Literature review
1.3.1 The Diabetes Challenge
1.3.2 The Internet as a Source of Health Information
1.4 Motivation and contributions
1.4.1 Motivation
1.4.2 Contributions
1.5 Research contents
1.6 Thesis Structure
Chapter2 Overview of Related Technology
2.1 Natural Language Processing
2.1.1 Application and Challenges of NLP
2.2 Text Mining Techniques
2.3 Topic Modeling Techniques
2.3.1 Topic Modeling Algorithms
2.4 Twitter as a Data Source
2.5 The Datasets
2.5.1 Twitter Data Set
2.5.2 Google Search Data
2.5.3 Baidu Search Data
2.6 Hashtag Selection
2.7 Text to Vector Transformation
2.8 Data Pre-processing Technique
2.9 Support Vector Machines(SVM)
2.10 Python Programming Language
2.11 Coherence Measures
2.12 Summary
bChapter3 Tweet Analysis and Identification of Insights
3.1 Analyzing Diabetes Discussion for Depression Related Insights
3.1.1 Data Set
3.1.2 Feature Engineering
3.1.3 Annotation
3.1.4 Experimental Analysis
3.2 Topic Analysis of Food Mentions in Tweets
3.2.1 System design
3.2.2 Experiment setup
3.2.3 Experimental Analysis
3.3 Summary
Chapter4 Association of Topics of Discussion Topics
4.1 Topic Association between Twitter Communication,Google and Baidu Web Searches
4.1.1 The LDA Algorithm
4.1.2 Word Relevance ratio/Similarity task
4.1.3 Topic Labeling
4.2 Association of Topics with Google and Baidu Web Search Results
4.2.1 Association of twitter and google data
4.2.2 Association of twitter and baidu data
4.3 Determining significance
4.4 Summary
Conclusion and Future Work
References
Acknowledgement
Publication and Awards
本文編號:3652344
【文章頁數(shù)】:75 頁
【學(xué)位級別】:碩士
【文章目錄】:
Abstract
摘要
Chapter1 Introduction
1.1 Background
1.2 Research Significance
1.3 Literature review
1.3.1 The Diabetes Challenge
1.3.2 The Internet as a Source of Health Information
1.4 Motivation and contributions
1.4.1 Motivation
1.4.2 Contributions
1.5 Research contents
1.6 Thesis Structure
Chapter2 Overview of Related Technology
2.1 Natural Language Processing
2.1.1 Application and Challenges of NLP
2.2 Text Mining Techniques
2.3 Topic Modeling Techniques
2.3.1 Topic Modeling Algorithms
2.4 Twitter as a Data Source
2.5 The Datasets
2.5.1 Twitter Data Set
2.5.2 Google Search Data
2.5.3 Baidu Search Data
2.6 Hashtag Selection
2.7 Text to Vector Transformation
2.8 Data Pre-processing Technique
2.9 Support Vector Machines(SVM)
2.10 Python Programming Language
2.11 Coherence Measures
2.12 Summary
bChapter3 Tweet Analysis and Identification of Insights
3.1 Analyzing Diabetes Discussion for Depression Related Insights
3.1.1 Data Set
3.1.2 Feature Engineering
3.1.3 Annotation
3.1.4 Experimental Analysis
3.2 Topic Analysis of Food Mentions in Tweets
3.2.1 System design
3.2.2 Experiment setup
3.2.3 Experimental Analysis
3.3 Summary
Chapter4 Association of Topics of Discussion Topics
4.1 Topic Association between Twitter Communication,Google and Baidu Web Searches
4.1.1 The LDA Algorithm
4.1.2 Word Relevance ratio/Similarity task
4.1.3 Topic Labeling
4.2 Association of Topics with Google and Baidu Web Search Results
4.2.1 Association of twitter and google data
4.2.2 Association of twitter and baidu data
4.3 Determining significance
4.4 Summary
Conclusion and Future Work
References
Acknowledgement
Publication and Awards
本文編號:3652344
本文鏈接:http://sikaile.net/kejilunwen/shengwushengchang/3652344.html
最近更新
教材專著