微博內(nèi)容的采集、分析及其可視化研究
發(fā)布時(shí)間:2018-07-22 10:35
【摘要】:隨著微博、微信等社交媒體的發(fā)展,智能終端的不斷涌現(xiàn),這些新興事物在改變?nèi)藗兩罘绞降耐瑫r(shí),也帶來(lái)了體積龐大、多重維度、非結(jié)構(gòu)化的信息數(shù)據(jù)。多數(shù)研究者認(rèn)為,這些數(shù)據(jù)是這個(gè)時(shí)代賜予的寶藏,面向數(shù)據(jù)科學(xué)的研究也愈演愈熱。本文從三個(gè)方面論述了面向新浪微博數(shù)據(jù)的研究工作:首先是微博數(shù)據(jù)的采集,其次是基于用戶微博數(shù)據(jù)的情感新詞發(fā)現(xiàn),最后是基于微博轉(zhuǎn)發(fā)數(shù)據(jù)的傳播網(wǎng)絡(luò)可視化研究。(1)針對(duì)新浪微博數(shù)據(jù)采集方法,本文首先對(duì)比分析了兩種不同的新浪微博模擬登錄驗(yàn)證方式,分別探討了兩種方法的利弊。其次,在獲取驗(yàn)證之后,介紹了新浪微博四類數(shù)據(jù)的采集過(guò)程,分別為用戶個(gè)人信息,用戶微博信息,用戶關(guān)注列表和單條微博的轉(zhuǎn)發(fā)和評(píng)論數(shù)據(jù),為后續(xù)的研究奠定了語(yǔ)料基礎(chǔ)。(2)針對(duì)用戶的新浪微博數(shù)據(jù),由于其口語(yǔ)化、非正式等特點(diǎn),常常伴有大量情感未登錄新詞出現(xiàn),本文基于用戶的微博數(shù)據(jù)進(jìn)行了詞語(yǔ)級(jí)情感傾向性判斷的研究。首先采用基于統(tǒng)計(jì)量的方法,識(shí)別微博語(yǔ)料中的新詞,然后利用神經(jīng)網(wǎng)絡(luò)去訓(xùn)練語(yǔ)料中詞語(yǔ)的詞向量,獲取詞語(yǔ)之間的內(nèi)在聯(lián)系,最后提出了基于詞向量的情感新詞發(fā)現(xiàn)方法。從實(shí)驗(yàn)結(jié)果來(lái)看,本文的方法具有一定的實(shí)用價(jià)值。(3)針對(duì)新浪微博的轉(zhuǎn)發(fā)數(shù)據(jù),本文對(duì)單條微博的傳播過(guò)程做了WEB可視化的分析。首先通過(guò)微博轉(zhuǎn)發(fā)數(shù)據(jù),構(gòu)建傳播網(wǎng)絡(luò)。然后根據(jù)轉(zhuǎn)發(fā)者個(gè)人信息數(shù)據(jù),從三個(gè)方面:節(jié)點(diǎn)的篩選、層次化的信息展示以及交互式功能的設(shè)計(jì)論述了可視化的實(shí)現(xiàn)過(guò)程。通過(guò)可視分析的方式,簡(jiǎn)單、快速的找出微博傳播過(guò)程中至關(guān)重要的節(jié)點(diǎn),判斷消息傳播的影響范圍。
[Abstract]:With the development of social media such as micro-blog and WeChat and the emergence of intelligent terminals, these new things have also brought large, multidimensional, unstructured information data while changing people's lifestyles. Most researchers believe that these data are the treasure of this time generation, and the more research of data science is becoming more and more the more. Heat. This paper discusses the research work of sina micro-blog data from three aspects: first, the acquisition of micro-blog data, the second is the discovery of emotional neologisms based on the user's micro-blog data, and the last is the research of the communication network visualization based on the micro-blog forwarding data. (1) in this paper, two kinds of methods are compared and analyzed in this paper. The advantages and disadvantages of the different Sina micro-blog simulation login verification methods are discussed respectively. Secondly, after obtaining the verification, the collection process of the four types of sina micro-blog data is introduced, which are user personal information, user micro-blog information, user attention list and the forwarding and comment data of single micro-blog, which have laid a corpus for subsequent research. (2) (2) according to the user's Sina micro-blog data, because of its colloquial and informal characteristics, it is often accompanied by a large number of unregistered words. This paper is based on the user's micro-blog data to make a study of the emotional tendency judgment of the word level. First, the method based on statistics is used to identify the new words in the micro-blog corpus, and then use the neural network. On the basis of the experimental results, the method has some practical value. (3) in view of the forwarding data of sina micro-blog, this paper makes a WEB visualization analysis of the transmission process of single micro-blog in this paper. First, the transmission network is constructed through the micro-blog forwarding data. Then according to the forwarder's personal information data, the visual realization process is discussed from three aspects: node selection, hierarchical information display and interactive function design. Through visual analysis, it is very simple and fast to find the vital part of the micro-blog communication process. Nodes determine the scope of the impact of message propagation.
【學(xué)位授予單位】:大連理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類號(hào)】:TP393.092;TP391.1
本文編號(hào):2137157
[Abstract]:With the development of social media such as micro-blog and WeChat and the emergence of intelligent terminals, these new things have also brought large, multidimensional, unstructured information data while changing people's lifestyles. Most researchers believe that these data are the treasure of this time generation, and the more research of data science is becoming more and more the more. Heat. This paper discusses the research work of sina micro-blog data from three aspects: first, the acquisition of micro-blog data, the second is the discovery of emotional neologisms based on the user's micro-blog data, and the last is the research of the communication network visualization based on the micro-blog forwarding data. (1) in this paper, two kinds of methods are compared and analyzed in this paper. The advantages and disadvantages of the different Sina micro-blog simulation login verification methods are discussed respectively. Secondly, after obtaining the verification, the collection process of the four types of sina micro-blog data is introduced, which are user personal information, user micro-blog information, user attention list and the forwarding and comment data of single micro-blog, which have laid a corpus for subsequent research. (2) (2) according to the user's Sina micro-blog data, because of its colloquial and informal characteristics, it is often accompanied by a large number of unregistered words. This paper is based on the user's micro-blog data to make a study of the emotional tendency judgment of the word level. First, the method based on statistics is used to identify the new words in the micro-blog corpus, and then use the neural network. On the basis of the experimental results, the method has some practical value. (3) in view of the forwarding data of sina micro-blog, this paper makes a WEB visualization analysis of the transmission process of single micro-blog in this paper. First, the transmission network is constructed through the micro-blog forwarding data. Then according to the forwarder's personal information data, the visual realization process is discussed from three aspects: node selection, hierarchical information display and interactive function design. Through visual analysis, it is very simple and fast to find the vital part of the micro-blog communication process. Nodes determine the scope of the impact of message propagation.
【學(xué)位授予單位】:大連理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類號(hào)】:TP393.092;TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 王素格;李德玉;魏英杰;宋曉雷;;基于同義詞的詞匯情感傾向判別方法[J];中文信息學(xué)報(bào);2009年05期
2 羅江華;;基于MD5與Base64的混合加密算法[J];計(jì)算機(jī)應(yīng)用;2012年S1期
,本文編號(hào):2137157
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2137157.html
最近更新
教材專著