天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 管理論文 > 移動網絡論文 >

基于話題模型的教育領域微博賬號萃取

發(fā)布時間:2018-06-22 05:13

  本文選題:信息冗余 + 對象篩選。 參考:《華中師范大學》2017年碩士論文


【摘要】:互聯網的高速發(fā)展和廣泛普及正深刻影響著社會的發(fā)展和信息的傳播,越來越多的人習慣通過微博、論壇、社區(qū)等網絡載體傳播見聞、事件和政策等各種信息。教育領域也正在新的時代下飛快地更新迭代,信息平臺的發(fā)展為我們提供了獲取教育信息的一大捷徑。而在信息充足豐富的同時,信息冗余的問題也隨之而來。因此,在快節(jié)奏的生活中,我們希望盡可能快速而全面地捕捉到教育領域中的前沿內容。本文的研究對象主要是在微博平臺上發(fā)布了與教育相關內容的博主賬號,希望尋找一種途徑能幫助我們在眾多可供選擇的對象集中篩選出一個小的博主集合,通過關注小集合中這些大V人群的微博信息,提煉出有關教育的、最新且覆蓋面較廣的信息動態(tài)。針對這個問題,我們首先對已有的研究和方法進行了分析,然后聚焦在比較有效的主題模型上?紤]到教育領域和微博文本的特點,我們確定出初步圈定對象的標準,找到合適的樣本;繼而獲取他們的文本數據,并采用中科院的分詞工具進行了數據轉換和預處理,編寫好詞庫和對應編號,使之形成形如“博主序號-詞語編號-詞頻”的三元組,使數據能直接應用到模型中分析。在分析和解決問題中,我們針對數據做了三層遞進的實驗。首先抽取小樣本分別進行AT模型和人工多重審閱的分析方式,觀察能最大程度呈現出相近結果的篩選方式,將它確定為本文的篩選機制——首先采用AT模型對主題進行劃分,其次根據呈現的關鍵詞對主題進行歸納并根據他們的比率排序,優(yōu)先關注在同一順位上出現次數最多的博主。最后分別針對采集到的兩個規(guī)模下的樣本按照制定以上方式進行篩選,找到在限定主題中的最優(yōu)關注用戶集合。本文的研究為類似問題的處理提供了范例。也還存在一些可以更進一步挖掘的地方,如可以考慮備選博主群的實時更新,以滿足話題變遷的可能性,或者博主間關聯度分析和博主背景等。這需要持續(xù)開展研究,本研究結果可以為后續(xù)研究打下基礎。
[Abstract]:The rapid development and widespread popularity of the Internet is deeply affecting the development of society and the dissemination of information. More and more people are accustomed to spread information through Weibo, forums, communities and other network carriers, such as information, events and policies. The field of education is also rapidly updating and iterating in the new era. The development of information platform provides us with a great shortcut to get education information. At the same time, the problem of information redundancy also follows. Therefore, in a fast-paced life, we want to capture the frontier of education as quickly and comprehensively as possible. The research object of this paper is to publish a blog account of education-related content on the Weibo platform, hoping to find a way to help us filter out a small set of bloggers in a large number of optional object sets. By focusing on the Weibo information of these large V groups in a small set, we can extract the latest and more extensive information trends about education. To solve this problem, we first analyze the existing research and methods, and then focus on the more effective topic model. In view of the characteristics of the educational field and Weibo texts, we have determined the criteria for preliminary delineation of objects, found suitable samples, then obtained their text data, and used the segmentation tools of the Chinese Academy of Sciences for data conversion and preprocessing. The thesaurus and corresponding numbering are compiled to form triples in the form of "blog ordinal-word number-word frequency", so that the data can be directly applied to the analysis of the model. In analyzing and solving the problem, we have done a three-layer progressive experiment on the data. Firstly, we select small samples to analyze AT model and artificial multiple reexamination, observe the screening method that can show similar results to the maximum extent, and determine it as the screening mechanism of this paper. Firstly, we use AT model to divide the topic. Secondly, the topics are summed up according to the keywords presented and sorted according to their ratio, giving priority to the bloggers who appear most frequently in the same sequence. Finally, according to the two scale samples collected, we select the optimal user set in the limited topic according to the formulation of the above method. The research in this paper provides an example for the treatment of similar problems. There are also some places that can be further explored, such as the consideration of real-time updating of alternative blogger groups to meet the possibility of topic change, or the analysis of the correlation degree between bloggers and the background of bloggers, and so on. The results of this study can lay a foundation for further research.
【學位授予單位】:華中師范大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP393.092

【參考文獻】

相關期刊論文 前10條

1 王志國;;網絡輿情監(jiān)控過程中微博文本分類處理的實現方法[J];圖書情報導刊;2016年12期

2 王永貴;張豐田;劉雨詩;肖成龍;;微博中結合轉發(fā)特性的用戶興趣話題挖掘方法[J];計算機應用研究;2017年07期

3 張丁文;;農村淘寶如何進行商品篩選[J];同行;2016年14期

4 劉志遠;;人工智能的發(fā)展方向和趨勢[J];福建理論學習;2016年06期

5 劉路星;鄭蓉蓉;蔡雪玲;;基于TAM模型的慕課教學推廣對策研究[J];安徽工業(yè)大學學報(社會科學版);2016年03期

6 裴超;肖詩斌;江敏;;基于改進的LDA主題模型的微博用戶聚類研究[J];情報理論與實踐;2016年03期

7 仲兆滿;胡云;李存華;劉宗田;;微博中特定用戶的相似用戶發(fā)現方法[J];計算機學報;2016年04期

8 荀峰;;最短路徑問題[J];中學數學教學參考;2015年Z2期

9 李鳳嶺;朱保平;;基于LDA模型的微博話題發(fā)現技術研究[J];計算機應用與軟件;2014年10期

10 米文麗;孫曰昕;;利用概率主題模型的微博熱點話題發(fā)現方法[J];計算機系統應用;2014年08期

相關碩士學位論文 前2條

1 曾珂;基于數據挖掘的微博用戶興趣群體發(fā)現與分類[D];華中師范大學;2014年

2 鄭希文;互聯網話題演變與傳播分析技術研究[D];哈爾濱工程大學;2009年

,

本文編號:2051730

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2051730.html


Copyright(c)文論論文網All Rights Reserved | 網站地圖 |

版權申明:資料由用戶4cbc9***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com