天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于話題模型的教育領(lǐng)域微博賬號(hào)萃取

發(fā)布時(shí)間:2018-06-22 05:13

  本文選題:信息冗余 + 對(duì)象篩選。 參考:《華中師范大學(xué)》2017年碩士論文


【摘要】:互聯(lián)網(wǎng)的高速發(fā)展和廣泛普及正深刻影響著社會(huì)的發(fā)展和信息的傳播,越來(lái)越多的人習(xí)慣通過(guò)微博、論壇、社區(qū)等網(wǎng)絡(luò)載體傳播見聞、事件和政策等各種信息。教育領(lǐng)域也正在新的時(shí)代下飛快地更新迭代,信息平臺(tái)的發(fā)展為我們提供了獲取教育信息的一大捷徑。而在信息充足豐富的同時(shí),信息冗余的問(wèn)題也隨之而來(lái)。因此,在快節(jié)奏的生活中,我們希望盡可能快速而全面地捕捉到教育領(lǐng)域中的前沿內(nèi)容。本文的研究對(duì)象主要是在微博平臺(tái)上發(fā)布了與教育相關(guān)內(nèi)容的博主賬號(hào),希望尋找一種途徑能幫助我們?cè)诒姸嗫晒┻x擇的對(duì)象集中篩選出一個(gè)小的博主集合,通過(guò)關(guān)注小集合中這些大V人群的微博信息,提煉出有關(guān)教育的、最新且覆蓋面較廣的信息動(dòng)態(tài)。針對(duì)這個(gè)問(wèn)題,我們首先對(duì)已有的研究和方法進(jìn)行了分析,然后聚焦在比較有效的主題模型上?紤]到教育領(lǐng)域和微博文本的特點(diǎn),我們確定出初步圈定對(duì)象的標(biāo)準(zhǔn),找到合適的樣本;繼而獲取他們的文本數(shù)據(jù),并采用中科院的分詞工具進(jìn)行了數(shù)據(jù)轉(zhuǎn)換和預(yù)處理,編寫好詞庫(kù)和對(duì)應(yīng)編號(hào),使之形成形如“博主序號(hào)-詞語(yǔ)編號(hào)-詞頻”的三元組,使數(shù)據(jù)能直接應(yīng)用到模型中分析。在分析和解決問(wèn)題中,我們針對(duì)數(shù)據(jù)做了三層遞進(jìn)的實(shí)驗(yàn)。首先抽取小樣本分別進(jìn)行AT模型和人工多重審閱的分析方式,觀察能最大程度呈現(xiàn)出相近結(jié)果的篩選方式,將它確定為本文的篩選機(jī)制——首先采用AT模型對(duì)主題進(jìn)行劃分,其次根據(jù)呈現(xiàn)的關(guān)鍵詞對(duì)主題進(jìn)行歸納并根據(jù)他們的比率排序,優(yōu)先關(guān)注在同一順位上出現(xiàn)次數(shù)最多的博主。最后分別針對(duì)采集到的兩個(gè)規(guī)模下的樣本按照制定以上方式進(jìn)行篩選,找到在限定主題中的最優(yōu)關(guān)注用戶集合。本文的研究為類似問(wèn)題的處理提供了范例。也還存在一些可以更進(jìn)一步挖掘的地方,如可以考慮備選博主群的實(shí)時(shí)更新,以滿足話題變遷的可能性,或者博主間關(guān)聯(lián)度分析和博主背景等。這需要持續(xù)開展研究,本研究結(jié)果可以為后續(xù)研究打下基礎(chǔ)。
[Abstract]:The rapid development and widespread popularity of the Internet is deeply affecting the development of society and the dissemination of information. More and more people are accustomed to spread information through Weibo, forums, communities and other network carriers, such as information, events and policies. The field of education is also rapidly updating and iterating in the new era. The development of information platform provides us with a great shortcut to get education information. At the same time, the problem of information redundancy also follows. Therefore, in a fast-paced life, we want to capture the frontier of education as quickly and comprehensively as possible. The research object of this paper is to publish a blog account of education-related content on the Weibo platform, hoping to find a way to help us filter out a small set of bloggers in a large number of optional object sets. By focusing on the Weibo information of these large V groups in a small set, we can extract the latest and more extensive information trends about education. To solve this problem, we first analyze the existing research and methods, and then focus on the more effective topic model. In view of the characteristics of the educational field and Weibo texts, we have determined the criteria for preliminary delineation of objects, found suitable samples, then obtained their text data, and used the segmentation tools of the Chinese Academy of Sciences for data conversion and preprocessing. The thesaurus and corresponding numbering are compiled to form triples in the form of "blog ordinal-word number-word frequency", so that the data can be directly applied to the analysis of the model. In analyzing and solving the problem, we have done a three-layer progressive experiment on the data. Firstly, we select small samples to analyze AT model and artificial multiple reexamination, observe the screening method that can show similar results to the maximum extent, and determine it as the screening mechanism of this paper. Firstly, we use AT model to divide the topic. Secondly, the topics are summed up according to the keywords presented and sorted according to their ratio, giving priority to the bloggers who appear most frequently in the same sequence. Finally, according to the two scale samples collected, we select the optimal user set in the limited topic according to the formulation of the above method. The research in this paper provides an example for the treatment of similar problems. There are also some places that can be further explored, such as the consideration of real-time updating of alternative blogger groups to meet the possibility of topic change, or the analysis of the correlation degree between bloggers and the background of bloggers, and so on. The results of this study can lay a foundation for further research.
【學(xué)位授予單位】:華中師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP393.092

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 王志國(guó);;網(wǎng)絡(luò)輿情監(jiān)控過(guò)程中微博文本分類處理的實(shí)現(xiàn)方法[J];圖書情報(bào)導(dǎo)刊;2016年12期

2 王永貴;張豐田;劉雨詩(shī);肖成龍;;微博中結(jié)合轉(zhuǎn)發(fā)特性的用戶興趣話題挖掘方法[J];計(jì)算機(jī)應(yīng)用研究;2017年07期

3 張丁文;;農(nóng)村淘寶如何進(jìn)行商品篩選[J];同行;2016年14期

4 劉志遠(yuǎn);;人工智能的發(fā)展方向和趨勢(shì)[J];福建理論學(xué)習(xí);2016年06期

5 劉路星;鄭蓉蓉;蔡雪玲;;基于TAM模型的慕課教學(xué)推廣對(duì)策研究[J];安徽工業(yè)大學(xué)學(xué)報(bào)(社會(huì)科學(xué)版);2016年03期

6 裴超;肖詩(shī)斌;江敏;;基于改進(jìn)的LDA主題模型的微博用戶聚類研究[J];情報(bào)理論與實(shí)踐;2016年03期

7 仲兆滿;胡云;李存華;劉宗田;;微博中特定用戶的相似用戶發(fā)現(xiàn)方法[J];計(jì)算機(jī)學(xué)報(bào);2016年04期

8 荀峰;;最短路徑問(wèn)題[J];中學(xué)數(shù)學(xué)教學(xué)參考;2015年Z2期

9 李鳳嶺;朱保平;;基于LDA模型的微博話題發(fā)現(xiàn)技術(shù)研究[J];計(jì)算機(jī)應(yīng)用與軟件;2014年10期

10 米文麗;孫曰昕;;利用概率主題模型的微博熱點(diǎn)話題發(fā)現(xiàn)方法[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2014年08期

相關(guān)碩士學(xué)位論文 前2條

1 曾珂;基于數(shù)據(jù)挖掘的微博用戶興趣群體發(fā)現(xiàn)與分類[D];華中師范大學(xué);2014年

2 鄭希文;互聯(lián)網(wǎng)話題演變與傳播分析技術(shù)研究[D];哈爾濱工程大學(xué);2009年



本文編號(hào):2051730

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2051730.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶4cbc9***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com