基于語言網(wǎng)絡(luò)的微博特征發(fā)現(xiàn)和話題關(guān)鍵詞提取
本文選題:微博 + 復(fù)雜網(wǎng)絡(luò)。 參考:《杭州電子科技大學(xué)》2014年碩士論文
【摘要】:微博是近年來出現(xiàn)的一種網(wǎng)絡(luò)新媒體,有著傳播迅速,使用方便等優(yōu)點(diǎn)。隨著互聯(lián)網(wǎng)技術(shù)的蓬勃發(fā)展,特別是手機(jī)互聯(lián)網(wǎng)用戶的迅速增加,每天生成的微博內(nèi)容越來越多,微博內(nèi)容的研究也變得日趨重要。本文首先基于海量微博內(nèi)容語料構(gòu)建了詞同現(xiàn)網(wǎng)絡(luò)來做微博語體特征發(fā)現(xiàn),然后又針對(duì)話題相關(guān)微博內(nèi)容語料構(gòu)建了話題關(guān)鍵詞提取網(wǎng)絡(luò),通過對(duì)構(gòu)建的語言網(wǎng)絡(luò)進(jìn)行分析和研究,提出了新的微博內(nèi)容研究和話題關(guān)鍵詞提取方法,并獲得了滿意的實(shí)驗(yàn)結(jié)果。 首先,本文對(duì)語言網(wǎng)絡(luò)和微博內(nèi)容研究的現(xiàn)狀和發(fā)展進(jìn)行了簡(jiǎn)要的回顧。文中對(duì)語言網(wǎng)絡(luò)研究的背景知識(shí)和相關(guān)技術(shù)做了分析,接著對(duì)微博內(nèi)容研究的方法進(jìn)行了總結(jié),主要有兩個(gè)研究方向,分別是從語言學(xué)角度分析微博語體特點(diǎn)和從文本挖掘角度獲取微博信息。 其次,本文提出了基于語言網(wǎng)絡(luò)的微博特征發(fā)現(xiàn)方法。語言網(wǎng)絡(luò)分析方法一般通過對(duì)語言形式的定量研究來認(rèn)識(shí)和理解語言網(wǎng)絡(luò)的共同的拓?fù)浣Y(jié)構(gòu)和演化的一般規(guī)律。本文提出將語言網(wǎng)絡(luò)分析運(yùn)用到微博這種網(wǎng)絡(luò)語言中,通過分析微博內(nèi)容構(gòu)建的語言網(wǎng)絡(luò)的復(fù)雜網(wǎng)絡(luò)特性,來從整體上發(fā)現(xiàn)微博內(nèi)容的語言學(xué)特征。 再次,本文在總結(jié)了現(xiàn)有的微博關(guān)鍵詞提取方法優(yōu)缺點(diǎn)的基礎(chǔ)上,提出了一種基于話題語言網(wǎng)絡(luò)的關(guān)鍵詞提取方法。首先對(duì)話題相關(guān)的微博內(nèi)容構(gòu)建語言網(wǎng)絡(luò),然后使用復(fù)雜網(wǎng)絡(luò)中小世界特性中的兩種中心性參數(shù)-介數(shù)中心性、接近中心性和度中心性相結(jié)合來作為詞語的特征權(quán)重,接著計(jì)算詞語節(jié)點(diǎn)特征權(quán)重參數(shù)值,,最后根據(jù)詞語節(jié)點(diǎn)參數(shù)值的大小來選擇話題關(guān)鍵詞。 最后,使用大規(guī)模微博語料和話題相關(guān)語料對(duì)本文提出的基于語言網(wǎng)絡(luò)的微博特征發(fā)現(xiàn)和話題關(guān)鍵詞提取算法進(jìn)行了實(shí)驗(yàn),并對(duì)測(cè)試結(jié)果進(jìn)行了分析。實(shí)驗(yàn)結(jié)果表明,本文的算法對(duì)研究微博內(nèi)容和提取微博話題關(guān)鍵詞具有一定的可用性。本文最后對(duì)論文所做的工作進(jìn)行了總結(jié)和評(píng)述,提煉了微博語言網(wǎng)絡(luò)和話題關(guān)鍵詞提取值得繼續(xù)研究的若干問題,為以后的研究指明了方向。
[Abstract]:Weibo is a new network media in recent years, which has the advantages of rapid dissemination and convenient use. With the rapid development of Internet technology, especially the rapid increase of mobile Internet users, more and more Weibo content is generated every day, and the research of Weibo content becomes more and more important. In this paper, we first build a word cooccurrence network based on massive Weibo content corpus to do Weibo stylistic feature discovery, and then construct a topic keyword extraction network for topic related Weibo content corpus. Based on the analysis and research of the language network, a new method of Weibo content research and topic keyword extraction is proposed, and satisfactory experimental results are obtained. Firstly, this paper briefly reviews the current situation and development of language network and Weibo content research. In this paper, the background knowledge and related technologies of language network research are analyzed, and then the methods of Weibo content research are summarized, there are two main research directions. It analyzes the features of Weibo style from the linguistic point of view and obtains Weibo information from the angle of text mining. Secondly, this paper proposes a Weibo feature discovery method based on language network. Language network analysis methods generally understand and understand the common topological structure of language network and the general law of evolution through the quantitative study of language forms. In this paper, language network analysis is applied to Weibo, which is a network language. By analyzing the complex network characteristics of language network constructed by Weibo content, the linguistic features of Weibo content can be found as a whole. Thirdly, on the basis of summarizing the advantages and disadvantages of existing Weibo keyword extraction methods, this paper proposes a keyword extraction method based on topic language network. Firstly, the language network is constructed for the topic related Weibo content, and then two kinds of central parameters in the small world characteristic of the complex network are used as the feature weight of the word, which is the combination of the centricity of the medium, the close centrality and the degree centrality. Then the feature weight parameter value of the word node is calculated and the topic key words are selected according to the size of the word node parameter value. Finally, we use large-scale Weibo corpus and topic related corpus to test the algorithm of Weibo feature discovery and topic keyword extraction based on language network, and analyze the test results. Experimental results show that the proposed algorithm is useful for studying Weibo content and extracting Weibo topic keywords. In the end, this paper summarizes and comments the work done in this paper, abstracts some problems worth further study on Weibo language network and topic keyword extraction, and points out the direction for future research.
【學(xué)位授予單位】:杭州電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.092;TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 王建偉;榮莉莉;;基于復(fù)雜網(wǎng)絡(luò)理論的中文字字網(wǎng)絡(luò)的實(shí)證研究[J];大連海事大學(xué)學(xué)報(bào);2008年04期
2 唐璐;張永光;付雪;;語義網(wǎng)絡(luò)的結(jié)構(gòu):我們?cè)鯓訉W(xué)習(xí)語義知識(shí)(英文)[J];Journal of Southeast University(English Edition);2006年03期
3 劉知遠(yuǎn);鄭亞斌;孫茂松;;漢語依存句法網(wǎng)絡(luò)的復(fù)雜網(wǎng)絡(luò)性質(zhì)[J];復(fù)雜系統(tǒng)與復(fù)雜性科學(xué);2008年02期
4 韋洛霞;李勇;康世勇;羅詩裕;;漢語詞組網(wǎng)的組織結(jié)構(gòu)與無標(biāo)度特性[J];科學(xué)通報(bào);2005年15期
5 劉海濤;;語言復(fù)雜網(wǎng)絡(luò)的聚類研究[J];科學(xué)通報(bào);2010年Z2期
6 陳芯瑩;劉海濤;;漢語句法網(wǎng)絡(luò)的中心節(jié)點(diǎn)研究[J];科學(xué)通報(bào);2011年10期
7 劉知遠(yuǎn);孫茂松;;漢語詞同現(xiàn)網(wǎng)絡(luò)的小世界效應(yīng)和無標(biāo)度特性[J];中文信息學(xué)報(bào);2007年06期
8 彭澤映;俞曉明;許洪波;劉春陽;;大規(guī)模短文本的不完全聚類[J];中文信息學(xué)報(bào);2011年01期
9 楊鈐雯;寇紀(jì)淞;陳富贊;李敏強(qiáng);;基于本體的語義網(wǎng)絡(luò)會(huì)話聚類和可視化方法[J];模式識(shí)別與人工智能;2011年01期
10 姜珍婷;周凱;;從微博看現(xiàn)代漢語新變化[J];江西科技師范學(xué)院學(xué)報(bào);2010年04期
本文編號(hào):1965429
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1965429.html