天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

基于微博的金融領(lǐng)域的熱點(diǎn)話題的發(fā)現(xiàn)與分析

發(fā)布時(shí)間:2018-05-17 22:04

  本文選題:TF-IWF-IDF + Word2Vec; 參考:《北京郵電大學(xué)》2016年碩士論文


【摘要】:微博,一個(gè)集社交娛樂、新聞來源、信息發(fā)布等于一身的社交平臺,擁有龐大的用戶群體,而網(wǎng)絡(luò)炒股和金融理財(cái)?shù)挠脩粢?guī)模也大幅提升。微博每天產(chǎn)生大量的信息數(shù)據(jù),涉及行業(yè)多、覆蓋的范圍廣,其信息的時(shí)效性和權(quán)威性較高,是股民和理財(cái)者特別關(guān)注微博的重要原因。如何從這些大量微博數(shù)據(jù)當(dāng)中發(fā)現(xiàn)股民和理財(cái)者關(guān)注的金融領(lǐng)域內(nèi)熱點(diǎn)話題,已成為各大證券公司和金融理財(cái)公司關(guān)注的熱點(diǎn)。本文主要是解決上述問題,即從微博中提取金融領(lǐng)域內(nèi)的熱點(diǎn)話題。本文首先對話題發(fā)現(xiàn)與追蹤相關(guān)技術(shù)進(jìn)行了介紹,以及話題發(fā)現(xiàn)聚類算法的相關(guān)技術(shù)介紹。接著對聚類算法進(jìn)行分析,選擇了 Single-Pass算法作為文本聚類算法,并提出了改進(jìn)的算法。為了改進(jìn)TF-IDF中IDF是定值、不能隨數(shù)據(jù)集動(dòng)態(tài)變化的問題,提出了基于詞性位置的增量TF-IWF-IDF權(quán)重計(jì)算方法。傳統(tǒng)的特征向量忽略了特征項(xiàng)語義和上下文環(huán)境的考慮,因此在文中提出了基于Word2Vec的增量TF-IWF-IDF特征向量表示方法。本文針對Single-Pass算法存在的問題,提出了基于多話題中心的二次聚類算法。針對微博數(shù)據(jù),經(jīng)實(shí)驗(yàn)對比分析,本文中熱點(diǎn)話題發(fā)現(xiàn)的效果比未改進(jìn)的Single-Pass算法提升了近10%左右。最后本文基于上述聚類算法來設(shè)計(jì)和實(shí)現(xiàn)了金融熱點(diǎn)話題原型系統(tǒng),在分析功能需求的基礎(chǔ)上,詳細(xì)介紹了原型系統(tǒng)的系統(tǒng)架構(gòu)和功能模塊的設(shè)計(jì)與實(shí)現(xiàn),并給出了原型系統(tǒng)效果圖。
[Abstract]:Weibo, a social-networking platform with social entertainment, news sources and information distribution, has a huge user base, while the number of Internet speculators and financial users has soared. Weibo produces a large amount of information data every day, involving many industries, covering a wide range, its information timeliness and authority is high, which is an important reason for shareholders and financial managers to pay special attention to Weibo. How to find the hot topic in the financial field from the large amount of Weibo data has become the focus of attention of the major securities companies and financial management companies. This paper is mainly to solve the above problems, that is, to extract the hot topics in the financial field from Weibo. In this paper, we first introduce the related technologies of topic discovery and tracking, as well as the related technology of topic discovery clustering algorithm. Then the clustering algorithm is analyzed, and the Single-Pass algorithm is selected as the text clustering algorithm, and an improved algorithm is proposed. In order to improve the problem that IDF is a fixed value in TF-IDF and cannot change dynamically with the data set, an incremental TF-IWF-IDF weight calculation method based on part of speech position is proposed. Traditional feature vectors ignore the semantic and contextual considerations of feature items, so an incremental TF-IWF-IDF feature vector representation method based on Word2Vec is proposed in this paper. In order to solve the problem of Single-Pass algorithm, this paper proposes a multi-topic center based quadratic clustering algorithm. According to the Weibo data, the experimental results show that the effect of hot topic discovery in this paper is about 10% higher than that of the unimproved Single-Pass algorithm. Finally, this paper designs and implements the financial hot topic prototype system based on the above clustering algorithm. Based on the analysis of the functional requirements, the system architecture and the design and implementation of the function module of the prototype system are introduced in detail. The prototype system effect diagram is also given.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP391.1;TP393.092

【參考文獻(xiàn)】

相關(guān)期刊論文 前6條

1 格桑多吉;喬少杰;韓楠;張小松;楊燕;元昌安;康健;;基于Single-Pass的網(wǎng)絡(luò)輿情熱點(diǎn)發(fā)現(xiàn)算法[J];電子科技大學(xué)學(xué)報(bào);2015年04期

2 馬雯雯;魏文晗;鄧一貴;;基于隱含語義分析的微博話題發(fā)現(xiàn)方法[J];計(jì)算機(jī)工程與應(yīng)用;2014年01期

3 殷風(fēng)景;肖衛(wèi)東;葛斌;李芳芳;;一種面向網(wǎng)絡(luò)話題發(fā)現(xiàn)的增量文本聚類算法[J];計(jì)算機(jī)應(yīng)用研究;2011年01期

4 稅儀冬;瞿有利;黃厚寬;;周期分類和Single-Pass聚類相結(jié)合的話題識別與跟蹤方法[J];北京交通大學(xué)學(xué)報(bào);2009年05期

5 楊燕;靳蕃;KAMEL Mohamed;;聚類有效性評價(jià)綜述[J];計(jì)算機(jī)應(yīng)用研究;2008年06期

6 張白妮,駱嘉偉,湯德佑;基于比對相似度動(dòng)態(tài)矩陣聚類算法在基因序列中的應(yīng)用[J];計(jì)算機(jī)應(yīng)用;2004年08期

相關(guān)會議論文 前1條

1 李恒訓(xùn);張華平;秦鵬;于滿泉;劉金剛;;基于主題詞的網(wǎng)絡(luò)熱點(diǎn)話題發(fā)現(xiàn)[A];第五屆全國信息檢索學(xué)術(shù)會議論文集[C];2009年

,

本文編號:1903049

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1903049.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶80f70***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com