天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

微博話題追蹤方法研究與設計

發(fā)布時間:2018-09-14 13:47
【摘要】:摘要:現(xiàn)如今,互聯(lián)網(wǎng)在人們的日常生活中扮演者越來越重要的角色,人們的工作生活都需要互聯(lián)網(wǎng)的支持。隨著互聯(lián)網(wǎng)技術的日益發(fā)展,美國出現(xiàn)了Twitter這樣的信息平臺,而國內也出現(xiàn)了新浪、騰訊微博。借助微博平臺,用戶可以通過140以內的簡短內容來發(fā)布消息,并且可以對感興趣的微博進行轉發(fā)評論。這樣的高效的平臺可以使一條有價值的新聞報道在短短幾分鐘內傳遍全網(wǎng),大大提高了用戶獲取最新消息的效率。然而,在信息爆炸的今天,對于海量的信息,人們可能顯得無所適從。所以現(xiàn)在亟需一種方法來對信息進行整合處理,使得人們能夠根據(jù)自己的需求獲得自己想要的信息。 本文對微博文本表示進行了研究。針對微博篇幅短小、實時性、口語化、原創(chuàng)性的特點,在原有的空間向量模型的基礎上,提出了適合于微博的文本表示方法。該方法在對微博處理之前,先濾除掉字數(shù)小于N的微博,在分詞之后,將所有的實詞作為特征詞。并且針對微博的特點提出了T-TFIDF權重計算方法。這種權重計算方法加重了微博小標題中詞的權重。通過這些改進,使空間向量能夠更好的表示微博文本內容。能夠根據(jù)微博中詞語的重要程度,賦予相應的權重值。 將微博文本轉化到向量空間中,在此基礎上,本文提出了基于K-means聚類的微博自適應話題追蹤方法。這種方法,可以根據(jù)用戶給出的一到四篇微博,對實時采集到的微博語料進行追蹤。經(jīng)過微博與子話題向量集的相似度的比較,判別微博是否屬于該話題。追蹤的同時,對子話題向量集進行動態(tài)調整。具體的方法是,當微博被判定為屬于該話題時,進行候選詞的挑選及詞頻統(tǒng)計。若詞頻大于閾值,則判定出現(xiàn)新的子話題,并通過K-means聚類的方法對追蹤到的微博進行聚類,并根據(jù)聚類結果對子話題向量集進行調整。這樣子話題向量集就能夠隨著追蹤到的微博進行動態(tài)調整,能夠更精確的對話題繼續(xù)追蹤。 此外,本文還對自動文摘在微博中的應用作了研究。首先以子話題向量集作為初始聚類中心對追蹤到的微博進行聚類。再進行句子權重的計算,選出每一類中權重最高的句子作為每一類的文摘句。最后將這些句子按時間順序排序,得到最后的話題文摘。 論文的工作得到了國家自然科學基金(No.61172072,61271308)、北京市自然科學基金(No.4112045)、高等教育博士點基金(No. W11C100030)、北京科技計劃(No.Z121100000312024)和北京市教育委員會學科建設與研究生建設項目等課題的支持。
[Abstract]:Absrtact: nowadays, the Internet plays a more and more important role in people's daily life. With the development of Internet technology, information platforms such as Twitter have emerged in the United States, while Sina and Tencent Weibo have emerged in China. With Weibo's platform, users can post messages through short content up to 140, and can forward comments to interested Weibo. Such an efficient platform enables a valuable news report to spread throughout the network in just a few minutes, greatly improving the efficiency of users' access to the latest news. However, in the information explosion today, for mass information, people may seem at a loss. Therefore, there is an urgent need for a way to integrate information so that people can get the information they want according to their needs. This paper studies the text representation of Weibo. Aiming at Weibo's characteristics of short space, real-time, colloquial and originality, a text representation method suitable for Weibo is put forward on the basis of the original space vector model. Before treating Weibo, the method filters out Weibo whose number of words is less than N, and takes all the notional words as feature words after participle. And according to Weibo's characteristic put forward T-TFIDF weight calculation method. This weight calculation method accentuates the weight of the words in Weibo subheading. Through these improvements, the space vector can better represent Weibo text content. Can according to Weibo in the important degree of words, give the corresponding weight value. On the basis of the transformation of Weibo text into vector space, this paper proposes an adaptive topic tracking method for Weibo based on K-means clustering. This method, according to the user given one to four Weibo, real-time collection of Weibo corpus tracking. By comparing Weibo with the similarity of subtopic vector set, we can judge whether Weibo belongs to this topic. At the same time, the subtopic vector set is dynamically adjusted. The specific method is, when Weibo is judged to belong to the topic, the candidate word selection and word frequency statistics. If the word frequency is greater than the threshold value, the new subtopic is judged, and then the tracked Weibo is clustered by K-means clustering method, and the subtopic vector set is adjusted according to the clustering result. In this way, the topic vector set can be dynamically adjusted with the tracking Weibo, and can continue to track the topic more accurately. In addition, the application of automatic abstracting in Weibo is also studied. Firstly, the subtopic vector set is used as the initial cluster center to cluster Weibo. Then the weight of each sentence is calculated and the sentence with the highest weight is selected as the abstract sentence of each class. Finally, these sentences are sorted in chronological order and the final topic abstracts are obtained. The work of this paper was obtained from the National Natural Science Foundation (No.61172072,61271308), the Beijing Natural Science Foundation (No.4112045) and the higher Education doctoral Foundation (No.). W11C100030), the Beijing Science and Technology Program (No.Z121100000312024) and the discipline Construction and Graduate Program of the Beijing Education Commission.
【學位授予單位】:北京交通大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.092;TP391.1

【參考文獻】

相關期刊論文 前4條

1 張曉艷;王挺;梁曉波;;LDA模型在話題追蹤中的應用[J];計算機科學;2011年S1期

2 席耀一;林琛;李弼程;周杰;許旭陽;;基于語義相似度的論壇話題追蹤方法[J];計算機應用;2011年01期

3 洪宇;張宇;劉挺;李生;;話題檢測與跟蹤的評測及研究綜述[J];中文信息學報;2007年06期

4 范云杰;劉懷亮;;基于維基百科的中文短文本分類研究[J];現(xiàn)代圖書情報技術;2012年03期

,

本文編號:2242884

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2242884.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權申明:資料由用戶62082***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
99视频精品免费视频| 国产成人一区二区三区久久 | 精品人妻一区二区三区免费| 亚洲专区一区中文字幕| 成人欧美一区二区三区视频| 亚洲中文字幕乱码亚洲| 又色又爽又黄的三级视频| 可以在线看的欧美黄片| 老司机精品在线你懂的| 超碰在线播放国产精品| 国产精品不卡免费视频| 国产又大又黄又粗的黄色| 中日韩免费一区二区三区| 欧洲偷拍视频中文字幕| 免费在线观看欧美喷水黄片 | 亚洲国产精品一区二区毛片| 加勒比系列一区二区在线观看| 色丁香之五月婷婷开心| 乱女午夜精品一区二区三区| 粉嫩国产美女国产av| 国自产拍偷拍福利精品图片| 亚洲第一区二区三区女厕偷拍| 精品国模一区二区三区欧美| 丰满人妻一二区二区三区av | 欧美精品亚洲精品日韩精品| 国产免费成人激情视频| 久久少妇诱惑免费视频| 国产又粗又深又猛又爽又黄| 精品一区二区三区乱码中文| 国产精品亚洲一区二区| 热久久这里只有精品视频| 欧美二区视频在线观看| 香蕉网尹人综合在线观看| 九九九热在线免费视频| 中文字幕精品一区二区三| 高潮日韩福利在线观看| 日本女优一色一伦一区二区三区 | 国产一区日韩二区欧美| 日韩不卡一区二区视频| 久久一区内射污污内射亚洲| 不卡中文字幕在线视频|