天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 軟件論文 >

基于主題模型的社交媒體主題挖掘和文獻(xiàn)影響力預(yù)測(cè)分析

發(fā)布時(shí)間:2018-04-08 14:11

  本文選題:主題模型 切入點(diǎn):社交媒體 出處:《西南大學(xué)》2017年碩士論文


【摘要】:Web2.0和互聯(lián)網(wǎng)技術(shù)成熟與進(jìn)步促使用戶產(chǎn)生內(nèi)容逐漸成為用戶使用互聯(lián)網(wǎng)的全新方式。用戶作為互聯(lián)網(wǎng)資源的使用者,同時(shí)也作為互聯(lián)網(wǎng)資源的創(chuàng)造者,讓人與互聯(lián)網(wǎng)的交互模式得到升華。人們傾向于在網(wǎng)絡(luò)平臺(tái)分享原創(chuàng)的個(gè)性化的建議,意見領(lǐng)袖、專家等也樂于分享專業(yè)的內(nèi)容,為相關(guān)領(lǐng)域貢獻(xiàn)智慧。比如,普通用戶通常在Twitter等社交媒體平臺(tái)分享自己的生活,專家將科研文獻(xiàn)發(fā)布于學(xué)術(shù)平臺(tái)供學(xué)習(xí)和閱讀。這兩者的內(nèi)容是都是文本,但是在文本挖掘方法和應(yīng)用探索方面卻大相徑庭。面臨的研究挑戰(zhàn)都是如何從海量數(shù)據(jù)高效準(zhǔn)確找到不同的用戶所需要的信息。本文的主要工作是利用主題模型進(jìn)行社交媒體短文本主題挖掘和文獻(xiàn)的未來影響力預(yù)測(cè)研究。主題模型的主要思想是借助于文本內(nèi)容的潛在主題,挖掘出文檔與主題,以及主題與單詞之間的關(guān)系,或者利用兩者之間的關(guān)系來指導(dǎo)模型的結(jié)果;诓煌膱(chǎng)景構(gòu)造合適的主題模型可以實(shí)現(xiàn)不同的目的。過去的方法中,由于Twitter文本長(zhǎng)度短、稀疏,用語不規(guī)范等特征導(dǎo)致傳統(tǒng)的LDA,PLSA對(duì)這種文本環(huán)境無法進(jìn)行有效的主題分析。值得一提的是,相比傳統(tǒng)的基于引用統(tǒng)計(jì)的方法進(jìn)行文獻(xiàn)影響力評(píng)估方法而言,本文引入主題模型的語義分析方法應(yīng)用于文獻(xiàn)未來影響力預(yù)測(cè)是新穎并且具有挑戰(zhàn)的想法。針對(duì)傳統(tǒng)方法的不足、不同應(yīng)用場(chǎng)景的特殊性以及主題模型的對(duì)文本挖掘的效果。本文重點(diǎn)進(jìn)行了以下兩個(gè)研究:(1)基于社交媒體短文本的主題挖掘分析(2)基于主語義分析的文獻(xiàn)影響力預(yù)測(cè)。本文分別以社交媒體短文本,文獻(xiàn)長(zhǎng)文本為基礎(chǔ),分別用Twitter中的時(shí)間和標(biāo)簽屬性來改進(jìn)和擴(kuò)展LDA模型,通過讀文獻(xiàn)進(jìn)行特征詞/詞組的定義,將文章的創(chuàng)新型與LDA分析出的重要性結(jié)合起來進(jìn)行影響力預(yù)測(cè)研究。為了研究社交媒體的短文本環(huán)境下主題挖掘情況,本文提出了新的主題模型HTTM,該模型先后利用Twitter消息(推文)中時(shí)間和標(biāo)簽信息為傳統(tǒng)的LDA增加了新的“標(biāo)簽-時(shí)間”層次來提高主題的表達(dá)性,推文聚類效果以及主題在時(shí)間序列下的演化效果。最后的實(shí)驗(yàn)效果證明了HTTM模型在以上幾個(gè)方面的有效性。針對(duì)文獻(xiàn)影響力預(yù)測(cè)研究,本文提出了一個(gè)TTRM模型來預(yù)測(cè)文獻(xiàn)的未來影響力。該模型以文章特征詞/詞對(duì)為鏈接,分別將文獻(xiàn)發(fā)表的時(shí)間和文章本身內(nèi)容將進(jìn)行創(chuàng)新性和重要性建模。其中對(duì)于重要性建模過程中創(chuàng)新地使用了主題模型的方法,分析文章在當(dāng)前文獻(xiàn)集中的重要程度。試驗(yàn)中,使用文獻(xiàn)數(shù)據(jù)集,證實(shí)了TTRM模型在文獻(xiàn)排序和影響力預(yù)測(cè)擬合上的有效性。實(shí)驗(yàn)中對(duì)比使用了基于引用的PageRank模型,和以TF-IDF作為文章重要性建模方法的MRR-ranking模型,TTRM在文獻(xiàn)排名和文獻(xiàn)影響力預(yù)測(cè)方面都有一定的優(yōu)勢(shì)。并且證明了我們的假設(shè),即文獻(xiàn)內(nèi)容中某些詞對(duì)于文章創(chuàng)新性具有貢獻(xiàn)作用,和發(fā)現(xiàn)新文獻(xiàn)具有一定的作用。
[Abstract]:Web2.0 and Internet technology is mature and progress to the user generated content has gradually become a new way for users to use the Internet as the Internet users. Users of resources, at the same time as the Internet resource creators, let the interactive mode with the Internet and the soul. People are inclined to the network platform and share original personalized advice, opinion leaders, experts, etc. is willing to share the professional content, for the relevant contribution in the field of intelligence. For example, ordinary users often share their lives in Twitter and other social media platforms, experts will be released in the scientific literature academic platform for learning and reading. The content is the text, but in text mining methods and application exploration but be quite different. The challenge is how to efficiently and accurately find the vast amounts of data from different information needed by the user. The main work of this paper is Social media short text mining and utilization of literature topic model future influence prediction research. The main idea of topic models is based on the underlying theme of the text content, dig out the document with the subject, and the relationship between the theme and the word, or the relationship between the model results. To guide the topic model to construct different scenes suitable can achieve different purposes. Based on the past method, because the Twitter length is short, sparse, terms are not standardized characteristics due to the traditional LDA, PLSA on the environment can not conduct effective text topic analysis. It is worth mentioning that, compared to the traditional literature influence evaluation methods cited statistics based on, this paper introduces the semantic topic model analysis method is applied to predict the future impact of literature is a novel and challenging the traditional idea. Method, different application scenarios and the particularity of topic model of text mining results. This paper focuses on the following two research: (1) social media short text mining analysis based on the theme (2) forecast subject semantic analysis literature. This paper respectively influence based on the social media in short text, literature long the text is based, with both the time and the Twitter attribute to the improvement and expansion of LDA model, the definition of feature words / phrases by reading the literature, the innovation of this paper and LDA analysis of the importance of combining research. In order to predict the influence of short text environment social media research under the topic mining, is proposed in this paper. The HTTM theme of a new model, this model has the use of the Twitter message (tweets) in time and tag information for traditional LDA has added a new "label - time" to improve the level of the expression of the theme The effect of evolution, tweets and theme clustering effect in the time series. Finally, the experimental results demonstrate the effectiveness of the HTTM model in the above aspects. According to the prediction of the influence of the literature, this paper proposes a TTRM model to predict the future of literature influence. In this model, the characteristics of word / word pairs link the publication time and the content itself will be innovative and important. The importance of modeling method in the modeling process of innovation in the use of the topic model, analysis the importance in the current literature. The centralized test, using literature data sets, confirmed the effectiveness of TTRM model in the literature sorting and impact forecast fitting. Experimental comparison using the PageRank model and MRR-ranking model based on a reference to the importance of modeling method with TF-IDF as the TTRM, and the ranking in the literature It has certain advantages in predicting influence. It also proves our assumption that some words in literature content contribute to the innovation of articles, and it has a certain effect in finding new literatures.

【學(xué)位授予單位】:西南大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前4條

1 薛素芝;魯燃;任圓圓;;基于速度增長(zhǎng)的微博熱點(diǎn)話題發(fā)現(xiàn)[J];計(jì)算機(jī)應(yīng)用研究;2013年09期

2 劉大有;薛銳青;齊紅;;基于作者權(quán)威值的論文價(jià)值預(yù)測(cè)算法[J];自動(dòng)化學(xué)報(bào);2012年10期

3 陳輝林;夏道勛;;基于CART決策樹數(shù)據(jù)挖掘算法的應(yīng)用研究[J];煤炭技術(shù);2011年10期

4 袁志堅(jiān);王樂;田李;賈焰;楊樹強(qiáng);;數(shù)據(jù)流突發(fā)檢測(cè)研究與進(jìn)展[J];計(jì)算機(jī)工程與應(yīng)用;2008年21期

相關(guān)博士學(xué)位論文 前1條

1 張金松;基于引文上下文分析的文獻(xiàn)檢索技術(shù)研究[D];大連海事大學(xué);2013年

相關(guān)碩士學(xué)位論文 前1條

1 王晶;基于社交媒體的熱點(diǎn)主題挖掘及主題演化分析[D];西南大學(xué);2016年

,

本文編號(hào):1721960

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1721960.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶7a716***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
国产精品欧美一级免费| 91麻豆精品欧美一区| 亚洲黄片在线免费小视频| 成人精品日韩专区在线观看 | 视频在线免费观看你懂的| 少妇特黄av一区二区三区| 国产传媒一区二区三区| 欧美一级特黄大片做受大屁股| 国产老熟女乱子人伦视频| 色一情一伦一区二区三| 亚洲一级在线免费观看| 婷婷开心五月亚洲综合| 九九热在线免费在线观看| 男人把女人操得嗷嗷叫| 亚洲国产一区精品一区二区三区色| 黄色国产一区二区三区| 欧美尤物在线视频91| 国产欧美日韩综合精品二区| 99久久无色码中文字幕免费| 久久99亚洲小姐精品综合| 夜色福利久久精品福利| 日本深夜福利视频在线| 久久国产人妻一区二区免费| 日韩在线一区中文字幕| 精品久久综合日本欧美| 婷婷一区二区三区四区| 一区二区三区日本高清| 亚洲精品中文字幕一二三| 少妇一区二区三区精品| 国产在线视频好看不卡| 欧美激情视频一区二区三区| 欧美日韩国产黑人一区| 日韩不卡一区二区视频| 富婆又大又白又丰满又紧又硬| 欧美精品一区二区三区白虎| 好东西一起分享老鸭窝| 国产精品亚洲一区二区| 国产综合香蕉五月婷在线| 精品亚洲一区二区三区w竹菊| 日本人妻中出在线观看| 亚洲一区二区三区三州|