天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 搜索引擎論文 >

基于hadoop的微博文本分類及商業(yè)詞抽取

發(fā)布時(shí)間:2019-02-19 18:49
【摘要】:隨著計(jì)算機(jī)技術(shù)和網(wǎng)絡(luò)技術(shù)的飛速發(fā)展,微博已經(jīng)普及成為國(guó)內(nèi)的一大新型媒體。微博用戶基數(shù)的迅速膨脹,加上信息的逐級(jí)傳播,與之俱來(lái)的問題是微博數(shù)據(jù)規(guī)模達(dá)到空前水平。面對(duì)微博服務(wù)迅猛發(fā)展中所產(chǎn)生的海量文本數(shù)據(jù),如何準(zhǔn)確有效的從中定向發(fā)現(xiàn)并獲取所需要的有較高商業(yè)價(jià)值的資料和信息,進(jìn)而提高廣告精準(zhǔn)度成為各微博平臺(tái)數(shù)據(jù)研究處理的一大目標(biāo),本文將對(duì)如何有效的從海量微博文本數(shù)據(jù)中發(fā)現(xiàn)和抽取商業(yè)關(guān)鍵詞進(jìn)行研究。為了更有針對(duì)性的進(jìn)行商業(yè)關(guān)鍵詞抽取,首先對(duì)海量微博數(shù)據(jù)進(jìn)行了文本分類,一方面降低了單次數(shù)據(jù)處理的規(guī)模,另一方面對(duì)同類數(shù)據(jù)進(jìn)行處理研究將更有針對(duì)性。再對(duì)各個(gè)類別中文本的關(guān)鍵詞結(jié)合互聯(lián)網(wǎng)搜索引擎中的搜索權(quán)值進(jìn)行調(diào)權(quán),有效提高了微博文本中商業(yè)關(guān)鍵詞抽取的精準(zhǔn)度。 由于微博文本數(shù)據(jù)具有總體數(shù)量多、單條簡(jiǎn)短及內(nèi)容隨意性大等特性,在利用傳統(tǒng)分類方法及商業(yè)信息提取算法對(duì)其進(jìn)行處理時(shí)存在一定的局限性。本文考慮到單條微博文本信息簡(jiǎn)短包含的有效特征少,且內(nèi)容比較口語(yǔ)化的特性,從相似詞及搭配詞方面對(duì)文本的特征詞進(jìn)行了擴(kuò)展,盡量降低特征丟失的可能性;結(jié)合微博文本數(shù)量多及內(nèi)容隨意性大的特性,提出了一種基于特征詞類別分散性及分散程度的微博文本分類方法?紤]到微博自有的轉(zhuǎn)發(fā)數(shù)、評(píng)論數(shù)及海量規(guī)模等因素,本文對(duì)傳統(tǒng)的TF-IDF算法進(jìn)行了相關(guān)改進(jìn),利用hadoop云計(jì)算平臺(tái)并以單個(gè)用戶的所有微博信息作為計(jì)算單元應(yīng)用改進(jìn)的TF-IDF算法,再綜合詞語(yǔ)在互聯(lián)網(wǎng)搜索引擎中的搜索權(quán)值進(jìn)行調(diào)權(quán),實(shí)現(xiàn)了從海量數(shù)據(jù)中對(duì)具有商業(yè)價(jià)值關(guān)鍵詞的有效抽取。實(shí)驗(yàn)表明,該微博分類方法在微博信息的分類中取得了較好的效果,在微博數(shù)據(jù)處理應(yīng)用場(chǎng)景中,綜合了改進(jìn)的TF-IDF權(quán)重及詞語(yǔ)互聯(lián)網(wǎng)搜索權(quán)重的商業(yè)關(guān)鍵詞抽取算法,,具有較好的適用性及商業(yè)效果。而結(jié)合了云計(jì)算平臺(tái)后,一定程度上提高了數(shù)據(jù)處理效率,使得對(duì)海量微博數(shù)據(jù)集上的處理變得可行有效。
[Abstract]:With the rapid development of computer technology and network technology, Weibo has become a new media in China. Weibo's rapid expansion of the user base, coupled with the gradual dissemination of information, comes with the question of the unprecedented scale of Weibo data. In the face of the massive text data produced by Weibo in the rapid development of service, how to accurately and effectively find and obtain the materials and information of high commercial value needed from them, To improve the accuracy of advertising has become a major target of data processing in Weibo platform. This paper will study how to effectively find and extract commercial keywords from the massive Weibo text data. In order to extract business keywords more pertinently, the text classification of massive Weibo data is carried out first, which reduces the scale of single data processing on the one hand, and studies the same data processing on the other hand, it will be more targeted. Then the key words of each type of Chinese text combined with the search weight value in the Internet search engine are adjusted to effectively improve the accuracy of business keyword extraction in Weibo text. Because Weibo text data has many characteristics, such as large quantity, short and random content, there are some limitations in using traditional classification method and business information extraction algorithm to process Weibo text data. Considering that there are few effective features and colloquial features in a single Weibo text, this paper extends the feature words of the text from the aspects of similar words and collocation words to reduce the possibility of feature loss as far as possible. According to the characteristics of Weibo's large quantity of text and randomness of content, this paper puts forward a new text categorization method of Weibo based on the dispersion and dispersion of feature word categories. Considering the factors of Weibo's own forwarding number, comment number and massive scale, this paper improves the traditional TF-IDF algorithm. Using hadoop cloud computing platform and taking all Weibo information of individual user as computing unit, the improved TF-IDF algorithm is applied, and then the search weight value of words in Internet search engine is synthesized to adjust the weight. The effective extraction of commercial value keywords from massive data is realized. The experiment shows that the Weibo classification method has achieved good results in the classification of Weibo information. In the data processing and application scene of Weibo, the improved business keyword extraction algorithm of TF-IDF weight and word Internet search weight is integrated. It has good applicability and commercial effect. Combined with cloud computing platform, the efficiency of data processing is improved to a certain extent, which makes it feasible and effective to deal with the massive Weibo data set.
【學(xué)位授予單位】:杭州電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP393.092;TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 李華;趙文偉;;微博客:圖書館的下一個(gè)網(wǎng)絡(luò)新貴工具[J];圖書與情報(bào);2009年04期

2 谷瓊;朱莉;蔡之華;袁紅星;;基于決策樹技術(shù)的高校研究生信息庫(kù)數(shù)據(jù)挖掘研究[J];電子技術(shù)應(yīng)用;2006年01期

3 李靜梅,孫麗華,張巧榮,張春生;一種文本處理中的樸素貝葉斯分類器[J];哈爾濱工程大學(xué)學(xué)報(bào);2003年01期

4 張寧,賈自艷,史忠植;使用KNN算法的文本分類[J];計(jì)算機(jī)工程;2005年08期

5 洪家榮,丁明峰,李星原,王麗薇;一種新的決策樹歸納學(xué)習(xí)算法[J];計(jì)算機(jī)學(xué)報(bào);1995年06期

6 劉非凡;趙軍;呂碧波;徐波;于浩;夏迎炬;;面向商務(wù)信息抽取的產(chǎn)品命名實(shí)體識(shí)別研究[J];中文信息學(xué)報(bào);2006年01期

7 吳軍,王作英,禹鋒,王俠;漢語(yǔ)語(yǔ)料的自動(dòng)分類[J];中文信息學(xué)報(bào);1995年04期

8 劉開瑛,薛翠芳,鄭家恒,周曉強(qiáng);中文文本中抽取特征信息的區(qū)域與技術(shù)[J];中文信息學(xué)報(bào);1998年02期

9 于瀟;;Web2.0時(shí)代下微博廣告?zhèn)鞑ゲ呗苑治鯷J];新聞界;2011年03期

10 曹玉;;2010微博營(yíng)銷10案例[J];科技與企業(yè);2011年03期



本文編號(hào):2426761

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2426761.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶b9052***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
国产又粗又猛又长又黄视频| 中文字幕日韩一区二区不卡| 色哟哟精品一区二区三区| 日本 一区二区 在线| 国产丝袜极品黑色高跟鞋| 绝望的校花花间淫事2| 国产av一区二区三区久久不卡| 国产精品午夜小视频观看| 色婷婷亚洲精品综合网| 中文字幕精品少妇人妻| 日韩在线视频精品视频| 日韩精品第一区二区三区| 亚洲一区二区三区四区性色av| 亚洲熟女精品一区二区成人| 欧美精品一区二区三区白虎| 国产成人午夜福利片片| 国产一级精品色特级色国产| 亚洲欧美日韩综合在线成成| 中文字幕亚洲人妻在线视频| 亚洲国产成人精品一区刚刚| 熟女少妇久久一区二区三区| 婷婷激情四射在线观看视频| 精品国产av一区二区三区不卡蜜 | 中文字幕有码视频熟女| 乱女午夜精品一区二区三区| 亚洲视频一区二区久久久| 激情内射亚洲一区二区三区| 91福利视频日本免费看看| 亚洲熟妇av一区二区三区色堂| 九七人妻一区二区三区| 亚洲欧美日韩在线中文字幕| 女人高潮被爽到呻吟在线观看| 亚洲一区二区欧美激情| 日韩欧美黄色一级视频| 国内精品伊人久久久av高清| 男人和女人干逼的视频| 国产一区二区三区不卡| 国产内射一级一片内射高清| 欧美日韩中国性生活视频| 日韩一区二区三区高清在| 中文字幕高清免费日韩视频|