微博用戶屬性認證研究與應(yīng)用
發(fā)布時間:2018-10-22 19:19
【摘要】:隨著互聯(lián)網(wǎng)的飛速發(fā)展,互聯(lián)網(wǎng)已經(jīng)滲透到每個人的生活。而微博這樣的社交網(wǎng)絡(luò)平臺更是風(fēng)靡社會,迅速成為了人們聊天交互、信息獲取的重要方式。國內(nèi)外主流微博平臺都積累了數(shù)以億計的用戶,擁有豐富的用戶信息,這些信息中蘊含著巨大的商業(yè)價值。如何正確地應(yīng)用這些數(shù)據(jù)來發(fā)現(xiàn)潛在的重要的知識,并以此更加了解用戶從而實現(xiàn)可觀的價值是十分重要的;谝陨媳尘,本文主要對微博用戶的屬性認證展開研究。本文對微博用戶的多種重要屬性的認證算法進行了研究。文章提出了一種新的基于詞向量距離的微博用戶職業(yè)屬性認證算法,通過衡量用戶發(fā)布的微博內(nèi)容詞匯與職業(yè)詞匯之間的距離來預(yù)測用戶所屬的職業(yè),并利用Word2vec這種基于神經(jīng)網(wǎng)絡(luò)的詞向量轉(zhuǎn)化工具以提高預(yù)測的準(zhǔn)確率;谡鎸嵨⒉┯脩魯(shù)據(jù)的實驗表明,該算法的準(zhǔn)確率可以達到近80%。同時,文章針對用戶的另一種社會屬性——用戶角色的分析,進行了算法研究。文章提出一種用戶角色分析的綜合評價指標(biāo)U-Score,該指標(biāo)由多種不同類型的層次化指標(biāo)構(gòu)成,綜合考慮了用戶的影響力、活躍度、中心性、可信度和重要性五種不同因素,并利用層次分析法來計算不同特征的權(quán)重。實驗結(jié)果表明,這種方法對于微博用戶的角色分析是可行的且能夠量化用戶的多種指標(biāo)。文章同時也對用戶的性別屬性認證進行了研究,根據(jù)采集到的微博用戶數(shù)據(jù)的特性,文章綜合了三種不同類型的用戶特征來對用戶性別進行分類,分類準(zhǔn)確率可以達到90%以上。同時,本文利用以上提出的微博用戶屬性認證算法,綜合開發(fā)了一個微博用戶屬性認證系統(tǒng)。該系統(tǒng)包括數(shù)據(jù)采集、數(shù)據(jù)存儲、數(shù)據(jù)挖掘認證三大模塊。在數(shù)據(jù)挖掘認證模塊中,系統(tǒng)實現(xiàn)了以上三種用戶屬性認證算法,可以實現(xiàn)對用戶屬性進行認證的目標(biāo)。
[Abstract]:With the rapid development of the Internet, the Internet has penetrated into everyone's life. The social network platform such as Weibo is popular in society and has become an important way for people to chat and exchange information quickly. The mainstream Weibo platform at home and abroad has accumulated hundreds of millions of users and has abundant user information, which contains enormous commercial value. It is very important to correctly apply these data to discover the potentially important knowledge, and thus to understand the user better and realize considerable value. Based on the above background, this paper mainly studies the attribute authentication of Weibo users. In this paper, the authentication algorithm of Weibo user's important attributes is studied. In this paper, a new occupational attribute authentication algorithm for Weibo users based on word vector distance is proposed to predict the occupation of the user by measuring the distance between the occupational vocabulary and the Weibo content vocabulary published by the user. Word2vec, a word vector transformation tool based on neural network, is used to improve the accuracy of prediction. Experiments based on real Weibo user data show that the accuracy of the algorithm can reach nearly 80%. At the same time, this paper studies the algorithm of user's role, which is another kind of social attribute. In this paper, a comprehensive evaluation index U-Scorefor user role analysis is proposed. The index is composed of many different types of hierarchical indexes, and five different factors, namely, influence, activity, centrality, credibility and importance of users, are taken into account. Analytic hierarchy process (AHP) is used to calculate the weights of different features. The experimental results show that this method is feasible for Weibo user's role analysis and can quantify user's multiple indexes. At the same time, the paper also studies the gender attribute authentication of users. According to the characteristics of Weibo user data collected, the paper synthesizes three different types of user characteristics to classify the gender of users, and the classification accuracy can reach more than 90%. At the same time, using Weibo user attribute authentication algorithm, we develop a user attribute authentication system. The system includes three modules: data acquisition, data storage and data mining authentication. In the data mining authentication module, the system implements the above three user attribute authentication algorithms, which can achieve the goal of user attribute authentication.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP393.092;TP311.13
本文編號:2288076
[Abstract]:With the rapid development of the Internet, the Internet has penetrated into everyone's life. The social network platform such as Weibo is popular in society and has become an important way for people to chat and exchange information quickly. The mainstream Weibo platform at home and abroad has accumulated hundreds of millions of users and has abundant user information, which contains enormous commercial value. It is very important to correctly apply these data to discover the potentially important knowledge, and thus to understand the user better and realize considerable value. Based on the above background, this paper mainly studies the attribute authentication of Weibo users. In this paper, the authentication algorithm of Weibo user's important attributes is studied. In this paper, a new occupational attribute authentication algorithm for Weibo users based on word vector distance is proposed to predict the occupation of the user by measuring the distance between the occupational vocabulary and the Weibo content vocabulary published by the user. Word2vec, a word vector transformation tool based on neural network, is used to improve the accuracy of prediction. Experiments based on real Weibo user data show that the accuracy of the algorithm can reach nearly 80%. At the same time, this paper studies the algorithm of user's role, which is another kind of social attribute. In this paper, a comprehensive evaluation index U-Scorefor user role analysis is proposed. The index is composed of many different types of hierarchical indexes, and five different factors, namely, influence, activity, centrality, credibility and importance of users, are taken into account. Analytic hierarchy process (AHP) is used to calculate the weights of different features. The experimental results show that this method is feasible for Weibo user's role analysis and can quantify user's multiple indexes. At the same time, the paper also studies the gender attribute authentication of users. According to the characteristics of Weibo user data collected, the paper synthesizes three different types of user characteristics to classify the gender of users, and the classification accuracy can reach more than 90%. At the same time, using Weibo user attribute authentication algorithm, we develop a user attribute authentication system. The system includes three modules: data acquisition, data storage and data mining authentication. In the data mining authentication module, the system implements the above three user attribute authentication algorithms, which can achieve the goal of user attribute authentication.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP393.092;TP311.13
【參考文獻】
相關(guān)期刊論文 前3條
1 鄭文超;徐鵬;;利用word2vec對中文詞進行聚類的研究[J];軟件;2013年12期
2 趙文兵;朱慶華;吳克文;黃奇;;微博客用戶特性及動機分析——以和訊財經(jīng)微博為例[J];現(xiàn)代圖書情報技術(shù);2011年02期
3 夏雨禾;;微博互動的結(jié)構(gòu)與機制——基于對新浪微博的實證研究[J];新聞與傳播研究;2010年04期
,本文編號:2288076
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2288076.html
最近更新
教材專著