微博用戶屬性識別方法研究
發(fā)布時間:2018-06-27 17:26
本文選題:微博分析 + 用戶屬性識別。 參考:《蘇州大學(xué)》2015年碩士論文
【摘要】:隨著社交網(wǎng)絡(luò)的迅猛發(fā)展,自動分析社交網(wǎng)絡(luò)中的有用信息成為目前自然語言處理、社交網(wǎng)絡(luò)分析等領(lǐng)域的重要研究課題。其中,微博用戶屬性識別是一項基本任務(wù)。該任務(wù)旨在根據(jù)微博用戶產(chǎn)生的相關(guān)數(shù)據(jù)對用戶的個體屬性(例如,性別、年齡等)進(jìn)行自動識別。準(zhǔn)確識別用戶的個體屬性,可以幫助更好的進(jìn)行智能營銷、個性化預(yù)測及情感分析等研究。本文的研究內(nèi)容主要包括以下三個方面:首先,針對微博中用戶的個人與非個人屬性,本文提出了一種結(jié)合微博用戶的用戶名和微博文本兩類信息的分類方法。該方法針對兩種文本訓(xùn)練不同分類器,并在此基礎(chǔ)上提出了一種基于分類器融合的方法,同時利用用戶名和微博兩類信息進(jìn)行分類。實驗結(jié)果表明,本文的方法可以達(dá)到較高的識別準(zhǔn)確率,并且分類器融合方法明顯優(yōu)于僅利用用戶名或微博文本的單分類器分類方法。其次,針對微博用戶的性別屬性,提出了一種基于交互式信息的半監(jiān)督性別分類方法。傳統(tǒng)的性別分類研究依賴大量的標(biāo)注樣本,而通常情況下人工標(biāo)注樣本費時費力。作為一種社交網(wǎng)絡(luò)平臺,微博提供了多種交互機(jī)制以供用戶互動。因此,微博平臺既包括用戶發(fā)布的微博等非交互式信息,同時也包括回復(fù)等交互式信息。本文提出了一種基于交互式信息的半監(jiān)督性別分類方法,該方法將交互式和非交互式兩類信息作為協(xié)同訓(xùn)練算法的兩個視圖,充分利用未標(biāo)注樣本實現(xiàn)半監(jiān)督性別分類。實驗結(jié)果表明基于非交互式和交互式視圖的半監(jiān)督性別分類方法能夠有效利用非標(biāo)注樣本提升性別分類性能。最后,針對微博用戶的年齡屬性,提出了一種基于文本和社交信息的半監(jiān)督年齡回歸方法。該方法通過協(xié)同訓(xùn)練算法同時結(jié)合用戶的文本和社交兩類信息,充分利用未標(biāo)注樣本實現(xiàn)半監(jiān)督年齡回歸。此外,我們提出了一種基于QBC的方法,解決了回歸問題中樣本置信度衡量的難題。實驗結(jié)果表明,本文提出的基于文本和社交信息的半監(jiān)督年齡回歸方法,在數(shù)據(jù)平衡和不平衡兩種情況下都能有效利用非標(biāo)注樣本提升年齡回歸的性能。
[Abstract]:With the rapid development of social networks, automatic analysis of useful information in social networks has become an important research topic in natural language processing, social network analysis and other fields. Among them, Weibo user attribute recognition is a basic task. The task is to automatically identify the user's individual attributes (such as gender, age, etc.) based on the relevant data generated by the Weibo user. Accurate identification of the individual attributes of users can help to better carry out intelligent marketing, personalized prediction and emotional analysis and other research. The research contents of this paper mainly include the following three aspects: firstly, aiming at the personal and non-personal attributes of users in Weibo, this paper proposes a classification method which combines the user name of Weibo user and the Weibo text. This method aims at training different classifiers for two kinds of text, and proposes a method based on classifier fusion. At the same time, two kinds of information, user name and Weibo, are used to classify. The experimental results show that the proposed method can achieve high recognition accuracy and the classifier fusion method is obviously superior to the single classifier classification method which only uses user name or Weibo text. Secondly, a semi-supervised gender classification method based on interactive information is proposed for the gender attributes of Weibo users. Traditional sex classification research relies on a large number of labeled samples, but usually manual labeling of samples takes time and effort. As a social network platform, Weibo provides a variety of interactive mechanisms for user interaction. Therefore, the Weibo platform includes not only non-interactive information such as Weibo published by users, but also interactive information such as replies. In this paper, a semi-supervised gender classification method based on interactive information is proposed. This method takes interactive and non-interactive information as two views of collaborative training algorithm, and realizes semi-supervised sex classification by using unlabeled samples. The experimental results show that the semi-supervised gender classification method based on non-interactive and interactive views can effectively improve the performance of sex classification by using unlabeled samples. Finally, a semi-supervised age regression method based on text and social information is proposed for the age attributes of Weibo users. This method combines the text and social information of users by the cooperative training algorithm, and makes full use of unlabeled samples to realize semi-supervised age regression. In addition, we propose a method based on QBC to solve the problem of sample confidence measurement in regression problem. The experimental results show that the proposed semi-supervised age regression method based on text and social information can effectively improve the performance of age regression by using unlabeled samples under both data balance and imbalance.
【學(xué)位授予單位】:蘇州大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2015
【分類號】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 曹波;蘇一丹;鄧琦;;基于最大熵模型的中國人名自動識別[J];計算機(jī)工程與應(yīng)用;2009年04期
2 陳鵬;隋晉光;;基于個體屬性的微博用戶特征行為統(tǒng)計分析[J];知識管理論壇;2013年03期
相關(guān)碩士學(xué)位論文 前1條
1 王廣新;基于微博的用戶興趣分析與個性化信息推薦[D];上海交通大學(xué);2013年
,本文編號:2074667
本文鏈接:http://sikaile.net/guanlilunwen/yingxiaoguanlilunwen/2074667.html
最近更新
教材專著