基于社交網(wǎng)絡(luò)的用戶性格與行為分析
發(fā)布時(shí)間:2018-04-13 17:23
本文選題:社交網(wǎng)絡(luò) + 文本挖掘。 參考:《北京郵電大學(xué)》2014年碩士論文
【摘要】:近兩年社交網(wǎng)絡(luò)如雨后春筍般出現(xiàn),國內(nèi)比較知名的有人人網(wǎng)、微博、QQ空間等,國外則有Facebook、Twitter等。社交網(wǎng)絡(luò)越來越多的改變著人們的生活方式以及社交方式,人們逐漸接受并習(xí)慣在社交網(wǎng)絡(luò)上發(fā)照片、寫日志、更新狀態(tài)。另一方面,用戶在社交網(wǎng)絡(luò)上的表現(xiàn)也趨于差異化,例如,有些人喜歡只瀏覽而不發(fā)表內(nèi)容,另一些人喜歡發(fā)表日志卻很少發(fā)照片。用戶的這些行為并不是雜亂無章的,而是蘊(yùn)含著一定的規(guī)律,如何有效的分析用戶的行為,挖掘行為背后的深層次規(guī)律,進(jìn)而給用戶提供個(gè)性化的服務(wù)成為一大難點(diǎn)。目前,基于社交網(wǎng)絡(luò)的用戶行為分析主要集中在用戶的行為數(shù)據(jù)上,沒有充分挖掘用戶在社交網(wǎng)絡(luò)內(nèi)發(fā)表的文本內(nèi)容,如用戶的狀態(tài)和日志等,另一方面,目前的用戶分析也沒有涉及到用戶的性格模型,如果能找到用戶的性格和行為之間的內(nèi)在聯(lián)系,必然能夠?yàn)樯缃痪W(wǎng)絡(luò)的用戶分析提供新的理論支持。本文的工作主要包括以下幾個(gè)方面: 1.分析方法的確定。首先探討了目前國內(nèi)社交網(wǎng)絡(luò)的發(fā)展以及社交網(wǎng)絡(luò)用戶數(shù)據(jù)的獲取,然后以人人網(wǎng)為研究對(duì)象,選取了通過構(gòu)建人人網(wǎng)站內(nèi)應(yīng)用的方式獲取用戶的數(shù)據(jù),站內(nèi)應(yīng)用的形式為基于人人網(wǎng)的在線性格測試。 2.站內(nèi)應(yīng)用的構(gòu)建。性格測試的題目選擇了大五性格測試量表,利用正態(tài)分布,將每種性格成分的成績平均分成五個(gè)檔次,根據(jù)用戶所在檔次對(duì)用戶進(jìn)行測試反饋,并采用Flex前端、Java后臺(tái)、MySQL數(shù)據(jù)庫技術(shù)進(jìn)行實(shí)現(xiàn)。該站內(nèi)應(yīng)用通過OAuth認(rèn)證獲得用戶的授權(quán),然后通過API讀取用戶的個(gè)人資料及UGC數(shù)據(jù)。 3.用戶數(shù)據(jù)的處理。對(duì)站內(nèi)應(yīng)用記錄的用戶的個(gè)人資料、UGC數(shù)據(jù)進(jìn)行量化,得到用戶的行為統(tǒng)計(jì)數(shù)據(jù),主要包括用戶發(fā)表狀態(tài)、日志,或者分享日志、相冊等的頻率。此外,對(duì)用戶的UGC進(jìn)行語義分析,首先對(duì)用戶的狀態(tài)、日志等進(jìn)行分詞以及詞頻統(tǒng)計(jì)、然后對(duì)不同的詞進(jìn)行權(quán)重調(diào)整,最后利用主成分分析簡化所得數(shù)據(jù)。 基于以上步驟得到的用戶的行為數(shù)據(jù)以及語義數(shù)據(jù),應(yīng)用線性回歸和決策樹算法,對(duì)用戶的性別、年齡、性格成分進(jìn)行預(yù)測,將預(yù)測結(jié)果與已知記錄進(jìn)行比較,驗(yàn)證算法的有效性。
[Abstract]:Social networks have sprung up in the past two years, with Renren, Weibo and QQ spaces in China and Facebook Twitter in foreign countries.More and more social networks are changing the way people live and socialize, and people gradually accept and become accustomed to posting photos, writing logs, and updating their status on social networks.On the other hand, users tend to behave differently on social networks. For example, some people like to browse rather than publish content, others like to post blogs but rarely post photos.These behaviors of users are not random, but contain certain laws. How to effectively analyze the behavior of users, excavate the deep rules behind the behavior, and then provide personalized services to users becomes a big difficulty.At present, the social network-based user behavior analysis mainly focuses on the user's behavior data, does not fully excavate the text content that the user publishes in the social network, such as user's status and the log, on the other hand,The current user analysis also does not involve the user's personality model. If we can find the internal relationship between the user's personality and behavior, it will provide a new theoretical support for the social network user analysis.The work of this paper mainly includes the following aspects:1.Determination of analytical methods.This paper first discusses the development of social network in China and the acquisition of user data of social network, and then takes Renren as the research object, and selects the way to obtain user's data by constructing the application in every website.The application in the station is based on the online personality test of Renren.2.Construction of in-station applications.The title of the personality test was the Big five Personality Test scale. By using the normal distribution, the scores of each personality component were divided into five grades, and the users were tested and feedback according to the user's grade.And use Flex front end Java backstage to carry on the implementation of MySQL database technology.The application obtains the user's authorization through OAuth authentication, and then reads the user's personal data and UGC data through API.3.User data processing.The user's personal data (UGC) are quantified and the user behavior statistics are obtained, including the frequency of user's published status, log, sharing log, photo album and so on.In addition, semantic analysis of user's UGC is carried out. First, word segmentation and word frequency statistics are carried out on user's state, log and so on, then the weight of different words is adjusted, and the data is simplified by principal component analysis (PCA).Based on the user behavior data and semantic data obtained from the above steps, linear regression and decision tree algorithm are used to predict the gender, age and personality of the user, and the predicted results are compared with the known records.The validity of the algorithm is verified.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP391.1;TP393.09
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 肖冬平,梁臣;社會(huì)網(wǎng)絡(luò)研究的理論模式綜述[J];廣西社會(huì)科學(xué);2003年12期
,本文編號(hào):1745473
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1745473.html
最近更新
教材專著