基于WEB的中文社交網(wǎng)站用戶屬性推測(cè)的研究與分析
[Abstract]:With the development of the Internet, social networking sites are becoming more and more popular. Users of these sites generate huge amounts of data every day, and the user data lurks a great deal of value. Because user data often involves personal privacy, they usually choose not to fill in or fill in false information to hide their personal information, which makes it difficult to obtain some valuable attribute information directly. How to infer the user's attribute information has become a hot topic in the current research. This article mainly takes Sina Weibo user as the research object, carries on the conjecture to the user's attribute. It mainly includes users' sex speculation, age distribution theory and education level distribution theory. The main work of this paper is as follows: 1) for the gender estimation of Chinese users, four text-based gender inference algorithms are proposed in this paper. They are nicknames based on the user gender inference algorithm (GIABON), tag-based user gender inference algorithm (GIABOL), based on Weibo text user gender inference algorithm (GIABOWT), mean-based user gender inference algorithm (GIABOM). The first three algorithms only consider the impact of a single attribute on the user's gender conjecture, which is actually limited, while GIABOM takes into account the effects of various types of texts on the user's gender conjecture. Experimental results show that the accuracy of GIABOM is 85.55%, which is much higher than the other three algorithms. This shows that it is more reasonable to consider some attributes in the user's gender estimation. 2) the age distribution of Chinese users is estimated. In this paper, a user age distribution estimation algorithm based on genetic algorithm is proposed to optimize the combination parameters and characteristic attributes of support vector machines (SVM). In this paper, linear kernel function, radial basis kernel function (RBF),) and genetic algorithm-based optimization parameter (RBF) are used as kernel functions of SVM algorithm respectively. Experiments show that the accuracy of SVM algorithm using linear kernel function can reach 75.38%, and the accuracy of SVM algorithm using RBF can reach 86.14%. Based on the genetic algorithm, the accuracy of the user age distribution estimation algorithm based on the combination parameters and characteristic attributes of support vector machine can reach 89.11%. The experimental results verify the validity and rationality of the proposed algorithm for the optimization of SVM parameters and features. 3) the educational level distribution of Chinese users is speculated. In this paper, a genetic algorithm based on genetic algorithm to optimize the combination parameters and characteristic attributes of support vector machines (SVM) is proposed to predict the user's educational level distribution. The idea is similar to the age distribution estimation algorithm for Chinese users. Experiments show that the accuracy of SVM algorithm using linear kernel function is 81.38%, and that of SVM algorithm using RBF is 92.14%. Based on genetic algorithm, the accuracy of the user education degree distribution prediction algorithm based on the combination parameters and feature attributes of support vector machine is 93.03%. This shows that the algorithm still has a good effect in predicting the education level of users.
【學(xué)位授予單位】:南京航空航天大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP18;TP393.09
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 張磊;陳貞翔;楊波;;社交網(wǎng)絡(luò)用戶的人格分析與預(yù)測(cè)[J];計(jì)算機(jī)學(xué)報(bào);2014年08期
2 沈翠華,劉廣利,鄧乃揚(yáng);一種改進(jìn)的支持向量分類方法及其應(yīng)用[J];計(jì)算機(jī)工程;2005年08期
相關(guān)會(huì)議論文 前1條
1 趙云龍;李艷兵;;社交網(wǎng)絡(luò)用戶的人格預(yù)測(cè)與關(guān)系強(qiáng)度研究[A];第七屆(2012)中國(guó)管理學(xué)年會(huì)商務(wù)智能分會(huì)場(chǎng)論文集(選編)[C];2012年
相關(guān)博士學(xué)位論文 前1條
1 萬(wàn)懷宇;社會(huì)網(wǎng)絡(luò)中基于鏈接的分類問題研究[D];北京交通大學(xué);2012年
相關(guān)碩士學(xué)位論文 前4條
1 張曉;社會(huì)網(wǎng)絡(luò)上的用戶屬性推測(cè)方法研究[D];哈爾濱工業(yè)大學(xué);2015年
2 夏勇;基于手機(jī)應(yīng)用日志的用戶基礎(chǔ)屬性預(yù)測(cè)[D];電子科技大學(xué);2015年
3 許盛伍;在線熱點(diǎn)新聞推薦系統(tǒng)研究和實(shí)現(xiàn)[D];南京航空航天大學(xué);2015年
4 壽泉;在線網(wǎng)絡(luò)用戶作者身份鑒定方法研究[D];南京航空航天大學(xué);2012年
,本文編號(hào):2443437
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2443437.html