論壇用戶行跡分析系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)
發(fā)布時間:2018-05-19 04:09
本文選題:網(wǎng)絡(luò)論壇 + 用戶行跡 ; 參考:《哈爾濱工業(yè)大學(xué)》2017年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)與人類生活進(jìn)一步融合,出現(xiàn)了各種各樣的網(wǎng)絡(luò)應(yīng)用,如在線論壇、電子商務(wù)、社交軟件、網(wǎng)絡(luò)游戲等;ヂ(lián)網(wǎng)在為人類生活提供便捷的同時也由于其虛擬性帶來了諸多問題。近年來互聯(lián)網(wǎng)向金融領(lǐng)域的擴(kuò)展加速了網(wǎng)絡(luò)實(shí)名制的進(jìn)程,推進(jìn)了可信網(wǎng)絡(luò)空間建設(shè)。但網(wǎng)絡(luò)論壇由于其討論交流的定位及非企業(yè)法人維護(hù)等原因,用戶在網(wǎng)絡(luò)論壇中依然使用著虛擬身份,為許多網(wǎng)絡(luò)違法行為提供了藏匿空間。對網(wǎng)絡(luò)中虛擬身份背后社會主體的追溯成為一個被關(guān)注的問題。針對這一問題,本文基于網(wǎng)絡(luò)用戶命名習(xí)慣性與中小型論壇用戶同一性,提出一種通過發(fā)現(xiàn)用戶在互聯(lián)網(wǎng)論壇空間內(nèi)活動行跡進(jìn)而挖掘虛擬身份背后社會主體信息的方法。其中網(wǎng)絡(luò)用戶命名習(xí)慣性指網(wǎng)絡(luò)用戶在互聯(lián)網(wǎng)使用中在多個網(wǎng)絡(luò)應(yīng)用或站點(diǎn)中使用相同的id進(jìn)行賬號注冊。中小型論壇用戶的同一性是指這些網(wǎng)絡(luò)論壇中聚集的用戶具有相同的特征。本文中所討論的用戶虛擬身份的標(biāo)識包括郵箱和用戶名。首先,本文通過鏈接擴(kuò)展和站點(diǎn)類型識別來發(fā)現(xiàn)當(dāng)前互聯(lián)網(wǎng)中的中文論壇站點(diǎn)從而構(gòu)建用戶論壇活動行跡的“地圖”。其次,通過對站點(diǎn)注冊查重接口的模擬來發(fā)現(xiàn)用戶的活動論壇集合,再基于論壇內(nèi)容爬蟲記錄獲取用戶在每個論壇內(nèi)的發(fā)回帖信息,發(fā)現(xiàn)用戶的論壇活動行跡。隨后對所發(fā)現(xiàn)的用戶行跡進(jìn)行分析,從用戶的發(fā)回帖記錄中匹配郵箱、手機(jī)號等個人信息,基于站點(diǎn)類別粗粒度的定位用戶關(guān)注領(lǐng)域,并根據(jù)同領(lǐng)域站點(diǎn)注冊數(shù)量、站點(diǎn)規(guī)模、單一站點(diǎn)用戶活躍度、單一站點(diǎn)用戶影響力等信息來量化用戶的領(lǐng)域影響力和領(lǐng)域興趣度。最后,本文將用戶行跡分析作為一種服務(wù)基于Web Service構(gòu)建行跡分析平臺來提供服務(wù)獲取接口。并在此基礎(chǔ)上對Web接口的輸入輸出進(jìn)行可視化封裝來實(shí)現(xiàn)系統(tǒng)前端。
[Abstract]:With the further integration of the Internet and human life, a variety of network applications have emerged, such as online forums, electronic commerce, social software, online games and so on. Internet not only provides convenience for human life, but also brings many problems because of its virtual nature. In recent years, the expansion of the Internet to the financial field has accelerated the process of network real name system and promoted the construction of trusted cyberspace. However, due to the location of discussion and communication and the maintenance of non-corporate legal person, users still use virtual identity in the network forum, which provides hiding space for many illegal activities on the network. The tracing of the social subject behind the virtual identity in the network has become a concerned issue. In order to solve this problem, based on the identity of network users' naming habits and small and medium-sized forum users, this paper proposes a method to mine the information of social subjects behind virtual identity by discovering users' activities in the Internet forum space. Network user naming habit means that network users use the same id to register their accounts in multiple network applications or sites in the use of the Internet. The identity of the users of small and medium-sized forums means that the users gathered in these web forums have the same characteristics. The identity of the user's virtual identity discussed in this article includes a mailbox and a user name. First of all, this paper uses link extension and site type identification to find out the Chinese forum sites in the current Internet to construct a "map" of user forum activity. Secondly, the user's active forum set is found through the simulation of the site registration and duplicate interface, and then the user's post information in each forum is obtained based on the forum content crawler record, and the user's forum activity track is found. Then we analyze the user's tracks, match the personal information such as mailbox, mobile phone number and other personal information from the user's post record, locate the user's domain of concern based on the coarse-grained site category, and according to the number of sites registered in the same domain, site size, etc. Single site user activity, single site user influence and other information to quantify users' domain influence and domain interest. Finally, this paper uses user trace analysis as a service based on Web Service to build a platform to provide service acquisition interface. On the basis of this, the input and output of Web interface are encapsulated visually to realize the front end of the system.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP393.09
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 李輝;梅佩;易軍凱;;基于混合度量方法的用戶興趣模型[J];計(jì)算機(jī)工程與設(shè)計(jì);2016年03期
2 代鵬;;基于Nutch的增量網(wǎng)頁信息采集系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];軟件;2015年11期
3 賈沖沖;王名揚(yáng);車鑫;;基于HRank的微博用戶影響力評價[J];計(jì)算機(jī)應(yīng)用;2015年04期
4 石偉杰;徐雅斌;;微博用戶興趣發(fā)現(xiàn)研究[J];現(xiàn)代圖書情報技術(shù);2015年01期
5 詹天晟;陳德華;樂嘉錦;王梅;;基于海量搜索歷史數(shù)據(jù)的用戶興趣模型[J];計(jì)算機(jī)應(yīng)用;2014年S2期
6 段松青;吳斌;王柏;;TTRank:基于傾向性轉(zhuǎn)變的用戶影響力排序[J];計(jì)算機(jī)研究與發(fā)展;2014年10期
7 蘇雪陽;左萬利;王俊華;;基于本體與模式的網(wǎng)絡(luò)用戶興趣挖掘[J];電子學(xué)報;2014年08期
8 張s,
本文編號:1908644
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1908644.html
最近更新
教材專著