在線社區(qū)用戶評論有用性研究
發(fā)布時間:2018-05-05 02:22
本文選題:Web挖掘 + 用戶評論 ; 參考:《山東大學》2014年碩士論文
【摘要】:近年來互聯(lián)網(wǎng)和Web2.0的迅速發(fā)展,特別是社交網(wǎng)絡和微博的興起,給網(wǎng)絡的使用方式帶來了巨大的改變,F(xiàn)在,每個用戶都同時是內(nèi)容的消費者和內(nèi)容的創(chuàng)造者。而互聯(lián)網(wǎng)巨大的用戶群體,導致用戶貢獻內(nèi)容(UGC)的爆發(fā)式增長。所有用戶都可以自由地創(chuàng)造內(nèi)容,這在一定程度上改革了傳統(tǒng)的互聯(lián)網(wǎng)模式,同時也帶來了內(nèi)容質(zhì)量的參差不齊。如何有效地從這些內(nèi)容中獲取有價值的信息,也逐漸吸引了很多演研究者的關(guān)注。 本文以豆瓣閱讀中的書評為研究對象,從分類和排序兩種角度研究了用戶評論的有用性。與亞馬遜等電子商務網(wǎng)站中的用戶評論相比,書籍評論有兩個明顯不同的特點,一是豆瓣讀書的書籍評論一般較長,平均長度超過千字,長評論意味著內(nèi)容的多樣和復雜,另一點是書籍屬于體驗性的物品,書評則帶有個人創(chuàng)作的性質(zhì),風格和內(nèi)容同樣重要。因此,本文嘗試從評論內(nèi)容和寫作風格,以及評論者的相關(guān)信息等方面抽取有用的特征。 本文的主要貢獻包括兩部分。首先,以用戶投票為衡量評論有用性的標準,建立了一個可用的評論數(shù)據(jù)集,并在該數(shù)據(jù)基礎(chǔ)上,分析研究了用戶評論和投票的特點。進一步,從數(shù)據(jù)中抽取出內(nèi)容和風格相關(guān)的特征,以及評論者的相關(guān)特征,分別以分類方法和排序方法對評論有用性進行學習建模。其中針對詞匯特征,本文提出了一種與評論主題相關(guān)的權(quán)重方式,實驗表明,該權(quán)重方式在分類模型和排序模型中都優(yōu)于單純的詞匯頻率或TFIDF方式。本文的實驗結(jié)果也說明,對評論有用性的挖掘而言,排序模型是一種更為合理的方式。
[Abstract]:In recent years, the rapid development of the Internet and Web2.0, especially the rise of social networks and Weibo, has brought great changes to the use of the network. Now, each user is both a consumer of content and a creator of content. And Internet huge user group, cause user to contribute content to rise explosively. All users are free to create content, which to some extent changes the traditional Internet model, but also brings about the uneven quality of content. How to effectively obtain valuable information from these contents has gradually attracted the attention of many acting researchers. This paper studies the usefulness of user reviews from the perspectives of classification and sorting. Book reviews have two distinct characteristics compared with user reviews on e-commerce sites such as Amazon. One is that the reviews of Douban books are generally longer, with an average length of more than a thousand words. Long reviews mean diversity and complexity of content. The other is that books are experiential items, and book reviews are as important as personal creation, style and content. Therefore, this paper attempts to extract useful features from the content and writing style of the commentary, as well as the relevant information of the reviewer. The main contribution of this paper consists of two parts. Firstly, a useful comment data set is established based on user voting, and the characteristics of user comment and voting are analyzed. Furthermore, the content and stylistic features are extracted from the data, and the relevant features of the reviewers are extracted, and the learning models of the usefulness of the reviews are modeled by the classification method and the sorting method, respectively. According to the lexical features, this paper proposes a weighting method related to the topic of comment. The experiments show that the weight method is superior to the simple word frequency or TFIDF method in the classification model and the sorting model. The experimental results also show that the ranking model is a more reasonable method for mining the usefulness of comments.
【學位授予單位】:山東大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.09
【共引文獻】
相關(guān)期刊論文 前10條
1 張昊e,
本文編號:1845758
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1845758.html
最近更新
教材專著