當(dāng)前位置：主頁 > 管理論文 > 移動網(wǎng)絡(luò)論文 >

網(wǎng)絡(luò)百科條目質(zhì)量評價研究

發(fā)布時間：2018-03-18 11:11

本文選題：維基百科　切入點(diǎn)：質(zhì)量特征　出處：《國防科學(xué)技術(shù)大學(xué)》2014年碩士論文　論文類型：學(xué)位論文

【摘要】：在二十一世紀(jì),互聯(lián)網(wǎng)的普及和應(yīng)用帶來了信息量的迅速增長。人們一方面受益于訪問大量信息的便捷,另一方面卻難以評判信息的質(zhì)量好壞。維基百科就是這樣的一個典型的案例。在維基百科上每天都有大量的條目在經(jīng)歷著創(chuàng)建、修改和編輯的過程,但是對條目的質(zhì)量評判工作卻跟不上條目變化的速度。維基百科采用基于人工的質(zhì)量評判體系,使得評價質(zhì)量的效率受到了很大限制。因此用戶對于其中相當(dāng)一部分的條目并不知道質(zhì)量到底是好是壞。雖然維基百科也引入了用戶評分的機(jī)制,但是受人的主觀性影響評價效果并不好。本文針對維基百科上質(zhì)量評價的效率和客觀性存在的問題,對條目的質(zhì)量評價進(jìn)行了研究,開發(fā)了自動的質(zhì)量評價方法。本文首先研究了維基百科上條目與質(zhì)量相關(guān)的特征,從而為后續(xù)質(zhì)量評價工作打下基礎(chǔ)。一方面維基百科的高質(zhì)量條目要具備傳統(tǒng)百科條目所擁有的一般性特征,比如內(nèi)容全面、信息準(zhǔn)確等等;另一方面維基百科條目的撰寫與傳統(tǒng)百科存在著本質(zhì)上的不同。維基百科采用了基于眾包的編寫模式,其條目的編寫工作不是由少數(shù)專家編委完成,而是由眾多編輯者一起參與完成的。所以條目的質(zhì)量與其編輯歷史有著密切的關(guān)聯(lián),我們也從其編輯歷史中挖掘了特征。我們對條目內(nèi)容和編輯歷史中的特征進(jìn)行了選取和分析,確定了部分能用機(jī)器評價的特征來作為我們研究的對象。在對條目質(zhì)量進(jìn)行評價時從分類和排序的兩個角度展開。從分類的角度我們開發(fā)了一套基于SVM的分類方法來區(qū)分優(yōu)質(zhì)條目候選和普通條目。這樣我們就能夠從維基百科海量的條目中選取高質(zhì)量的條目作為優(yōu)質(zhì)條目候選,從而克服人工評選的不足;另一方面我們研究了優(yōu)質(zhì)條目候選在評審階段能否得到晉升的問題。我們發(fā)現(xiàn)機(jī)器的分類效果并不理想,因此不能完全替代人工評審的作用。接下來我們研究了對領(lǐng)域內(nèi)的條目進(jìn)行質(zhì)量排序的方法。我們首先利用Page Rank模型對于條目和編輯者之間構(gòu)成的二部圖網(wǎng)絡(luò)進(jìn)行了建模。我們發(fā)現(xiàn)直接利用Page Rank來計算收斂狀態(tài)下的排名時優(yōu)質(zhì)條目的排名比較靠后,質(zhì)量排序效果不好。因此我們換了另外的思路。我們利用優(yōu)質(zhì)條目來度量編輯者的水平,進(jìn)而實(shí)現(xiàn)了對于條目的質(zhì)量排序。在此基礎(chǔ)上,我們開發(fā)的基于編輯歷史特征的加權(quán)算法能顯著提高排序的效果。本文的研究以網(wǎng)絡(luò)百科中的維基百科為典型代表,對于研究其他的網(wǎng)絡(luò)百科以及眾包網(wǎng)站上內(nèi)容的質(zhì)量評價也具有重要參考意義。
[Abstract]:In 21th century, the popularity and application of the Internet brought about a rapid increase in the amount of information. On the one hand, people benefited from easy access to a large number of information. On the other hand, it is difficult to judge the quality of information. Wikipedia is a typical case. There are a lot of items in Wikipedia every day that go through the process of creating, modifying and editing. But the quality of items doesn't keep up with the speed at which they change. Wikipedia uses an artificial system of quality evaluation. So users don't know whether the quality is good or bad, even though Wikipedia also introduces a user rating mechanism. But the effect of subjective evaluation is not good. This paper studies the quality evaluation of items in view of the problems of efficiency and objectivity of quality evaluation on Wikipedia. An automatic quality evaluation method is developed. Firstly, this paper studies the quality-related features of Wikipedia entries. On the one hand, the high quality items of Wikipedia should have the general characteristics of traditional encyclopedia items, such as comprehensive content, accurate information and so on. On the other hand, the writing of Wikipedia entries is fundamentally different from that of traditional encyclopedias. Wikipedia adopts a crowdsourcing approach, and the writing of its entries is not done by a small number of expert editors. So the quality of the entry is closely related to its editing history, We also excavated features from its editing history. We selected and analyzed the content of entries and features in editing history. Some of the features of machine evaluation can be used as the object of our research. When evaluating the quality of items, we develop a set of classifiers based on SVM from the perspective of classification and sorting. In order to distinguish between good entry candidates and regular entries, so we can select high quality entries from Wikipedia's mass of entries as candidates for high quality entries. In order to overcome the shortcomings of manual selection; on the other hand, we studied whether the candidates for quality items can be promoted in the review stage. We found that the classification effect of the machine is not satisfactory. Therefore, we can not completely replace the role of manual review. Next, we study the method of quality ranking of items in the domain. We first use the Page Rank model to make a bipartite graph network between entries and editors. We found that Page Rank was used directly to calculate the ranking of quality items in the convergent state. The quality sort effect is not good. So we change the way of thinking. We use the high quality item to measure the level of the editor, and then realize the quality sort of the item. On this basis, We developed a weighted algorithm based on editing historical features, which can significantly improve the effect of sorting. In this paper, Wikipedia in the network encyclopedia as a typical representative, It is also of great significance to study the quality evaluation of other online encyclopedias and crowdsourcing websites.
【學(xué)位授予單位】：國防科學(xué)技術(shù)大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2014
【分類號】：TP393.092;TP391.1
，

本文編號：1629332

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/guanlilunwen/ydhl/1629332.html

上一篇：基于OpenSBC的IMS網(wǎng)絡(luò)入侵檢測系統(tǒng)的設(shè)計與實(shí)現(xiàn)
下一篇：社交網(wǎng)絡(luò)用戶影響力關(guān)鍵技術(shù)研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

網(wǎng)絡(luò)百科條目質(zhì)量評價研究