天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

網(wǎng)絡(luò)百科條目質(zhì)量評(píng)價(jià)研究

發(fā)布時(shí)間:2018-03-18 11:11

  本文選題:維基百科 切入點(diǎn):質(zhì)量特征 出處:《國(guó)防科學(xué)技術(shù)大學(xué)》2014年碩士論文 論文類型:學(xué)位論文


【摘要】:在二十一世紀(jì),互聯(lián)網(wǎng)的普及和應(yīng)用帶來(lái)了信息量的迅速增長(zhǎng)。人們一方面受益于訪問(wèn)大量信息的便捷,另一方面卻難以評(píng)判信息的質(zhì)量好壞。維基百科就是這樣的一個(gè)典型的案例。在維基百科上每天都有大量的條目在經(jīng)歷著創(chuàng)建、修改和編輯的過(guò)程,但是對(duì)條目的質(zhì)量評(píng)判工作卻跟不上條目變化的速度。維基百科采用基于人工的質(zhì)量評(píng)判體系,使得評(píng)價(jià)質(zhì)量的效率受到了很大限制。因此用戶對(duì)于其中相當(dāng)一部分的條目并不知道質(zhì)量到底是好是壞。雖然維基百科也引入了用戶評(píng)分的機(jī)制,但是受人的主觀性影響評(píng)價(jià)效果并不好。本文針對(duì)維基百科上質(zhì)量評(píng)價(jià)的效率和客觀性存在的問(wèn)題,對(duì)條目的質(zhì)量評(píng)價(jià)進(jìn)行了研究,開(kāi)發(fā)了自動(dòng)的質(zhì)量評(píng)價(jià)方法。本文首先研究了維基百科上條目與質(zhì)量相關(guān)的特征,從而為后續(xù)質(zhì)量評(píng)價(jià)工作打下基礎(chǔ)。一方面維基百科的高質(zhì)量條目要具備傳統(tǒng)百科條目所擁有的一般性特征,比如內(nèi)容全面、信息準(zhǔn)確等等;另一方面維基百科條目的撰寫(xiě)與傳統(tǒng)百科存在著本質(zhì)上的不同。維基百科采用了基于眾包的編寫(xiě)模式,其條目的編寫(xiě)工作不是由少數(shù)專家編委完成,而是由眾多編輯者一起參與完成的。所以條目的質(zhì)量與其編輯歷史有著密切的關(guān)聯(lián),我們也從其編輯歷史中挖掘了特征。我們對(duì)條目?jī)?nèi)容和編輯歷史中的特征進(jìn)行了選取和分析,確定了部分能用機(jī)器評(píng)價(jià)的特征來(lái)作為我們研究的對(duì)象。在對(duì)條目質(zhì)量進(jìn)行評(píng)價(jià)時(shí)從分類和排序的兩個(gè)角度展開(kāi)。從分類的角度我們開(kāi)發(fā)了一套基于SVM的分類方法來(lái)區(qū)分優(yōu)質(zhì)條目候選和普通條目。這樣我們就能夠從維基百科海量的條目中選取高質(zhì)量的條目作為優(yōu)質(zhì)條目候選,從而克服人工評(píng)選的不足;另一方面我們研究了優(yōu)質(zhì)條目候選在評(píng)審階段能否得到晉升的問(wèn)題。我們發(fā)現(xiàn)機(jī)器的分類效果并不理想,因此不能完全替代人工評(píng)審的作用。接下來(lái)我們研究了對(duì)領(lǐng)域內(nèi)的條目進(jìn)行質(zhì)量排序的方法。我們首先利用Page Rank模型對(duì)于條目和編輯者之間構(gòu)成的二部圖網(wǎng)絡(luò)進(jìn)行了建模。我們發(fā)現(xiàn)直接利用Page Rank來(lái)計(jì)算收斂狀態(tài)下的排名時(shí)優(yōu)質(zhì)條目的排名比較靠后,質(zhì)量排序效果不好。因此我們換了另外的思路。我們利用優(yōu)質(zhì)條目來(lái)度量編輯者的水平,進(jìn)而實(shí)現(xiàn)了對(duì)于條目的質(zhì)量排序。在此基礎(chǔ)上,我們開(kāi)發(fā)的基于編輯歷史特征的加權(quán)算法能顯著提高排序的效果。本文的研究以網(wǎng)絡(luò)百科中的維基百科為典型代表,對(duì)于研究其他的網(wǎng)絡(luò)百科以及眾包網(wǎng)站上內(nèi)容的質(zhì)量評(píng)價(jià)也具有重要參考意義。
[Abstract]:In 21th century, the popularity and application of the Internet brought about a rapid increase in the amount of information. On the one hand, people benefited from easy access to a large number of information. On the other hand, it is difficult to judge the quality of information. Wikipedia is a typical case. There are a lot of items in Wikipedia every day that go through the process of creating, modifying and editing. But the quality of items doesn't keep up with the speed at which they change. Wikipedia uses an artificial system of quality evaluation. So users don't know whether the quality is good or bad, even though Wikipedia also introduces a user rating mechanism. But the effect of subjective evaluation is not good. This paper studies the quality evaluation of items in view of the problems of efficiency and objectivity of quality evaluation on Wikipedia. An automatic quality evaluation method is developed. Firstly, this paper studies the quality-related features of Wikipedia entries. On the one hand, the high quality items of Wikipedia should have the general characteristics of traditional encyclopedia items, such as comprehensive content, accurate information and so on. On the other hand, the writing of Wikipedia entries is fundamentally different from that of traditional encyclopedias. Wikipedia adopts a crowdsourcing approach, and the writing of its entries is not done by a small number of expert editors. So the quality of the entry is closely related to its editing history, We also excavated features from its editing history. We selected and analyzed the content of entries and features in editing history. Some of the features of machine evaluation can be used as the object of our research. When evaluating the quality of items, we develop a set of classifiers based on SVM from the perspective of classification and sorting. In order to distinguish between good entry candidates and regular entries, so we can select high quality entries from Wikipedia's mass of entries as candidates for high quality entries. In order to overcome the shortcomings of manual selection; on the other hand, we studied whether the candidates for quality items can be promoted in the review stage. We found that the classification effect of the machine is not satisfactory. Therefore, we can not completely replace the role of manual review. Next, we study the method of quality ranking of items in the domain. We first use the Page Rank model to make a bipartite graph network between entries and editors. We found that Page Rank was used directly to calculate the ranking of quality items in the convergent state. The quality sort effect is not good. So we change the way of thinking. We use the high quality item to measure the level of the editor, and then realize the quality sort of the item. On this basis, We developed a weighted algorithm based on editing historical features, which can significantly improve the effect of sorting. In this paper, Wikipedia in the network encyclopedia as a typical representative, It is also of great significance to study the quality evaluation of other online encyclopedias and crowdsourcing websites.
【學(xué)位授予單位】:國(guó)防科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.092;TP391.1
,

本文編號(hào):1629332

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1629332.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶1c125***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
久久热九九这里只有精品| 欧美日韩人妻中文一区二区| 日本理论片午夜在线观看| 日韩精品一区二区三区四区| 我的性感妹妹在线观看| 亚洲做性视频在线播放| 狠狠干狠狠操在线播放| 免费观看潮喷到高潮大叫 | 日韩在线视频精品视频| 久久国产精品热爱视频| 色综合久久中文综合网| 不卡一区二区在线视频| 亚洲国产性生活高潮免费视频| 久久亚洲精品成人国产| 欧美三级不卡在线观线看| 激情图日韩精品中文字幕| 激情爱爱一区二区三区| 国产又猛又黄又粗又爽无遮挡 | 日韩欧美第一页在线观看| 蜜臀人妻一区二区三区| 日本一本在线免费福利| 中文字幕五月婷婷免费| 高清免费在线不卡视频| 久久99青青精品免费观看| 一区二区三区日韩经典| 好吊妞视频只有这里有精品| 在线免费观看一二区视频| 91人妻人澡人人爽人人精品| 好吊视频有精品永久免费| 欧美精品女同一区二区| 福利视频一区二区三区| 丝袜人妻夜夜爽一区二区三区| 中文精品人妻一区二区| 亚洲妇女作爱一区二区三区| 伊人网免费在线观看高清版| 亚洲精品欧美精品日韩精品| 亚洲欧美日韩综合在线成成| 色一情一伦一区二区三| 99一级特黄色性生活片| 欧美久久一区二区精品| 殴美女美女大码性淫生活在线播放|