關(guān)于“中文網(wǎng)頁自動(dòng)分類競(jìng)賽”結(jié)果的分析
發(fā)布時(shí)間:2018-06-29 11:15
本文選題:計(jì)算機(jī)應(yīng)用 + 中文信息處理; 參考:《中文信息學(xué)報(bào)》2003年05期
【摘要】:在最近召開的"全國搜索引擎與網(wǎng)上信息挖掘?qū)W術(shù)研討會(huì)"上,舉辦了一場(chǎng)"中文網(wǎng)頁自動(dòng)分類競(jìng)賽",共有來自全國各地的10個(gè)隊(duì)參加。本文在介紹本次競(jìng)賽活動(dòng)規(guī)則和過程的基礎(chǔ)上,詳細(xì)分析了競(jìng)賽的結(jié)果,從而使我們對(duì)于目前中文網(wǎng)頁自動(dòng)分類技術(shù)的現(xiàn)狀有了一種具體的認(rèn)識(shí):目前已有分類器的性能沒有呈現(xiàn)出明顯的差距,中文網(wǎng)頁的分類比普通文本的分類要困難的多。同時(shí),本文還嘗試推出一個(gè)標(biāo)準(zhǔn)的中文網(wǎng)頁分類的實(shí)例樣本集,希望通過不斷完善,最終作為中文網(wǎng)頁分類技術(shù)研究的基本語料。
[Abstract]:At the "National Symposium on search engines and online Information Mining", a "Chinese Page automatic Classification Competition" was held, involving 10 teams from all over the country. On the basis of introducing the rules and process of the competition, this paper analyzes the results of the competition in detail. So that we have a specific understanding of the current situation of Chinese web page automatic classification technology: the performance of the existing classifiers has not shown a clear gap, the classification of Chinese web pages is much more difficult than the ordinary text classification. At the same time, this paper also tries to develop a standard sample set of Chinese web page classification, hoping that it can be used as the basic language data for the research of Chinese web page classification technology.
【作者單位】: 北京大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)系 北京大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)系
【基金】:國家973重大基礎(chǔ)研究項(xiàng)目資助(G1999032706)
【分類號(hào)】:TP393.09
,
本文編號(hào):2081934
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2081934.html
最近更新
教材專著