主題自適應(yīng)學(xué)術(shù)會議搜索系統(tǒng)
發(fā)布時間:2018-06-22 02:56
本文選題:學(xué)術(shù)會議搜索 + 支持向量機(jī); 參考:《華中科技大學(xué)》2013年碩士論文
【摘要】:據(jù)不完全統(tǒng)計(jì),每年在世界各地舉辦的國際學(xué)術(shù)會議的數(shù)量達(dá)到了1萬多次,參會人次也有百萬之巨,學(xué)術(shù)交流活動日益頻繁。而且,學(xué)術(shù)會議的種類繁多,特點(diǎn)復(fù)雜,有的是一次性的會議,有的則是系列性的會議。面對數(shù)量龐大的研究者關(guān)于學(xué)術(shù)會議信息檢索的急切需求,主要關(guān)注于文獻(xiàn)檢索的現(xiàn)有學(xué)術(shù)搜索引擎與數(shù)字圖書館已顯得力不從心,難以滿足用戶的檢索要求。 Acrost是一個面向CFP(Call For Papers)的主題自適應(yīng)學(xué)術(shù)會議搜索系統(tǒng),它具有基于主題檢索方式的特點(diǎn),除了提供學(xué)術(shù)會議檢索服務(wù)之外,它還具有投稿推薦特色服務(wù)。為了獲取充足的數(shù)據(jù)源,系統(tǒng)使用了兩種方式:(1)基于通用搜索引擎的方法,節(jié)省了大量的資源開銷,采用支持向量機(jī)分類器過濾噪聲信息;(2)基于向量空間模型的主題爬蟲,定向地爬取學(xué)術(shù)會議網(wǎng)頁。在獲取了原始的學(xué)術(shù)會議網(wǎng)頁之后,利用正則表達(dá)式與條件隨機(jī)場分別對半結(jié)構(gòu)化和非結(jié)構(gòu)化網(wǎng)頁進(jìn)行信息抽取和實(shí)體識別,從而采集學(xué)術(shù)會議元數(shù)據(jù)。然后,利用Lucene對元數(shù)據(jù)建立倒排索引;同時,提出了一種基于增量層次聚類算法的主題發(fā)現(xiàn)方法,對用戶上傳的PDF文檔進(jìn)行解析并自動發(fā)現(xiàn)其所屬主題領(lǐng)域。另外,系統(tǒng)中建立了一套基于學(xué)術(shù)影響因子的學(xué)術(shù)會議評價模型,其考慮的指標(biāo)包括篇均被引用計(jì)數(shù)、論文錄用率等。 實(shí)驗(yàn)結(jié)果表明,Acrost系統(tǒng)的學(xué)術(shù)會議檢索服務(wù)的召回率、準(zhǔn)確率及F度量分別是84.8%、90.5%、87.6%;投稿推薦服務(wù)的召回率、準(zhǔn)確率及F度量分別是60.8%、68.7%、64.5%;同時,Acrost系統(tǒng)能夠快速地響應(yīng)用戶的服務(wù)請求。這表明,Acrost系統(tǒng)在相關(guān)性判定與運(yùn)行速度方面都具備了較好的性能。
[Abstract]:According to incomplete statistics, the number of international academic conferences held in various parts of the world has reached more than 10,000 every year, and the number of participants is over one million, and academic exchange activities are becoming more and more frequent. Moreover, academic conferences are of many kinds and complex characteristics, some are one-off meetings and some are series meetings. Facing the urgent demand of a large number of researchers on the information retrieval of academic conferences, the existing academic search engines and digital libraries, which mainly focus on literature retrieval, have been unable to do so. Acrost is a topic adaptive academic conference search system for CFP (call for papers). It also has the contribution recommendation characteristic service. In order to obtain sufficient data sources, the system uses two ways: (1) the method based on general search engine saves a lot of resource overhead, and adopts support vector machine classifier to filter noise information; (2) the topic crawler based on vector space model. Crawl the academic conference web page in a directed way. After obtaining the original academic conference pages, the regular expression and conditional random field are used to extract information and identify entities from semi-structured and unstructured web pages, respectively, so as to collect the metadata of academic meetings. At the same time, a topic discovery method based on incremental hierarchical clustering algorithm is proposed, which parses the PDF documents uploaded by users and automatically finds the subject areas to which they belong. In addition, a set of academic conference evaluation model based on academic influence factor is established in the system. The indexes considered in the model include the number of references and the employment rate of papers. The experimental results show that the recall rate, accuracy rate and F metric of the academic conference retrieval service in Acrost system are 84.8and 90.5and 87.6respectively, the recall rate, accuracy and F measurement of the contribution recommendation service are 60.888.7and 64.5respectively. At the same time, Acrost system can quickly respond to user's service request. This shows that Acrost system has better performance in relation determination and running speed.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前3條
1 周立柱,林玲;聚焦爬蟲技術(shù)研究綜述[J];計(jì)算機(jī)應(yīng)用;2005年09期
2 劉金紅;陸余良;;主題網(wǎng)絡(luò)爬蟲研究綜述[J];計(jì)算機(jī)應(yīng)用研究;2007年10期
3 諶志群;張國煊;;文本挖掘研究進(jìn)展[J];模式識別與人工智能;2005年01期
,本文編號:2051253
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2051253.html
最近更新
教材專著