基于本體學(xué)習(xí)的Deep Web語(yǔ)義標(biāo)注關(guān)鍵問(wèn)題研究
本文選題:Deep + Web; 參考:《蘇州大學(xué)》2012年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)的快速發(fā)展以及Web相關(guān)技術(shù)的不斷成熟,Deep Web目前已成為人們獲取信息的重要來(lái)源,為了使用戶可以快速、準(zhǔn)確、方便地獲取所需的Deep Web資源,Deep Web信息集成技術(shù)已經(jīng)成為該領(lǐng)域研究的熱點(diǎn)問(wèn)題。 Deep Web結(jié)果語(yǔ)義標(biāo)注是Deep Web信息集成系統(tǒng)中的重要階段,而Deep Web查詢接口模式的準(zhǔn)確抽取又是語(yǔ)義標(biāo)注的基礎(chǔ)。因此,本文分別對(duì)Deep Web查詢接口模式抽取和語(yǔ)義標(biāo)注進(jìn)行了深入的研究,并將本體引入標(biāo)注過(guò)程,在此基礎(chǔ)上設(shè)計(jì)并構(gòu)建一個(gè)面向Deep Web的搜索引擎原型系統(tǒng)。本文的主要研究工作如下: (1)對(duì)Deep Web信息集成系統(tǒng)框架和Deep Web語(yǔ)義標(biāo)注國(guó)內(nèi)外研究現(xiàn)狀進(jìn)行介紹,分析傳統(tǒng)語(yǔ)義標(biāo)注方法的缺點(diǎn)和不足。簡(jiǎn)要介紹本體的概念與作用以及本文所使用的Deep Web領(lǐng)域本體的構(gòu)建原則和學(xué)習(xí)方法。 (2)提出一種基于分層模型的Deep Web查詢接口模式抽取方法,解決現(xiàn)有接口模式抽取方法中忽略查詢接口內(nèi)部結(jié)構(gòu)與語(yǔ)義關(guān)系的問(wèn)題。該方法首先挖掘查詢接口元素的頁(yè)面布局特征,使用基于擴(kuò)展的層次聚類方法抽取接口模式樹(shù),其次利用控件與標(biāo)簽的位置及語(yǔ)義關(guān)系為模式樹(shù)各節(jié)點(diǎn)匹配語(yǔ)義描述標(biāo)簽。 (3)提出一種基于本體的Deep Web語(yǔ)義標(biāo)注方法,解決傳統(tǒng)方法中標(biāo)注能力不足和標(biāo)注結(jié)果不統(tǒng)一的問(wèn)題。首先對(duì)數(shù)據(jù)單元進(jìn)行對(duì)齊分組,然后采用多種基本標(biāo)注器對(duì)分組進(jìn)行組合標(biāo)注;之后建立結(jié)果模式與本體間的映射得到完整且統(tǒng)一的標(biāo)注結(jié)果;最后對(duì)同一領(lǐng)域內(nèi)的不同數(shù)據(jù)源進(jìn)行交叉標(biāo)注驗(yàn)證。 (4)設(shè)計(jì)并實(shí)現(xiàn)一個(gè)面向圖書(shū)電子商務(wù)領(lǐng)域的Deep Web搜索引擎原型系統(tǒng)。 本文采用UIUC提供的數(shù)據(jù)集分別對(duì)文中提出的解決方法進(jìn)行實(shí)驗(yàn),,通過(guò)對(duì)實(shí)驗(yàn)結(jié)果的分析驗(yàn)證了本文提出的方法是可行且有效的。
[Abstract]:With the rapid development of the Internet and the continuous maturity of Web related technology, Deep Web has become an important source of information for people to obtain, in order to enable users to quickly and accurately, It has become a hot issue in this field to obtain the Deep Web resource and deep Web information integration technology conveniently. Semantic annotation of Deep Web results is an important stage in Deep Web information integration system, and the accurate extraction of Deep Web query interface pattern is the basis of semantic annotation. Therefore, in this paper, Deep Web query interface pattern extraction and semantic annotation are studied in depth, and ontology is introduced into the annotation process. On this basis, a Deep Web oriented search engine prototype system is designed and constructed. The main work of this paper is as follows: This paper introduces the framework of Deep Web information integration system and the current research status of Deep Web semantic annotation at home and abroad, and analyzes the shortcomings and shortcomings of traditional semantic annotation methods. This paper briefly introduces the concept and function of ontology, as well as the construction principles and learning methods of Deep Web domain ontology used in this paper. A hierarchical model based Deep Web query interface pattern extraction method is proposed to solve the problem of ignoring the relationship between the structure and semantics of the query interface in the existing interface pattern extraction methods. Firstly, the page layout features of the query interface elements are mined, and the interface pattern tree is extracted by using the extended hierarchical clustering method. Secondly, the semantic relation between the control and the tag is used to match the semantic description labels for each node of the pattern tree. In this paper, we propose an ontology-based Deep Web semantic annotation method to solve the problems of insufficient tagging ability and inconsistent annotation results in traditional methods. Firstly, the data units are grouped in alignment, then the grouping is labeled by a variety of basic annotators, and then the mapping between the result pattern and the ontology is established to obtain a complete and uniform annotation result. Finally, cross-tagging validation of different data sources in the same domain is carried out. Design and implement a Deep Web search engine prototype system for book e-commerce. In this paper, we use the data set provided by UIUC to carry out experiments on the proposed solution, and the analysis of the experimental results shows that the proposed method is feasible and effective.
【學(xué)位授予單位】:蘇州大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP391.1;TP393.4
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 鄧志鴻,唐世渭,張銘,楊冬青,陳捷;Ontology研究綜述[J];北京大學(xué)學(xué)報(bào)(自然科學(xué)版);2002年05期
2 馬安香;高克寧;張曉紅;張斌;;基于CPN網(wǎng)絡(luò)的Deep Web數(shù)據(jù)語(yǔ)義標(biāo)注[J];東北大學(xué)學(xué)報(bào)(自然科學(xué)版);2009年06期
3 李向陽(yáng);張亞非;;一種基于遺傳算法的語(yǔ)義標(biāo)注[J];電子科技大學(xué)學(xué)報(bào);2007年01期
4 王治綱;孫小林;文坤梅;李玉華;;基于優(yōu)先級(jí)的超協(xié)調(diào)本體推理研究[J];計(jì)算機(jī)科學(xué);2007年04期
5 楊曉琴;鞠時(shí)光;曹慶皇;王秀紅;;基于包裝器的Deep Web自動(dòng)語(yǔ)義標(biāo)注[J];計(jì)算機(jī)工程;2010年12期
6 劉偉;孟小峰;孟衛(wèi)一;;Deep Web數(shù)據(jù)集成研究綜述[J];計(jì)算機(jī)學(xué)報(bào);2007年09期
7 崔曉軍;彭智勇;曾承;;基于多標(biāo)注源的Deep Web查詢結(jié)果自動(dòng)標(biāo)注[J];計(jì)算機(jī)應(yīng)用;2009年01期
8 鄭冬冬,趙朋朋,崔志明;Deep Web爬蟲(chóng)研究與設(shè)計(jì)[J];清華大學(xué)學(xué)報(bào)(自然科學(xué)版);2005年S1期
9 劉偉;孟小峰;凌妍妍;;一種基于圖模型的Web數(shù)據(jù)庫(kù)采樣方法[J];軟件學(xué)報(bào);2008年02期
10 寇月;申德榮;李冬;聶鐵錚;;一種基于語(yǔ)義及統(tǒng)計(jì)分析的Deep Web實(shí)體識(shí)別機(jī)制[J];軟件學(xué)報(bào);2008年02期
相關(guān)碩士學(xué)位論文 前2條
1 李文駿;Deep Web數(shù)據(jù)源發(fā)現(xiàn)和語(yǔ)義標(biāo)注技術(shù)研究[D];蘇州大學(xué);2008年
2 陳洪平;面向Deep Web的數(shù)據(jù)抽取與語(yǔ)義標(biāo)注技術(shù)研究[D];蘇州大學(xué);2010年
本文編號(hào):1877162
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1877162.html