天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 搜索引擎論文 >

面向深層網(wǎng)絡(luò)的查詢規(guī)劃策略的研究

發(fā)布時(shí)間:2018-01-21 08:18

  本文關(guān)鍵詞: 網(wǎng)絡(luò)數(shù)據(jù)庫(kù) 查詢能力 可執(zhí)行查詢規(guī)劃 出處:《哈爾濱工程大學(xué)》2012年碩士論文 論文類型:學(xué)位論文


【摘要】:當(dāng)今,在線數(shù)據(jù)源(又稱為網(wǎng)絡(luò)數(shù)據(jù)庫(kù))越來(lái)越盛行,它們把數(shù)據(jù)隱藏在查詢表單之后,從而形成了所謂的深層網(wǎng)絡(luò),和表層網(wǎng)絡(luò)相比,表層網(wǎng)絡(luò)的HTML頁(yè)面是靜態(tài)的,數(shù)據(jù)存儲(chǔ)在文檔中,而深層網(wǎng)絡(luò)中的數(shù)據(jù)則是存儲(chǔ)在后臺(tái)數(shù)據(jù)庫(kù)中,只有用戶在表單上提交了查詢后,它才生成動(dòng)態(tài)HTML頁(yè)面。根據(jù)BrightPlanet公司的統(tǒng)計(jì)表明,深層網(wǎng)絡(luò)蘊(yùn)含的信息量是表層網(wǎng)絡(luò)的500倍,并且數(shù)量每年仍在飛快地增長(zhǎng),所以研究深層網(wǎng)絡(luò)是必需的而且意義深遠(yuǎn)。由于Web數(shù)據(jù)庫(kù)具有規(guī)模大、自治性、異構(gòu)性、動(dòng)態(tài)性以及不同的數(shù)據(jù)源具有不同有限的查詢能力等特點(diǎn),使得深層網(wǎng)絡(luò)數(shù)據(jù)集成中的查詢處理比傳統(tǒng)的分布環(huán)境下的查詢處理更具挑戰(zhàn)性。為了解決數(shù)據(jù)源的自治異構(gòu)問(wèn)題,本文提出了一種數(shù)據(jù)源的描述方法。 為了統(tǒng)計(jì)每個(gè)領(lǐng)域中屬性詞匯的大小,本文進(jìn)行了一項(xiàng)調(diào)查:使用搜索引擎(例如:Google和bing)和Web目錄(例如:invisibleweb.com),收集了200個(gè)關(guān)于電影、書籍銷售、汽車銷售和音樂(lè)四個(gè)領(lǐng)域的數(shù)據(jù)源,其中每個(gè)領(lǐng)域含50個(gè)。調(diào)查結(jié)果表明:隨著數(shù)據(jù)源的增多,它們的總共詞匯數(shù)量收斂于一個(gè)相對(duì)較小的范圍內(nèi)。受此啟發(fā),為每個(gè)屬性詞匯建立倒排索引。此外,本文還提出了一個(gè)模塊化的方法,,來(lái)為目標(biāo)查詢生成可執(zhí)行的查詢規(guī)劃,它有五個(gè)模塊共同工作完成這些任務(wù):查詢擴(kuò)展、預(yù)處理、查詢重寫、查找相關(guān)數(shù)據(jù)源和生成模塊。本文還設(shè)計(jì)了一種基于倒排索引高效生成邏輯規(guī)劃的算法和一種為邏輯規(guī)劃找出可執(zhí)行次序的算法。 在本文中,因?yàn)閿?shù)據(jù)源存在訪問(wèn)限制,所以沒(méi)有出現(xiàn)在邏輯規(guī)劃中的數(shù)據(jù)源可能提供有用的綁定屬性,可能有利于可執(zhí)行查詢規(guī)劃的生成。此外,我們也表明了這些off-query訪問(wèn)在什么情況下是沒(méi)必要的,以及在這些情況下只使用邏輯規(guī)劃中的數(shù)據(jù)源就可以生成可執(zhí)行的查詢規(guī)劃;也表明了這些off-query訪問(wèn)在什么情況下是必要的,我們提出了一個(gè)算法來(lái)找到和邏輯規(guī)劃相關(guān)的數(shù)據(jù)源。 最后實(shí)驗(yàn)表明本文的算法具有良好的效率、準(zhǔn)確率和擴(kuò)展性。
[Abstract]:Today, online data sources (also known as network databases) are becoming more and more popular, they hide data behind the query form, thus forming a so-called deep network, compared with the surface network. The HTML page of the surface network is static, the data is stored in the document, while the data in the deep network is stored in the background database, only after the user has submitted the query on the form. It generates dynamic HTML pages. According to BrightPlanet, deep networks contain 500 times as much information as surface networks and continue to grow rapidly each year. Therefore, it is necessary and far-reaching to study the deep network. Because Web database has the characteristics of large scale, autonomy, heterogeneity, dynamic and different data sources have different limited query ability and so on. The query processing in deep network data integration is more challenging than that in the traditional distributed environment. In order to solve the problem of autonomous heterogeneity of data sources, a description method of data sources is proposed in this paper. In order to measure the size of attribute vocabulary in each domain. This article conducted a survey using search engines (e.g.: Google and bing) and the Web directory (e.g.: invisibleweb.com). Collected 200 data sources on film, book sales, car sales and music, with 50 in each. The results show that: as data sources increase. Their total number of words converges to a relatively small range. Inspired by this, an inverted index is established for each attribute vocabulary. In addition, this paper proposes a modularization method. It has five modules working together to complete these tasks: query expansion, preprocessing, query rewriting. This paper also designs an efficient algorithm for generating logical programming based on inverted index and an algorithm for finding executable order for logic programming. In this article, data sources that do not appear in logical planning may provide useful binding properties that may facilitate the generation of executable query planning because of access restrictions to the data source. We have also shown where these off-query access is not necessary and where only the data sources in the logical planning can be used to generate executable query planning; We also show that these off-query access is necessary under what circumstances, we propose an algorithm to find the data source related to logical programming. Finally, experiments show that the algorithm has good efficiency, accuracy and expansibility.
【學(xué)位授予單位】:哈爾濱工程大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文 前3條

1 宋暉,張嶺,葉允明,馬范援;基于標(biāo)記樹對(duì)象抽取技術(shù)的Hidden Web獲取研究[J];計(jì)算機(jī)工程與應(yīng)用;2002年23期

2 劉偉;孟小峰;孟衛(wèi)一;;Deep Web數(shù)據(jù)集成研究綜述[J];計(jì)算機(jī)學(xué)報(bào);2007年09期

3 鄭冬冬,趙朋朋,崔志明;Deep Web爬蟲研究與設(shè)計(jì)[J];清華大學(xué)學(xué)報(bào)(自然科學(xué)版);2005年S1期



本文編號(hào):1450959

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1450959.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶009d8***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com