天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 碩博論文 > 信息類博士論文 >

缺失數(shù)據(jù)查詢處理技術(shù)研究

發(fā)布時間:2018-05-14 18:59

  本文選題:缺失數(shù)據(jù) + 不完整/確定數(shù)據(jù); 參考:《浙江大學(xué)》2017年博士論文


【摘要】:隨著計算機、互聯(lián)網(wǎng)、通信以及定位技術(shù)的快速發(fā)展,數(shù)據(jù)呈爆炸式增長。這些數(shù)據(jù)在形態(tài)上具有海量、高維、多源、異構(gòu)、不確定/不完整等特征,使得當(dāng)前相關(guān)技術(shù)很難支撐人們進行復(fù)雜而多樣的智能處理需求。缺失數(shù)據(jù)具有不確定/不完整特征,并廣泛存在于科學(xué)研究和社會生活的各個領(lǐng)域,如通信、交通和經(jīng)濟等領(lǐng)域。查詢是計算機科學(xué)的基本問題,存在于目前幾乎所有的計算機應(yīng)用領(lǐng)域。如何高效、智能地查詢數(shù)據(jù),挖掘有價值的信息,服務(wù)于社會,已成為當(dāng)今信息技術(shù)的重大挑戰(zhàn)。然而,現(xiàn)有的查詢處理技術(shù)大多只關(guān)注完整數(shù)據(jù)。由于缺失數(shù)據(jù)的性質(zhì)與完整數(shù)據(jù)的性質(zhì)存在很大差異,因而缺失數(shù)據(jù)查詢不能(直接)有效地采用傳統(tǒng)的完整數(shù)據(jù)查詢處理方法。為此,本文對缺失數(shù)據(jù)查詢處理技術(shù)進行了深入研究,主要包括:1)不完整數(shù)據(jù)查詢:相似查詢在地理信息系統(tǒng)和多媒體檢索等方面有著廣闊的應(yīng)用前景,F(xiàn)有大多的相似查詢處理方法只針對完整數(shù)據(jù),但在許多的實際應(yīng)用中,數(shù)據(jù)集中的對象可能存在缺失信息(如數(shù)據(jù)對象的某些屬性值缺失等);這些對象之間的鄰近關(guān)系就不能用傳統(tǒng)的相似度計算方式衡量,而需要尋找適用于缺失數(shù)據(jù)對象的新穎的鄰近關(guān)系度量方式。本文開發(fā)了兩種有效的不完整數(shù)據(jù)索引方法,并提出了精確和近似的查詢算法,以支持不完整數(shù)據(jù)k最近鄰查詢問題。此外,偏好查詢在決策支持、個性化服務(wù)、以及推薦系統(tǒng)等方面具有重要的應(yīng)用價值。例如,在電影推薦系統(tǒng)中,針對含有缺失信息的電影評分?jǐn)?shù)據(jù)集,系統(tǒng)可以利用Top-k支配查詢選出前k個觀眾評分最好的電影以推薦給電影愛好者。本文提出了面向不完整數(shù)據(jù)Top-kk支配查詢的skyband剪枝、上界值剪枝、位圖索引技術(shù)以及壓縮技術(shù),并設(shè)計了多種有效的查詢算法,以支持不完整數(shù)據(jù)Top-k支配查詢問題。同時,本文分析了位圖索引壓縮技術(shù)對查詢效率和索引空間成本的影響,并進一步探討了查詢效率和索引空間成本的權(quán)衡問題。2)不確定圖查詢:盡管已有許多專家學(xué)者致力于不確定圖查詢處理技術(shù)研究,并取得了大量可喜的研究成果,但距離滿足不斷出現(xiàn)的、復(fù)雜而多樣的查詢需求還有一定的差距,仍待進一步深入研究。例如,不確定圖反k最近鄰查詢在資源優(yōu)化配置、本質(zhì)蛋白質(zhì)識別等場景具有重大的應(yīng)用前景。然而,尚未有學(xué)者對不確定圖反k最近鄰查詢進行研究。所以,本文探討了不確定圖反k最近鄰查詢問題。本文引入了圖規(guī)模剪枝、等價圖剪枝和概率界剪枝以縮小圖規(guī)模、減少可能圖數(shù)量和縮小候選數(shù)據(jù)節(jié)點數(shù)量。本文除了提出精確算法來處理不確定圖反k最近鄰查詢,還給出了一種新穎的抽樣算法。該算法使用了圖規(guī)模剪枝和自適應(yīng)分層抽樣方法。該抽樣方法被證明是無偏的且其方差不比隨機抽樣方法差。與已有算法相比,本文所提出的精確算法效率平均提高了五個數(shù)量級;而且,抽樣算法比精確算法效率至少提高了三個數(shù)量級。3)概率查詢質(zhì)量優(yōu)化:在許多的實際應(yīng)用中,數(shù)據(jù)帶有不確定性,使得基于這些不確定數(shù)據(jù)的查詢(即概率查詢)往往返回一組不確定的查詢結(jié)果。換句話說,概率查詢返回的結(jié)果包含大量噪聲,這使得查詢質(zhì)量下降。為此,本文提出一種新穎的概率查詢質(zhì)量優(yōu)化框架,旨在給定的預(yù)算內(nèi)選擇出一組最影響查詢質(zhì)量的不確定數(shù)據(jù)對象去清洗,以最大化查詢質(zhì)量。本文結(jié)合一個新開發(fā)的ASI結(jié)構(gòu)加速查詢質(zhì)量計算,并利用兩個有效的剪枝策略提出分支界限法、貪心算法和抽樣算法以支持?jǐn)?shù)據(jù)對象選擇問題。ASI結(jié)構(gòu)被證實比未使用該結(jié)構(gòu)的質(zhì)量計算算法性能提升了兩到三個數(shù)量級。4)缺失數(shù)據(jù)查詢處理系統(tǒng):基于上述的研究成果,本文開發(fā)了一個基于不完整數(shù)據(jù)偏好查詢的餐館推薦系統(tǒng)。該系統(tǒng)考慮了餐廳的不完整評級信息,并在PostgreSQL數(shù)據(jù)庫中集成了不完整數(shù)據(jù)skyline查詢和Top-k支配查詢,主要支持三個功能模塊:友好便捷的查詢提交模塊、靈活實用的結(jié)果解釋模塊、以及增量式的數(shù)據(jù)集即時交互模塊。
[Abstract]:With the rapid development of computer, Internet, communication and positioning technology, the data are increasing explosively. These data have the characteristics of massive, high dimension, multi source, heterogeneous, uncertain / incomplete and so on, which make it difficult to support people to carry out complex and multiple intelligent processing requirements. Missing data is uncertain / incomplete. The whole feature is widely used in all fields of scientific research and social life, such as communications, transportation and economic fields. Inquiry is the basic problem of computer science. It exists in almost all computer applications. How to efficiently, intelligently inquire data, excavate valuable information and serve the society, has become today's information technology. However, most of the existing query processing techniques pay attention to the complete data. Because the nature of the missing data is very different from the nature of the complete data, the missing data query can not (directly) effectively use the traditional complete data query processing method. In this paper, the missing data query processing technology is carried out in this paper. In-depth research includes: 1) incomplete data query: similar query has a broad application prospect in geographic information system and multimedia retrieval. Most of the existing similar query processing methods only aim at the complete data, but in many practical applications, there may be missing information in the objects of the data set (such as a certain data object. " The proximity relations between these objects can not be measured by the traditional similarity calculation method, but we need to find a novel proximity relationship measure suitable for missing data objects. This paper developed two effective methods of incomplete data indexing, and proposed an accurate and approximate query algorithm to support not. Complete data k nearest neighbor query. In addition, preference query has important application value in decision support, personalized service, and recommendation system. For example, in the movie recommendation system, the system can use Top-k dominating query to select the best movies of the former K audience for the movie scoring data set with missing information. This paper proposes skyband pruning, upper bound pruning, bitmap index technology and compression technology for incomplete data Top-kk dominating query, and designs a variety of effective query algorithms to support incomplete data Top-k dominating query problem. At the same time, this paper analyzes the query efficiency of bitmap index compression technology. The impact of rate and index space cost, and further discussion of query efficiency and index space cost tradeoff problem.2) uncertain graph query: Although many experts and scholars have committed to the research of uncertain graph query processing technology, and have achieved a lot of gratifying research results, but the distance meets the continuous, complex and diverse query needs. There is still a certain gap, which still needs to be further studied. For example, the scenario of uncertain graph inverse k nearest neighbor query in resource optimization and essential protein recognition has a great application prospect. However, no scholars have studied the inverse k nearest neighbor query for uncertain graphs. Therefore, this paper discusses the problem of the inverse k nearest neighbor query. In this paper, we introduce a graph scale pruning, equivalent graph pruning and probability boundary pruning to reduce the size of the graph, reduce the number of possible graphs and reduce the number of candidate data nodes. In addition to an accurate algorithm to deal with the inverse k nearest neighbor query, a novel sampling algorithm is given. This algorithm uses the scale pruning and adaptive segmentation of the graph. The sampling method is proved to be unbiased and its variance is not worse than the random sampling method. Compared with the existing algorithms, the efficiency of the proposed algorithm is improved by five orders of magnitude; moreover, the sampling algorithm improves at least three orders of magnitude.3 compared with the exact algorithm efficiency: in many practical cases In application, the data with uncertainty makes the query based on these uncertain data often return a set of uncertain query results. In other words, the result of the return of the probability query contains a lot of noise, which makes the query quality decline. Therefore, a novel probabilistic query quality optimization framework is proposed. In the fixed budget, we select a set of uncertain data objects which most influence the quality of query to maximize the quality of query. This paper combines a newly developed ASI structure to accelerate query quality calculation, and uses two effective pruning strategies to put forward the branch boundary method, greedy algorithm and sampling algorithm to support the.ASI node of the data object selection problem. The structure is proved to improve the missing data query processing system by two to three orders of magnitude.4 than the quality calculation algorithm that does not use the structure. Based on the above research results, this paper develops a restaurant recommendation system based on incomplete data preference query. The system takes into account the incomplete rating information of the dining hall and is in the PostgreSQL data. The library is integrated with incomplete data skyline query and Top-k dominating query. It mainly supports three functional modules: friendly and convenient query submission module, flexible and practical result interpretation module, and incremental data set instant interaction module.

【學(xué)位授予單位】:浙江大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2017
【分類號】:TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 丁曉鋒;金海;趙娜;;支持頻繁位置更新的不確定移動對象索引策略[J];計算機學(xué)報;2012年12期

2 張慧;鄭吉平;韓秋廷;;BTreeU-Topk:基于二叉樹的不確定數(shù)據(jù)上的Top-k查詢算法[J];計算機研究與發(fā)展;2012年10期

3 王意潔;李小勇;祁亞斐;孫偉東;;不確定數(shù)據(jù)查詢技術(shù)研究[J];計算機研究與發(fā)展;2012年07期

4 李文鳳;彭智勇;李德毅;;不確定性Top-K查詢處理[J];軟件學(xué)報;2012年06期

5 孫永佼;袁野;王國仁;;P2P環(huán)境下面向不確定數(shù)據(jù)的Top-k查詢[J];計算機學(xué)報;2011年11期

6 張旭;何向南;金澈清;周傲英;;面向不確定圖的k最近鄰查詢[J];計算機研究與發(fā)展;2011年10期

7 王艷秋;徐傳飛;于戈;谷峪;陳默;;一種面向不確定對象的可見k近鄰查詢算法[J];計算機學(xué)報;2010年10期

8 莊毅;;ISU-Tree:一種支持概率k近鄰查詢的不確定高維索引[J];計算機學(xué)報;2010年10期

9 劉德喜;萬常選;劉喜平;;不確定數(shù)據(jù)庫中基于x-tuple的高效Top-k查詢處理算法[J];計算機研究與發(fā)展;2010年08期

10 周傲英;金澈清;王國仁;李建中;;不確定性數(shù)據(jù)管理技術(shù)研究綜述[J];計算機學(xué)報;2009年01期



本文編號:1889083

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/1889083.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶d78cf***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com