異質(zhì)網(wǎng)絡(luò)上的自相似性連接算法研究與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-05-30 22:19
本文選題:相似性連接 + 異質(zhì)網(wǎng)絡(luò)。 參考:《復(fù)旦大學(xué)》2013年碩士論文
【摘要】:相似性連接是很多研究問題的基礎(chǔ),不少實(shí)際問題也都可以歸結(jié)為相似性連接。盡管近年來,越來越多的學(xué)者將注意力集中到網(wǎng)絡(luò)數(shù)據(jù)和圖數(shù)據(jù)上,如鏈接預(yù)測、社區(qū)檢測和在線廣告等,目前對于異質(zhì)網(wǎng)絡(luò)中的相似性連接卻鮮有研究。特別地,已有工作都忽略了異質(zhì)網(wǎng)絡(luò)中節(jié)點(diǎn)和關(guān)系的類型以及由此帶來的不同路徑背后隱藏的不同語義信息,所以不能直接用于解決異質(zhì)網(wǎng)絡(luò)的相似性連接問題。 針對異質(zhì)網(wǎng)絡(luò)的異質(zhì)性特點(diǎn),本文提出了一套基于路徑的異質(zhì)網(wǎng)絡(luò)自相似性連接解決方案。給定一條連接路徑,該方案返回目標(biāo)類型中前k個(gè)最相似的節(jié)點(diǎn)對象對。在這一框架下,我們分別研究了精確自相似性連接和近似自相似性連接兩個(gè)具體的問題。在精確自相似性連接中,我們對空間數(shù)據(jù)庫最近點(diǎn)對查找算法PSR進(jìn)行改進(jìn),提出了一種基于R*樹的算法ePS-join。雖然改進(jìn)有效,但由于索引結(jié)構(gòu)R*樹本身的局限,該算法并不能很好地處理高維時(shí)的情況。于是,為了適應(yīng)異質(zhì)網(wǎng)絡(luò)的另兩個(gè)特點(diǎn),即高維性和海量性,我們將精確問題轉(zhuǎn)化成近似問題,以在準(zhǔn)確率和效率之間做下折中。在近似自相似性連接中,我們引入LSH為數(shù)據(jù)集建立索引,提出了基于NBLSH的aPS-join,并利用相似性度量的性質(zhì)設(shè)計(jì)了一種剪枝優(yōu)化策略來進(jìn)一步減小候選集的大小,改進(jìn)后的算法稱為基于BPLSH的aPS-join。 相比同質(zhì)網(wǎng)絡(luò)中的相似性連接算法LS-join,aPS-join能夠結(jié)合異質(zhì)網(wǎng)絡(luò)中的語義信息。真實(shí)數(shù)據(jù)集上的實(shí)驗(yàn)表明,我們的算法不僅具有良好的效率,而且能保證很高的準(zhǔn)確性。
[Abstract]:Similarity join is the foundation of many research problems, and many practical problems can be attributed to similarity join. Although in recent years, more and more scholars focus on network data and graph data, such as link prediction, community detection and online advertising, there is little research on similarity connection in heterogeneous networks. In particular, the existing work has neglected the types of nodes and relationships in heterogeneous networks and the different semantic information hidden behind different paths, so they can not be directly used to solve the problem of similarity connection in heterogeneous networks. According to the heterogeneity characteristics of heterogeneous networks, this paper proposes a path based solution for self-similarity connection of heterogeneous networks. Given a connection path, the scheme returns the first k most similar node object pairs in the target type. In this framework, we study the exact self-similarity connection and the approximate self-similarity connection respectively. In the accurate self-similarity connection, we improve the nearest point search algorithm (PSR) of spatial database, and propose an algorithm based on R* tree (ePS-jointing). Although the improvement is effective, due to the limitation of the index structure R* tree itself, the algorithm can not deal with the high dimensional time very well. Therefore, in order to adapt to the other two characteristics of heterogeneous networks, that is, high-dimensional and magnanimity, we transform the exact problem into an approximate one, so as to make a compromise between accuracy and efficiency. In approximate self-similarity join, we introduce LSH to index data sets, propose aPS-join based on NBLSH, and design a pruning optimization strategy to further reduce the size of candidate set by using the property of similarity measure. The improved algorithm is called aPS-join based on BPLSH. Compared with the similarity join algorithm in homogeneous networks, LS-joinaPS-join can combine semantic information in heterogeneous networks. Experiments on real data sets show that our algorithm not only has good efficiency, but also ensures high accuracy.
【學(xué)位授予單位】:復(fù)旦大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP311.13
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 郝建柏;陳賢富;黃雙福;楊俊;;一種基于模糊近鄰標(biāo)簽傳遞的半監(jiān)督分類算法[J];微電子學(xué)與計(jì)算機(jī);2010年02期
2 余海洋;林琛;陳珂;江弋;鄒權(quán);;Pass-Join-K:多分段匹配的相似性連接算法[J];計(jì)算機(jī)科學(xué)與探索;2013年10期
3 劉雪莉;王宏志;李建中;高宏;;實(shí)體數(shù)據(jù)庫中多相似連接順序選擇策略[J];計(jì)算機(jī)科學(xué)與探索;2012年10期
4 ;[J];;年期
5 ;[J];;年期
6 ;[J];;年期
7 ;[J];;年期
8 ;[J];;年期
9 ;[J];;年期
10 ;[J];;年期
相關(guān)碩士學(xué)位論文 前3條
1 劉雪莉;基于實(shí)體的相似性連接操作的研究[D];哈爾濱工業(yè)大學(xué);2012年
2 周健雯;異質(zhì)網(wǎng)絡(luò)上的自相似性連接算法研究與實(shí)現(xiàn)[D];復(fù)旦大學(xué);2013年
3 徐媛媛;基于MapReduce的相似性連接研究[D];寧波大學(xué);2014年
,本文編號:1957056
本文鏈接:http://sikaile.net/wenyilunwen/guanggaoshejilunwen/1957056.html
最近更新
教材專著