大規(guī)模網(wǎng)絡(luò)中抽樣策略與應(yīng)用研究
發(fā)布時間:2018-06-21 06:03
本文選題:復(fù)雜網(wǎng)絡(luò) + 網(wǎng)絡(luò)抽樣; 參考:《電子科技大學(xué)》2015年碩士論文
【摘要】:在復(fù)雜網(wǎng)絡(luò)科學(xué)領(lǐng)域,人們通過對復(fù)雜網(wǎng)絡(luò)的靜態(tài)性質(zhì)和動態(tài)特征的研究,提出了很多刻畫復(fù)雜網(wǎng)絡(luò)的指標(biāo)和分析計(jì)算方法。然而,隨著網(wǎng)絡(luò)規(guī)模的不斷增長,這其中有很多算法由于時空復(fù)雜度過高而不再適用于大規(guī)模網(wǎng)絡(luò)上的計(jì)算。解決這個問題的方法之一便是利用網(wǎng)絡(luò)抽樣,在抽樣數(shù)據(jù)集上以較低的資源消耗獲得近似解。然而,現(xiàn)有的抽樣算法大部分是基于單階段抽樣框架,往往難以精確刻畫具有多重分布特性的復(fù)雜網(wǎng)絡(luò)樣本,無法滿足具體領(lǐng)域研究的需要。本文在國內(nèi)外相關(guān)研究基礎(chǔ)上,針對大規(guī)模網(wǎng)絡(luò)中的最短路徑計(jì)算和大規(guī)模社交網(wǎng)絡(luò)中的二元關(guān)系預(yù)測提出了網(wǎng)絡(luò)抽樣策略并設(shè)計(jì)了相應(yīng)的快速算法。最短路徑是很多網(wǎng)絡(luò)分析算法的基礎(chǔ),然而經(jīng)典算法由于復(fù)雜度過高,往往無法處理大規(guī)模的網(wǎng)絡(luò)。我們在大規(guī)模網(wǎng)絡(luò)的最短路徑計(jì)算中引入了中心點(diǎn)抽樣,提出了有效計(jì)算最短路徑的近似算法。本算法利用復(fù)雜網(wǎng)絡(luò)中普遍存在的等級結(jié)構(gòu),抽取網(wǎng)絡(luò)中的若干中心節(jié)點(diǎn)并與周圍的鄰居節(jié)點(diǎn)進(jìn)行凝聚形成超級節(jié)點(diǎn),構(gòu)成新的上層網(wǎng)絡(luò),并在上層網(wǎng)絡(luò)上重復(fù)中心節(jié)點(diǎn)與鄰居節(jié)點(diǎn)的凝聚過程,直到最上層網(wǎng)絡(luò)中的節(jié)點(diǎn)個數(shù)低于某一閾值時結(jié)束。最后利用Dijkstra算法精確地計(jì)算最上層網(wǎng)絡(luò)中節(jié)點(diǎn)間的最短路徑,并根據(jù)凝聚得到的等級網(wǎng)絡(luò)結(jié)構(gòu)來估計(jì)原始網(wǎng)絡(luò)中節(jié)點(diǎn)之間的最短距離。實(shí)驗(yàn)結(jié)果表明我們的算法在精度和效率上都明顯優(yōu)于當(dāng)前現(xiàn)有的算法。在大規(guī)模社交網(wǎng)絡(luò)中,用戶之間的二元關(guān)系對社交網(wǎng)絡(luò)分析有著很高價值。然而,隨著網(wǎng)絡(luò)規(guī)模的不斷增加,用戶之間的關(guān)系越來越復(fù)雜,也越來越隱蔽,如何高效準(zhǔn)確地推斷用戶之間的二元關(guān)系逐漸成為現(xiàn)在的研究熱點(diǎn)。在本文中,我們基于社會心理學(xué)理論提出了一種適合于大規(guī)模網(wǎng)絡(luò)二元關(guān)系預(yù)測的算法。我們設(shè)計(jì)了分段訓(xùn)練框架將訓(xùn)練集分為兩個子集,并在每個子集上分別訓(xùn)練SVM分類器。為了應(yīng)對大規(guī)模訓(xùn)練數(shù)據(jù)帶來的算法效率下降問題,我們引入了有效的抽樣策略,利用不到原網(wǎng)絡(luò)1%的數(shù)據(jù)樣本,構(gòu)建了具有高預(yù)測準(zhǔn)確率的模型。我們與現(xiàn)有的算法以及它們的集成算法(AdaBoost)進(jìn)行了比較,在公共標(biāo)準(zhǔn)數(shù)據(jù)集上的實(shí)驗(yàn)結(jié)果表明,我們的算法在大規(guī)模社交網(wǎng)絡(luò)上的預(yù)測準(zhǔn)確率更高。
[Abstract]:In the field of complex network science, through the study of static and dynamic characteristics of complex networks, many indexes and analytical methods are proposed to describe complex networks. However, with the increasing of network size, many of these algorithms are no longer suitable for large-scale networks due to their high space-time complexity. One of the methods to solve this problem is to use network sampling to obtain approximate solution on the sample data set with lower resource consumption. However, most of the existing sampling algorithms are based on the single-stage sampling frame, and it is often difficult to accurately depict complex network samples with multiple distribution characteristics, which can not meet the needs of the research in specific fields. On the basis of domestic and foreign research, this paper proposes a network sampling strategy and designs a fast algorithm for computing the shortest path in large-scale networks and predicting binary relationships in large-scale social networks. The shortest path is the basis of many network analysis algorithms, but the classical algorithms are often unable to deal with large-scale networks because of their high complexity. The center sampling is introduced into the calculation of the shortest path of the large-scale network, and an approximate algorithm for calculating the shortest path is proposed. Based on the hierarchical structure of complex network, this algorithm extracts some central nodes from the network and condenses with the neighboring nodes to form a super node to form a new upper layer network. The condensation process between central node and neighbor node is repeated on the upper layer network until the number of nodes in the upper layer network is lower than a certain threshold. Finally, Dijkstra algorithm is used to accurately calculate the shortest path between nodes in the upper layer of the network, and the shortest distance between nodes in the original network is estimated according to the hierarchical network structure. The experimental results show that our algorithm is superior to the existing algorithms in accuracy and efficiency. In large-scale social networks, the binary relationship between users is of great value to the analysis of social networks. However, with the increasing of network scale, the relationship between users is becoming more and more complex and covert. How to infer the binary relationship between users efficiently and accurately has gradually become a hot research topic. In this paper, based on the theory of social psychology, we propose an algorithm for predicting binary relationships in large-scale networks. We design a segmented training framework that divides the training set into two subsets and trains SVM classifiers on each subset. In order to deal with the problem of reduced algorithm efficiency brought about by large-scale training data, we introduce an effective sampling strategy and construct a model with high prediction accuracy by using less than 1% of the data samples from the original network. Compared with the existing algorithms and their ensemble algorithms AdaBoost. the experimental results on common standard datasets show that our algorithm has higher prediction accuracy on large-scale social networks.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2015
【分類號】:O157.5
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 唐晉韜;王挺;王戟;;適合復(fù)雜網(wǎng)絡(luò)分析的最短路徑近似算法[J];軟件學(xué)報;2011年10期
,本文編號:2047520
本文鏈接:http://sikaile.net/kejilunwen/yysx/2047520.html
最近更新
教材專著