大規(guī)模社會(huì)網(wǎng)絡(luò)中影響最大化問題高效處理技術(shù)研究

發(fā)布時(shí)間：2019-06-22 16:25

【摘要】：近年來，隨著互聯(lián)網(wǎng)和Web2.0技術(shù)的飛速發(fā)展，社會(huì)網(wǎng)絡(luò)作為溝通現(xiàn)實(shí)人類世界的橋梁，已經(jīng)成為交互溝通、知識(shí)共享和信息傳播的重要媒介和平臺(tái)。其中，影響最大化問題旨在發(fā)現(xiàn)社會(huì)網(wǎng)絡(luò)中最有影響力的節(jié)點(diǎn)集合，是社會(huì)網(wǎng)絡(luò)分析領(lǐng)域的關(guān)鍵問題，在許多重要場景中有著廣泛的應(yīng)用，例如市場營銷、廣告發(fā)布、輿情預(yù)警、水質(zhì)監(jiān)測、疫情監(jiān)控等，因此具有很高的研究價(jià)值和應(yīng)用價(jià)值。許多影響最大化應(yīng)用策略的制定和部署對(duì)于算法求解時(shí)間十分敏感，因此，高效的求解算法是當(dāng)前學(xué)術(shù)界和工業(yè)界研究影響最大化算法的核心目標(biāo)。已有的研究成果主要集中于一些貪心算法和啟發(fā)式算法，存在求解速度慢、計(jì)算效率低的問題。另一方面，當(dāng)今社會(huì)網(wǎng)絡(luò)的數(shù)據(jù)規(guī)模海量、數(shù)據(jù)耦合度高、網(wǎng)絡(luò)結(jié)構(gòu)動(dòng)態(tài)變化。當(dāng)面對(duì)大規(guī)模社會(huì)網(wǎng)絡(luò)時(shí)，已有算法暴露出許多難以克服的問題：第一，社會(huì)網(wǎng)絡(luò)中節(jié)點(diǎn)影響值計(jì)算的可并行性問題。已有工作專注于降低算法的復(fù)雜度，沒有充分利用已有的并行計(jì)算架構(gòu)來加速問題求解。而實(shí)際社會(huì)網(wǎng)絡(luò)中存在大量的節(jié)點(diǎn)影響值計(jì)算可由并行計(jì)算架構(gòu)并發(fā)執(zhí)行。因此，在挖掘算法的并行性方面，影響最大化算法的執(zhí)行速度仍有很大的提升空間。第二，已有影響最大化算法未充分考慮社會(huì)網(wǎng)絡(luò)節(jié)點(diǎn)分布特性。社會(huì)網(wǎng)絡(luò)中節(jié)點(diǎn)的度分布服從典型的冪律分布。然而，現(xiàn)有貪心算法大多采用精確計(jì)算的方式來計(jì)算所有節(jié)點(diǎn)的影響值，導(dǎo)致大度數(shù)節(jié)點(diǎn)的計(jì)算復(fù)雜度十分高，成為算法執(zhí)行的瓶頸。第三，社會(huì)網(wǎng)絡(luò)拓?fù)浣Y(jié)構(gòu)的動(dòng)態(tài)變化問題�，F(xiàn)有影響最大化問題研究專注于靜態(tài)網(wǎng)絡(luò)；當(dāng)網(wǎng)絡(luò)動(dòng)態(tài)變化時(shí)，大都需要針對(duì)全網(wǎng)進(jìn)行重新計(jì)算節(jié)點(diǎn)的影響值，會(huì)造成大量冗余計(jì)算，導(dǎo)致性能無法滿足大規(guī)模社會(huì)網(wǎng)絡(luò)的需求。針對(duì)上述技術(shù)瓶頸，本文系統(tǒng)地研究了社會(huì)網(wǎng)絡(luò)影響最大化問題的高效處理技術(shù)，從以下幾個(gè)方面展開研究：針對(duì)現(xiàn)有方法并行度差、算法復(fù)雜度高，從而導(dǎo)致運(yùn)行時(shí)間過長的問題，本文基于CPU+GPU的異構(gòu)并行計(jì)算框架，設(shè)計(jì)和實(shí)現(xiàn)了一種具有高并行度的影響最大化算法BUTA，并針對(duì)GPU體系結(jié)構(gòu)做了進(jìn)一步算法優(yōu)化。本文通過深入分析社會(huì)網(wǎng)絡(luò)中節(jié)點(diǎn)之間的層次依賴關(guān)系，發(fā)現(xiàn)了節(jié)點(diǎn)影響值計(jì)算的可并行性。在此基礎(chǔ)上，設(shè)計(jì)了一種自底向上的逐層掃描方法BUTA。BUTA算法一方面可以在保證算法精度的同時(shí)大幅度降低算法復(fù)雜度，另一方面BUTA充分利用了節(jié)點(diǎn)的層次分布，以高并行度計(jì)算節(jié)點(diǎn)的影響值。為了使BUTA算法更加適配CPU+GPU的異構(gòu)并行計(jì)算框架，本文設(shè)計(jì)了三種優(yōu)化方法：K層合并、數(shù)據(jù)重組和合并訪存，分別用于降低運(yùn)行時(shí)分支，減少訪存次數(shù)和提高算法并行度。針對(duì)已有影響最大化算法未充分利用社會(huì)網(wǎng)絡(luò)節(jié)點(diǎn)分布特性的問題，本文提出了一種基于蒙特卡洛理論的采樣估計(jì)算法ESMCE，大幅度提升了計(jì)算效率。本文對(duì)社會(huì)網(wǎng)絡(luò)中節(jié)點(diǎn)的分布特性進(jìn)行了建模和挖掘；針對(duì)大度數(shù)節(jié)點(diǎn)計(jì)算時(shí)間長的問題，本文引入蒙特卡洛理論，設(shè)計(jì)了一種節(jié)點(diǎn)影響值估計(jì)方法ESMCE。在采樣過程中，ESMCE算法設(shè)計(jì)了一種由冪律指數(shù)指導(dǎo)的采樣節(jié)點(diǎn)個(gè)數(shù)計(jì)算方法。之后，根據(jù)估計(jì)誤差同精度要求之間的差距，，本文提出了一種基于灰度預(yù)測模型的后續(xù)采樣節(jié)點(diǎn)個(gè)數(shù)預(yù)測方法，以通過多次迭代采樣來提高算法精度直至采樣誤差滿足設(shè)定的精度要求。針對(duì)社會(huì)網(wǎng)絡(luò)拓?fù)鋭?dòng)態(tài)變化造成的已有算法計(jì)算效率低的問題，本文設(shè)計(jì)了一種增量式的影響最大化算法IncInf。本文深入分析了社會(huì)網(wǎng)絡(luò)拓?fù)浣Y(jié)構(gòu)的演化特征，發(fā)現(xiàn)社會(huì)網(wǎng)絡(luò)的拓?fù)渥兓瘽M足優(yōu)先連接原則，同時(shí)最有影響力節(jié)點(diǎn)的度數(shù)要明顯大于普通節(jié)點(diǎn)。基于上述發(fā)現(xiàn)，本文設(shè)計(jì)了一種基于局部化理論的影響變化量高效計(jì)算方法�；诠�(jié)點(diǎn)的影響變化量和原有網(wǎng)絡(luò)對(duì)應(yīng)的最有影響力節(jié)點(diǎn)信息，設(shè)計(jì)了一種剪枝策略，將候選節(jié)點(diǎn)范圍有效縮小到影響值增長迅速、度排序靠前的節(jié)點(diǎn)集合，從而大幅度降低了動(dòng)態(tài)社會(huì)網(wǎng)絡(luò)影響最大化求解的復(fù)雜度，減少了程序運(yùn)行時(shí)間。針對(duì)當(dāng)前內(nèi)容分發(fā)方法忽略了社會(huì)網(wǎng)絡(luò)中的用戶關(guān)聯(lián)關(guān)系、地理位置等社會(huì)信息，從而導(dǎo)致用戶訪問延遲高的問題，本文設(shè)計(jì)了一種基于影響最大化的內(nèi)容分發(fā)方法SCORE。同已有的內(nèi)容分發(fā)方法不同，SCORE方法充分利用了社會(huì)網(wǎng)絡(luò)中的用戶信息，提出了一種基于影響最大化算法的緩存內(nèi)容選擇策略以快速準(zhǔn)確地定位未來訪問頻率較高的關(guān)鍵內(nèi)容。為了最小化訪問延遲，SCORE方法通過挖掘用戶之間的關(guān)聯(lián)關(guān)系和地理位置信息，設(shè)計(jì)了一種基于K-MEANS聚類算法和加權(quán)球面平均計(jì)算方法的邊緣服務(wù)器選擇策略，從而將關(guān)鍵內(nèi)容預(yù)先分發(fā)到離潛在訪問用戶最近的邊緣服務(wù)器，以便于就近響應(yīng)用戶請(qǐng)求。實(shí)驗(yàn)結(jié)果表明，SCORE方法可以大幅度降低用戶訪問延遲，提升用戶體驗(yàn)質(zhì)量。綜上所述，本文針對(duì)社會(huì)網(wǎng)絡(luò)影響最大化問題的高效處理技術(shù)提出了有效的解決方案，并通過在真實(shí)數(shù)據(jù)集上進(jìn)行實(shí)驗(yàn)驗(yàn)證了所提算法的有效性，對(duì)于推進(jìn)社會(huì)網(wǎng)絡(luò)影響最大化問題的研究和實(shí)用化具有一定的理論意義和應(yīng)用價(jià)值。
[Abstract]:In recent years, with the rapid development of the Internet and the Web2.0 technology, the social network, as a bridge for communication with the real human world, has become an important medium and platform for interactive communication, knowledge sharing and information dissemination. Among them, the purpose of maximizing the problem is to find the most influential set of nodes in the social network, which is the key problem in the field of social network analysis, and has wide application in many important scenarios, such as marketing, advertisement release, public opinion warning, water quality monitoring, epidemic monitoring and the like. And therefore has high research value and application value. The development and deployment of many impact-maximizing application strategies are very sensitive to the algorithm's solution time, so the efficient solution algorithm is the core of the current research and influence maximizing algorithm in the academic and industry. The existing research results are mainly focused on some greedy algorithms and heuristic algorithms. On the other hand, the data of the present social network is massive, the data coupling degree is high, the network structure changes dynamically, In the face of the large-scale social network, the existing algorithms have exposed many problems which are difficult to overcome: the parallelizable question of the calculation of the influence value of the nodes in the first and the social networks The existing work is focused on reducing the complexity of the algorithm, and the existing parallel computing architecture is not fully utilized to speed up the problem. A large number of node-affected values in the actual social network can be executed by the parallel computing architecture. So, in the aspect of the parallelism of the mining algorithm, the execution speed of the influence maximizing algorithm is still greatly improved. the second, the existing effect maximization algorithm does not take full account of the social network node distribution The distribution of the nodes in the social network obeys the typical power law. However, most of the existing greedy algorithms are used to calculate the influence value of all the nodes in a precise way, which leads to the high computational complexity of the large-degree nodes and becomes a bottle to be executed by the algorithm. The Dynamic Change of the Structure of the Third and the Social Network Problem: The existing research on maximizing the problem is focused on the static network; when the network changes dynamically, it is necessary to re-compute the influence value of the node for the whole network, resulting in a large amount of redundant calculation, which can lead to the performance of the large-scale social network In view of the above-mentioned technical bottleneck, this paper systematically studies the high-efficiency processing technology of the problem of maximizing the influence of social network, from the following aspects: Open study: The algorithm complexity is high for the existing method, resulting in the running time In this paper, based on the heterogeneous parallel computing framework of CPU and GPU, this paper designs and implements a high-degree-of-parallelism impact-maximizing algorithm, BUTA, and advances in to the GPU architecture. In this paper, the hierarchical dependency of nodes in the social network is analyzed, and the calculation of the influence of the nodes is found. On the basis of this, a self-base layer-by-layer scanning method, BUTA. BUTA, is designed. On the one hand, the algorithm complexity can be greatly reduced while the accuracy of the algorithm is guaranteed. On the other hand, BUTA makes full use of the hierarchical distribution of the nodes and calculates the nodes with high degree of parallelism. In order to make the BUTA algorithm more suitable for the heterogeneous parallel computing framework of the CPU + GPU, the paper designs three optimization methods: K-layer combination, data recombination and combined access, which are used to reduce the running-time branch, reduce the number of visits and the increase. This paper presents a sampling estimation algorithm based on Monte-Carlo theory, which is based on the Monte-Carlo theory, which has not fully utilized the distribution characteristics of the social network nodes. The calculation efficiency is improved. The distribution characteristics of the nodes in the social network are modeled and mined, and the problem of the long-time calculation time of the large-degree nodes is solved. In this paper, the Monte-Carlo theory is introduced, and a kind of node influence value estimation is designed. Method ESMCE. In the process of sampling, the ESMCE algorithm designs a sampling section that is guided by the power law index. In this paper, based on the difference between the estimation error and the precision requirement, a method for predicting the number of subsequent sampling nodes based on the gray-scale prediction model is presented in this paper to improve the accuracy of the algorithm until the sampling error is satisfied by multi-iterative sampling. In order to solve the problem of the low efficiency of the existing algorithm caused by the dynamic change of the social network topology, this paper designs an incremental influence to the maximum. In this paper, the evolution of the social network topology is deeply analyzed, and the topology change of the social network is found to meet the principle of priority and the degree of most influential nodes It is obviously larger than the common node. Based on the above-mentioned findings, this paper designs a kind of effect based on the local theory the invention designs a pruning strategy based on the influence change amount of the node and the most influential node information corresponding to the original network, the pre-ordering node set is sorted, so that the complexity of the dynamic social network influence maximization solution is greatly reduced, The time of program execution is reduced. In view of the fact that the current content distribution method ignores the social information such as the user-related relation and the geographical position in the social network, the problem that the user's access delay is high is caused, and a kind of problem based on the influence is designed in this paper. The content distribution method SCORE is different from the existing content distribution method. The SCORE method makes full use of the user information in the social network, and proposes a cache content selection strategy based on the influence maximization algorithm to quickly and accurately locate the future. In order to minimize the access delay, the SCORE method designs a K-MEANS clustering algorithm and a weighted spherical average computing party based on the association relationship and the geographic location information between the users in order to minimize the access delay the edge server selection policy of the method thereby pre-distributing the key content to the edge server closest to the potential access user, so as to respond to the user's request nearby. The experiment results show that the SCORE method can greatly reduce the user access To sum up, this paper puts forward a valid solution for the efficient processing technology of the problem of maximizing the influence of social network, and it is made on the real data set In this paper, the effectiveness of the proposed algorithm is verified, and the research and practical tools for maximizing the influence of the social network
【學(xué)位授予單位】：國防科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】：博士
【學(xué)位授予年份】：2013
【分類號(hào)】：TP393.02;TP301.6

【參考文獻(xiàn)】

相關(guān)期刊論文前4條

1 郎君;秦兵;宋巍;劉龍;劉挺;李生;;基于社會(huì)網(wǎng)絡(luò)的人名檢索結(jié)果重名消解[J];計(jì)算機(jī)學(xué)報(bào);2009年07期

2 田家堂;王軼彤;馮小軍;;一種新型的社會(huì)網(wǎng)絡(luò)影響最大化算法[J];計(jì)算機(jī)學(xué)報(bào);2011年10期

3 劉思峰;灰色系統(tǒng)理論的產(chǎn)生與發(fā)展[J];南京航空航天大學(xué)學(xué)報(bào);2004年02期

4 何清華;張世琦;;社會(huì)網(wǎng)絡(luò)分析發(fā)展與工程應(yīng)用研究綜述[J];建設(shè)監(jiān)理;2012年02期

相關(guān)博士學(xué)位論文前5條

1 韓毅;社會(huì)網(wǎng)絡(luò)分析與挖掘的若干關(guān)鍵問題研究[D];國防科學(xué)技術(shù)大學(xué);2011年

2 謝乃明;灰色系統(tǒng)建模技術(shù)研究[D];南京航空航天大學(xué);2008年

3 高霖;社會(huì)網(wǎng)絡(luò)動(dòng)態(tài)性及網(wǎng)絡(luò)環(huán)境中的分布式搜索策略研究[D];中國科學(xué)技術(shù)大學(xué);2009年

4 萬懷宇;社會(huì)網(wǎng)絡(luò)中基于鏈接的分類問題研究[D];北京交通大學(xué);2012年

5 曾波;灰色預(yù)測建模技術(shù)研究[D];南京航空航天大學(xué);2012年

本文編號(hào)：2504761

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/wenyilunwen/guanggaoshejilunwen/2504761.html

上一篇：論我國廣告審查法律制度的缺欠與完善
下一篇：走中國當(dāng)代工業(yè)設(shè)計(jì)之路

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

大規(guī)模社會(huì)網(wǎng)絡(luò)中影響最大化問題高效處理技術(shù)研究