基于圖論的復(fù)雜網(wǎng)絡(luò)社團(tuán)挖掘與結(jié)構(gòu)分析
發(fā)布時(shí)間:2017-12-26 20:35
本文關(guān)鍵詞:基于圖論的復(fù)雜網(wǎng)絡(luò)社團(tuán)挖掘與結(jié)構(gòu)分析 出處:《西安電子科技大學(xué)》2016年博士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 圖理論 社團(tuán)挖掘 結(jié)構(gòu)分析 復(fù)雜網(wǎng)絡(luò) 算法
【摘要】:網(wǎng)絡(luò)科學(xué)是當(dāng)前研究現(xiàn)實(shí)世界事物之間關(guān)系的有力工具之一。將復(fù)雜系統(tǒng)建模為復(fù)雜網(wǎng)絡(luò)是進(jìn)一步分析和研究現(xiàn)實(shí)世界對象之間關(guān)系的前提,其中點(diǎn)表示復(fù)雜系統(tǒng)中的對象,邊表示對象之間的關(guān)系。面對建模成的復(fù)雜網(wǎng)絡(luò),迫切的問題是探索網(wǎng)絡(luò)中蘊(yùn)含的信息,從中等尺度角度揭示網(wǎng)絡(luò)的組成及其物理意義,即如何從復(fù)雜網(wǎng)絡(luò)中提取元數(shù)據(jù)組。元數(shù)據(jù)組是一組具有真實(shí)物理意義的節(jié)點(diǎn),可通過實(shí)驗(yàn)驗(yàn)證節(jié)點(diǎn)的各種子集是否具有相同或相似的功能提取元數(shù)據(jù)組。由于材料成本、人工成本、時(shí)間成本等客觀限制,面對當(dāng)前的海量數(shù)據(jù)這種枚舉式的驗(yàn)證是不可能完成的任務(wù)。因此人們將目光轉(zhuǎn)向計(jì)算的方法,具體的流程是先將要研究的復(fù)雜系統(tǒng)建模成復(fù)雜網(wǎng)絡(luò),對其中的元數(shù)據(jù)組進(jìn)行建模如通常建模為社團(tuán)或模塊,并設(shè)計(jì)相應(yīng)的社團(tuán)挖掘算法。通過計(jì)算方法,可方便、快速、低成本地提取復(fù)雜系統(tǒng)中的元數(shù)據(jù)組。本質(zhì)上計(jì)算方法包含兩個(gè)核心科學(xué)問題:首先是如何定義社團(tuán),即根據(jù)網(wǎng)絡(luò)中元數(shù)據(jù)組的特性怎樣描述社團(tuán)的問題,如傳統(tǒng)地將社團(tuán)定義為內(nèi)部邊連接緊密外部邊稀疏的稠密子圖;其次是如何設(shè)計(jì)出高效的社團(tuán)挖掘算法。本文主要基于圖論圍繞社團(tuán)挖掘的兩個(gè)核心科學(xué)問題開展研究,提出了邊反三角中心性并基于此設(shè)計(jì)了社團(tuán)挖掘算法EACH,定義了兩種新型社團(tuán),即Cograph社團(tuán)和2-club社團(tuán),并設(shè)計(jì)了相應(yīng)的挖掘算法EPCA和DIVANC,最后給出2-club社團(tuán)在蛋白質(zhì)相互作用網(wǎng)絡(luò)中功能模塊結(jié)構(gòu)分析中的應(yīng)用。主要內(nèi)容詳述如下:1.針對當(dāng)前經(jīng)典的社團(tuán)挖掘算法主要基于內(nèi)緊外松的社團(tuán)設(shè)計(jì),且較多的考慮如何設(shè)計(jì)衡量社團(tuán)‘內(nèi)緊’的相關(guān)指標(biāo)而對‘外松’關(guān)注不夠的問題。本文從路徑外切的角度提出用來衡量社團(tuán)外松程度的邊反三角中心性,進(jìn)一步基于此設(shè)計(jì)劃分的社團(tuán)挖掘算法EACH。邊反三角中心性定義為該邊參與形成的P4(由4個(gè)節(jié)點(diǎn),3條邊組成的簡單路徑)數(shù)目與該邊參與的可能的P4數(shù)目之比。EACH迭代地刪去邊反三角中心性最高的邊直到當(dāng)前所有邊的邊反三角中心性全部為0,并基于孤立點(diǎn)處理策略將刪邊過程中的孤立點(diǎn)加入到當(dāng)前合適的連通分支中,輸出的連通分支即為社團(tuán)。該算法簡單高效,挖掘的社團(tuán)緊湊,物理意義顯著。2.以稠密子圖為中尺度結(jié)構(gòu)研究復(fù)雜網(wǎng)絡(luò)中元數(shù)據(jù)組時(shí)忽略了元數(shù)據(jù)組內(nèi)部的結(jié)構(gòu),然而其對理解該元數(shù)據(jù)組的功能形成機(jī)制及運(yùn)作模式至關(guān)重要。為了探索元數(shù)據(jù)組各成員之間的內(nèi)部拓?fù)浣Y(jié)構(gòu),定義了具有圖論特征的Cograph社團(tuán),其結(jié)構(gòu)唯一地對應(yīng)著Cotree。基于Cotree,可進(jìn)一步清晰地揭示Cograph社團(tuán)中規(guī)模和層次可變且結(jié)構(gòu)等價(jià)的子結(jié)構(gòu),為用戶提供從微觀尺度到中觀尺度連續(xù)觀察社團(tuán)內(nèi)部結(jié)構(gòu)構(gòu)造的獨(dú)特視角。本文提出邊P4中心性并基于此設(shè)計(jì)了挖掘Cograph社團(tuán)的劃分算法EPCA。邊P4中心性定義為該邊參與形成P4的數(shù)目,由于P4是由4個(gè)節(jié)點(diǎn)3條邊順序相連組成的簡單無環(huán)路徑,所以若邊P4中心性較大,這條邊可能是連接社團(tuán)之間的邊。EPCA迭代地刪去邊P4中心性最高的邊,刪到當(dāng)前網(wǎng)絡(luò)中所有邊的邊P4中心性為0停時(shí)結(jié)束,當(dāng)前連通分支全為Cograph,輸出其作為Cograph社團(tuán)。源于邊P4中心性,EPCA計(jì)算成本低、無參數(shù)、不依賴任何外部的度量。3.面對當(dāng)前復(fù)雜網(wǎng)絡(luò)社團(tuán)挖掘研究中遇到的挑戰(zhàn):內(nèi)緊外松的稠密子圖并不能刻畫元數(shù)據(jù)組的本質(zhì)特征,以及受蛋白質(zhì)相互作用網(wǎng)絡(luò)中的元數(shù)據(jù)組由稠密的相互作用三元組模體組成啟發(fā),本文定義了富含相互作用三元組的2-club社團(tuán)刻畫復(fù)雜網(wǎng)絡(luò)中的元數(shù)據(jù)組。2-club社團(tuán)是連通的2-club,而2-club是直徑為2的子圖;谶呅∩持行男栽O(shè)計(jì)了算法DIVANC挖掘2-club社團(tuán),進(jìn)一步提出簡單的兩步重疊策略將DIVANC擴(kuò)展成可挖掘重疊的2-club社團(tuán)。構(gòu)造邊小生境中心性時(shí)考慮其參與的P4數(shù)目和其參與形成的三角形數(shù)目。邊小生境中心性較大,該邊可能是社團(tuán)之間的邊。DIVANC迭代地刪去邊小生境中心性最高的邊,刪到當(dāng)前網(wǎng)絡(luò)中所有邊的邊小生境中心性為0停時(shí)結(jié)束,當(dāng)前連通分支全為2-club,輸出非重疊2-club社團(tuán),若需要重疊2-club社團(tuán),執(zhí)行兩步重疊策略,得到重疊2-club社團(tuán)。4.透徹地理解PPI網(wǎng)絡(luò)中的功能模塊仍然是當(dāng)前系統(tǒng)生物學(xué)研究中的一個(gè)重要挑戰(zhàn)。針對當(dāng)前大多數(shù)關(guān)于模塊的分析主要集中在如何從整個(gè)PPI網(wǎng)絡(luò)中挖掘具有生物意義的功能模塊,而較少關(guān)注其內(nèi)部結(jié)構(gòu),沒有給出其內(nèi)部結(jié)構(gòu)與功能的關(guān)聯(lián)分析的問題,本文基于圖論分析功能模塊的內(nèi)部結(jié)構(gòu),揭示功能模塊的相關(guān)特征情況,并進(jìn)一步推斷其在整個(gè)PPI網(wǎng)絡(luò)中的功能,給出了理解PPI網(wǎng)絡(luò)中功能模塊組織結(jié)構(gòu)和特性分布的新視角。
[Abstract]:Network science is one of the powerful tools to study the relationship between things in the real world. Modeling complex systems as complex networks is a prerequisite for further analysis and study of relationships between real-world objects, where points represent objects in complex systems, and edges represent relations between objects. Faced with the complex network modeled, the urgent problem is to explore the information contained in the network, and reveal the composition and physical meaning of the network from a moderate scale. That is to say, how to extract metadata from complex networks. Metadata group is a set of nodes that have real physical meaning. It can verify whether all kinds of subset of nodes have the same or similar functions to extract metadata group by experiment. Because of the objective limitations of material cost, labor cost and time and cost, it is impossible to complete the enumerated verification in the face of the current mass data. Therefore, people turn their attention to the calculation method. The specific process is to first model the complex system to be complex network, and model the metadata group, such as usually modeling as a community or module, and design the corresponding community mining algorithm. Through the calculation method, the metadata group in the complex system can be extracted conveniently, fast and low. The essence of calculation method consists of two core scientific problems: the first is how to define the community, according to the characteristics of the network in data set to describe associations, such as the traditional community is defined as the internal side is tightly connected with the external side of the sparse dense sub graph; second is how to design an efficient algorithm for mining association. In this paper, two key scientific problems in graph theory on Community Mining Based on the research, put forward and reverse triangle edge center is designed based on this association mining algorithm EACH, defines two new clubs, namely the Cograph community and the 2-club community, and designed the corresponding mining algorithm EPCA and DIVANC application are presented in the 2-club community the function in the protein interaction network module in structural analysis. The main contents are as follows: 1., aiming at the current classical community mining algorithm, mainly based on tight and loose community design, and much consideration is given to how to design related indicators to measure the "tightness" of the society. From the angle of path cutting, this paper puts forward the edge anti triangle centrality which is used to measure the degree of community loosening, and further is based on the association mining algorithm EACH of this design. The side inverse triangle centrality is defined as the ratio of the number of P4 (a simple path composed of 4 nodes, 3 sides) to the possible number of P4 that the side participates in. EACH iteratively cuts out the top edge of the reverse triangle center, until the edges of all the edges are all 0, and based on the outlier processing strategy, we add the outliers in the edge deletion process to the suitable connected branches, and the output connected components are the communities. The algorithm is simple and efficient, the mining community is compact and the physical meaning is significant. 2., the dense subgraph as a mesoscale structure is used to study the meta data set in complex network. It ignores the internal structure of metadata group. However, it is very important for understanding the function mechanism and operation mode of the metadata group. In order to explore the internal topology between the members of the metadata group, the Cograph community with graph theory features is defined, whose structure is uniquely corresponding to the Cotree. Based on Cotree, we can further reveal clearly the variable and equivalent sub structure of Cograph community in terms of scale and hierarchy, and provide users with a unique perspective to continuously observe the internal structure of communities from microscale to mesoscale. In this paper, the edge P4 centrality is proposed and based on this, the partition algorithm for mining Cograph community is designed, EPCA. The edge P4 centrality is defined as the number of P4 that the side participates in, because P4 is a simple acyclic path consisting of 4 nodes and 3 edges sequentially, so if the center of the edge P4 is large, the edge may connect the edges between the communities. EPCA iteratively eliminates the edges with the highest P4 edge, and deletes the edges of all the edges in the current network. The P4 centrality is 0 at the end of the stopping time, and the current connected branches are all Cograph, and output them as Cograph communities. Due to the edge P4 centrality, EPCA has low computing cost, no parameter, and no external measurement. 3. in the face of the current complex network community mining research challenges: the essential characteristics of the tight loose dense subgraph can not describe the metadata group, and metadata group in protein interaction networks by the interaction of dense three tuple module body inspiration, this paper in the interaction of three tuple meaning 2-club Association describe the set of metadata in complex networks. The 2-club community is a connected 2-club, and 2-club is a subgraph with a diameter of 2. Based on the niche niche design, we design algorithm DIVANC to mine 2-club communities, and further propose a simple two step overlap strategy to extend DIVANC to 2-club communities that can be mined. The number of P4 and the number of triangles involved in the formation of the niche centrality are considered. The side niche is centrality, which may be the edge of the community. DIVANC iteratively deleting edges of the highest niche center, to delete all the edges in the network side of the center of the end of the 0 niche to stop, the connectedness of all 2-club, the output of non overlapping 2-club community, if need to overlap the 2-club community, the implementation of the two step strategy to overlap, overlapping the 2-club community. 4. a thorough understanding of the functional modules in the PPI network is still an important challenge in the current research of system biology. For most of the current on the module of the analysis focused on how to dig out from the entire PPI network function module has a biological significance, and less concerned about its internal structure, correlation analysis has given its internal structure and function of the problem, this paper analyze the internal structure of graph theory based on function module, function module is revealed
【學(xué)位授予單位】:西安電子科技大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2016
【分類號】:TP311.13;O157.5
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 鐘柯;肖昱;許s,
本文編號:1338770
本文鏈接:http://sikaile.net/shoufeilunwen/jckxbs/1338770.html
最近更新
教材專著