基于類標(biāo)依賴性的多關(guān)系圖多類標(biāo)分類算法研究
發(fā)布時(shí)間:2018-01-14 05:04
本文關(guān)鍵詞:基于類標(biāo)依賴性的多關(guān)系圖多類標(biāo)分類算法研究 出處:《哈爾濱工業(yè)大學(xué)》2016年碩士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 類標(biāo)依賴性 多關(guān)系圖 多類標(biāo) 分類
【摘要】:隨著移動(dòng)互聯(lián)網(wǎng)的飛速發(fā)展,各種移動(dòng)互聯(lián)網(wǎng)應(yīng)用已經(jīng)成為人們最重要的信息交互平臺(tái),這些信息交互平臺(tái)將人們串聯(lián)成各種各樣的虛擬社會(huì)網(wǎng)絡(luò)。多個(gè)社會(huì)網(wǎng)絡(luò)可以通過節(jié)點(diǎn)或關(guān)系的映射而形成一個(gè)多關(guān)系社會(huì)網(wǎng)絡(luò),通常可以用多關(guān)系圖來表示。多關(guān)系圖中節(jié)點(diǎn)的多類標(biāo)分類在網(wǎng)絡(luò)精準(zhǔn)營銷、社會(huì)網(wǎng)絡(luò)分析、社會(huì)化搜索等領(lǐng)域都有重要的應(yīng)用價(jià)值。在多類標(biāo)分類問題中,如何有效利用類標(biāo)依賴性信息對(duì)于提高分類算法的性能是至關(guān)重要的。在多關(guān)系圖的多類標(biāo)分類問題中,類標(biāo)依賴性信息包括兩個(gè)方面:隱含在節(jié)點(diǎn)內(nèi)容屬性中的類標(biāo)依賴性和隱含于關(guān)系拓?fù)渲械念悩?biāo)依賴性。本文的研究重點(diǎn)是如何有效挖掘這兩種類標(biāo)依賴性信息,并在此基礎(chǔ)上設(shè)計(jì)出有針對(duì)性的多類標(biāo)分類算法。基于內(nèi)容屬性類標(biāo)依賴性的思想,本文利用類標(biāo)共現(xiàn)信息來計(jì)算類標(biāo)依賴性,并據(jù)此提出了一種基于內(nèi)容屬性類標(biāo)依賴性的多關(guān)系圖多類標(biāo)分類算法(MRML-C)。該算法結(jié)合了類標(biāo)空間聚類劃分策略,有效地將多類標(biāo)分類問題分解為多個(gè)規(guī)?s小的子問題,降低了算法復(fù)雜度。對(duì)比實(shí)驗(yàn)結(jié)果表明利用類標(biāo)依賴性進(jìn)行類標(biāo)空間劃分的策略有效的解決了類標(biāo)爆炸的問題,并且MRML-C在大多數(shù)的數(shù)據(jù)集上都表現(xiàn)出了較好的分類性能;陉P(guān)系拓?fù)漕悩?biāo)依賴性的思想,本文利用類標(biāo)共現(xiàn)信息和關(guān)系拓?fù)湫畔⒐餐?jì)算類標(biāo)依賴性,并據(jù)此提出了一種基于關(guān)系拓?fù)漕悩?biāo)依賴性的多關(guān)系圖多類標(biāo)分類算法(MRML-R),該算法首先會(huì)對(duì)類標(biāo)空間進(jìn)行聚類劃分,然而采用問題轉(zhuǎn)化算法將各個(gè)多類標(biāo)分類子問題轉(zhuǎn)化成單類標(biāo)分類問題,在訓(xùn)練模型的過程中采用了基于隨機(jī)游走樣本抽樣方法的隨機(jī)森林算法,有機(jī)融合了關(guān)系拓?fù)湫畔ⅰW詈蟛捎枚鄶?shù)投票策略集成各個(gè)子空間的預(yù)測(cè)結(jié)果。對(duì)比實(shí)驗(yàn)結(jié)果表明MRML-R算法在二值驗(yàn)證指標(biāo)上具有更好的分類表現(xiàn)。
[Abstract]:With the rapid development of mobile Internet, various mobile Internet applications have become the most important information exchange platform. These information interaction platforms connect people into a variety of virtual social networks. Multiple social networks can form a multi-relational social network by mapping nodes or relationships. The multi-class classification of nodes in multi-relational graph has important application value in the fields of network precision marketing, social network analysis, social search and so on. It is very important to make use of class dependency information to improve the performance of classification algorithm. In the multi-class classification problem of multi-relational graph. The class label dependency information includes two aspects:. The class dependency hidden in the node content attribute and the class label dependency hidden in the relational topology. This paper focuses on how to effectively mine these two kinds of dependency information. Based on the idea of content attribute class dependency, this paper uses class co-occurrence information to calculate class label dependency. On the basis of this, a multi-relational graph multi-class classification algorithm based on content attribute class dependency is proposed, which combines the clustering strategy of cluster space. The multi-class classification problem is effectively decomposed into multiple subproblems with reduced scale. The algorithm complexity is reduced. The comparison experiment results show that the strategy of class label space partition based on class dependency can effectively solve the problem of class label explosion. And MRML-C has shown good classification performance on most data sets. In this paper, class dependency is calculated by class cooccurrence information and relational topological information. Based on this, a multi-relational graph multi-class classification algorithm based on relational topological class dependency is proposed. At first, the algorithm will be used to classify the cluster space. However, the problem transformation algorithm is used to transform the multi-class sub-problem into a single-class classification problem. In the process of training model, the random forest algorithm based on random walk sample sampling method is adopted. Finally, the majority voting strategy is used to integrate the prediction results of each subspace. The experimental results show that the MRML-R algorithm has better classification performance on the binary verification index.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP301.6
,
本文編號(hào):1422115
本文鏈接:http://sikaile.net/guanlilunwen/yingxiaoguanlilunwen/1422115.html
最近更新
教材專著