電子郵件通信實(shí)體關(guān)系挖掘與分析研究
本文選題:社團(tuán)劃分 切入點(diǎn):實(shí)體勾畫(huà) 出處:《電子科技大學(xué)》2014年博士論文 論文類(lèi)型:學(xué)位論文
【摘要】:為了適應(yīng)網(wǎng)絡(luò)數(shù)據(jù)快速增長(zhǎng)的實(shí)體關(guān)系挖掘需要,電子郵件網(wǎng)絡(luò)作為應(yīng)用最廣泛的通信網(wǎng)絡(luò)之一,因其社會(huì)性明顯、應(yīng)用人群巨大、數(shù)據(jù)中隱含著現(xiàn)實(shí)的關(guān)系體系,其社會(huì)網(wǎng)絡(luò)分析的研究日趨活躍。對(duì)電子郵件網(wǎng)絡(luò)數(shù)據(jù)的社會(huì)結(jié)構(gòu)進(jìn)行劃分呈現(xiàn)、未知鏈接的預(yù)測(cè),是社會(huì)網(wǎng)絡(luò)分析在網(wǎng)絡(luò)數(shù)據(jù)實(shí)體關(guān)系挖掘中的重要內(nèi)容,同時(shí)在電子商務(wù)、社交推薦等商業(yè)應(yīng)用,反恐、犯罪偵查等業(yè)務(wù)方面具有廣泛的應(yīng)用前景。其中社團(tuán)網(wǎng)絡(luò)劃分、鏈路預(yù)測(cè)則一直是研究的熱點(diǎn)方向。面對(duì)大數(shù)據(jù)量的電子郵件通信實(shí)體關(guān)系挖掘,社團(tuán)劃分的效率、社團(tuán)劃分的準(zhǔn)確性和鏈路預(yù)測(cè)的召回率和準(zhǔn)確率問(wèn)題成為實(shí)際應(yīng)用的困擾。本文從現(xiàn)有社會(huì)網(wǎng)絡(luò)分析的已知算法出發(fā),針對(duì)電子郵件網(wǎng)絡(luò)通信實(shí)體關(guān)系挖掘中的社團(tuán)結(jié)構(gòu)檢測(cè)算法的準(zhǔn)確性問(wèn)題、計(jì)算效率問(wèn)題,以及鏈路預(yù)測(cè)算法召回率和準(zhǔn)確率問(wèn)題進(jìn)行了深入研究。論文的主要貢獻(xiàn)如下:(1)提出了一個(gè)新的社團(tuán)結(jié)構(gòu)檢測(cè)算法的測(cè)度模型。該模型針對(duì)模塊度方法在劃分結(jié)果穩(wěn)定性方面存在的不足,基于信息中心度思想提出了一個(gè)新的測(cè)度模型,該模型通過(guò)對(duì)節(jié)點(diǎn)間關(guān)聯(lián)度和節(jié)點(diǎn)的度進(jìn)行加權(quán),不僅能夠準(zhǔn)確識(shí)別聚類(lèi)中心,而且為網(wǎng)絡(luò)中節(jié)點(diǎn)間相似度計(jì)算提供了依據(jù)。據(jù)此進(jìn)一步提出了一種新的社團(tuán)劃分算法(BSM算法),仿真實(shí)驗(yàn)和真實(shí)網(wǎng)絡(luò)數(shù)據(jù)集上的實(shí)驗(yàn)結(jié)果表明,與模塊度方法相比,該算法的穩(wěn)定性和準(zhǔn)確性更高,由此也證實(shí)了測(cè)度模型的有效性。(2)提出了一個(gè)適用于大規(guī)模復(fù)雜網(wǎng)絡(luò)社團(tuán)劃分的快速算法模型。該算法模型的研究工作分為兩步,首先針對(duì)魯汶快速算法首輪迭代效率低的問(wèn)題,通過(guò)引入剪枝策略,提出了一種改進(jìn)算法(FLA算法)。然后針對(duì)魯汶快速算法基于模塊度優(yōu)化思想,易于收斂到局部最優(yōu)解的缺點(diǎn),通過(guò)對(duì)優(yōu)化模板函數(shù)進(jìn)行改進(jìn),引入節(jié)點(diǎn)的度和邊的權(quán)重等相關(guān)信息,在FLA算法的基礎(chǔ)上,提出了一種新的CDDW算法。仿真實(shí)驗(yàn)和真實(shí)網(wǎng)絡(luò)數(shù)據(jù)集上的實(shí)驗(yàn)結(jié)果表明,新的算法模型不僅能夠大幅降低計(jì)算開(kāi)銷(xiāo),而且能夠提高整個(gè)網(wǎng)絡(luò)社團(tuán)劃分結(jié)果的準(zhǔn)確性。(3)提出了一種新型的鏈路預(yù)測(cè)集成學(xué)習(xí)算法模型。針對(duì)主流的鏈路預(yù)測(cè)算法普遍存在召回率和準(zhǔn)確率較低的問(wèn)題,提出了一種新穎的集成學(xué)習(xí)算法模型,將鏈路預(yù)測(cè)問(wèn)題視為一個(gè)二元分類(lèi)問(wèn)題,利用Booting算法框架提供的誤差反饋機(jī)制,設(shè)計(jì)實(shí)現(xiàn)了一個(gè)新的鏈路預(yù)測(cè)算法模型:AdaPred模型。為了進(jìn)一步提高算法的準(zhǔn)確率和召回率,提出了一種新的鏈路預(yù)測(cè)算法,并將其集成到AdaPred模型中。通過(guò)在論文協(xié)作網(wǎng)絡(luò)和電子郵件網(wǎng)絡(luò)等真實(shí)數(shù)據(jù)的實(shí)證研究,證明了AdaPred算法的預(yù)測(cè)準(zhǔn)確率和召回率明顯優(yōu)于其他算法。(4)研發(fā)了一個(gè)電子郵件通信網(wǎng)絡(luò)實(shí)體關(guān)系可視化分析系統(tǒng)。可視化技術(shù)有利于社會(huì)網(wǎng)絡(luò)分析走向?qū)嶋H應(yīng)用,將對(duì)該技術(shù)的普及產(chǎn)生深遠(yuǎn)影響。本論文以郵件網(wǎng)絡(luò)中的實(shí)體關(guān)系挖掘?yàn)榍腥朦c(diǎn),研發(fā)了一個(gè)面向應(yīng)用的可視化分析平臺(tái)。該平臺(tái)所提供的數(shù)據(jù)分析能力與國(guó)際前沿水平看齊,具有良好的通用性和可擴(kuò)展性。所研發(fā)的原型系統(tǒng)已通過(guò)第三方測(cè)試和國(guó)家863課題驗(yàn)收,驗(yàn)收考評(píng)結(jié)果為優(yōu)秀。綜上,本文對(duì)社會(huì)網(wǎng)絡(luò)分析技術(shù)走向?qū)嶋H應(yīng)用時(shí)面臨的幾類(lèi)重要挑戰(zhàn)性問(wèn)題進(jìn)行了針對(duì)性研究,并在此基礎(chǔ)上設(shè)計(jì)實(shí)現(xiàn)了一個(gè)可視化分析系統(tǒng)原型,該研究成果為社會(huì)網(wǎng)絡(luò)分析技術(shù)的推廣應(yīng)用提供了一個(gè)高效可行的解決方案。本文所采用的分析技術(shù)基于網(wǎng)絡(luò)拓?fù)浣Y(jié)構(gòu),而不依賴于更多的上下文信息,因此具有良好的可擴(kuò)展性,能夠推廣到更廣泛的社會(huì)網(wǎng)絡(luò)數(shù)據(jù)分析應(yīng)用場(chǎng)景。
[Abstract]:In order to increase the network data mining need to adapt to the entity relation network, email communication network as one of the most widely used, because of its obvious social application, huge population, data implies system reality, research and analysis of its social network is becoming more and more active. The social structure of email network data are divided into presentation, forecast unknown links, is the important content of social network analysis in the network data mining entity relationship, at the same time in electronic commerce, social recommendation and other commercial applications, counter terrorism, criminal investigation and other business and has wide application prospect. The community network division, link prediction has been the focus of research direction. In the face of a large amount of data e-mail communication entity relationship mining efficiency, community classification, community classification accuracy and link prediction precision and recall problems become real The application of problems. Starting from the analysis of the existing known algorithms of social networks, aiming at accuracy of community structure mining e-mail network communication entity relation detection algorithm in the calculation efficiency, and link prediction algorithm recall rate and accuracy rate were studied. The main contributions of this thesis are as follows: (1) put forward the measurement model a new community structure detection algorithm. This model is based on modularity method in the lack of stability of division results exist, the information center of the idea of a new measurement model based on the model of the correlation between nodes and nodes are weighted, not only can accurately identify the clustering center, and provides according to the similarity between the nodes in the network are calculated. Further proposes a new partitioning algorithm (BSM algorithm), simulation experiments and real data On the set of experimental results show that compared with the modularity method, the algorithm stability and higher accuracy, which also confirms the validity of the measurement model. (2) proposed a fast algorithm model for large-scale complex network community division. On the model of the algorithm are divided into two steps, first of all in Leuven the first round of iteration fast algorithm for the problem of low efficiency, by introducing the pruning strategy, proposed an improved algorithm (FLA algorithm). Then the Leuven fast algorithm based on modularity optimization, convergence to local optimal solution, based on the optimized template function is improved, and the weights of the edges and other related information into the node, based on the FLA algorithm, this paper proposes a new CDDW algorithm. The simulation results and the real network data sets. The experimental results show that the new algorithm model can not only greatly reduce the computational cost, and The accuracy and can enhance the network partition result. (3) proposed a new type of link prediction ensemble learning algorithm model. For link prediction algorithm mainstream widespread recall rate and low accuracy problem, this paper proposes a novel ensemble learning algorithm of the model, the link prediction problem as a a two element classification problem, error Booting algorithm using the framework provided by the feedback mechanism, the design and implementation of a new algorithm for link prediction models: AdaPred model. In order to further improve the accuracy and recall rate of the algorithm, we propose a new link prediction algorithm, and integrated into the AdaPred model. Through the empirical study on the real data collaboration network and e-mail network, AdaPred algorithm proves that the prediction accuracy rate and recall rate is better than other algorithms. (4) developed an email communication network The entity relationship analysis system. The visualization technology is conducive to social network analysis to practical application, will have a profound impact on the popularization of this technology. In this paper, the mail in the network entity relationship mining as the starting point, research and analysis platform of an application oriented visualization. The platform provides data analysis capabilities with the international advanced level in line with good universality and expansibility. The prototype system has been developed through the third party testing and the National 863 project acceptance, acceptance appraisal result is excellent. In conclusion, this paper researched the social network analysis technique into practical application faces several important challenges, and on this basis the design and Implementation of a visualization analysis system prototype, this research provides a feasible solution for the application of social network analysis. The analysis technology adopted in this paper is based on network topology without relying on more contextual information, so it has good scalability and can be extended to a wider application scenario of social network data analysis.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2014
【分類(lèi)號(hào)】:TP393.098
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 趙淑萍;IP地址安全使用全攻略[J];華南金融電腦;2004年11期
2 楊鵬,趙博,王琨,周利華;利用Java技術(shù)實(shí)現(xiàn)SIP通信[J];計(jì)算機(jī)應(yīng)用;2005年02期
3 陳業(yè)綱;李柳柏;徐則同;;利用JAINSIP構(gòu)建SIP服務(wù)器[J];計(jì)算機(jī)時(shí)代;2006年11期
4 白巖;劉大有;;一種Agent通信中邏輯意外信息轉(zhuǎn)換方法[J];計(jì)算機(jī)研究與發(fā)展;2007年03期
5 白巖;劉大有;劉杰;;一種移動(dòng)Agent通信中本體信息調(diào)整方法[J];吉林大學(xué)學(xué)報(bào)(工學(xué)版);2007年05期
6 王汝傳,王紹棣,孫知信,傅靜;混合密碼認(rèn)證模型的研究[J];計(jì)算機(jī)學(xué)報(bào);2002年11期
7 蒲志強(qiáng);馮山;;基于移動(dòng)IPv6的身份認(rèn)證體系[J];綿陽(yáng)師范學(xué)院學(xué)報(bào);2007年11期
8 陳性元,李勇,潘正運(yùn),宋國(guó)文;選擇認(rèn)可動(dòng)態(tài)邏輯[J];通信學(xué)報(bào);2002年06期
9 ;協(xié)議[J];電子科技文摘;2002年11期
10 路而紅;墨西哥新通信法規(guī)促進(jìn)市場(chǎng)發(fā)展[J];通訊產(chǎn)品世界;1996年06期
相關(guān)會(huì)議論文 前1條
1 江義杰;楊曉暉;;用GPS儀表實(shí)現(xiàn)電信通信實(shí)體的地理信息定位[A];2005年安徽通信論文集[C];2006年
相關(guān)博士學(xué)位論文 前1條
1 吳祖峰;電子郵件通信實(shí)體關(guān)系挖掘與分析研究[D];電子科技大學(xué);2014年
相關(guān)碩士學(xué)位論文 前1條
1 樊怡;高校通信實(shí)體經(jīng)營(yíng)模式的研究[D];蘭州大學(xué);2007年
,本文編號(hào):1573521
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1573521.html