基于鏈接分析的藏文Web社區(qū)發(fā)現(xiàn)算法研究
本文選題:Web社區(qū) + 鏈接分析。 參考:《西北民族大學(xué)》2012年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)技術(shù)的不斷發(fā)展,以網(wǎng)頁(yè)形式出現(xiàn)的藏族文化信息越來(lái)越多,近年來(lái)藏族網(wǎng)站正以驚人的速度在不斷的增長(zhǎng),并且這些藏語(yǔ)Web數(shù)據(jù)具有數(shù)據(jù)量大,無(wú)組織等特征,如何有效、快速的從這些藏語(yǔ)Web中找出有用信息成為當(dāng)前一個(gè)研究熱點(diǎn)。研究發(fā)現(xiàn),在這些龐大且復(fù)雜的Web中,存在著大量的社區(qū),這些社區(qū)對(duì)研究社會(huì)熱點(diǎn)等起著非常重要的作用。社區(qū)可以為用戶提供及時(shí)的、有價(jià)值的信息,社區(qū)也可以反映Web中普遍存在的復(fù)雜的聚團(tuán)關(guān)系和層次關(guān)系,同時(shí)對(duì)藏語(yǔ)社區(qū)的深入跟蹤研究不僅可以及時(shí)了解藏族地區(qū)的文化的發(fā)展情況,還可以及時(shí)掌握藏族地區(qū)的社會(huì)發(fā)展態(tài)勢(shì)。將Web社區(qū)發(fā)現(xiàn)算法應(yīng)用于搜索引擎的開(kāi)發(fā),有助于提高Web信息搜索的精確度,進(jìn)而為開(kāi)發(fā)更優(yōu)的搜索引擎提供基礎(chǔ)。 Web頁(yè)面的鏈接關(guān)系為Web社區(qū)發(fā)現(xiàn)研究提供了非常豐富的信息線索。鏈接分析是Web社區(qū)發(fā)現(xiàn)的關(guān)鍵技術(shù)之一。 本文在考察了當(dāng)前藏語(yǔ)Web及其鏈接數(shù)據(jù)特征之后,通過(guò)對(duì)Web社區(qū),鏈接分析技術(shù)等基礎(chǔ)理論的分析,了解了基于鏈接分析的Web社區(qū)發(fā)現(xiàn)技術(shù):鏈接凝聚算法和鏈接分裂算法。本文中我們重點(diǎn)研究了鏈接分裂算法中基于極值優(yōu)化的社區(qū)發(fā)現(xiàn)算法,并找出了該社區(qū)發(fā)現(xiàn)算法中存在的問(wèn)題:極值優(yōu)化算法對(duì)初始網(wǎng)絡(luò)劃分的依賴性,以及模塊度是否達(dá)到極大值的不確定性,算法中局部最優(yōu)并不代表全局最優(yōu)等不足。本文提出了基于分歧點(diǎn)的改進(jìn)極值優(yōu)化算法,該算法可以將社區(qū)間存在著分歧的網(wǎng)頁(yè)進(jìn)一步劃分開(kāi)來(lái),消除了在模塊度的極大值方面的不確定性,并且實(shí)現(xiàn)了在未知社區(qū)個(gè)數(shù)的網(wǎng)絡(luò)中對(duì)社區(qū)進(jìn)行劃分,提高擴(kuò)大了應(yīng)用到了藏文網(wǎng)站中。經(jīng)過(guò)大量實(shí)驗(yàn)表明,本文提出的改進(jìn)算法能夠進(jìn)一步提高發(fā)現(xiàn)Web社區(qū)的質(zhì)量,具有重要的理論和實(shí)際應(yīng)用價(jià)值。
[Abstract]:With the continuous development of Internet technology, more and more Tibetan cultural information appears in the form of web pages. In recent years, Tibetan websites are growing at an alarming rate, and these Tibetan Web data have the characteristics of large amount of data, unorganized and so on. How to find useful information from these Tibetan Web effectively and quickly has become a hot research topic. It is found that there are a large number of communities in these large and complex Web, which play a very important role in the research of social hot spots. Communities can provide users with timely, valuable information, and communities can reflect complex clusters and hierarchical relationships that are prevalent in Web. At the same time, the further study of Tibetan language community can not only understand the development of Tibetan culture in time, but also grasp the social development situation of Tibetan area. The application of Web community discovery algorithm to the development of search engine will help to improve the accuracy of Web information search and provide the foundation for developing a better search engine. The link relationship of Web pages provides rich information clues for Web community discovery research. Link analysis is one of the key technologies found in the Web community. After investigating the features of the current Tibetan Web and its linked data, this paper analyzes the basic theories of Web community, link analysis technology and so on, and finds out the Web community discovery technology based on link analysis: link aggregation algorithm and link splitting algorithm. In this paper, we focus on the community discovery algorithm based on extremum optimization in the link splitting algorithm, and find out the problems in the community discovery algorithm: the dependence of the extreme value optimization algorithm on the initial network partition. As well as the uncertainty of whether the module degree reaches the maximum, the local optimization does not represent the global optimization. In this paper, an improved extremum optimization algorithm based on bifurcation points is proposed. The algorithm can further divide the web pages with different communities and eliminate the uncertainty in the maximum of module degree. It also realizes the division of communities in the network of unknown community numbers, and improves the application to Tibetan language websites. A large number of experiments show that the improved algorithm proposed in this paper can further improve the quality of Web community, and has important theoretical and practical application value.
【學(xué)位授予單位】:西北民族大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前8條
1 廖金波;論引文分析的由來(lái)與發(fā)展[J];高校圖書(shū)館工作;1999年03期
2 皮紅;;網(wǎng)絡(luò)鏈接分析研究的最新進(jìn)展與發(fā)展趨勢(shì)[J];湖南科技學(xué)院學(xué)報(bào);2009年08期
3 王丹;劉發(fā)升;;復(fù)雜網(wǎng)絡(luò)的社區(qū)發(fā)現(xiàn)算法研究[J];計(jì)算機(jī)時(shí)代;2009年03期
4 李峻金;向陽(yáng);牛鵬;劉麗明;蘆英明;;一種新的復(fù)雜網(wǎng)絡(luò)聚類算法[J];計(jì)算機(jī)應(yīng)用研究;2010年06期
5 王莉軍;楊炳儒;翟云;謝永紅;;動(dòng)態(tài)社區(qū)發(fā)現(xiàn)算法的研究進(jìn)展[J];計(jì)算機(jī)應(yīng)用研究;2011年09期
6 蘇芳荔;;文獻(xiàn)引文分析、網(wǎng)絡(luò)鏈接分析和網(wǎng)絡(luò)引文分析的比較[J];情報(bào)探索;2010年01期
7 杜海峰;李樹(shù)茁;W.F.Marcus;悅中山;楊緒松;;小世界網(wǎng)絡(luò)與無(wú)標(biāo)度網(wǎng)絡(luò)的社區(qū)結(jié)構(gòu)研究[J];物理學(xué)報(bào);2007年12期
8 何宇;趙洪利;楊海濤;趙東杰;;復(fù)雜網(wǎng)絡(luò)演化研究綜述[J];裝備指揮技術(shù)學(xué)院學(xué)報(bào);2011年01期
,本文編號(hào):1908723
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1908723.html