基于社交關(guān)系的web搜索研究
本文選題:搜索引擎 + 社交搜索。 參考:《杭州電子科技大學(xué)》2017年碩士論文
【摘要】:目前,網(wǎng)民獲取信息的一個(gè)重要方式就是在線搜索。但是隨著WEB 2.0和各類社交網(wǎng)站的日益崛起,傳統(tǒng)搜索引擎的重大缺陷日益凸顯。即各種Web2.0網(wǎng)站和社交網(wǎng)站擁有極為豐富的用戶數(shù)據(jù),但傳統(tǒng)搜索引擎卻無法支持對(duì)其檢索。對(duì)用戶而言,不同的用戶因?yàn)槁殬I(yè)、愛好、學(xué)歷及社交關(guān)系有所不同,對(duì)搜索結(jié)果的期待也就有所不同,因此對(duì)獲取個(gè)性化的搜索結(jié)果有著迫切的需求。對(duì)企業(yè)而言,企業(yè)迫切需要用戶參與的新型搜索方式的普及,從而能夠獲取更多用戶信息,以便更好地發(fā)展客戶關(guān)系和提供個(gè)性化服務(wù)。鑒于此,本文提出一個(gè)新的搜索系統(tǒng),命名為PERSO,意圖使用PERSO系統(tǒng)實(shí)現(xiàn)基于用戶社交行為的個(gè)性化搜索目標(biāo)。該系統(tǒng)通過爬取開放的在線社交網(wǎng)絡(luò)數(shù)據(jù),分析豐富的用戶特征及社交關(guān)系,把用戶最感興趣的結(jié)果反饋在最靠前的位置,從而改善了傳統(tǒng)搜索引擎的搜索結(jié)果。用戶建模是個(gè)性化社交搜索的關(guān)鍵,本文根據(jù)國內(nèi)最大最開放的社交網(wǎng)站新浪微博的數(shù)據(jù)特征,提出了多層次多維度的用戶模型,包括一級(jí)(用戶自身社交行為)、二級(jí)(朋友社交行為)、三級(jí)(社交擴(kuò)張)社交相關(guān)度模型,后者是前者的補(bǔ)充,全面地描述了社交網(wǎng)絡(luò)中的用戶特征。在用戶建模的基礎(chǔ)上,本文提出將社交相關(guān)度模型集成到web文本搜索處理過程的三種方法,即三種網(wǎng)頁排序機(jī)制:先文本特征過濾后社交特征排序的兩步驟TP排序;先社交特征過濾后文本特征排序的兩步驟PT排序;社交與文本共排序的一步驟HB排序。最后,本文以1000萬篇百度百科文檔和20位真實(shí)的新浪微博用戶的社交數(shù)據(jù)為數(shù)據(jù)來源,以F1和nDCG@K為評(píng)價(jià)指標(biāo),設(shè)計(jì)執(zhí)行了四組實(shí)驗(yàn):三種層級(jí)的社交化相關(guān)度模型的測評(píng)實(shí)驗(yàn);三種排序機(jī)制的測評(píng)實(shí)驗(yàn);用戶與好友信息對(duì)搜索效果的影響力對(duì)比測評(píng)實(shí)驗(yàn);好友數(shù)量對(duì)搜索效果的影響力測評(píng)實(shí)驗(yàn)。實(shí)驗(yàn)證明了本文排序機(jī)制的有效性,以及每一級(jí)模型對(duì)搜索結(jié)果不同的改善程度。
[Abstract]:At present, an important way for Internet users to obtain information is online search. However, with the rise of Web 2.0 and various social networking sites, traditional search engines become more and more flawed. Web 2.0 sites and social networking sites are rich in user data, but traditional search engines cannot retrieve them. For users, different users have different expectations for search results because of their different occupations, hobbies, academic qualifications and social relationships, so there is an urgent need to obtain personalized search results. For enterprises, enterprises urgently need the popularization of new search methods in which users participate, so as to obtain more user information, in order to better develop customer relations and provide personalized services. In view of this, this paper proposes a new search system named Perso, which is intended to realize the personalized search target based on user's social behavior. By crawling open online social network data and analyzing rich user characteristics and social relationships, the system feedback the most interesting results to the front, thus improving the search results of traditional search engines. User modeling is the key to personalized social search. According to the data features of Sina Weibo, the largest and most open social network in China, this paper puts forward a multi-level and multi-dimensional user model. It includes one level (user's own social behavior), two (friend's social behavior), three (social expansion) social relevance model, the latter is the supplement of the former, which comprehensively describes the characteristics of users in the social network. On the basis of user modeling, this paper proposes three methods to integrate the social correlation model into the web text search process, that is, three kinds of web page sorting mechanisms: first text feature filtering and then two step TP sorting of social features; Social feature filtering and text feature sorting are two steps PT sorting and one step HB sorting of social and text cosorting. Finally, taking 10 million Baidu encyclopedia documents and 20 real Sina Weibo users' social data as data source, using F1 and nDCGGK as evaluation indexes, four groups of experiments are designed and implemented: three levels of social correlation model; Three kinds of ranking mechanism evaluation experiment; user and friend information on the impact of the search effect comparison test; the number of friends on the impact of the search effect evaluation experiment. The experimental results show that the ranking mechanism is effective and the search results are improved by different models at each level.
【學(xué)位授予單位】:杭州電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP393.09;TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 李金洋;王燕華;樊艷;汪誠愚;張蓉;何曉豐;;中文分類體系的構(gòu)建與查詢系統(tǒng)[J];計(jì)算機(jī)應(yīng)用;2016年S1期
2 徐曉楓;賀j;楊靜;;融合社交與搜索數(shù)據(jù)的電視劇點(diǎn)播排名預(yù)測研究[J];計(jì)算機(jī)工程;2015年08期
3 李志虹;;基于遺傳迭代優(yōu)化的云計(jì)算下海量數(shù)據(jù)分類查詢[J];科技通報(bào);2015年06期
4 張曉娟;李健;樂興虎;;不同意圖類別查詢的搜索引擎穩(wěn)定性分析[J];情報(bào)雜志;2015年06期
5 周敬才;胡華平;岳虹;;基于Lucene全文檢索系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];計(jì)算機(jī)工程與科學(xué);2015年02期
6 過云燕;王宏志;張瑋奇;;社交網(wǎng)絡(luò)中基于分類屬性的好友推薦[J];計(jì)算機(jī)工程與應(yīng)用;2015年12期
7 張彥文;;Facebook社交搜索及其對(duì)圖書館服務(wù)的影響[J];圖書館論壇;2014年10期
8 孫逸敏;;基于Sphinx的社交網(wǎng)絡(luò)搜索引擎的設(shè)計(jì)與分析[J];科技通報(bào);2014年02期
9 程時(shí)端;郭亮;王文東;;社會(huì)搜索研究綜述[J];北京郵電大學(xué)學(xué)報(bào);2013年01期
10 黃翼彪;;實(shí)現(xiàn)Lucene接口的中文分詞器的比較研究[J];科技信息;2012年12期
相關(guān)碩士學(xué)位論文 前1條
1 張煉;基于圖模型的Web文檔分類方法研究[D];內(nèi)蒙古科技大學(xué);2010年
,本文編號(hào):2038854
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2038854.html