網(wǎng)絡(luò)信息構(gòu)造與用戶行為結(jié)合分析研究
發(fā)布時(shí)間:2018-11-07 08:58
【摘要】:伴隨著人們?cè)絹?lái)越多樣化的信息需求,問(wèn)答社區(qū)(Community Question Answering, CQA)網(wǎng)站服務(wù)模式應(yīng)運(yùn)而生。廣泛的用戶參與使其信息量迅速增長(zhǎng),龐大的信息資源庫(kù)也為搜索引擎提供了很好的信息源,越來(lái)越多Web用戶通過(guò)搜索從中獲取信息。然而,信息的長(zhǎng)期積累造成大量“過(guò)時(shí)”信息出現(xiàn)在其中,給使用者帶來(lái)不便。 網(wǎng)頁(yè)瀏覽日志記錄著用戶瀏覽過(guò)程中的細(xì)節(jié)信息,反映用戶行為、意圖和使用習(xí)慣,對(duì)分析CQA查詢用戶使用情況和信息時(shí)效性有著重要意義。本文提出針對(duì)用戶網(wǎng)頁(yè)瀏覽日志的處理方法,包括URL查詢關(guān)鍵字的截取與規(guī)格化處理、查詢過(guò)程的劃分等。在查詢過(guò)程劃分的基礎(chǔ)上,對(duì)大量真實(shí)用戶的瀏覽行為習(xí)慣做了統(tǒng)計(jì)分析。結(jié)果顯示,用戶每查詢一次信息平均用時(shí)6.28分鐘、訪問(wèn)8個(gè)網(wǎng)頁(yè);部分查詢?cè)诮惶娌l(fā)中進(jìn)行;用戶對(duì)于各網(wǎng)站站內(nèi)搜索引擎使用頻率較高。 本文結(jié)合用戶瀏覽行為分析,以及CQA信息固有特征,建立CQA查詢用戶滿意度判斷框架。結(jié)果表明在加入用戶瀏覽行為特征后,分類框架的準(zhǔn)確率、召回率均有明顯提升。通過(guò)分析Yahoo Chiebukuro問(wèn)答社區(qū)的用戶滿意率和信息時(shí)效性,發(fā)現(xiàn)用戶滿意率和信息時(shí)效性在不同問(wèn)題類別之間的表現(xiàn)差異明顯。
[Abstract]:Along with people's more and more diversified information demand, the question and answer community (Community Question Answering, CQA) website service pattern arises at the historic moment. The extensive user participation makes its information quantity increase rapidly, the huge information resource database also provides the very good information source for the search engine, more and more Web user obtains the information through the search. However, the long-term accumulation of information causes a lot of "outdated" information to appear in it, causing inconvenience to users. Web browsing log records the details of the user's browsing process and reflects the user's behavior, intention and usage habits. It is of great significance to analyze the usage of CQA query users and the timeliness of information. In this paper, the processing methods for user's web browsing log are proposed, including the interception and normalization of URL query keywords, the partition of query process and so on. Based on the partition of query process, the browsing behavior of a large number of real users is analyzed statistically. The results show that the average time of each query is 6.28 minutes, and 8 pages are visited; part of the query is carried out alternately and concurrency; the users use the search engine in each site with a high frequency. Based on the analysis of user browsing behavior and the inherent characteristics of CQA information, this paper establishes a framework for judging the satisfaction of CQA query users. The results show that the accuracy and recall rate of the classification framework are improved obviously after the user browsing behavior feature is added. By analyzing the user satisfaction rate and information timeliness of Yahoo Chiebukuro Q & A community, it is found that the performance of user satisfaction rate and information timeliness in different problem categories is obvious.
【學(xué)位授予單位】:北京化工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.092
本文編號(hào):2315842
[Abstract]:Along with people's more and more diversified information demand, the question and answer community (Community Question Answering, CQA) website service pattern arises at the historic moment. The extensive user participation makes its information quantity increase rapidly, the huge information resource database also provides the very good information source for the search engine, more and more Web user obtains the information through the search. However, the long-term accumulation of information causes a lot of "outdated" information to appear in it, causing inconvenience to users. Web browsing log records the details of the user's browsing process and reflects the user's behavior, intention and usage habits. It is of great significance to analyze the usage of CQA query users and the timeliness of information. In this paper, the processing methods for user's web browsing log are proposed, including the interception and normalization of URL query keywords, the partition of query process and so on. Based on the partition of query process, the browsing behavior of a large number of real users is analyzed statistically. The results show that the average time of each query is 6.28 minutes, and 8 pages are visited; part of the query is carried out alternately and concurrency; the users use the search engine in each site with a high frequency. Based on the analysis of user browsing behavior and the inherent characteristics of CQA information, this paper establishes a framework for judging the satisfaction of CQA query users. The results show that the accuracy and recall rate of the classification framework are improved obviously after the user browsing behavior feature is added. By analyzing the user satisfaction rate and information timeliness of Yahoo Chiebukuro Q & A community, it is found that the performance of user satisfaction rate and information timeliness in different problem categories is obvious.
【學(xué)位授予單位】:北京化工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.092
【參考文獻(xiàn)】
相關(guān)期刊論文 前5條
1 李晨;巢文涵;陳小明;李舟軍;;中文社區(qū)問(wèn)答中問(wèn)題答案質(zhì)量評(píng)價(jià)和預(yù)測(cè)[J];計(jì)算機(jī)科學(xué);2011年06期
2 王君澤;黃本雄;胡廣;溫杰;;社區(qū)問(wèn)答服務(wù)中的問(wèn)題分類任務(wù)研究[J];計(jì)算機(jī)工程與科學(xué);2011年01期
3 張磊;李亞楠;王斌;李鵬;蔣在帆;;網(wǎng)頁(yè)搜索引擎查詢?nèi)罩镜腟ession劃分研究[J];中文信息學(xué)報(bào);2009年02期
4 孔維澤;劉奕群;張敏;馬少平;;問(wèn)答社區(qū)中回答質(zhì)量的評(píng)價(jià)方法研究[J];中文信息學(xué)報(bào);2011年01期
5 郭俊霞;高城;許南山;盧罡;;基于網(wǎng)頁(yè)瀏覽日志的用戶行為分析[J];計(jì)算機(jī)科學(xué);2014年03期
,本文編號(hào):2315842
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2315842.html
最近更新
教材專著