基于Web日志挖掘的用戶信息需求識(shí)別研究
發(fā)布時(shí)間:2019-05-08 04:40
【摘要】:當(dāng)今時(shí)代,信息爆炸和信息迷向是所有信息用戶所面臨的現(xiàn)狀之一。面對(duì)互聯(lián)網(wǎng)我們渴望能通過(guò)搜索引擎從海量的信息中找到自己所真正需要的信息。由于用戶自身的知識(shí)、背景以及所處的環(huán)境等各種因素,用戶提交給搜索引擎的查詢?cè)~往往不能準(zhǔn)確的表達(dá)其信息需求。目前有學(xué)者單獨(dú)研究用戶基于搜索引擎的信息行為規(guī)律,期望能從用戶的行為中發(fā)現(xiàn)用戶的興趣;也有學(xué)者考慮通過(guò)網(wǎng)絡(luò)的形式進(jìn)行問(wèn)卷調(diào)查獲取用戶信息需求。本文所不同的是將用戶的信息行為特征結(jié)合數(shù)據(jù)挖掘技術(shù)來(lái)建立識(shí)別用戶信息需求的模型,以此來(lái)自動(dòng)獲取用戶的信息需求,并期望將該模型用于提高搜索引擎的效率。本文側(cè)重在通過(guò)用戶的信息行為特征來(lái)挖掘用戶的查詢?nèi)罩?建立用戶信息需求的自動(dòng)分類模型。 本文首先對(duì)Web日志挖掘和用戶信息需求兩個(gè)方面的理論進(jìn)行研究與分析,闡述了本文研究的理論基礎(chǔ),并提出要研究的問(wèn)題。其次針對(duì)日志挖掘的數(shù)據(jù)預(yù)處理階段做了詳細(xì)的描述,介紹了本文數(shù)據(jù)的來(lái)源,數(shù)據(jù)的格式以及日志數(shù)據(jù)的清洗轉(zhuǎn)換、用戶識(shí)別等預(yù)處理操作過(guò)程。然后對(duì)用戶的信息搜索行為進(jìn)行分類,主要是針對(duì)用戶的潛在搜索行為,利用簡(jiǎn)單的統(tǒng)計(jì)方法總結(jié)出搜索引擎用戶一些基本的行為特征和規(guī)律。最后將基于搜索引擎的用戶信息需求進(jìn)行劃分,分別為導(dǎo)航類信息需求和信息事務(wù)類信息需求,并利用用戶的信息行為特征建立用戶信息需求的自動(dòng)分類模型。
[Abstract]:Nowadays, information explosion and information confusion are one of the current situations faced by all information users. In the face of the Internet, we are eager to find the information we really need from the vast amount of information through search engines. Due to various factors such as users' own knowledge, background and environment, the query words submitted by users to search engines are often unable to express their information requirements accurately. At present, some scholars study the rules of users' information behavior based on search engine alone, hoping to find the user's interest from the user's behavior, and some scholars consider obtaining users' information needs through questionnaire survey through the form of network. What is different in this paper is that the information behavior characteristics of users are combined with data mining technology to establish a model to identify the information requirements of users so as to automatically obtain the information requirements of users and expect this model to be used to improve the efficiency of search engines. This paper focuses on mining the user's query log through the characteristics of user's information behavior, and establishes the automatic classification model of user's information requirement. Firstly, this paper studies and analyzes the theory of Web log mining and user information requirement, expounds the theoretical basis of this research, and puts forward the problems to be studied. Secondly, the data pre-processing stage of log mining is described in detail, and the data source, data format, cleaning and transformation of log data, user identification and other pre-processing procedures are introduced in this paper. Then the information search behavior of users is classified, mainly aiming at the potential search behavior of users, using simple statistical method to summarize some basic behavior characteristics and rules of users in search engine. Finally, the user information requirements based on search engine are divided into navigation information requirements and information transaction information requirements, and an automatic classification model of user information requirements is established by using the information behavior characteristics of users.
【學(xué)位授予單位】:華中師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:G350
本文編號(hào):2471626
[Abstract]:Nowadays, information explosion and information confusion are one of the current situations faced by all information users. In the face of the Internet, we are eager to find the information we really need from the vast amount of information through search engines. Due to various factors such as users' own knowledge, background and environment, the query words submitted by users to search engines are often unable to express their information requirements accurately. At present, some scholars study the rules of users' information behavior based on search engine alone, hoping to find the user's interest from the user's behavior, and some scholars consider obtaining users' information needs through questionnaire survey through the form of network. What is different in this paper is that the information behavior characteristics of users are combined with data mining technology to establish a model to identify the information requirements of users so as to automatically obtain the information requirements of users and expect this model to be used to improve the efficiency of search engines. This paper focuses on mining the user's query log through the characteristics of user's information behavior, and establishes the automatic classification model of user's information requirement. Firstly, this paper studies and analyzes the theory of Web log mining and user information requirement, expounds the theoretical basis of this research, and puts forward the problems to be studied. Secondly, the data pre-processing stage of log mining is described in detail, and the data source, data format, cleaning and transformation of log data, user identification and other pre-processing procedures are introduced in this paper. Then the information search behavior of users is classified, mainly aiming at the potential search behavior of users, using simple statistical method to summarize some basic behavior characteristics and rules of users in search engine. Finally, the user information requirements based on search engine are divided into navigation information requirements and information transaction information requirements, and an automatic classification model of user information requirements is established by using the information behavior characteristics of users.
【學(xué)位授予單位】:華中師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:G350
【引證文獻(xiàn)】
相關(guān)期刊論文 前1條
1 倪曼蒂;覃擁軍;;基于Web日志挖掘的用戶模式識(shí)別研究[J];現(xiàn)代計(jì)算機(jī);2013年16期
相關(guān)碩士學(xué)位論文 前1條
1 李曼;Web日志挖掘技術(shù)在心靈家園網(wǎng)中的應(yīng)用研究[D];河南大學(xué);2013年
,本文編號(hào):2471626
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2471626.html
最近更新
教材專著