教學(xué)資源搜索平臺Web日志挖掘技術(shù)研究
發(fā)布時間:2018-05-19 02:10
本文選題:Web日志挖掘 + 數(shù)據(jù)預(yù)處理; 參考:《廣西大學(xué)》2014年碩士論文
【摘要】:隨著Web應(yīng)用的不斷增多,Web數(shù)據(jù)庫的規(guī)模也在不斷擴大,其數(shù)據(jù)量亦逐漸加大。Web日志挖掘利用數(shù)據(jù)挖掘技術(shù)對web服務(wù)器的log日志進行挖掘分析,探究日志中潛在的規(guī)則與模式,最終將其應(yīng)用到網(wǎng)站架構(gòu)設(shè)計、個性化服務(wù)等方面。Web日志挖掘的過程通常分為三個階段:數(shù)據(jù)預(yù)處理階段、模式發(fā)現(xiàn)階段以及模式分析階段。在整個Web日志挖掘過程中,最為重要的是數(shù)據(jù)預(yù)處理階段,它能直接影響到后面模式識別與模式分析的算法性能及計算結(jié)果。其中會話識別是數(shù)據(jù)預(yù)處理的主要環(huán)節(jié),同時也是最為基礎(chǔ)、關(guān)鍵的步驟。本文的主要研究工作包括:(1)給出了一種基于動態(tài)時間閥值的Web會話識別方法。對目前常用的幾種會話識別方法進行了詳細(xì)的描述,分析了每種方法的優(yōu)缺點,在參考基于時間的啟發(fā)式識別方法基礎(chǔ)上,提出一種以站點首頁作為新會話的開始,以動態(tài)時間閥值來決定會話邊界的改進會話識別方法,給出了算法流程圖以及具體的實現(xiàn)方法。實驗結(jié)果表明,改進的會話識別方法不僅可以識別出更多的真實用戶會話,而且還能有效地提高會話識別的精確度和識全度。(2)設(shè)計了一個基于Web日志挖掘的教學(xué)資源搜索平臺。該平臺以廣西中醫(yī)藥大學(xué)學(xué)校網(wǎng)站IIS日志為處理對象,選取了2013年7月某天的日志信息作為系統(tǒng)的分析數(shù)據(jù)。設(shè)計了系統(tǒng)的整體架構(gòu),對系統(tǒng)各主要模塊的功能進行了詳細(xì)的說明,給出了數(shù)據(jù)表結(jié)構(gòu)和每個環(huán)節(jié)的流程圖,編程實現(xiàn)了原型系統(tǒng)。
[Abstract]:With the increasing of Web application, the scale of web database is also expanding, and the data amount of web log mining is also gradually increasing. The data mining technology is used to mine and analyze the log of web server, and to explore the potential rules and patterns in the log. The process of Web log mining is usually divided into three stages: data preprocessing, pattern discovery and pattern analysis. In the whole process of Web log mining, the data preprocessing stage is the most important, which can directly affect the algorithm performance and calculation results of pattern recognition and pattern analysis. Session recognition is the main step of data preprocessing, and it is also the most basic and key step. The main research work in this paper includes: 1) A Web session recognition method based on dynamic time threshold is presented. This paper gives a detailed description of several commonly used methods of session recognition, analyzes the advantages and disadvantages of each method, and proposes a new session based on the first page of the site based on the reference of the heuristic recognition method based on time. An improved session recognition method based on the dynamic time threshold to determine the boundary of the session is presented. The algorithm flow chart and the implementation method are given. Experimental results show that the improved session recognition method can not only identify more real user sessions, but also effectively improve the accuracy and accuracy of session identification. (2) A teaching resource search platform based on Web log mining is designed. The platform takes the IIS log of Guangxi University of traditional Chinese Medicine as the processing object and selects the log information of July 2013 as the systematic analysis data. The whole structure of the system is designed, the functions of the main modules of the system are explained in detail, the structure of the data table and the flow chart of each link are given, and the prototype system is realized by programming.
【學(xué)位授予單位】:廣西大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP391.1;TP393.09
【參考文獻】
相關(guān)期刊論文 前1條
1 趙潔;董振寧;張沙清;肖南峰;;一種多粒度Web使用數(shù)據(jù)收集方法[J];現(xiàn)代圖書情報技術(shù);2011年02期
,本文編號:1908236
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1908236.html
最近更新
教材專著