基于Web日志挖掘和關聯(lián)規(guī)則的個性化推薦系統(tǒng)模型研究

發(fā)布時間：2018-08-17 09:30

【摘要】：隨著科學技術的飛速發(fā)展,互聯(lián)網(wǎng)提供的豐富信息在助推社會產(chǎn)業(yè)部門升級的同時也帶來了一些問題,如信息的急速增長易產(chǎn)生大爆炸效應,造成“信息過載”。同時,為了對互聯(lián)網(wǎng)用戶提供更加全面的信息資源,網(wǎng)站經(jīng)營者和管理者不斷向Web站點中添加信息,這就使得Web站點的拓撲結(jié)構(gòu)日益復雜化。由于向Web站點新添加的資源可能不符合用戶的真實需求,易造成用戶瀏覽Web站點時出現(xiàn)“資源迷向”。因此,如何從海量的數(shù)據(jù)中發(fā)現(xiàn)人們感興趣的信息是我們面臨的問題。所以,出現(xiàn)了數(shù)據(jù)挖掘在Web站點分析中的應用,即Web挖掘。 Web挖掘是一項綜合技術,它涉及Web技術、數(shù)據(jù)挖掘、信息學、計算機語言學等多個領域。Web挖掘可以在很多方面發(fā)揮作用,如對搜索引擎的結(jié)構(gòu)進行挖掘,確定權(quán)威頁面,Web文檔分類,Web使用挖掘,智能查詢,建立Metaweb數(shù)據(jù)倉庫等。Web使用挖掘就是從服務器日志中發(fā)現(xiàn)用戶行為特征和導航模式。本文系統(tǒng)闡述了數(shù)據(jù)挖掘、Web挖掘以及Web使用挖掘的整個流程,重點研究了Web日志預處理過程、關聯(lián)規(guī)則挖掘模型和滑動窗口推薦模型三方面內(nèi)容。首先,Web日志預處理過程包括：數(shù)據(jù)清理、用戶識別、會話識別、路徑補充和事務識別。經(jīng)過預處理階段,可以從用戶訪問信息中去除大量無關的數(shù)據(jù),同時也對Internet上的用戶訪問信息進行結(jié)構(gòu)化處理,并將其以事務或會話的形式保存在關系數(shù)據(jù)庫中。然后,對預處理后的數(shù)據(jù),本文采用加權(quán)關聯(lián)規(guī)則對其進行挖掘。經(jīng)典的關聯(lián)規(guī)則挖掘算法Apriori不僅能夠發(fā)現(xiàn)Web訪問頁面之間的相互聯(lián)系,而且對發(fā)現(xiàn)用戶偏好導航模式有重要作用。但是,將Apriori算法應用于Web日志挖掘也有其主觀局限性。Apriori算法隱含的假設是所有頁面的重要性是相同的,它并沒有考慮到頁面之間的差異性,因此,使用該規(guī)則挖掘出來的數(shù)據(jù)中可能會遺漏掉某些用戶感興趣的頁面。針對Apriori算法在Web日志挖掘應用中存在的不足,本文引入“頁面權(quán)值”這一概念,它反映了用戶對頁面的真實喜好。根據(jù)頁面權(quán)值的定義,我們綜合考慮用戶對頁面的瀏覽時間和訪問頻次兩個因素,并在此基礎上提出了W-Apriori算法。該算法采用擴展布爾矩陣的表示方式來描述事務數(shù)據(jù)庫,這樣有助于事務數(shù)據(jù)庫的壓縮。同時,權(quán)值的引入也有利于區(qū)分頁面之間的差異,有效地解決了挖掘過程中遺漏某些重要頁面的問題。最后,本文將挖掘得到的規(guī)則形成規(guī)則庫,結(jié)合使用滑動窗口技術,設計實踐基于關聯(lián)規(guī)則挖掘的Web日志推薦模型。該模型不僅能夠有效解決“信息過載”和“資源迷向”等問題。而且可以將用戶感興趣的頁面推薦給相關Web用戶,實現(xiàn)推薦的個性化。
[Abstract]:With the rapid development of science and technology, the rich information provided by the Internet not only promotes the upgrading of social industrial departments, but also brings some problems, such as the rapid growth of information is easy to produce a big bang effect, resulting in "information overload". At the same time, in order to provide more comprehensive information resources for Internet users, website operators and managers constantly add information to Web sites, which makes the topology of Web sites increasingly complex. Because the new resources added to the Web site may not meet the real needs of the user, it is easy to cause a "resource obsessive" when the user browses the Web site. Therefore, how to find the information that people are interested in from the massive data is the problem we face. Therefore, the application of data mining in Web site analysis, that is, Web mining, Web mining is a comprehensive technology, it involves Web technology, data mining, informatics, Web mining can play a role in many aspects, such as mining the structure of search engine, determining the authority page of Web document classification, Web usage mining, intelligent query, etc. Web usage mining, such as establishing Metaweb data warehouse, is to discover user behavior characteristics and navigation patterns from server logs. In this paper, the whole process of data mining and Web usage mining is systematically described, and three aspects of Web log preprocessing process, association rule mining model and sliding window recommendation model are studied. Firstly, the preprocessing process of Web log includes data cleaning, user identification, session identification, path supplement and transaction identification. After preprocessing, a large amount of irrelevant data can be removed from the user access information. At the same time, the user access information on Internet can be structured and stored in the relational database as a transaction or session. Then, this paper uses weighted association rules to mine the preprocessed data. Apriori, a classical association rule mining algorithm, can not only discover the relationship between Web pages, but also play an important role in discovering user preference navigation patterns. However, the application of Apriori algorithm to Web log mining also has its subjective limitations. The implicit assumption of the algorithm is that all pages are of the same importance, and it does not take into account the differences between pages. Some pages of interest to users may be omitted from the data mined using this rule. Aiming at the deficiency of Apriori algorithm in the application of Web log mining, this paper introduces the concept of "page weight", which reflects the users' real preference for pages. According to the definition of page weight, we consider two factors: browsing time and visiting frequency, and then we propose W-Apriori algorithm. The algorithm uses the extended Boolean matrix to describe the transaction database, which is helpful to the compression of the transaction database. At the same time, the introduction of weight also helps to distinguish the differences between pages, and effectively solves the problem of missing some important pages in the process of mining. Finally, this paper designs the Web log recommendation model based on association rule mining by combining the rule base mining and sliding window technology. The model not only can effectively solve the problems of information overload and resource misorientation. And users can be interested in the pages recommended to the relevant Web users, personalized recommendations.
【學位授予單位】：西南大學
【學位級別】：碩士
【學位授予年份】：2014
【分類號】：TP391.3

【參考文獻】

相關期刊論文前10條

1 田曉珍;尚冬娟;;Web的個性化服務[J];重慶工學院學報(自然科學版);2008年07期

2 張智軍,方穎,許云濤;基于Apriori算法的水平加權(quán)關聯(lián)規(guī)則挖掘[J];計算機工程與應用;2003年14期

3 顧明;仲萃豪;;MIS軟件開發(fā)的過程模型[J];計算機科學;1997年06期

4 郭巖;白碩;于滿泉;;Web使用信息挖掘綜述[J];計算機科學;2005年01期

5 張文獻,陸建江;加權(quán)布爾型關聯(lián)規(guī)則的研究[J];計算機工程;2003年09期

6 李成軍;楊天奇;;一種改進的加權(quán)關聯(lián)規(guī)則挖掘方法[J];計算機工程;2010年07期

7 張玉芳;熊忠陽;耿曉斐;陳劍敏;;Eclat算法的分析及改進[J];計算機工程;2010年23期

8 陳文;;基于Fp樹的加權(quán)頻繁模式挖掘算法[J];計算機工程;2012年06期

9 邢東山,沈鈞毅,宋擒豹;從Web日志中挖掘用戶瀏覽偏愛路徑[J];計算機學報;2003年11期

10 歐陽為民,鄭誠,蔡慶生;數(shù)據(jù)庫中加權(quán)關聯(lián)規(guī)則的發(fā)現(xiàn)[J];軟件學報;2001年04期

，

本文編號：2187191

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2187191.html

上一篇：基于改進PageRank算法的個性化搜索的研究
下一篇：基于信息資源組織視覺的新型OPAC系統(tǒng)設計研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于Web日志挖掘和關聯(lián)規(guī)則的個性化推薦系統(tǒng)模型研究