天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

基于Web日志挖掘和關(guān)聯(lián)規(guī)則的個性化推薦系統(tǒng)模型研究

發(fā)布時間:2018-08-17 09:30
【摘要】:隨著科學(xué)技術(shù)的飛速發(fā)展,互聯(lián)網(wǎng)提供的豐富信息在助推社會產(chǎn)業(yè)部門升級的同時也帶來了一些問題,如信息的急速增長易產(chǎn)生大爆炸效應(yīng),造成“信息過載”。同時,為了對互聯(lián)網(wǎng)用戶提供更加全面的信息資源,網(wǎng)站經(jīng)營者和管理者不斷向Web站點中添加信息,這就使得Web站點的拓?fù)浣Y(jié)構(gòu)日益復(fù)雜化。由于向Web站點新添加的資源可能不符合用戶的真實需求,易造成用戶瀏覽Web站點時出現(xiàn)“資源迷向”。因此,如何從海量的數(shù)據(jù)中發(fā)現(xiàn)人們感興趣的信息是我們面臨的問題。所以,出現(xiàn)了數(shù)據(jù)挖掘在Web站點分析中的應(yīng)用,即Web挖掘。 Web挖掘是一項綜合技術(shù),它涉及Web技術(shù)、數(shù)據(jù)挖掘、信息學(xué)、計算機(jī)語言學(xué)等多個領(lǐng)域。Web挖掘可以在很多方面發(fā)揮作用,如對搜索引擎的結(jié)構(gòu)進(jìn)行挖掘,確定權(quán)威頁面,Web文檔分類,Web使用挖掘,智能查詢,建立Metaweb數(shù)據(jù)倉庫等。Web使用挖掘就是從服務(wù)器日志中發(fā)現(xiàn)用戶行為特征和導(dǎo)航模式。本文系統(tǒng)闡述了數(shù)據(jù)挖掘、Web挖掘以及Web使用挖掘的整個流程,重點研究了Web日志預(yù)處理過程、關(guān)聯(lián)規(guī)則挖掘模型和滑動窗口推薦模型三方面內(nèi)容。 首先,Web日志預(yù)處理過程包括:數(shù)據(jù)清理、用戶識別、會話識別、路徑補(bǔ)充和事務(wù)識別。經(jīng)過預(yù)處理階段,可以從用戶訪問信息中去除大量無關(guān)的數(shù)據(jù),同時也對Internet上的用戶訪問信息進(jìn)行結(jié)構(gòu)化處理,并將其以事務(wù)或會話的形式保存在關(guān)系數(shù)據(jù)庫中。 然后,對預(yù)處理后的數(shù)據(jù),本文采用加權(quán)關(guān)聯(lián)規(guī)則對其進(jìn)行挖掘。 經(jīng)典的關(guān)聯(lián)規(guī)則挖掘算法Apriori不僅能夠發(fā)現(xiàn)Web訪問頁面之間的相互聯(lián)系,而且對發(fā)現(xiàn)用戶偏好導(dǎo)航模式有重要作用。但是,將Apriori算法應(yīng)用于Web日志挖掘也有其主觀局限性。Apriori算法隱含的假設(shè)是所有頁面的重要性是相同的,它并沒有考慮到頁面之間的差異性,因此,使用該規(guī)則挖掘出來的數(shù)據(jù)中可能會遺漏掉某些用戶感興趣的頁面。 針對Apriori算法在Web日志挖掘應(yīng)用中存在的不足,本文引入“頁面權(quán)值”這一概念,它反映了用戶對頁面的真實喜好。根據(jù)頁面權(quán)值的定義,我們綜合考慮用戶對頁面的瀏覽時間和訪問頻次兩個因素,并在此基礎(chǔ)上提出了W-Apriori算法。該算法采用擴(kuò)展布爾矩陣的表示方式來描述事務(wù)數(shù)據(jù)庫,這樣有助于事務(wù)數(shù)據(jù)庫的壓縮。同時,權(quán)值的引入也有利于區(qū)分頁面之間的差異,有效地解決了挖掘過程中遺漏某些重要頁面的問題。 最后,本文將挖掘得到的規(guī)則形成規(guī)則庫,結(jié)合使用滑動窗口技術(shù),設(shè)計實踐基于關(guān)聯(lián)規(guī)則挖掘的Web日志推薦模型。該模型不僅能夠有效解決“信息過載”和“資源迷向”等問題。而且可以將用戶感興趣的頁面推薦給相關(guān)Web用戶,實現(xiàn)推薦的個性化。
[Abstract]:With the rapid development of science and technology, the rich information provided by the Internet not only promotes the upgrading of social industrial departments, but also brings some problems, such as the rapid growth of information is easy to produce a big bang effect, resulting in "information overload". At the same time, in order to provide more comprehensive information resources for Internet users, website operators and managers constantly add information to Web sites, which makes the topology of Web sites increasingly complex. Because the new resources added to the Web site may not meet the real needs of the user, it is easy to cause a "resource obsessive" when the user browses the Web site. Therefore, how to find the information that people are interested in from the massive data is the problem we face. Therefore, the application of data mining in Web site analysis, that is, Web mining, Web mining is a comprehensive technology, it involves Web technology, data mining, informatics, Web mining can play a role in many aspects, such as mining the structure of search engine, determining the authority page of Web document classification, Web usage mining, intelligent query, etc. Web usage mining, such as establishing Metaweb data warehouse, is to discover user behavior characteristics and navigation patterns from server logs. In this paper, the whole process of data mining and Web usage mining is systematically described, and three aspects of Web log preprocessing process, association rule mining model and sliding window recommendation model are studied. Firstly, the preprocessing process of Web log includes data cleaning, user identification, session identification, path supplement and transaction identification. After preprocessing, a large amount of irrelevant data can be removed from the user access information. At the same time, the user access information on Internet can be structured and stored in the relational database as a transaction or session. Then, this paper uses weighted association rules to mine the preprocessed data. Apriori, a classical association rule mining algorithm, can not only discover the relationship between Web pages, but also play an important role in discovering user preference navigation patterns. However, the application of Apriori algorithm to Web log mining also has its subjective limitations. The implicit assumption of the algorithm is that all pages are of the same importance, and it does not take into account the differences between pages. Some pages of interest to users may be omitted from the data mined using this rule. Aiming at the deficiency of Apriori algorithm in the application of Web log mining, this paper introduces the concept of "page weight", which reflects the users' real preference for pages. According to the definition of page weight, we consider two factors: browsing time and visiting frequency, and then we propose W-Apriori algorithm. The algorithm uses the extended Boolean matrix to describe the transaction database, which is helpful to the compression of the transaction database. At the same time, the introduction of weight also helps to distinguish the differences between pages, and effectively solves the problem of missing some important pages in the process of mining. Finally, this paper designs the Web log recommendation model based on association rule mining by combining the rule base mining and sliding window technology. The model not only can effectively solve the problems of information overload and resource misorientation. And users can be interested in the pages recommended to the relevant Web users, personalized recommendations.
【學(xué)位授予單位】:西南大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 田曉珍;尚冬娟;;Web的個性化服務(wù)[J];重慶工學(xué)院學(xué)報(自然科學(xué)版);2008年07期

2 張智軍,方穎,許云濤;基于Apriori算法的水平加權(quán)關(guān)聯(lián)規(guī)則挖掘[J];計算機(jī)工程與應(yīng)用;2003年14期

3 顧明;仲萃豪;;MIS軟件開發(fā)的過程模型[J];計算機(jī)科學(xué);1997年06期

4 郭巖;白碩;于滿泉;;Web使用信息挖掘綜述[J];計算機(jī)科學(xué);2005年01期

5 張文獻(xiàn),陸建江;加權(quán)布爾型關(guān)聯(lián)規(guī)則的研究[J];計算機(jī)工程;2003年09期

6 李成軍;楊天奇;;一種改進(jìn)的加權(quán)關(guān)聯(lián)規(guī)則挖掘方法[J];計算機(jī)工程;2010年07期

7 張玉芳;熊忠陽;耿曉斐;陳劍敏;;Eclat算法的分析及改進(jìn)[J];計算機(jī)工程;2010年23期

8 陳文;;基于Fp樹的加權(quán)頻繁模式挖掘算法[J];計算機(jī)工程;2012年06期

9 邢東山,沈鈞毅,宋擒豹;從Web日志中挖掘用戶瀏覽偏愛路徑[J];計算機(jī)學(xué)報;2003年11期

10 歐陽為民,鄭誠,蔡慶生;數(shù)據(jù)庫中加權(quán)關(guān)聯(lián)規(guī)則的發(fā)現(xiàn)[J];軟件學(xué)報;2001年04期

,

本文編號:2187191

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2187191.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶6b8be***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com