當(dāng)前位置：主頁 > 經(jīng)濟(jì)論文 > 電子商務(wù)論文 >

點(diǎn)擊流數(shù)據(jù)倉庫在電子商務(wù)中的研究與應(yīng)用

發(fā)布時(shí)間：2018-11-18 13:25

【摘要】：隨著數(shù)據(jù)庫技術(shù)的發(fā)展，企業(yè)的辦公效率大大地提高了。數(shù)據(jù)庫的廣泛應(yīng)用，，使得企業(yè)存儲(chǔ)的業(yè)務(wù)數(shù)據(jù)急劇的增長。存儲(chǔ)于企業(yè)之中的大量數(shù)據(jù)無法轉(zhuǎn)化成有效的信息，導(dǎo)致了“數(shù)據(jù)豐富，信息貧乏”的局面，這種情況使得企業(yè)對于數(shù)據(jù)庫的投資無法轉(zhuǎn)化成收益。數(shù)據(jù)倉庫可以存儲(chǔ)大量的歷史數(shù)據(jù)，它的出現(xiàn)很好地解決了這個(gè)問題。傳統(tǒng)的數(shù)據(jù)倉庫只是從各類業(yè)務(wù)數(shù)據(jù)庫之中加載數(shù)據(jù)，隨著Internet的發(fā)展，Web數(shù)據(jù)日益成為人們所關(guān)注的重要數(shù)據(jù)來源。在這些數(shù)據(jù)中，Web日志是十分重要的一種行為數(shù)據(jù)，它可以幫助決策者理解用戶習(xí)慣，進(jìn)而做出有針對性的部署。本文就是在這樣一種背景下，構(gòu)建了點(diǎn)擊流數(shù)據(jù)倉庫、實(shí)施了基于隱式關(guān)聯(lián)頁面的用戶聚類算法，并描述了用戶聚類算法如何在電子商務(wù)中的應(yīng)用。本文所構(gòu)建的點(diǎn)擊流數(shù)據(jù)倉庫以電子商務(wù)環(huán)境為應(yīng)用背景，以Web日志為重要數(shù)據(jù)源。數(shù)據(jù)倉庫設(shè)計(jì)采用了Inmon所倡導(dǎo)的數(shù)據(jù)倉庫+從屬數(shù)據(jù)集市的構(gòu)架，數(shù)據(jù)倉庫采用關(guān)系模型構(gòu)建，維度數(shù)據(jù)集市采用維度模型構(gòu)建。數(shù)據(jù)倉庫作為企業(yè)管理人員做出決策的數(shù)據(jù)基礎(chǔ)，它以第三范式的形式存儲(chǔ)了大量的、低粒度的業(yè)務(wù)歷史數(shù)據(jù)。從屬數(shù)據(jù)集市基于用戶的需求而構(gòu)造。采用數(shù)據(jù)倉庫+從屬數(shù)據(jù)集市架構(gòu)可以很好的平衡訪問效率和結(jié)構(gòu)調(diào)整的靈活性。在所構(gòu)建的點(diǎn)擊流數(shù)據(jù)倉庫的基礎(chǔ)上，本文給出了一種基于向量的點(diǎn)擊流用戶聚類算法。算法將用戶的點(diǎn)擊流數(shù)據(jù)映射為向量數(shù)據(jù)，根據(jù)向量之間夾角的大小程度來判斷用戶之間的相似程度。論文將隱式關(guān)聯(lián)頁面挖掘算法所得到的關(guān)聯(lián)頁面組作為向量的維度。隱式關(guān)聯(lián)頁面可以很好地反映用戶的訪問習(xí)慣，更好的突出感興趣的主題性。論文所屬算法在所構(gòu)建的實(shí)驗(yàn)性數(shù)據(jù)倉庫上進(jìn)行了驗(yàn)證。實(shí)驗(yàn)表明，算法能夠有效地識(shí)別用戶目標(biāo)頁面，發(fā)現(xiàn)兩項(xiàng)以上的隱式關(guān)聯(lián)頁面。用戶聚類亦可以更好地適應(yīng)復(fù)雜的互聯(lián)網(wǎng)環(huán)境。
[Abstract]:With the development of database technology, the office efficiency of enterprises has been greatly improved. With the wide application of database, the business data stored by enterprises increase rapidly. The large amount of data stored in the enterprise can not be converted into effective information, which leads to the situation of "rich data, poor information", which makes the enterprise's investment in the database can not be converted into income. Data warehouse can store a lot of historical data, and it solves this problem well. Traditional data warehouse only loads data from all kinds of business databases. With the development of Internet, Web data is becoming an important data source that people pay more and more attention to. Among these data, Web logging is a very important behavior data, it can help decision makers understand user habits, and then make targeted deployment. In this paper, we construct the click-stream data warehouse, implement the user clustering algorithm based on implicit association pages, and describe the application of user clustering algorithm in e-commerce. The click-stream data warehouse constructed in this paper is based on electronic commerce environment and Web log as important data source. The design of data warehouse adopts the framework of data warehouse subordinate data Mart advocated by Inmon. The data warehouse is constructed by relational model and dimension data Mart is constructed by dimension model. As a data base for enterprise managers to make decisions, data Warehouse stores a large amount of low granularity business history data in the form of the third normal form. Dependent data marts are constructed based on user needs. Using data warehouse subordinate data Mart architecture can balance access efficiency and flexibility of structure adjustment. Based on the click-stream data warehouse, a vector-based click-stream user clustering algorithm is presented in this paper. The algorithm maps the user's click-stream data to vector data and judges the similarity between users according to the magnitude of the angle between vectors. In this paper, the association page group obtained by the implicit association page mining algorithm is regarded as the dimension of the vector. Implicit association pages can well reflect the user's visiting habits and better highlight the theme of interest. The algorithm is verified on the experimental data warehouse. Experiments show that the algorithm can effectively identify user target pages and find more than two implicit association pages. User clustering can also better adapt to the complex Internet environment.
【學(xué)位授予單位】：遼寧工業(yè)大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2014
【分類號】：TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 郭曉淳;馬冬梅;;點(diǎn)擊流數(shù)據(jù)倉庫中基于事件驅(qū)動(dòng)的星型ER模型[J];信息技術(shù);2012年06期

2 褚紅丹;焦素云;馬威;;用戶訪問興趣路徑挖掘方法[J];計(jì)算機(jī)工程與應(yīng)用;2008年35期

3 林文龍;劉業(yè)政;余智學(xué);;用頁組拓?fù)淦骄嚯x改善頁面聚類算法[J];計(jì)算機(jī)科學(xué);2008年10期

4 劉嘉;祁奇;陳振宇;惠成峰;;ESSK:一種計(jì)算點(diǎn)擊流相似度的新方法[J];計(jì)算機(jī)科學(xué);2012年06期

5 馬超;沈微;;基于閉合有間隔頻繁子序列的點(diǎn)擊流聚類[J];計(jì)算機(jī)工程;2010年23期

6 周勇,鮑鈺;互聯(lián)網(wǎng)目標(biāo)頁面間隱式關(guān)聯(lián)規(guī)則的發(fā)現(xiàn)[J];計(jì)算機(jī)應(yīng)用;2004年08期

7 黎客來;湯震;;點(diǎn)擊流數(shù)據(jù)倉庫系統(tǒng)應(yīng)用研究[J];計(jì)算機(jī)與現(xiàn)代化;2008年02期

8 楊怡玲,管旭東,尤晉元;基于頁面內(nèi)容和站點(diǎn)結(jié)構(gòu)的頁面聚類挖掘算法[J];軟件學(xué)報(bào);2002年03期

9 李曉明;夏秀峰;張斌;;一種具有增量挖掘功能的Web點(diǎn)擊流聚類算法[J];沈陽大學(xué)學(xué)報(bào);2010年03期

10 曾陳萍;;點(diǎn)擊流數(shù)據(jù)倉庫的維度建模設(shè)計(jì)與實(shí)現(xiàn)[J];統(tǒng)計(jì)與決策;2008年08期

相關(guān)博士學(xué)位論文前1條

1 鮑鈺;WEB日志挖掘及其應(yīng)用研究[D];華東師范大學(xué);2010年

本文編號：2340145

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/jingjilunwen/dianzishangwulunwen/2340145.html

上一篇：關(guān)于邀請出席“2017年中國國際貿(mào)易學(xué)會(huì)年會(huì)暨國際貿(mào)易發(fā)展論壇”的通知
下一篇：情報(bào)學(xué)的創(chuàng)新與發(fā)展——第五屆全國情報(bào)學(xué)博士生論壇會(huì)議綜述

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

點(diǎn)擊流數(shù)據(jù)倉庫在電子商務(wù)中的研究與應(yīng)用