當(dāng)前位置：主頁(yè) > 管理論文 > 移動(dòng)網(wǎng)絡(luò)論文 >

某電子商務(wù)網(wǎng)站搜索日志分析系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)

發(fā)布時(shí)間：2018-04-25 09:32

本文選題：日志分析 + 關(guān)鍵詞��；參考：《中國(guó)科學(xué)院大學(xué)(中國(guó)科學(xué)院工程管理與信息技術(shù)學(xué)院)》2017年碩士論文

【摘要】：隨著互聯(lián)網(wǎng)的飛速發(fā)展以及網(wǎng)站數(shù)量的急劇增加,各網(wǎng)站對(duì)用戶的爭(zhēng)奪變得越來(lái)越激烈。為了更好的吸引和留住用戶,需要更好地了解網(wǎng)站用戶的使用行為,研究并分析搜索引擎的日志已成為在海量數(shù)據(jù)中獲取用戶行為有效數(shù)據(jù)的主要方法�；诖�,為了更好地捕捉網(wǎng)站用戶現(xiàn)實(shí)需求,了解用戶意愿,本文設(shè)計(jì)和實(shí)現(xiàn)了一個(gè)網(wǎng)站搜索日志分析系統(tǒng),從而幫助網(wǎng)站能更好的服務(wù)客戶,并實(shí)現(xiàn)網(wǎng)站的快速發(fā)展。不同網(wǎng)站的搜索引擎針對(duì)的目標(biāo)群體不同,本文的研究對(duì)象是某電子商務(wù)行業(yè)網(wǎng)站的搜索日志,通過(guò)建立日志分析系統(tǒng)來(lái)了解網(wǎng)站的用戶行為模式,并挖掘其潛在需求。本系統(tǒng)設(shè)計(jì)中存在的最大困難之處在于如何搜索海量的日志數(shù)據(jù),并實(shí)現(xiàn)搜索的高速性和準(zhǔn)確性。主要研究?jī)?nèi)容如下:1,搜索日志的收集格式使用NCSA擴(kuò)展日志格式,網(wǎng)站頁(yè)面各分析項(xiàng)使用標(biāo)簽記錄,使用開(kāi)源Apache和Flume海量日志采集系統(tǒng)進(jìn)行日志收集,使網(wǎng)站日志收集具有高效,準(zhǔn)確,及時(shí)等特點(diǎn),減輕了開(kāi)發(fā)和測(cè)試的壓力和負(fù)擔(dān),同時(shí)降低了風(fēng)險(xiǎn)。頁(yè)面各統(tǒng)計(jì)項(xiàng)通過(guò)添加標(biāo)簽使分析日志具備了簡(jiǎn)單,準(zhǔn)確的特點(diǎn),降低了日志分析的負(fù)擔(dān)。2,使用分布式處理平臺(tái)Hadoop對(duì)日志進(jìn)行分析,論文中主要分析基于HDFS文件存儲(chǔ)和Map/Reduce的分布式處理的關(guān)鍵技術(shù),對(duì)日志分析的實(shí)現(xiàn)過(guò)程進(jìn)行了詳細(xì)的描述和分析,通過(guò)使用Hadoop解決了海量日志分析處理的時(shí)效性和準(zhǔn)確性的問(wèn)題,并且代碼開(kāi)發(fā)非常簡(jiǎn)單,難度大幅度降低,項(xiàng)目推進(jìn)的效率提升明顯。3,設(shè)計(jì)并實(shí)現(xiàn)了用戶行為的分析模型和用戶信息質(zhì)量的評(píng)分模型,通過(guò)這兩個(gè)模型我們可以獲知用戶的網(wǎng)站瀏覽偏好以及用戶信息的質(zhì)量信息,以及關(guān)鍵詞相關(guān)性的信息,建立了用戶的偏好瀏覽模型和信息聚類模型,為信息聚合和個(gè)性化的搜索提供了數(shù)據(jù)支撐。最后,通過(guò)對(duì)上線系統(tǒng)運(yùn)行兩周后的結(jié)果分析,并且按照分析結(jié)果搜索重新進(jìn)行排序設(shè)置以及聚類展示,很好地提升了使用效果,系統(tǒng)也達(dá)到了預(yù)期的目標(biāo)。
[Abstract]:With the rapid development of the Internet and the rapid increase of the number of websites, the competition for users becomes more and more fierce. In order to attract and retain users better, it is necessary to understand the user's behavior better. The research and analysis of search engine log has become the main method to obtain the effective data of user behavior in the massive data. Based on this, this paper designs and implements a website search log analysis system in order to better capture the actual needs of website users and understand users' wishes, so as to help the website to better serve customers and realize the rapid development of the website. The search engine of different websites aims at different target groups. The research object of this paper is the search log of a website in an electronic commerce industry. Through the establishment of log analysis system, we can understand the user behavior pattern of the website and mine its potential demand. The biggest difficulty in the design of this system is how to search the massive log data and realize the high speed and accuracy of the search. The main research contents are as follows: the search log collection format uses NCSA extended log format, the analysis items of website pages use label recording, and the open source Apache and Flume massive log collection system are used for log collection, which makes the website log collection efficient. Accurate, timely and other characteristics, reduce the development and testing of the pressure and burden, while reducing the risk. Each statistical item on the page has simple and accurate features by adding tags, and reduces the burden of log analysis. 2. The distributed processing platform Hadoop is used to analyze the log. In this paper, the key technologies of distributed processing based on HDFS file storage and Map/Reduce are analyzed, and the implementation process of log analysis is described and analyzed in detail. Through the use of Hadoop to solve the problem of timeliness and accuracy of massive log analysis and processing, and the code development is very simple, the difficulty is greatly reduced, The efficiency of the project is improved obviously. 3. The analysis model of user behavior and the scoring model of user information quality are designed and implemented. Through these two models, we can get the user's preference for browsing website and the quality information of user information. The user preference browsing model and information clustering model are established, which provide data support for information aggregation and personalized search. Finally, by analyzing the results of the on-line system after two weeks' running, and reordering and clustering display according to the analysis result search, the system improves the use effect well, and the system also achieves the expected goal.
【學(xué)位授予單位】：中國(guó)科學(xué)院大學(xué)(中國(guó)科學(xué)院工程管理與信息技術(shù)學(xué)院)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP393.092;TP391.3

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 文娟,薛永生,段江嬌,王勁波;基于關(guān)聯(lián)規(guī)則的日志分析系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];廈門大學(xué)學(xué)報(bào)(自然科學(xué)版);2005年S1期

2 張曉剛;潘久輝;;MS SQL Server 2000日志分析方法的研究與實(shí)現(xiàn)[J];計(jì)算機(jī)工程與設(shè)計(jì);2006年19期

3 李春林;周根鴻;張文體;;重視日志審計(jì)確保數(shù)據(jù)安全[J];醫(yī)學(xué)信息;2007年10期

4 梁曉雪;王鋒;;基于聚類的日志分析技術(shù)綜述與展望[J];云南大學(xué)學(xué)報(bào)(自然科學(xué)版);2009年S1期

5 黃海隆;陳賽娉;;計(jì)算機(jī)日志分析與管理方法的研究[J];大眾科技;2006年07期

6 鄭毅;;基于日志分析的網(wǎng)絡(luò)IDS研究[J];襄樊學(xué)院學(xué)報(bào);2008年11期

7 陳庭平;沈麗娟;曾鵬;;日志服務(wù)器建設(shè)和應(yīng)用[J];網(wǎng)絡(luò)安全技術(shù)與應(yīng)用;2010年09期

8 鄒先霞;賈維嘉;潘久輝;;基于數(shù)據(jù)庫(kù)日志的變化數(shù)據(jù)捕獲研究[J];小型微型計(jì)算機(jī)系統(tǒng);2012年03期

9 羅新;;防火墻日志分析系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];計(jì)算機(jī)時(shí)代;2012年02期

10 姜良華;崔建明;;Serv-U FTP服務(wù)器日志分析系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)[J];電腦知識(shí)與技術(shù);2010年28期

相關(guān)會(huì)議論文前10條

1 馬辰;武斌;;一種基于攻擊事件圖的蜜網(wǎng)日志分析方法[A];虛擬運(yùn)營(yíng)與云計(jì)算——第十八屆全國(guó)青年通信學(xué)術(shù)年會(huì)論文集（下冊(cè)）[C];2013年

2 周濤;;基于數(shù)據(jù)挖掘的入侵檢測(cè)日志分析技術(shù)研究[A];第二屆中國(guó)科學(xué)院博士后學(xué)術(shù)年會(huì)暨高新技術(shù)前沿與發(fā)展學(xué)術(shù)會(huì)議程序冊(cè)[C];2010年

3 陳晨;鄭康鋒;;一種基于支持向量機(jī)的蜜網(wǎng)系統(tǒng)日志分析方法[A];2011年通信與信息技術(shù)新進(jìn)展——第八屆中國(guó)通信學(xué)會(huì)學(xué)術(shù)年會(huì)論文集[C];2011年

4 劉莉;;基于多協(xié)議技術(shù)的日志集中管理安全方案[A];2008年中國(guó)通信學(xué)會(huì)無(wú)線及移動(dòng)通信委員會(huì)學(xué)術(shù)年會(huì)論文集[C];2008年

5 耿濤;;Web日志分析在電子數(shù)據(jù)取證中的應(yīng)用[A];第二十一次全國(guó)計(jì)算機(jī)安全學(xué)術(shù)交流會(huì)論文集[C];2006年

6 閆龍川;王懷宇;李楓;毛一凡;;基于Hadoop的郵件日志分析與研究[A];2012電力行業(yè)信息化年會(huì)論文集[C];2012年

7 陳慶章;王磊;毛科技;戴國(guó)勇;;基于防火墻日志的在線攻擊偵查系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)(英文)[A];全國(guó)第19屆計(jì)算機(jī)技術(shù)與應(yīng)用（CACIS）學(xué)術(shù)會(huì)議論文集（下冊(cè)）[C];2008年

8 王振亞;武斌;;基于MFI-WT算法的蜜網(wǎng)日志分析方法[A];第十七屆全國(guó)青年通信學(xué)術(shù)年會(huì)論文集[C];2012年

9 金松昌;方濱興;楊樹(shù)強(qiáng);賈焰;;基于Hadoop的網(wǎng)絡(luò)安全日志分析系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[A];全國(guó)計(jì)算機(jī)安全學(xué)術(shù)交流會(huì)論文集·第二十五卷[C];2010年

10 朱金清;王建新;陳志泊;;基于APRIORI的層次化聚類算法及其在IDS日志分析中的應(yīng)用[A];第二十四屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集（研究報(bào)告篇）[C];2007年

相關(guān)重要報(bào)紙文章前10條

1 中航工業(yè)南方航空工業(yè)集團(tuán)(有限)公司科技與信息部鄒滬湘;分析日志識(shí)別暴力破解[N];計(jì)算機(jī)世界;2013年

2 ;日志分析中的五個(gè)誤區(qū)[N];網(wǎng)絡(luò)世界;2004年

3 陳代壽;網(wǎng)管的四兩撥千斤[N];中國(guó)計(jì)算機(jī)報(bào);2004年

4 IBM大數(shù)據(jù)專家 James Kobielus　范范編譯;大數(shù)據(jù)日志分析借機(jī)器學(xué)習(xí)騰飛[N];網(wǎng)絡(luò)世界;2014年

5 《網(wǎng)絡(luò)世界》評(píng)測(cè)實(shí)驗(yàn)室于洋;用好Web日志[N];網(wǎng)絡(luò)世界;2004年

6 重慶航行者;IIS的安全[N];電腦報(bào);2002年

7 河南工業(yè)職業(yè)技術(shù)學(xué)院邱建新;監(jiān)測(cè)Squid日志的五種方法[N];計(jì)算機(jī)世界;2005年

8 shotgun;入侵檢測(cè)初步（上）[N];電腦報(bào);2001年

9 朱閔;淺談企業(yè)核心應(yīng)用的安全審計(jì)(下)[N];網(wǎng)絡(luò)世界;2008年

10 覃進(jìn)文;在Windows 2000&&2003下快速安裝Webalizer[N];中國(guó)電腦教育報(bào);2003年

相關(guān)博士學(xué)位論文前3條

1 饒翔;基于日志的大規(guī)模分布式軟件系統(tǒng)可信保障技術(shù)研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2011年

2 曹志波;基于日志的任務(wù)建模及調(diào)度優(yōu)化的研究[D];華南理工大學(xué);2014年

3 胡蓉;WEB日志和子空間聚類挖掘算法研究[D];華中科技大學(xué);2008年

相關(guān)碩士學(xué)位論文前10條

1 張?zhí)焐?日志采集與分析在Web網(wǎng)站中的設(shè)計(jì)與實(shí)現(xiàn)[D];上海交通大學(xué);2015年

2 周海靖;日志大數(shù)據(jù)分析平臺(tái)技術(shù)研究[D];山東大學(xué);2015年

3 賴特;網(wǎng)絡(luò)安全設(shè)備日志融合技術(shù)研究[D];電子科技大學(xué);2015年

4 董妍妍;基于Hadoop的Teradata數(shù)據(jù)倉(cāng)庫(kù)日志分析系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];南京大學(xué);2014年

5 李名弈;IPTVQOS日志分析方法研究[D];復(fù)旦大學(xué);2013年

6 劉季函(Liu,Chi Han);基于Spark的網(wǎng)絡(luò)日志分析系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];南京大學(xué);2014年

7 李榮榮;基于Hadoop平臺(tái)的日志分析系統(tǒng)[D];復(fù)旦大學(xué);2013年

8 周云斌;基于主機(jī)的日志大數(shù)椐分析及安全性檢查[D];大連理工大學(xué);2015年

9 張迪;基于NoSQL的大規(guī)模Web日志分析系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];復(fù)旦大學(xué);2013年

10 潘宇軒;基于Django的日志分析系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];南京大學(xué);2014年

，

本文編號(hào)：1800770

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/guanlilunwen/ydhl/1800770.html

上一篇：基于自校正分散控制的Web服務(wù)器比例延遲保證
下一篇：基于高分子鏈的入侵容忍系統(tǒng)病毒吸附算法

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

某電子商務(wù)網(wǎng)站搜索日志分析系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)