天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于Web內(nèi)容的業(yè)務(wù)洞察系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)

發(fā)布時(shí)間:2018-04-29 17:23

  本文選題:URL分析 + 網(wǎng)頁(yè)分類(lèi); 參考:《北京郵電大學(xué)》2017年碩士論文


【摘要】:互聯(lián)網(wǎng)時(shí)代是信息爆發(fā)的時(shí)代,人們可以瀏覽多種多樣的網(wǎng)絡(luò)資源,塑造自己獨(dú)特的瀏覽習(xí)慣。對(duì)于單個(gè)用戶而言,其訪問(wèn)的網(wǎng)絡(luò)資源信息的集合在一定程度上代表了其瀏覽習(xí)慣以及興趣愛(ài)好。目前針對(duì)這些日志的普遍處理方法是采用DPI技術(shù)進(jìn)行常規(guī)的字段統(tǒng)計(jì),不涉及到對(duì)報(bào)文內(nèi)的具體內(nèi)容的分析,或者針對(duì)內(nèi)容的分析只局限于URL指向的頁(yè)面內(nèi)容的目標(biāo)文本,忽視了 URL資源的結(jié)構(gòu)特點(diǎn)等諸多因素,最終降低了內(nèi)容分析的精度。將URL資源的背景知識(shí)等信息也作為分析的原材料,結(jié)合URL的多級(jí)結(jié)構(gòu)特點(diǎn)和網(wǎng)頁(yè)類(lèi)型特點(diǎn)實(shí)現(xiàn)對(duì)Web內(nèi)容(Web頁(yè)面和URL)的信息提取與分析的方法成為了研究重點(diǎn)。本文圍繞網(wǎng)絡(luò)運(yùn)營(yíng)商如何針對(duì)用戶進(jìn)行業(yè)務(wù)洞察的背景和需求,對(duì)基于Web內(nèi)容的業(yè)務(wù)洞察實(shí)現(xiàn)時(shí)所需要的相關(guān)技術(shù)方案進(jìn)行研究,最終設(shè)計(jì)并開(kāi)發(fā)完成基于Web內(nèi)容的業(yè)務(wù)洞察系統(tǒng)的搭建。主要研究?jī)?nèi)容有:1.研究新聞?lì)、視頻類(lèi)、電子商務(wù)類(lèi)的不同類(lèi)型網(wǎng)頁(yè)內(nèi)容提取。本文分析了不同類(lèi)型的網(wǎng)頁(yè)的結(jié)構(gòu)特點(diǎn)并設(shè)計(jì)和實(shí)現(xiàn)了不同類(lèi)型的網(wǎng)頁(yè)內(nèi)容的提取方法,最終運(yùn)用在URL分析和Web內(nèi)容分析等功能模塊中;2.研究URL標(biāo)簽信息獲取。本文對(duì)URL的結(jié)構(gòu)特點(diǎn)和背景知識(shí)進(jìn)行分析,并歸納總結(jié)出一種可以識(shí)別URL信息并對(duì)信息進(jìn)行統(tǒng)一化自動(dòng)管理的方法;3.研究系統(tǒng)的平臺(tái)架構(gòu)搭建方案。本文從需求出發(fā),將零散的技術(shù)以功能模塊的形式進(jìn)行整合,最終轉(zhuǎn)化為完整的系統(tǒng)。根據(jù)對(duì)相關(guān)技術(shù)研究和調(diào)研所得到的解決方案,本文實(shí)現(xiàn)了網(wǎng)頁(yè)信息多級(jí)標(biāo)簽獲取方法,將URL拆分成多個(gè)字段并對(duì)每個(gè)字段的內(nèi)容進(jìn)行歸類(lèi)和解析的方法以及通過(guò)網(wǎng)絡(luò)資源搜索匹配及識(shí)別信息的處理方法,并通過(guò)測(cè)試驗(yàn)證了這些方法的有效性;谝陨详P(guān)鍵技術(shù)方案的實(shí)現(xiàn),本文完成了基于Web內(nèi)容的業(yè)務(wù)洞察系統(tǒng)的開(kāi)發(fā),該系統(tǒng)根據(jù)用戶網(wǎng)絡(luò)訪問(wèn)日志中的請(qǐng)求URL字段集合,實(shí)現(xiàn)了 URL分析,網(wǎng)頁(yè)分類(lèi),Web內(nèi)容分析,規(guī)則管理等功能,將URL字段集合轉(zhuǎn)化為用戶的行為特征信息,為用戶特征提取提供基礎(chǔ),同時(shí)為網(wǎng)絡(luò)運(yùn)營(yíng)商等服務(wù)提供商針對(duì)用戶進(jìn)行業(yè)務(wù)洞察提供了先決條件。
[Abstract]:Internet era is the era of information explosion, people can browse a variety of network resources, shape their own unique browsing habits. To a certain extent, the collection of network resources information accessed by a single user represents their browsing habits and interests. At present, the general method of dealing with these logs is to use the DPI technology to carry on the conventional field statistics, which does not involve the analysis of the specific content in the message, or the analysis of the content is limited to the target text of the page content pointed to by the URL. Many factors, such as the structural characteristics of URL resources, are ignored, and the accuracy of content analysis is reduced. The information such as background knowledge of URL resources is also used as the raw material of analysis, and the method of extracting and analyzing the information of URL content web pages and URLs based on the characteristics of multilevel structure and web page type of URL has become the focus of research. This paper focuses on the background and requirements of network operators how to carry out business insight for users, and studies the relevant technical solutions needed for the realization of business insight based on Web content. Finally, we design and develop the business insight system based on Web content. The main research contents are: 1. Research on different types of web content extraction of news, video and e-commerce. This paper analyzes the structural characteristics of different types of web pages and designs and implements the extraction methods of different types of web pages. Finally, it is used in the functional modules of URL analysis and Web content analysis. URL tag information acquisition is studied. In this paper, the structural characteristics and background knowledge of URL are analyzed, and a method of recognizing URL information and managing it automatically is summarized. Research the platform architecture of the system. In this paper, the scattered technology is integrated in the form of functional modules, and finally transformed into a complete system. According to the solution of research and research on related technology, this paper realizes the method of obtaining multilevel tags of web information. The URL is divided into several fields and the contents of each field are classified and parsed, and the methods of searching, matching and identifying information through network resources are presented, and the validity of these methods is verified by testing. Based on the implementation of the above key technology, this paper completes the development of a business insight system based on Web content. According to the set of requested URL fields in user network access log, the system realizes URL analysis and web page classification. The function of rule management transforms the URL field set into the behavior characteristic information of the user, which provides the basis for the feature extraction of the user, and also provides the precondition for the service provider such as the network operator to carry on the service insight to the user.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:TP393.09

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 宋宇;羅準(zhǔn)辰;真溱;;基于引用背景信息的關(guān)鍵詞自動(dòng)抽取方法研究[J];情報(bào)理論與實(shí)踐;2016年11期

2 忻禾登;;基于NoSQL數(shù)據(jù)庫(kù)的大數(shù)據(jù)查詢技術(shù)[J];信息記錄材料;2016年04期

3 宋宇;真溱;;關(guān)鍵詞自動(dòng)抽取技術(shù)綜述[J];情報(bào)理論與實(shí)踐;2016年07期

4 居美云;;軟件測(cè)試用例設(shè)計(jì)[J];信息與電腦(理論版);2016年12期

5 朱澤德;李淼;張健;曾偉輝;曾新華;;一種基于LDA模型的關(guān)鍵詞抽取方法[J];中南大學(xué)學(xué)報(bào)(自然科學(xué)版);2015年06期

6 李華康;孫國(guó)梓;胥備;徐向陽(yáng);夏春蓉;;一種基于知識(shí)網(wǎng)絡(luò)血緣關(guān)系的網(wǎng)頁(yè)分類(lèi)方法[J];江蘇科技大學(xué)學(xué)報(bào)(自然科學(xué)版);2014年04期

7 曾超宇;李金香;;Redis在高速緩存系統(tǒng)中的應(yīng)用[J];微型機(jī)與應(yīng)用;2013年12期

8 孫立偉;何國(guó)輝;吳禮發(fā);;網(wǎng)絡(luò)爬蟲(chóng)技術(shù)的研究[J];電腦知識(shí)與技術(shù);2010年15期

9 胡學(xué)鋼;李星華;謝飛;吳信東;;基于詞匯鏈的中文新聞網(wǎng)頁(yè)關(guān)鍵詞抽取方法[J];模式識(shí)別與人工智能;2010年01期

10 許世明;武波;馬翠;邸思;徐洪奎;杜如虛;;一種基于預(yù)分類(lèi)的高效SVM中文網(wǎng)頁(yè)分類(lèi)器[J];計(jì)算機(jī)工程與應(yīng)用;2010年01期

相關(guān)碩士學(xué)位論文 前5條

1 楊鎰銘;基于URL模式的網(wǎng)頁(yè)分類(lèi)算法研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2016年

2 何金城;分布式數(shù)據(jù)管理平臺(tái)的設(shè)計(jì)與實(shí)現(xiàn)[D];中山大學(xué);2015年

3 孫駿雄;基于網(wǎng)絡(luò)爬蟲(chóng)的網(wǎng)站信息采集技術(shù)研究[D];大連海事大學(xué);2014年

4 莫卓穎;基于語(yǔ)義DOM的WEB信息抽取[D];廣西師范大學(xué);2012年

5 何維;行業(yè)網(wǎng)站分類(lèi)方法研究與應(yīng)用[D];浙江大學(xué);2006年

,

本文編號(hào):1820872

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1820872.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶1a32e***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com