天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

Web挖掘技術(shù)及其在互聯(lián)網(wǎng)中的應(yīng)用研究

發(fā)布時間:2018-10-26 19:57
【摘要】:隨著信息技術(shù)的不斷發(fā)展,計算機與通信技術(shù)不僅推動著現(xiàn)代社會的信息化發(fā)展,而且同時影響并在改變著人們的現(xiàn)代生活。然而信息技術(shù)同時帶來了數(shù)據(jù)的爆炸式增長,人們迫切需要一種對海量數(shù)據(jù)進行有效利用和處理的解決方案。在這樣的大數(shù)據(jù)背景下,數(shù)據(jù)挖掘技術(shù)應(yīng)運而生。Web挖掘技術(shù)作為該領(lǐng)域的一個分支,針對的是萬維網(wǎng)海量數(shù)據(jù)的有效梳理和運用。由于互聯(lián)網(wǎng)技術(shù)日新月異,而Web挖掘技術(shù)相對發(fā)展較晚,因此本文以Web挖掘作為研究核心,并深入分析其在互聯(lián)網(wǎng)領(lǐng)域的應(yīng)用。 本文首先介紹了Web技術(shù)的研究背景、現(xiàn)狀、技術(shù)難點和未來發(fā)展方向等方面,以及對數(shù)據(jù)挖掘、機器學(xué)習(xí)等相關(guān)概念做了深入說明。然后,繼續(xù)關(guān)注Web挖掘技術(shù)的實現(xiàn)過程和應(yīng)用場景,介紹了文本預(yù)處理的核心實現(xiàn)過程和話題檢測與追蹤、用戶行為分析兩個應(yīng)用的技術(shù)背景。 作為Web內(nèi)容挖掘技術(shù)的一個重要應(yīng)用之一,話題檢測與動態(tài)追蹤旨在檢測未知話題并且追蹤已有話題的后續(xù)發(fā)展。 針對網(wǎng)絡(luò)媒介上新聞事件報道類文本對象的話題檢測與動態(tài)追蹤問題,本文實現(xiàn)了一種混合聚類解決方案。本方案基于“貢獻度”對話題模型做了層次化調(diào)整,更加適合于構(gòu)建互聯(lián)網(wǎng)新聞話題,而且效率性能有了大幅提升。實際互聯(lián)網(wǎng)新聞數(shù)據(jù)表明,與K-Means算法相比,本方案準(zhǔn)確率和召回率有了顯著提升,并且構(gòu)建的話題樹模型層次化效果明顯。 針對中文微博類文本對象的話題檢測與動態(tài)追蹤問題,本文提出了一種基于主題詞的增量式模糊聚類解決方案。本方案首先根據(jù)微博自身的文本特點,提出了一套信息反垃圾的過濾方案。然后利用時效性和詞頻兩個因素,為主題詞建立適應(yīng)微博特點的權(quán)重。最后利用增量式模糊聚類方法完成突發(fā)話題的檢測過程。實際微博數(shù)據(jù)表明,本方案可以有效地檢測出突發(fā)事件、熱點話題等,而且時間效率較為理想。 作為Web使用挖掘技術(shù)的一個重要應(yīng)用之一用戶行為分析旨在了解用戶習(xí)慣、興趣點等,分析評測用戶的產(chǎn)品滿意度,以便改善產(chǎn)品提升用戶體驗。 針對搜索引擎的用戶滿意度評測,本文闡述了一種基于用戶使用行為的自動化解決方案。本方案首先介紹原始網(wǎng)絡(luò)日志預(yù)先處理過程,即從日志數(shù)據(jù)中得到具體用戶操作行為數(shù)據(jù)并進行特征抽取。然后,提出了一種基于CURE算法的推薦技術(shù),人工對選取的樣本進行標(biāo)注。最后,利用動態(tài)建模技術(shù)完成對用戶滿意度的模型構(gòu)建。實際搜索引擎數(shù)據(jù)表明,基于機器學(xué)習(xí)的自動化評測方案已經(jīng)接近人工評測水平,達到了實際應(yīng)用要求,并且動態(tài)模型通過多模型構(gòu)建、自動更新、反饋糾正等機制可以有效延長生命周期,提高了學(xué)習(xí)的延續(xù)性。
[Abstract]:With the continuous development of information technology, computer and communication technology not only promote the development of information technology in modern society, but also affect and change people's modern life at the same time. However, information technology has brought the explosive growth of data at the same time, people urgently need a solution to effectively use and process the massive data. Under the background of big data, data mining technology emerges as the times require. As a branch of this field, Web mining technology is aimed at the effective combing and application of the massive data of the World wide Web. Because of the rapid development of Internet technology and the relatively late development of Web mining technology, this paper takes Web mining as the core of research, and deeply analyzes its application in the field of Internet. This paper first introduces the research background, current situation, technical difficulties and future development direction of Web technology, as well as the related concepts such as data mining, machine learning and so on. Then, we continue to pay attention to the implementation process and application scenarios of Web mining technology, and introduce the core implementation process of text preprocessing, topic detection and tracking, and user behavior analysis technology background. As one of the important applications of Web content mining technology, topic detection and dynamic tracking aims to detect unknown topics and track the future development of existing topics. To solve the problem of topic detection and dynamic tracking of news event-like text objects on network media, a hybrid clustering solution is implemented in this paper. Based on the "contribution degree", the topic model is adjusted hierarchically, which is more suitable for the construction of Internet news topics, and the efficiency performance has been greatly improved. The actual Internet news data show that compared with the K-Means algorithm, the accuracy and recall rate of this scheme are significantly improved, and the hierarchical effect of the topic tree model is obvious. Aiming at the topic detection and dynamic tracking of Chinese Weibo text objects, an incremental fuzzy clustering solution based on theme words is proposed in this paper. Firstly, according to Weibo's own text characteristics, a set of information anti-spam filtering scheme is put forward. Then, by using the two factors of timeliness and word frequency, the weight of the theme words is established to suit Weibo's characteristics. Finally, incremental fuzzy clustering method is used to complete the detection process of burst topic. The actual Weibo data show that this scheme can effectively detect unexpected events, hot topics and so on, and the time efficiency is ideal. As an important application of Web usage mining technology, user behavior analysis aims at understanding user habits, points of interest, and analyzing and evaluating users' product satisfaction, in order to improve the product and enhance the user experience. According to the evaluation of user satisfaction of search engine, this paper presents an automatic solution based on user's use behavior. This scheme first introduces the pre-processing process of the original network log, that is, the user's operation behavior data is obtained from the log data and the feature extraction is carried out. Then, a recommendation technique based on CURE algorithm is proposed to label the selected samples manually. Finally, the dynamic modeling technology is used to build the model of user satisfaction. The actual search engine data show that the automated evaluation scheme based on machine learning is close to the level of manual evaluation and meets the requirements of practical application, and the dynamic model is automatically updated through multi-model construction. Feedback correction and other mechanisms can effectively prolong the life cycle and improve the continuity of learning.
【學(xué)位授予單位】:山東大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP311.13;TP391.1

【參考文獻】

相關(guān)期刊論文 前10條

1 陳學(xué)昌;韓佳珍;魏桂英;;話題識別與跟蹤技術(shù)發(fā)展研究[J];中國管理信息化;2011年09期

2 孫玲芳;夏聰;;Web使用挖掘在用戶行為分析中的應(yīng)用[J];江蘇科技大學(xué)學(xué)報(自然科學(xué)版);2011年03期

3 王淵;;面向用戶的搜索引擎檢索結(jié)果評價[J];河南圖書館學(xué)刊;2007年04期

4 于滿泉;駱衛(wèi)華;許洪波;白碩;;話題識別與跟蹤中的層次化話題識別技術(shù)研究[J];計算機研究與發(fā)展;2006年03期

5 張晨逸;孫建伶;丁軼群;;基于MB-LDA模型的微博主題挖掘[J];計算機研究與發(fā)展;2011年10期

6 程葳;龍志yN;;面向互聯(lián)網(wǎng)新聞的在線話題檢測算法[J];計算機工程;2009年18期

7 劉樹超;李永臣;武洪萍;;Web數(shù)據(jù)挖掘研究與探討[J];制造業(yè)自動化;2010年09期

8 張小豐;;面向Web的數(shù)據(jù)挖掘技術(shù)在網(wǎng)站優(yōu)化中的個性化推薦方法的研究與應(yīng)用[J];制造業(yè)自動化;2012年01期

9 江婕;李建民;曾R挽,

本文編號:2296793


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2296793.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶d9c4f***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com