天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于多特征信息融合的WEB廣告聚類方法研究

發(fā)布時間:2018-04-20 14:51

  本文選題:Web廣告 + 多特征 ; 參考:《哈爾濱工業(yè)大學》2014年碩士論文


【摘要】:伴隨著互聯(lián)網(wǎng)的快速發(fā)展,Web廣告已成為網(wǎng)絡服務提供商盈利的重要途徑,同時也是許多傳統(tǒng)行業(yè)宣傳自身品牌和產(chǎn)品的有效渠道。海量Web廣告數(shù)據(jù)中隱藏著高價值的信息和知識,因此如何對其進行有效的挖掘已經(jīng)成為許多互聯(lián)網(wǎng)應用的關(guān)鍵問題。在Web廣告數(shù)據(jù)挖掘中,聚類分析是一項重要的基礎技術(shù),不僅可以用于分析競爭對手,并且能夠輔助政府以及評估機構(gòu)對經(jīng)濟發(fā)展狀況進行評估和預測。Web廣告數(shù)據(jù)包含多種特征,但每一種特征都不能全面描述Web廣告對象。通過多種特征的融合,將能夠全面描述Web廣告對象。因此,本課題研究了基于多特征信息融合的Web廣告聚類方法。主要完成以下工作:(1)分析Web廣告特點,搜集、構(gòu)建相關(guān)數(shù)據(jù)集。研究了面向Web廣告數(shù)據(jù)的特征提取方法,實現(xiàn)了一種基于模糊匹配的文本特征提取方法和四種圖像特征提取方法;(2)Web廣告數(shù)據(jù)的特征空間有高維稀疏的特點,而決定兩個簇分離的往往是極少數(shù)特征。為了區(qū)分這些極少數(shù)特征的重要程度,本文在EW-kmeans的基礎上改進了目標函數(shù),綜合考慮了簇間距離和簇內(nèi)距離對聚類效果的影響,提出了基于鑒別子空間的三階張量加權(quán)k-means方法(Dkmeans),并給出相關(guān)理論證明。實驗結(jié)果表明:與最新的相關(guān)聚類算法相比,Dkmeans算法在6個公開數(shù)據(jù)集上均取得了更好的聚類效果;(3)針對Web廣告中的不同特征,搭配不同組合進行融合實驗。通過實驗,發(fā)現(xiàn)不同組合的特征融合,對Web廣告聚類效果均有不同程度的提高。其中,組合全部特征融合,可以得到最好的聚類效果,從而驗證了多特征融合可以提高Web廣告的聚類效果。
[Abstract]:With the rapid development of the Internet, Web advertising has become an important way for Internet service providers to make profits, and it is also an effective channel for many traditional industries to propagate their own brands and products. Huge amount of Web advertising data hides high value information and knowledge, so how to mine it effectively has become the key problem of many Internet applications. In Web advertising data mining, clustering analysis is an important basic technology, not only can be used to analyze competitors, And it can assist the government and evaluation agencies to evaluate and predict the economic development. The web advertising packet contains many features, but each feature can not fully describe the object of Web advertising. Through the fusion of various features, it will be able to describe the Web advertising object in a comprehensive way. Therefore, this paper studies the Web advertising clustering method based on multi-feature information fusion. Analyze the characteristics of Web advertising, collect and construct related data sets. In this paper, the feature extraction method for Web advertising data is studied, and a text feature extraction method based on fuzzy matching and four image feature extraction methods are implemented. The separation of the two clusters is often determined by a very small number of features. In order to distinguish the importance of these few features, the objective function is improved on the basis of EW-kmeans, and the influence of the distance between clusters and within clusters on the clustering effect is considered synthetically. In this paper, a third order Zhang Liang weighted k-means method based on discriminant subspace is proposed and the relevant theoretical proof is given. The experimental results show that compared with the latest correlation clustering algorithm, the DK means algorithm achieves a better clustering effect on 6 open datasets. Through experiments, it is found that the feature fusion of different combinations can improve the clustering effect of Web advertising to varying degrees. Among them, the best clustering effect can be obtained by combining all features fusion, which verifies that multi-feature fusion can improve the clustering effect of Web advertising.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.09;TP391.41;TP391.1

【參考文獻】

相關(guān)期刊論文 前4條

1 韓普;王東波;劉艷云;蘇新寧;;詞性對中英文文本聚類的影響研究[J];中文信息學報;2013年02期

2 高燕;;關(guān)鍵詞自動標引方法綜述[J];電子世界;2012年06期

3 周楊;苗奪謙;岳曉冬;;基于自適應權(quán)重的粗糙K均值聚類算法[J];計算機科學;2011年06期

4 奉國和;鄭偉;;國內(nèi)中文自動分詞技術(shù)研究綜述[J];圖書情報工作;2011年02期

相關(guān)碩士學位論文 前1條

1 樓佳;中文文本聚類的評價與改進研究[D];杭州電子科技大學;2009年

,

本文編號:1778216

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1778216.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶3b288***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com