天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 搜索引擎論文 >

連續(xù)隱馬爾科夫模型在點擊欺詐識別中的應用研究

發(fā)布時間:2018-05-31 23:01

  本文選題:點擊欺詐 + 連續(xù)隱馬爾科夫模型 ; 參考:《上海交通大學》2013年碩士論文


【摘要】:隨著搜索引擎關鍵詞廣告營銷模式的蓬勃發(fā)展,欺詐點擊行為已經(jīng)成為困擾廣告商和搜索引擎公司的一大難題。對于點擊欺詐識別與防治問題的研究也成為國內外學者們關注的焦點。本文分析了搜索引擎在線關鍵詞廣告的點擊欺詐(click fraud)問題及其行為特征。鑒于關鍵詞廣告對應的點擊行為模式較為符合隱馬爾科夫模型(HMM)的基本前提假設,,本文試圖把HMM模型的理論框架應用于點擊欺詐識別。 本文的工作主要有: (1)HMM只是一個理論框架模型。本文對關鍵詞點擊的行為模式進行了分析,搭建了針對搜索引擎關鍵詞廣告的連續(xù)隱馬氏模型(CHMM),并創(chuàng)立了欺詐點擊行為的識別方法; (2)根據(jù)觀測數(shù)據(jù),訓練得到CHMM模型(參數(shù)估計),并對該模型的識別效果進行了驗證。統(tǒng)計結果表明:CHMM模型對點擊欺詐的識別有較高的準確率; (3)討論了模型中的參數(shù):隱狀態(tài)數(shù)N、序列的長度R、以及閾值大小,選取不同值的情況下,模型的識別準確度。以確定最佳的隱狀態(tài)數(shù)(固定值)和閾值等參數(shù)。 (4)由于時間段、突發(fā)事件等因素影響,可能導致某一在線關鍵詞廣告的點擊率明顯提升,但是這并不是欺詐點擊造成的。本文采用動態(tài)的CHMM模型,不斷更新用于訓練的時間序列數(shù)據(jù),以產生新的參數(shù),可以很好的降低這類因素對模型識別準確度的影響。 (5)隱馬爾科夫模型(HMM)的參數(shù)估計是其應用于識別問題時能否達到較高的準確率的關鍵。傳統(tǒng)的Baum-Welch算法有諸多缺陷,基于SegmentalK-Means(SKM)的訓練算法,與Baum-Welch算法相比,不僅可以降低運算的復雜度,收斂速度也較快,而且該算法更側重于對模型的輸出模式進行自動分類識別。因此,對點擊欺詐識別問題,SKM算法更有針對性,適用性更強。實證分析也表明,SKM訓練算法對于點擊欺詐的識別效果更好。此外,本文初步探討了基于MCMC的Gibbs抽樣法的HMM參數(shù)估計方法。
[Abstract]:With the vigorous development of search engine keyword advertising marketing mode, fraudulent click behavior has become a major problem for advertisers and search engine companies. The research on click fraud identification and prevention has also become the focus of scholars at home and abroad. This paper analyzes the click Fraud-click problem of online keyword advertising in search engines and its behavioral characteristics. In view of the fact that the click behavior pattern corresponding to the advertisement corresponds to the basic premise hypothesis of Hidden Markov Model (hmm), this paper attempts to apply the theoretical framework of HMM model to click fraud identification. The main work of this paper is as follows: The hmm is only a theoretical framework model. In this paper, the behavior pattern of keyword click is analyzed, the continuous hidden Markov model for keyword advertisement is built, and the identification method of fraudulent click behavior is established. (2) according to the observed data, the CHMM model (parameter estimation) is obtained, and the recognition effect of the model is verified. The statistical results show that the 1: CHMM model has a high accuracy in the recognition of click fraud. (3) the parameters of the model are discussed: the number of hidden states N, the length of the sequence R, and the threshold value. The recognition accuracy of the model is obtained by selecting different values. In order to determine the best number of hidden states (fixed value) and threshold and other parameters. Due to the influence of time period, unexpected events and other factors, the click rate of an online keyword advertisement may increase obviously, but this is not caused by fraudulent click. In this paper, the dynamic CHMM model is used to continuously update the time series data used for training to produce new parameters, which can reduce the influence of these factors on the accuracy of model recognition. The parameter estimation of hidden Markov model (HMMM) is the key to the accuracy of HMMM when it is applied to the problem of recognition. The traditional Baum-Welch algorithm has many defects. Compared with the Segmental K-Means-SKM (Segmental K-Means-SKM) algorithm, the algorithm can not only reduce the computational complexity and the convergence speed, but also focus on the automatic classification and recognition of the output pattern of the model. Therefore, the SKM algorithm is more specific and applicable to the problem of click fraud identification. Empirical analysis also shows that SKM training algorithm is more effective in the recognition of click fraud. In addition, this paper preliminarily discusses the HMM parameter estimation method based on Gibbs sampling method based on MCMC.
【學位授予單位】:上海交通大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP391.3

【參考文獻】

相關期刊論文 前7條

1 李葦營,易克初,胡征;神經(jīng)網(wǎng)絡與HMM構成的混合網(wǎng)絡在語音識別中應用的研究[J];電子學報;1994年10期

2 袁健;張勁松;馬良;;一種有效預防點擊欺詐的策略[J];計算機應用;2009年07期

3 張祖蓮;卡米力·木衣丁;王命全;;一種有效預防點擊欺詐的算法[J];計算機應用;2010年07期

4 龔尚福;姜曉旭;;基于用戶行為分析的廣告欺詐點擊檢測[J];計算機應用與軟件;2011年04期

5 高志堅;;引入第三方監(jiān)測根治點擊欺詐[J];生產力研究;2007年18期

6 歐海鷹;呂廷杰;;在線關鍵詞廣告研究綜述:新的研究方向[J];管理評論;2011年04期

7 黃曉彬;王春峰;房振明;熊春連;;基于隱馬爾科夫模型的中國股票信息探測[J];系統(tǒng)工程理論與實踐;2012年04期

相關碩士學位論文 前4條

1 張喜良;拓展的隱馬氏模型和基于遺傳算法的參數(shù)估計方法[D];國防科學技術大學;2010年

2 張靜亞;基于HMM的漢語連續(xù)數(shù)字語音識別[D];蘇州大學;2005年

3 吳yN;在線廣告點擊欺騙的檢測和應用[D];上海交通大學;2006年

4 舒正勇;商業(yè)搜索引擎的點擊欺詐問題研究[D];遼寧師范大學;2008年



本文編號:1961677

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1961677.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權申明:資料由用戶493a4***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com