基于遞歸神經(jīng)網(wǎng)絡(luò)的廣告點擊率預(yù)估
發(fā)布時間:2018-01-19 04:04
本文關(guān)鍵詞: 在線廣告 廣告點擊率 邏輯回歸 隨機森林 遞歸神經(jīng)網(wǎng)絡(luò) LSTM 出處:《浙江理工大學》2016年碩士論文 論文類型:學位論文
【摘要】:伴隨著互聯(lián)網(wǎng)而生的在線廣告,在過去幾十年里,得到飛速發(fā)展。廣告點擊率作為計算廣告的重要研究內(nèi)容,越來越受到人們的關(guān)注。借助機器學習,根據(jù)歷史數(shù)據(jù)預(yù)估廣告點擊率是目前的主要方法,憑借準確的廣告點擊率預(yù)估可以使廣告投放得更加精準,提高真實的點擊率,增加收益。雖然使用線性模型可以簡單地預(yù)估廣告點擊率,但是線性模型的學習能力有限,對于越來越多的數(shù)據(jù)特征,無法更有效得學習,而且在學習的過程中容易出現(xiàn)過度擬合的情況,影響模型對特征的學習。基于神經(jīng)網(wǎng)絡(luò)算法的模型采用非線性激勵函數(shù)以及多層節(jié)點結(jié)構(gòu)可以更好得學習大量非線性特征之間復雜的關(guān)系,從而提高模型的預(yù)估能力。其中,遞歸神經(jīng)網(wǎng)絡(luò)是一種網(wǎng)絡(luò)中存在環(huán)結(jié)構(gòu)、能存儲神經(jīng)元前一時刻的輸出并且具有較強的優(yōu)化計算能力的神經(jīng)網(wǎng)絡(luò)。本文主要工作包括以下三個方面:(1)本文針對不同的模型進行相應(yīng)的特征處理,邏輯回歸模型采用拼接顯性特征組合提取隱藏用戶屬性,再通過哈希映射,將原來不同類型的特征值轉(zhuǎn)換成相同類型的特征值。隨機森林模型采用建立特征字典,過濾頻次過低的樣本數(shù),然后進行one-hot編碼去處理特征;谏窠(jīng)網(wǎng)絡(luò)的模型,本文采用首先計算特征的頻次,并建立特征頻次字典,將字符型特征轉(zhuǎn)變成整型特征,然后將轉(zhuǎn)化后的特征進行離差標準化,使每個特征的特征值范圍在[0,1]之間。(2)遞歸神經(jīng)網(wǎng)絡(luò)雖然已經(jīng)應(yīng)用于廣告點擊率的預(yù)估,但是遞歸神經(jīng)網(wǎng)絡(luò)模型采用梯度下降,在趨近最小值時,可能會出現(xiàn)梯度爆發(fā)或消失,從而影響預(yù)估效果。本文采用基于LSTM(long short term memory)改進的遞歸神經(jīng)網(wǎng)絡(luò)預(yù)估廣告點擊率,利用LSTM去修正RNN,來防止梯度的爆發(fā)或消失。實驗結(jié)果表明基于LSTM改進的遞歸神經(jīng)網(wǎng)絡(luò)模型在預(yù)估廣告點擊率方面取得了較好的效果。(3)本文采用python語言編寫邏輯回歸模型,隨機森林模型、BP(Back Propagation)神經(jīng)網(wǎng)絡(luò)模型、遞歸神經(jīng)網(wǎng)絡(luò)模型和基于LSTM(Area Under roc Curve)改進的遞歸神經(jīng)網(wǎng)絡(luò)模型。并分別采用sigmoid函數(shù)和ReLu函數(shù)來訓練遞歸神經(jīng)網(wǎng)絡(luò),實驗證明ReLu函數(shù)收斂得更快,模型預(yù)估的效果更好。模型評估方法采用logloss方法,與AUC相比logloss更能反映模型預(yù)估廣告點擊率的準確性。
[Abstract]:With the development of the Internet, online advertising has been developed rapidly in the past few decades. As an important research content of computational advertising, ad click rate has been paid more and more attention by people and with the help of machine learning. It is the main method to estimate the ad click rate according to the historical data. With the accurate estimate of the ad click rate, the advertisement can be placed more accurately and the real click rate can be improved. Although the linear model can be used to estimate the click rate of advertising, the learning ability of the linear model is limited, and it is unable to learn more and more effectively for more and more data features. And in the process of learning it is easy to over-fit the situation. The model based on neural network algorithm uses nonlinear excitation function and multi-layer node structure to better learn the complex relationship between a large number of nonlinear features. In order to improve the prediction ability of the model, the recurrent neural network is a ring structure in the network. Neural network which can store the output of the previous time of the neuron and has a strong ability to optimize the computation. The main work of this paper includes the following three aspects: 1) this paper deals with the corresponding characteristics of different models. The logical regression model uses splicing dominant feature combination to extract hidden user attributes and then hash map. The original eigenvalues of different types are converted to the same type of eigenvalues. The stochastic forest model adopts the establishment of feature dictionaries and the number of samples with low filtering frequency. Based on the neural network model, this paper first calculates the frequency of the features, and establishes the feature frequency dictionary to transform the character features into integral features. The converted features are then standardized for deviation, so that the range of eigenvalues for each feature is in the range of. [Although the recursive neural network has been applied to the prediction of ad click rate, the recursive neural network model adopts gradient descent, and the gradient may erupt or disappear when the minimum value is approached. In this paper, an improved recursive neural network based on LSTM(long short term memory is used to estimate the ad click rate. Use LSTM to fix RNN. The experimental results show that the improved recursive neural network model based on LSTM is effective in predicting the click rate of advertisements. This paper uses python language to write the logical regression model. Random forest model (BPP-Back Propagation) neural network model. Recursive neural network model and based on LSTM(Area Under roc current). The improved recursive neural network model and the sigmoid function and ReLu function are used to train the recurrent neural network. Experimental results show that the ReLu function converges faster and the effect of model prediction is better. Logloss method is used to evaluate the model. Compared with AUC, logloss can better reflect the accuracy of the model in predicting ad click rate.
【學位授予單位】:浙江理工大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:F713.8;TP183
【引證文獻】
相關(guān)碩士學位論文 前1條
1 朱靜陽;基于LDBN的心臟病發(fā)病風險模型研究[D];鄭州大學;2017年
,本文編號:1442526
本文鏈接:http://sikaile.net/jingjilunwen/guojimaoyilunwen/1442526.html
最近更新
教材專著