銀行電話營銷成功之路的分析與預(yù)測
本文選題:銀行存款 + 電話銷售; 參考:《華中師范大學》2017年碩士論文
【摘要】:在通信業(yè)發(fā)達的今天,電話營銷的現(xiàn)象早已出現(xiàn)在大街小巷,然而人們對電話營銷的接受程度卻越來越低,營銷結(jié)果往往使得營銷人員精疲力竭。而本文的研究結(jié)果對于商業(yè)銀行的客戶管理,發(fā)掘有價值客戶,維護客戶的忠誠度有重要的理論價值和現(xiàn)實意義。當前隨著大數(shù)據(jù)的突起,使用用數(shù)據(jù)挖掘技術(shù)施行精準營銷的領(lǐng)域也越來越多,本文中就提出了利用數(shù)據(jù)挖掘的方式,以預(yù)測出經(jīng)過電話營銷銷售銀行長期存款的結(jié)果,文中收集了國外41188條的銀行電話營銷數(shù)據(jù),分析了與銀行客戶、產(chǎn)品和社會經(jīng)濟屬性相關(guān)的150個特征變量,然后通過人為的半自動化選擇縮減到21個變量。由于得到的數(shù)據(jù)集是非平衡數(shù)據(jù),只有11.3%條數(shù)據(jù)是電話銷售成功的記錄,為了明確非平衡數(shù)據(jù)集對模型的影響,在對缺失值預(yù)處理之后采用了 Chawla提出的SMOTE算法生成了新的平衡數(shù)據(jù)集,之后比較了利用平衡數(shù)據(jù)集和非平衡數(shù)據(jù)集訓練模型的效果,發(fā)現(xiàn)非平衡數(shù)據(jù)集得到的模型預(yù)測的結(jié)果更加偏向于樣本中多數(shù)的那一類,因此本文使用了平衡數(shù)據(jù)集進行模型的訓練與評估。本文考慮了三個分類模型:Logistic回歸模型、決策樹和支持向量機,并使用精準度和ROC曲線下AUC的值衡量了分類的效果。其中Logistic回歸分類法和決策樹擬合模型的解釋很容易被人們理解,而且對新的數(shù)據(jù)還有較好的預(yù)測,而支持向量機模型相比較而言則比較復雜,但對線性問題和非線性問題都有較好的學習能力,正是由于這樣的復雜性,支持向量機往往能夠提供精確的預(yù)測,文中經(jīng)過訓練對比確定各模型的參數(shù)或結(jié)構(gòu)后,利用測試集數(shù)據(jù)測得三個模型的精準度分別為47.3%、73.1%和 52.6%,ROC 曲線下 AUC 的值分別為 0.921、0.985 和 0.938。在營銷領(lǐng)域,管理者更加希望通過識別具有較高價值的客戶,盡量避免在一些低價值的客戶身上浪費資源,以此提高投入產(chǎn)出比,那么就希望預(yù)測的結(jié)果更加準確,而本文中AUC的值相差不大,根據(jù)精準度最高的原則,選擇決策樹C5.0分類算法進行預(yù)測。
[Abstract]:In today's developed communications industry, the phenomenon of telephone marketing has already appeared in the streets and lanes. However, the acceptance of telephone marketing is getting lower and lower. The marketing results often make the marketing staff exhausted. The results of this paper are important to the customer management of commercial banks, the valuable customers, and the loyalty of the customers. With the emergence of large data, there are more and more fields of using data mining technology to carry out accurate marketing. In this paper, we put forward the method of using data mining to predict the result of long term deposit through the telemarketing and marketing bank. In this paper, 41188 foreign bank telephone marketing numbers are collected in this paper. According to the analysis, 150 characteristic variables related to bank customers, products and socioeconomic attributes are analyzed and then reduced to 21 variables by human semi automated selection. Since the obtained data sets are non balanced data, only 11.3% data are the records of successful telephone sales. In order to determine the impact of the non balanced dataset on the model, the missing data are missing. After the value preprocessing, the SMOTE algorithm proposed by Chawla is used to generate a new balanced data set. After comparing the effect of using the balanced data set and non balanced data set training model, it is found that the model prediction results from the non balanced dataset are more biased toward the majority of the samples in the sample. Therefore, this paper uses a balanced dataset. This paper considers three classification models: Logistic regression model, decision tree and support vector machine, and uses the value of precision and the value of AUC under the ROC curve to measure the classification effect. The interpretation of the Logistic regression and the decision tree fitting model is easy to be understood by people, and the new data are also better. The support vector machine model is more complex, but it has better learning ability for both linear and nonlinear problems. It is because of the complexity that the support vector machine can often provide accurate prediction. After the training comparison is used to determine the parameters or structures of each model, the test set data is used to measure the data. The accuracy of the three models is 47.3%, 73.1% and 52.6% respectively. The value of AUC under the ROC curve is 0.921,0.985 and 0.938. in the marketing field, and the managers are more hoping to avoid wasting resources on some low value customers by identifying the customers with higher value, so that the input and output ratio can be improved. Then it is hoped to predict the input and output ratio. The result is more accurate, and the value of AUC in this paper is not very different. According to the principle of the highest accuracy, the decision tree C5.0 classification algorithm is selected to predict.
【學位授予單位】:華中師范大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:F274
【參考文獻】
相關(guān)期刊論文 前10條
1 肖亞明;陳永杰;王玉鵬;劉美娜;;分類變量缺失數(shù)據(jù)處理方法有效性的比較研究[J];中國衛(wèi)生統(tǒng)計;2016年02期
2 肖超峰;郭浩明;;基于Logistic回歸方法的信用風險預(yù)測研究[J];電子技術(shù);2013年09期
3 馬莉婷;;數(shù)據(jù)挖掘技術(shù)在客戶精細營銷預(yù)測模型中的應(yīng)用——以移動通信業(yè)務(wù)為例[J];閩江學院學報;2013年05期
4 朱明;陶新民;;基于隨機下采樣和SMOTE的不均衡SVM分類算法[J];信息技術(shù);2012年01期
5 宋建華;;商業(yè)銀行電話營銷研究[J];金融論壇;2011年10期
6 葉軍;;電話營銷應(yīng)講究技巧[J];現(xiàn)代金融;2011年07期
7 王觀玉;郭勇;;支持向量機在電信客戶流失預(yù)測中的應(yīng)用研究[J];計算機仿真;2011年04期
8 郭靜;王永釗;;我國電話營銷現(xiàn)狀問題與對策[J];產(chǎn)業(yè)與科技論壇;2011年05期
9 丁世飛;齊丙娟;譚紅艷;;支持向量機理論與算法研究綜述[J];電子科技大學學報;2011年01期
10 柯新利;邊馥苓;;基于C5.0決策樹算法的元胞自動機土地利用變化模擬模型[J];長江流域資源與環(huán)境;2010年04期
相關(guān)碩士學位論文 前2條
1 黃華;基于神經(jīng)網(wǎng)絡(luò)模型的銀行客戶分類研究[D];安徽工業(yè)大學;2014年
2 肖春蘭;電話營銷在企業(yè)中的應(yīng)用現(xiàn)狀及改進路徑[D];陜西師范大學;2013年
,本文編號:2042140
本文鏈接:http://sikaile.net/jingjilunwen/xmjj/2042140.html