基于IPTV機頂盒KPI數(shù)據(jù)的用戶報障預(yù)測系統(tǒng)
發(fā)布時間:2018-06-24 00:40
本文選題:IPTV + 非均衡數(shù)據(jù)集 ; 參考:《南京郵電大學(xué)》2017年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)技術(shù)的飛速發(fā)展,IPTV變得越來越重要,已經(jīng)成為我們平常生活中不可缺少的一部分。IPTV用戶希望獲得更好的服務(wù)體驗。為了更好地服務(wù)IPTV用戶以及充分利用數(shù)據(jù),運營商希望利用IPTV機頂盒KPI數(shù)據(jù)建立用戶報障預(yù)測模型,通過對IPTV機頂盒KPI數(shù)據(jù)的分析,預(yù)測出即將報障的用戶。運營商可以與這些即將報障的用戶進行溝通,以便及時發(fā)現(xiàn)問題并且解決問題,這樣就能夠提高運營商與用戶之間的黏度。本論文從數(shù)據(jù)分析,建模預(yù)測與系統(tǒng)部署三個部分,展開了一系列的研究。本論文首先對IPTV機頂盒KPI數(shù)據(jù)進行數(shù)據(jù)清洗和相關(guān)性分析等數(shù)據(jù)預(yù)處理工作,得到適合建模的數(shù)據(jù)集。由于數(shù)據(jù)集為非均衡數(shù)據(jù)集,在建立用戶報障預(yù)測模型時,本論文分別從算法層面和數(shù)據(jù)層面建立模型。在算法層面建立模型時,本論文基于傳統(tǒng)決策樹算法提出了無偏決策樹算法。無偏決策樹算法改進了傳統(tǒng)決策樹算法的特征選擇準(zhǔn)則和葉節(jié)點判定準(zhǔn)則,可以直接處理非均衡數(shù)據(jù)集。在數(shù)據(jù)層面建立模型時,本論文基于傳統(tǒng)的過采樣算法提出了新的過采樣算法,基于平均距離的自適應(yīng)合成過采樣方法。新的過采樣算法利用少數(shù)類樣本點與它周圍多數(shù)類樣本點之間的平均距離作為一維參數(shù)來自適應(yīng)地生成人造樣本點。在數(shù)據(jù)集均衡后,使用隨機森林算法行進建模。實驗表明,數(shù)據(jù)層面的用戶報障模型性能較好。在系統(tǒng)部署時,本論文采用Spark on YARN的部署模式處理數(shù)據(jù)建立模型,最后以頁面可視化的方式呈現(xiàn)給IPTV運維人員。
[Abstract]:With the rapid development of Internet technology IPTV has become more and more important and has become an indispensable part of our daily life. IPTV users want to get a better service experience. In order to better serve IPTV users and make full use of data, operators hope to use IPTV set-top box KPI data to set up user barrier prediction model. Through the analysis of IPTV set-top box KPI data, the users about to report obstacles are predicted. Operators can communicate with those users who are going to be in trouble so that problems can be detected and solved in time so that the viscosity between operators and customers can be improved. In this paper, a series of research is carried out from three parts: data analysis, modeling prediction and system deployment. Firstly, the IPTV set-top box (IPTV) KPI data is preprocessed by data cleaning and correlation analysis, and a data set suitable for modeling is obtained. Because the data set is a non-equilibrium data set, this paper builds the model from the algorithm level and the data level respectively when establishing the user barrier prediction model. This paper presents an unbiased decision tree algorithm based on the traditional decision tree algorithm. The unbiased decision tree algorithm improves the feature selection criterion and the leaf node decision criterion of the traditional decision tree algorithm and can deal with the unbalanced data sets directly. In this paper, a new over-sampling algorithm based on traditional over-sampling algorithm and an adaptive composite oversampling method based on average distance are proposed when modeling at the data level. The new oversampling algorithm adaptively generates artificial sample points by using the average distance between a few sample points and most of the sample points around them as one-dimensional parameters. After the data set is equalized, the stochastic forest algorithm is used to model the model. The experimental results show that the performance of the user barrier model on data level is better. When the system is deployed, the deployment mode of Spark on YARN is used to process the data to build the model, and the model is presented to IPTV operators in the form of page visualization.
【學(xué)位授予單位】:南京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TN949.292
【參考文獻】
相關(guān)期刊論文 前1條
1 張世強;呂杰能;蔣崢;張雷;;關(guān)于相關(guān)系數(shù)的探討[J];數(shù)學(xué)的實踐與認識;2009年19期
,本文編號:2059178
本文鏈接:http://sikaile.net/kejilunwen/xinxigongchenglunwen/2059178.html
最近更新
教材專著