針對非平衡警情數(shù)據(jù)改進的K-Means-Boosting-BP模型
發(fā)布時間:2018-04-10 23:23
本文選題:非平衡數(shù)據(jù) + Synthetic ; 參考:《中國圖象圖形學(xué)報》2017年09期
【摘要】:目的掌握警情的時空分布規(guī)律,通過機器學(xué)習(xí)算法建立警情時空預(yù)測模型,制定科學(xué)的警務(wù)防控方案,有效抑制犯罪的發(fā)生,是犯罪地理研究的重點。已有研究表明,警情時空分布多集中在中心城區(qū)或居民密集區(qū),在時空上屬于非平衡數(shù)據(jù),這種數(shù)據(jù)的非平衡性通常導(dǎo)致在該數(shù)據(jù)上訓(xùn)練的模型成為弱學(xué)習(xí)器,預(yù)測精度較低。為解決這種非平衡數(shù)據(jù)的回歸問題,提出一種基于KMeans均值聚類的Boosting算法。方法該算法以Boosting集成學(xué)習(xí)算法為基礎(chǔ),應(yīng)用GA-BP神經(jīng)網(wǎng)絡(luò)生成基分類器,借助KMeans均值聚類算法進行基分類器的集成,從而實現(xiàn)將弱學(xué)習(xí)器提升為強學(xué)習(xí)器的目標(biāo)。結(jié)果與常用的解決非平衡數(shù)據(jù)回歸問題的Synthetic Minority Oversampling Technique Boosting算法,簡稱SMOTEBoosting算法相比,該算法具有兩方面的優(yōu)勢:1)在降低非平衡數(shù)據(jù)中少數(shù)類均方誤差的同時也降低了數(shù)據(jù)的整體均方誤差,SMOTEBoosting算法的整體均方誤差為2.14E-04,KMeans-Boosting算法的整體均方誤差達到9.85E-05;2)更好地平衡了少數(shù)類樣本識別的準(zhǔn)確率和召回率,KMeans-Boosting算法的召回率約等于52%,SMOTEBoosting算法的召回率約等于91%;但KMeans-Boosting算法的準(zhǔn)確率等于85%,遠(yuǎn)高于SMOTEBoosting算法的19%。結(jié)論 KMeans-Boosting算法能夠顯著的降低非平衡數(shù)據(jù)的整體均方誤差,提高少數(shù)類樣本識別的準(zhǔn)確率和召回率,是一種有效地解決非平衡數(shù)據(jù)回歸問題和分類問題的算法,可以推廣至其他需要處理非平衡數(shù)據(jù)的領(lǐng)域中。
[Abstract]:Objective to master the temporal and spatial distribution of police information, to establish a spatio-temporal prediction model of police information through machine learning algorithm, to formulate a scientific police prevention and control scheme, and to effectively suppress the occurrence of crime, which is the focus of crime geography research.It has been shown that the temporal and spatial distribution of police information is mostly concentrated in the central urban area or densely populated area, and belongs to the non-equilibrium data in time and space. The non-equilibrium of this kind of data usually leads to the model trained on the data become a weak learner, and the prediction accuracy is low.In order to solve the regression problem of unbalanced data, a Boosting algorithm based on KMeans mean clustering is proposed.Methods based on Boosting ensemble learning algorithm, GA-BP neural network is used to generate base classifier, and KMeans mean clustering algorithm is used to realize the ensemble of base classifier.Results compared with the commonly used Synthetic Minority Oversampling Technique Boosting (SMOTEBoosting) algorithm for solving the non-equilibrium data regression problem,The accuracy of recognition of a few samples is balanced with the recall rate of KMeans-Boosting algorithm. The recall rate of KMeans-Boosting algorithm is about 522. The recall rate of SMOTEBoosting algorithm is about 91, but the accuracy rate of KMeans-Boosting algorithm is 85, which is much higher than that of SMOTEBoosting algorithm.Conclusion KMeans-Boosting algorithm can significantly reduce the global mean square error of non-equilibrium data and improve the accuracy and recall rate of a small number of samples. It is an effective algorithm to solve the problem of regression and classification of unbalanced data.It can be extended to other areas that need to deal with unbalanced data.
【作者單位】: 華南師范大學(xué);廣東精一規(guī)劃信息科技股份有限公司;
【基金】:公安部科技強警基礎(chǔ)工作專項項目(2016GABJC47)
【分類號】:D035.3;TP311.13
,
本文編號:1733403
本文鏈接:http://sikaile.net/falvlunwen/fanzuizhian/1733403.html
教材專著