基于密度偏倚抽樣的局部距離異常檢測(cè)方法
發(fā)布時(shí)間:2018-08-03 17:12
【摘要】:異常檢測(cè)是數(shù)據(jù)挖掘的重要研究領(lǐng)域,當(dāng)前基于距離或者最近鄰概念的異常數(shù)據(jù)檢測(cè)方法,在進(jìn)行海量高維數(shù)據(jù)異常檢測(cè)時(shí),存在運(yùn)算時(shí)間過(guò)長(zhǎng)的問題.許多改進(jìn)的異常檢測(cè)方法雖然提高了算法運(yùn)算效率,然而檢測(cè)效果欠佳.基于此,提出一種基于密度偏倚抽樣的局部距離異常檢測(cè)算法,首先利用基于密度偏倚的概率抽樣方法對(duì)所需檢測(cè)的數(shù)據(jù)集合進(jìn)行概率抽樣,之后對(duì)抽樣數(shù)據(jù)利用基于局部距離的局部異常檢測(cè)方法,對(duì)抽樣集合進(jìn)行局部異常系數(shù)計(jì)算,得到的異常系數(shù)既是抽樣數(shù)據(jù)的局部異常系數(shù),又是數(shù)據(jù)集的近似全局異常系數(shù).然后對(duì)得到的每個(gè)數(shù)據(jù)點(diǎn)的局部異常系數(shù)進(jìn)行排序,異常系數(shù)值越大的數(shù)據(jù)點(diǎn)越可能是異常點(diǎn).實(shí)驗(yàn)結(jié)果表明,與已有的算法相比,該算法具有更高的檢測(cè)精確度和更少的運(yùn)算時(shí)間,并且該算法對(duì)各種維度和數(shù)據(jù)規(guī)模的數(shù)據(jù)都具有很好的檢測(cè)效果,可擴(kuò)展性強(qiáng).
[Abstract]:Anomaly detection is an important research field in data mining. The current anomaly detection method based on distance or nearest neighbor concept has the problem of long operation time in detecting large amounts of high-dimensional data. Many improved anomaly detection methods improve the computational efficiency of the algorithm, but the detection effect is not good. Based on this, a local distance anomaly detection algorithm based on density bias sampling is proposed. Firstly, the probability sampling method based on density bias is used to sample the data set. Then the local anomaly coefficient of the sample set is calculated by using the local anomaly detection method based on the local distance. The obtained anomaly coefficient is not only the local abnormal coefficient of the sample data but also the approximate global anomaly coefficient of the data set. Then the local outlier coefficients of each data point are sorted. The more outlier the data point is, the more likely the outlier point is. The experimental results show that the algorithm has higher detection accuracy and less computation time than the existing algorithms, and the algorithm has good detection effect and scalability for data of various dimensions and data scales.
【作者單位】: 中國(guó)科學(xué)院大學(xué);天基綜合信息系統(tǒng)重點(diǎn)實(shí)驗(yàn)室(中國(guó)科學(xué)院軟件研究所);
【基金】:國(guó)家自然科學(xué)基金(U1435220) 國(guó)家高技術(shù)研究發(fā)展計(jì)劃(863)(2012AA011206)~~
【分類號(hào)】:TP311.13
,
本文編號(hào):2162421
[Abstract]:Anomaly detection is an important research field in data mining. The current anomaly detection method based on distance or nearest neighbor concept has the problem of long operation time in detecting large amounts of high-dimensional data. Many improved anomaly detection methods improve the computational efficiency of the algorithm, but the detection effect is not good. Based on this, a local distance anomaly detection algorithm based on density bias sampling is proposed. Firstly, the probability sampling method based on density bias is used to sample the data set. Then the local anomaly coefficient of the sample set is calculated by using the local anomaly detection method based on the local distance. The obtained anomaly coefficient is not only the local abnormal coefficient of the sample data but also the approximate global anomaly coefficient of the data set. Then the local outlier coefficients of each data point are sorted. The more outlier the data point is, the more likely the outlier point is. The experimental results show that the algorithm has higher detection accuracy and less computation time than the existing algorithms, and the algorithm has good detection effect and scalability for data of various dimensions and data scales.
【作者單位】: 中國(guó)科學(xué)院大學(xué);天基綜合信息系統(tǒng)重點(diǎn)實(shí)驗(yàn)室(中國(guó)科學(xué)院軟件研究所);
【基金】:國(guó)家自然科學(xué)基金(U1435220) 國(guó)家高技術(shù)研究發(fā)展計(jì)劃(863)(2012AA011206)~~
【分類號(hào)】:TP311.13
,
本文編號(hào):2162421
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2162421.html
最近更新
教材專著