天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 碩博論文 > 信息類博士論文 >

面向網(wǎng)絡(luò)欺詐行為發(fā)現(xiàn)的不確定數(shù)據(jù)離群點(diǎn)檢測(cè)算法研究

發(fā)布時(shí)間:2019-05-27 00:03
【摘要】:隨著互聯(lián)網(wǎng)的飛速發(fā)展,人們的日常生活變得與網(wǎng)絡(luò)密不可分。與此同時(shí),頻頻出現(xiàn)的網(wǎng)絡(luò)欺詐行為成為影響人們正常網(wǎng)絡(luò)生活的重要因素。離群點(diǎn)檢測(cè)技術(shù)是一種重要的數(shù)據(jù)挖掘技術(shù),也是異常檢測(cè)的重要手段,而基于距離的離群點(diǎn)檢測(cè)是目前最常用離群點(diǎn)檢測(cè)技術(shù)之一。本文對(duì)面向網(wǎng)絡(luò)欺詐行為發(fā)現(xiàn)的不確定數(shù)據(jù)離群點(diǎn)檢測(cè)算法展開研究。網(wǎng)絡(luò)欺詐行為多發(fā)生在網(wǎng)絡(luò)交易過(guò)程中并伴隨著異常的網(wǎng)絡(luò)交易行為。本文將每個(gè)用戶的網(wǎng)絡(luò)交易行為看做一個(gè)數(shù)據(jù)對(duì)象,將其映射到一個(gè)多維空間之中,網(wǎng)絡(luò)交易行為的每個(gè)屬性分別作為該空間的一個(gè)維度。一次異常的網(wǎng)絡(luò)交易行為往往體現(xiàn)為偏離大多數(shù)數(shù)據(jù)對(duì)象的少數(shù)數(shù)據(jù),對(duì)這些數(shù)據(jù)的檢測(cè)即為該多維空間中的離群點(diǎn)檢測(cè)。于此同時(shí),由于數(shù)據(jù)不完整、噪聲干擾、操作失誤等原因,網(wǎng)絡(luò)交易行為數(shù)據(jù)往往存在不確定性。本文對(duì)不確定數(shù)據(jù)集上基于距離的離群點(diǎn)檢測(cè)算法展開研究,旨在高效、合理地檢測(cè)出不確定離群點(diǎn),為異常網(wǎng)絡(luò)交易和網(wǎng)絡(luò)欺詐行為發(fā)現(xiàn)提供幫助。本文首先使用x-tuple模型和可能世界語(yǔ)義模型對(duì)不確定數(shù)據(jù)集進(jìn)行描述。每一個(gè)不確定數(shù)據(jù)對(duì)象表示為一個(gè)x-tuple,它的每一個(gè)可能出現(xiàn)的數(shù)據(jù)實(shí)例表示為一個(gè)tuple,來(lái)自不同x-tuple的若干tuple構(gòu)成一個(gè)可能世界。一個(gè)可能世界是不確定數(shù)據(jù)集的一個(gè)實(shí)例。隨后本文將不確定數(shù)據(jù)集上的離群點(diǎn)檢測(cè)看做一個(gè)查詢過(guò)程,針對(duì)不同的數(shù)據(jù)特征分別提出了不確定數(shù)據(jù)集上的期望離群點(diǎn)檢測(cè)、半期望離群點(diǎn)檢測(cè)、全概率離群點(diǎn)檢測(cè)和相對(duì)離群點(diǎn)檢測(cè)四種全新的概念。期望離群點(diǎn)檢測(cè)是其中最簡(jiǎn)單的不確定數(shù)據(jù)集上離群點(diǎn)檢測(cè)概念,它為每一個(gè)tuple和每一個(gè)x-tuple計(jì)算一個(gè)期望離群度,從整個(gè)數(shù)據(jù)集上查詢得到期望離群度最高的K個(gè)x-tuple。半期望離群點(diǎn)檢測(cè)是對(duì)期望離群點(diǎn)檢測(cè)的改進(jìn),它解決了后者容易受到數(shù)據(jù)不完整性影響的問(wèn)題。該檢測(cè)方法只計(jì)算每個(gè)tuple的期望離群度而不再計(jì)算各個(gè)x-tuple的期望離群度,所以稱之為半期望離群度。相對(duì)離群點(diǎn)檢測(cè)解決了前面兩種離群點(diǎn)檢測(cè)概念容易受到陣發(fā)性數(shù)據(jù)和噪聲影響的問(wèn)題。它不再計(jì)算各個(gè)tuple和x-tuple的期望離群度,而是通過(guò)各個(gè)x-tuple兩兩比較找出最可能成為離群點(diǎn)的K個(gè)x-tuple。該方法還避免了一些參數(shù)閾值的確定,降低了離群點(diǎn)檢測(cè)應(yīng)用的門檻,特別適合不是特定應(yīng)用領(lǐng)域?qū)<业钠胀ㄓ脩羰褂。本文最后提出了全概率離群點(diǎn)檢測(cè)的概念。它借鑒不確定數(shù)據(jù)集上全局top-K查詢的思想,計(jì)算各個(gè)x-tuple在任意可能世界中成為top-k1離群點(diǎn)的概率,概率最高的k2個(gè)x-tuple即為不確定數(shù)據(jù)集上的離群點(diǎn)。本文形式化地給出了上述四種不確定數(shù)據(jù)離群點(diǎn)的定義,提出了算法框架,在此基礎(chǔ)上設(shè)計(jì)了剪枝優(yōu)化策略并形成了高效的優(yōu)化算法,最后通過(guò)在真實(shí)數(shù)據(jù)集和人工數(shù)據(jù)集上的實(shí)驗(yàn)對(duì)算法精度、效率、剪枝優(yōu)化策略的有效性和算法可擴(kuò)展性等進(jìn)行了驗(yàn)證。已有的不確定數(shù)據(jù)集上基于距離的離群點(diǎn)檢測(cè)研究往往存在不足,一是假設(shè)不確定數(shù)據(jù)數(shù)據(jù)服從某個(gè)已知的分布,特別是正態(tài)分布等存在概率密度函數(shù)的解析表達(dá)式的分布。但這在實(shí)際應(yīng)用中往往難以實(shí)現(xiàn),這限制了相關(guān)研究的應(yīng)用。二是有些研究雖然同樣采用了x-tuple模型和可能世界語(yǔ)義描述不確定數(shù)據(jù)集,但他們忽略了數(shù)據(jù)多樣性,一個(gè)不確定數(shù)據(jù)并沒(méi)有體現(xiàn)為多個(gè)可能出現(xiàn)的實(shí)例。本文提出的新的不確定數(shù)據(jù)離群點(diǎn)檢測(cè)概念能夠適用于各種概率分布環(huán)境,同時(shí)考慮了數(shù)據(jù)不完整性和多樣性,能夠高效、合理地實(shí)現(xiàn)離群點(diǎn)檢測(cè)。
[Abstract]:With the rapid development of the Internet, people's daily life becomes inseparable from the network. At the same time, frequent network fraud has become an important factor that affects people's normal network life. Outlier detection is an important data mining technique, and is an important means of anomaly detection, and the detection of outliers based on distance is one of the most common outlier detection techniques. This paper studies the non-deterministic data outliers detection algorithm, which is found in the network-oriented fraud behavior. Network fraud often occurs in the course of the network transaction and is accompanied by the abnormal network transaction behavior. In this paper, the network transaction behavior of each user is considered as a data object, which is mapped into a multi-dimensional space, and each attribute of the network transaction behavior is used as one dimension of the space respectively. An abnormal network transaction behavior is often embodied as a few data from most of the data objects, and the detection of these data is an outlier detection in the multi-dimensional space. At the same time, the network transaction behavior data is often uncertain due to incomplete data, noise interference, operation error and the like. This paper, based on the distance-based outlier detection algorithm for uncertain data sets, is designed to efficiently and reasonably detect out-of-the-point outliers, and provide help for the discovery of abnormal network transactions and network fraud. This paper first uses the x-tuple model and the possible world semantic model to describe the uncertain data set. Each indeterminate data object is represented as an x-tuple, each possible data instance of which is represented as a tuple, and a number of tuple from different x-tuple form a possible world. One possible world is an example of an uncertain data set. In this paper, the outlier detection on the data set is not determined as a query process, and four new concepts, such as the desired outlier detection, the semi-expected outlier detection, the full-probability outlier detection and the relative outliers, are presented for different data features, respectively. It is expected that the outlier detection is one of the most simple outlier detection concepts in the data set, which calculates a desired outlier for each tuple and each x-tuple, and queries the K x-tuple with the highest expected outliers from the entire set of data. The semi-expected outlier detection is an improvement in the detection of the desired outlier, which solves the problem that the latter is susceptible to data integrity. The detection method only calculates the expected outliers for each tuple and no longer calculates the expected outliers for each x-tuple, which is referred to as a semi-expected outlier. The relative outlier detection solves the problem that the two previous outlier detection concepts are susceptible to paroxysmal data and noise. It no longer calculates the expected outliers for each tuple and x-tuple, but rather finds the K x-tuple that is most likely to be an outlier by comparing the x-tuple. The method also avoids the determination of some parameter thresholds, reduces the threshold of the outlier detection application, and is particularly suitable for ordinary users of a specific application field expert. In this paper, the concept of all-probability outliers detection is put forward. It uses the idea of not to determine the global top-K query on the data set, and calculates the probability that each x-tuple is the top-k1 outlier in any possible world, and the highest probability of k2 x-tuple is that the outliers on the data set are not determined. In this paper, the definition of the four uncertain data outliers is given in this paper, and the algorithm framework is put forward. On this basis, the pruning optimization strategy is designed and an efficient optimization algorithm is formed, and the accuracy and efficiency of the algorithm are finally improved by the experiments on the real data set and the artificial data set. The effectiveness of the pruning optimization strategy and the scalability of the algorithm are verified. It is often not enough to determine the distance-based outlier detection in the existing data set. First, it is assumed that the data data is not determined to be subject to a known distribution, especially the distribution of the analytical expression of the probability density function such as a normal distribution. But this is often difficult to achieve in practical applications, which limits the application of the related studies. Second, some studies, while using the x-tuple model and possibly the world semantic description, do not determine the data set, but they ignore the data diversity, and an uncertain data is not shown as a number of possible instances. The new non-deterministic data outliers detection concept proposed in this paper can be applied to various probability distribution environments, while considering the data incompleteness and diversity, the outliers detection can be efficiently and reasonably realized.
【學(xué)位授予單位】:國(guó)防科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP311.13


本文編號(hào):2485740

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/2485740.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶0d47f***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com