基于距離的孤立點(diǎn)挖掘在計(jì)算機(jī)取證中的應(yīng)用研究
本文選題:孤立點(diǎn) + 計(jì)算機(jī)取證; 參考:《山東師范大學(xué)》2014年碩士論文
【摘要】:隨著信息技術(shù)的發(fā)展,我們已經(jīng)進(jìn)入了大數(shù)據(jù)時(shí)期,每天產(chǎn)生各類形式各異的數(shù)據(jù),與之相隨的會(huì)出現(xiàn)各種網(wǎng)絡(luò)安全問題,,針對(duì)這些問題,當(dāng)前的研究工作主要集中在安全防御方面,但是計(jì)算機(jī)網(wǎng)絡(luò)犯罪技術(shù)也在不斷進(jìn)步,因此僅靠防御方法是不能很好地打擊計(jì)算機(jī)犯罪的,我們需要發(fā)揮社會(huì)和法律的力量去打擊計(jì)算機(jī)犯罪,計(jì)算機(jī)取證技術(shù)應(yīng)運(yùn)而生。 數(shù)據(jù)挖掘技術(shù)可以從海量的數(shù)據(jù)中挖掘出潛在的、有研究價(jià)值的知識(shí),但是從這些海量數(shù)據(jù)中找到那些極少數(shù)的異常行為并發(fā)現(xiàn)有意義的知識(shí)是一項(xiàng)富有挑戰(zhàn)性的工作,然而現(xiàn)實(shí)生活中經(jīng)常包含一些與數(shù)據(jù)集一般行為或者一般模型不一致的數(shù)據(jù)對(duì)象,即孤立點(diǎn)。雖然正常的行為比數(shù)異常行為要多得多,但不正常的行為可能含有非常有趣的知識(shí)。所以研究這些孤立點(diǎn)具有一定的理論基礎(chǔ)和實(shí)踐意義。 本文對(duì)K近鄰孤立點(diǎn)檢測算法進(jìn)行了更為細(xì)致的研究,進(jìn)而對(duì)其進(jìn)行了改進(jìn),提高了算法的效率及準(zhǔn)確性。同時(shí)針對(duì)網(wǎng)絡(luò)操作日志數(shù)據(jù)量大、計(jì)算量大的特點(diǎn),本文采用基于MapReduce架構(gòu)的分布式算法思想,在Hadoop集群中快速的檢測孤立點(diǎn)。對(duì)國內(nèi)和國際的相關(guān)異常檢測方法的研究和應(yīng)用進(jìn)行了詳細(xì)分析,設(shè)計(jì)了基于孤立點(diǎn)挖掘的異常檢測模型,最后將孤立點(diǎn)檢測方法應(yīng)用于計(jì)算機(jī)取證技術(shù)中。本文主要研究如下: (1)系統(tǒng)探討了當(dāng)前國內(nèi)外孤立點(diǎn)挖掘算法的研究現(xiàn)狀,分析了孤立點(diǎn)挖掘算法的應(yīng)用實(shí)例,理論學(xué)習(xí)了孤立點(diǎn)挖掘算法的概念、流程,并對(duì)孤立點(diǎn)挖掘算法的性能和實(shí)現(xiàn)機(jī)制進(jìn)行了總結(jié)。深入研究計(jì)算機(jī)取證的相關(guān)知識(shí)與技能,總結(jié)了計(jì)算機(jī)取證的關(guān)鍵技術(shù),并給出計(jì)算機(jī)取證的流程。 (2)深入研究基于距離的反向K近鄰孤立點(diǎn)檢測算法,并指出其不足,并對(duì)算法進(jìn)行了改進(jìn),通過剪枝操作去除冗余數(shù)據(jù)后,加入了自適應(yīng)確定參數(shù)的機(jī)制,避免了過多人工參與造成的數(shù)據(jù)偏離,提高了算法的準(zhǔn)確性和高效性。在Hadoop集群架構(gòu)中設(shè)計(jì)了基于MapReduce的孤立點(diǎn)檢測算法,在分布式環(huán)境中快速檢測孤立點(diǎn)。 (3)構(gòu)造一種基于孤立點(diǎn)挖掘算法的日志分析模型,對(duì)日志數(shù)據(jù)進(jìn)行預(yù)處理后,將改進(jìn)的孤立點(diǎn)檢測算法應(yīng)用到模型中,經(jīng)實(shí)例證明,該模型可以有效的將算法中挖掘出的孤立點(diǎn)進(jìn)行分析,能夠得到初步證據(jù),使得取證服務(wù)更加高效、智能。
[Abstract]:With the development of information technology, we have entered the period of big data, and every day we produce various kinds of data in different forms, with which there will be various network security problems, aiming at these problems. The current research work is mainly focused on security defense, but the technology of computer network crime is also making continuous progress. Therefore, it is not possible to crack down on computer crime by relying on defensive methods alone. We need to exert the social and legal forces to combat computer crime, computer forensics technology came into being. Data mining technology can mine potential and valuable knowledge from large amount of data, but it is a challenging task to find a few abnormal behaviors and find meaningful knowledge from these massive data. However, in real life, there are often some data objects that are inconsistent with the general behavior of data sets or general models, that is, outliers. Although normal behavior is much more than abnormal behavior, abnormal behavior may contain very interesting knowledge. Therefore, the study of these isolated points has a certain theoretical basis and practical significance. In this paper, the K-nearest neighbor outlier detection algorithm is studied in detail, and then improved to improve the efficiency and accuracy of the algorithm. At the same time, aiming at the characteristics of large amount of log data and large amount of computation, this paper adopts the idea of distributed algorithm based on MapReduce architecture to detect outliers quickly in Hadoop cluster. The research and application of relevant anomaly detection methods at home and abroad are analyzed in detail. An anomaly detection model based on outlier mining is designed. Finally, outlier detection method is applied to computer forensics. The main contents of this paper are as follows: In this paper, the current situation of outlier mining algorithm at home and abroad is systematically discussed, and the application examples of outlier mining algorithm are analyzed. The concept and flow of outlier mining algorithm are studied theoretically. The performance and implementation mechanism of outlier mining algorithm are summarized. This paper studies the knowledge and skills of computer forensics, summarizes the key technology of computer forensics, and gives the flow of computer forensics. 2) the distance based inverse K-nearest neighbor outlier detection algorithm is studied, and its shortcomings are pointed out, and the algorithm is improved. After the redundant data is removed by pruning operation, the adaptive parameter determination mechanism is added. The data deviation caused by too much artificial participation is avoided, and the accuracy and efficiency of the algorithm are improved. An outlier detection algorithm based on MapReduce is designed in Hadoop cluster architecture, which can detect outliers quickly in distributed environment. A log analysis model based on outlier mining algorithm is constructed. After the log data is preprocessed, the improved outlier detection algorithm is applied to the model. The model can effectively analyze the outliers excavated in the algorithm, and obtain the preliminary evidence, which makes the forensics service more efficient and intelligent.
【學(xué)位授予單位】:山東師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP311.13;TP393.08
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 陳衛(wèi)東,劉計(jì)劃;關(guān)于完善我國刑事證明標(biāo)準(zhǔn)體系的若干思考[J];法律科學(xué).西北政法學(xué)院學(xué)報(bào);2001年03期
2 徐勉,王景光;孤立點(diǎn)挖掘技術(shù)在入侵檢測中的應(yīng)用研究[J];計(jì)算機(jī)安全;2004年01期
3 趙海波,郁迅,楊宇航;IP網(wǎng)絡(luò)地址映射技術(shù)的分析和實(shí)現(xiàn)[J];電子技術(shù)應(yīng)用;1999年05期
4 李建江;崔健;王聃;嚴(yán)林;黃義雙;;MapReduce并行編程模型研究綜述[J];電子學(xué)報(bào);2011年11期
5 謝毓湘;欒悉道;陳丹雯;張芯;;一種基于局部不變特征的圖像特定場景檢測方法[J];國防科技大學(xué)學(xué)報(bào);2013年03期
6 黃斌;許榕生;鄧小鴻;;一種基于孤立點(diǎn)挖掘的計(jì)算機(jī)取證技術(shù)[J];江南大學(xué)學(xué)報(bào)(自然科學(xué)版);2009年02期
7 陸聲鏈,林士敏;基于距離的孤立點(diǎn)檢測研究[J];計(jì)算機(jī)工程與應(yīng)用;2004年33期
8 岳峰;邱保志;;基于反向K近鄰的孤立點(diǎn)檢測算法[J];計(jì)算機(jī)工程與應(yīng)用;2007年07期
9 趙艷;翟偉斌;楊澤明;許榕生;;基于Web Services接口的信息安全綜合審計(jì)系統(tǒng)[J];計(jì)算機(jī)工程;2007年14期
10 孫國梓;耿偉明;陳丹偉;申濤;;基于可信概率的電子數(shù)據(jù)取證有效性模型[J];計(jì)算機(jī)學(xué)報(bào);2011年07期
本文編號(hào):1929595
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1929595.html