天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

網(wǎng)絡(luò)安全審計(jì)中基于Hadoop的敏感詞檢測(cè)技術(shù)研究

發(fā)布時(shí)間:2018-05-07 05:42

  本文選題:內(nèi)容審計(jì) + XML ; 參考:《東華大學(xué)》2015年碩士論文


【摘要】:隨著互聯(lián)網(wǎng)的普及,網(wǎng)絡(luò)中的信息資源越發(fā)豐富。與此同時(shí),越來(lái)越多的非法信息、不良信息、敏感信息也充斥網(wǎng)絡(luò),網(wǎng)絡(luò)成為封建迷信、色情暴力、反動(dòng)言論、謠言訛傳等信息傳播的主要媒介。面對(duì)這些威脅網(wǎng)絡(luò)安全的因素,安全審計(jì)因其實(shí)時(shí)性、動(dòng)態(tài)性和主動(dòng)防御的特點(diǎn),為網(wǎng)絡(luò)提供了很好的安全保障。 論文結(jié)合某公司一個(gè)實(shí)際的網(wǎng)絡(luò)安全審計(jì)系統(tǒng)項(xiàng)目,重點(diǎn)研究了內(nèi)容審計(jì)中的敏感詞檢測(cè)技術(shù)。首先介紹了敏感詞檢測(cè)與網(wǎng)絡(luò)安全審計(jì)的概念、研究現(xiàn)狀,以及與課題相關(guān)的技術(shù)。在分析系統(tǒng)功能需求的基礎(chǔ)上,給出了系統(tǒng)的總體實(shí)現(xiàn)模型。實(shí)際項(xiàng)目的日志數(shù)據(jù),以XML格式存儲(chǔ),具有語(yǔ)義和結(jié)構(gòu)雙重信息。論文結(jié)合雙數(shù)組Trie樹和Dewey編碼,重點(diǎn)研究了XML文檔中的敏感詞檢測(cè)技術(shù),提出了敏感度的概念,并給出其計(jì)算方法。結(jié)合研究結(jié)果,論文最后設(shè)計(jì)并實(shí)現(xiàn)了一個(gè)敏感詞檢測(cè)系統(tǒng)原型,驗(yàn)證了課題所研究的方法和技術(shù)的有效性。論文的主要工作有以下幾個(gè)方面。 分析了網(wǎng)絡(luò)信息安全審計(jì)系統(tǒng)的功能需求,設(shè)計(jì)了系統(tǒng)的總體實(shí)現(xiàn)模型。結(jié)合內(nèi)容審計(jì),分析了其中基于日志審計(jì)的流程,給出了日志數(shù)據(jù)的格式,明確了敏感詞檢測(cè)技術(shù)研究的對(duì)象。 敏感詞檢測(cè)的數(shù)據(jù)對(duì)象是XML格式的日志數(shù)據(jù)。為了獲取其結(jié)構(gòu)信息,實(shí)現(xiàn)復(fù)雜結(jié)構(gòu)的敏感詞檢測(cè),論文研究了基于Dewey編碼的XML文檔編碼方式,將XML文檔樹中父節(jié)點(diǎn)的編碼直接作為其孩子節(jié)點(diǎn)編碼的前綴,從而可以方便的獲取節(jié)點(diǎn)所在的層和節(jié)點(diǎn)間的結(jié)構(gòu)關(guān)系,,有利于簡(jiǎn)便地計(jì)算出日志的結(jié)構(gòu)敏感度。 為了提高敏感詞檢測(cè)的效率,需要為敏感詞庫(kù)建立索引。論文采用雙數(shù)組Trie樹,為敏感詞庫(kù)構(gòu)建索引,研究了基于語(yǔ)義和結(jié)合結(jié)構(gòu)信息的敏感詞檢測(cè)算法。一方面,根據(jù)節(jié)點(diǎn)的權(quán)值和敏感詞出現(xiàn)的頻率,來(lái)計(jì)算語(yǔ)義敏感度,給出了敏感度的計(jì)算公式。另一方面,在敏感詞具有結(jié)構(gòu)信息時(shí),需要結(jié)合語(yǔ)義和結(jié)構(gòu)信息進(jìn)行敏感詞檢測(cè)。通過(guò)敏感詞間距離的計(jì)算,先進(jìn)行語(yǔ)義上的匹配,然后再進(jìn)行結(jié)構(gòu)相似性的匹配,實(shí)現(xiàn)了包含結(jié)構(gòu)信息的敏感詞檢測(cè)。 結(jié)合所研究的敏感詞檢測(cè)技術(shù),論文設(shè)計(jì)并實(shí)現(xiàn)了一個(gè)網(wǎng)絡(luò)安全審計(jì)中敏感詞檢測(cè)系統(tǒng)的原型。將系統(tǒng)分為用戶接口、信息準(zhǔn)備、檢測(cè)引擎和審計(jì)策略四個(gè)子系統(tǒng)。設(shè)計(jì)了系統(tǒng)的總體架構(gòu),分析了用戶與系統(tǒng)的交互過(guò)程。在此基礎(chǔ)上,詳細(xì)介紹了各個(gè)子系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)。將Dewey編碼生成算法、基于雙數(shù)組Trie樹索引結(jié)構(gòu)的檢測(cè)算法進(jìn)行合理地分解,應(yīng)用在實(shí)驗(yàn)搭建的Hadoop集群環(huán)境中,在一定程度上提高了系統(tǒng)的可擴(kuò)展性。
[Abstract]:With the popularity of the Internet, the information resources in the network are more and more abundant. At the same time, more and more illegal information, bad information, sensitive information is also flooded with the Internet, the network has become the feudal superstition, pornographic violence, reactionary remarks, rumors and other information dissemination of the main media. In the face of these factors which threaten the network security, the security audit provides a good security for the network because of its real-time, dynamic and active defense characteristics. Based on a project of a company's network security audit system, this paper focuses on the detection technology of sensitive words in content audit. Firstly, the concepts of sensitive word detection and network security audit are introduced. Based on the analysis of the functional requirements of the system, the overall implementation model of the system is given. The log data of the actual project is stored in XML format with both semantic and structural information. Combined with double array Trie tree and Dewey coding, this paper focuses on the detection technology of sensitive words in XML documents, puts forward the concept of sensitivity and gives its calculation method. Finally, a prototype of sensitive word detection system is designed and implemented, which verifies the effectiveness of the methods and techniques studied in this paper. The main work of this paper is as follows. The functional requirements of network information security audit system are analyzed, and the overall implementation model of the system is designed. Combined with content audit, the flow of log audit is analyzed, the format of log data is given, and the research object of sensitive word detection technology is defined. The data object detected by sensitive words is log data in XML format. In order to obtain the structure information and detect the sensitive words of complex structure, the XML document coding method based on Dewey coding is studied in this paper. The encoding of the parent node in the XML document tree is directly used as the prefix of the child node coding. Therefore, the structure relationship between the layers and nodes can be easily obtained, and the structural sensitivity of the log can be calculated easily. In order to improve the efficiency of sensitive word detection, it is necessary to index sensitive lexicon. In this paper, we use double array Trie tree to build index for sensitive lexicon, and study the detection algorithm of sensitive words based on semantic and structural information. On the one hand, the semantic sensitivity is calculated according to the weights of nodes and the frequency of the occurrence of sensitive words, and the formula of sensitivity is given. On the other hand, when sensitive words have structural information, it is necessary to combine semantic and structural information to detect sensitive words. By calculating the distance between the sensitive words, the semantic matching is carried out, and then the structural similarity matching is carried out, which realizes the detection of the sensitive words containing structural information. This paper designs and implements a prototype of sensitive word detection system in network security audit. The system is divided into four subsystems: user interface, information preparation, detection engine and audit strategy. The overall architecture of the system is designed, and the interaction process between the user and the system is analyzed. On this basis, the design and implementation of each subsystem are introduced in detail. The Dewey coding generation algorithm and the detection algorithm based on double array Trie tree index structure are decomposed reasonably and applied to the experimental Hadoop cluster environment, which improves the extensibility of the system to a certain extent.
【學(xué)位授予單位】:東華大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類號(hào)】:TP393.08

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 ;CB/T 18336.1-2008《信息技術(shù) 安全技術(shù) 信息技術(shù)安全性評(píng)估準(zhǔn)則第1部分:簡(jiǎn)介和一般模型》概要[J];信息技術(shù)與標(biāo)準(zhǔn)化;2009年06期

2 李方偉;鄭波;朱江;張海波;;一種基于AC-RBF神經(jīng)網(wǎng)絡(luò)的網(wǎng)絡(luò)安全態(tài)勢(shì)預(yù)測(cè)方法[J];重慶郵電大學(xué)學(xué)報(bào)(自然科學(xué)版);2014年05期

3 孫欽東;管曉宏;周亞?wèn)|;;網(wǎng)絡(luò)信息內(nèi)容審計(jì)研究的現(xiàn)狀及趨勢(shì)[J];計(jì)算機(jī)研究與發(fā)展;2009年08期

4 吳海濤;唐振民;;XML文檔的Dewey編碼生成算法[J];計(jì)算機(jī)工程;2010年19期

5 王思力;張華平;王斌;;雙數(shù)組Trie樹算法優(yōu)化及其應(yīng)用研究[J];中文信息學(xué)報(bào);2006年05期

6 鄧一貴;伍玉英;;基于文本內(nèi)容的敏感詞決策樹信息過(guò)濾算法[J];計(jì)算機(jī)工程;2014年09期

7 ;第34次中國(guó)互聯(lián)網(wǎng)絡(luò)發(fā)展?fàn)顩r統(tǒng)計(jì)報(bào)告[J];互聯(lián)網(wǎng)天地;2014年07期

8 黃元飛;國(guó)外信息安全測(cè)評(píng)認(rèn)證體系簡(jiǎn)介[J];通信保密;2000年04期

9 謝志偉;王志明;;基于數(shù)據(jù)挖掘的網(wǎng)絡(luò)安全審計(jì)技術(shù)的研究[J];軟件;2013年12期

10 李玲娟;倪鋮;韓京宇;;一種新的基于Dewey編碼的XML路徑索引[J];計(jì)算機(jī)技術(shù)與發(fā)展;2010年10期



本文編號(hào):1855628

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1855628.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶028d5***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com