天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于文本的敏感信息的監(jiān)測調(diào)度與去重研究

發(fā)布時間:2019-01-10 14:57
【摘要】:互聯(lián)網(wǎng)發(fā)展給人們的生活帶來了很大的便利,極大的推動了社會的進步。但與此同時一些不法分子利用網(wǎng)絡(luò)傳播信息的方便和迅速,在網(wǎng)絡(luò)上傳播一些包含色情、暴恐、反動等不良內(nèi)容的敏感信息,給國家安全,社會的發(fā)展,人們的生活帶來了極大的負面影響。從龐大的互聯(lián)網(wǎng)中及時的檢索到這些敏感信息并對其進行監(jiān)控成為網(wǎng)絡(luò)安全領(lǐng)域的一個研究熱點。為了及時的發(fā)現(xiàn)敏感信息,本文對敏感信息的監(jiān)測調(diào)度策略和敏感網(wǎng)頁去重進行了研究,主要的工作如下:1提出一種基于網(wǎng)頁敏感度的敏感網(wǎng)頁分類監(jiān)測策略。本文通過對網(wǎng)頁進行敏感關(guān)鍵詞匹配,得到敏感關(guān)鍵詞及其在網(wǎng)頁中的位置,結(jié)合敏感詞本身的敏感度及其在網(wǎng)頁中位置的影響因子,給出了一種計算網(wǎng)頁敏感度的算法。計算網(wǎng)頁的敏感度后,根據(jù)敏感網(wǎng)頁的敏感程度分類進行不同頻率的監(jiān)測,優(yōu)化敏感網(wǎng)監(jiān)測,提高發(fā)現(xiàn)敏感信息的及時性。實驗表明該策略能夠有效提高系統(tǒng)發(fā)現(xiàn)敏感信息的及時性以及重點敏感信息的比例。2提出了一種基于非敏感網(wǎng)頁變化時間預(yù)測的敏感信息補充發(fā)現(xiàn)策略。本文根據(jù)最近幾次網(wǎng)頁的變化次數(shù)和時間間隔,對網(wǎng)頁的下次變化時間進行預(yù)測,對滿足時間條件的網(wǎng)頁進行爬取,提高爬取經(jīng)常變化的網(wǎng)頁的頻率,降低爬取不發(fā)生變化的網(wǎng)頁的頻率,提高經(jīng)常變動網(wǎng)頁的敏感信息發(fā)現(xiàn)速度,提高其發(fā)現(xiàn)新敏感網(wǎng)頁的總數(shù)。實驗結(jié)果表明該策略能夠較好對基于網(wǎng)頁敏感度的敏感網(wǎng)頁監(jiān)測策略進行補充,進一步提高敏感信息的發(fā)現(xiàn)率。3提出了一種基于敏感信息摘要的去重策略。通過網(wǎng)頁敏感關(guān)鍵詞匹配,得到網(wǎng)頁包含的敏感關(guān)鍵詞位置,提取敏感詞對應(yīng)的敏感上下文,將網(wǎng)頁的所有敏感詞對應(yīng)的敏感上下文合并生成網(wǎng)頁的敏感信息摘要。通過網(wǎng)頁敏感摘要信息的編輯距離計算出敏感摘要信息的相似度。然后比較敏感摘要信息的相似度達到敏感網(wǎng)頁的去重功能。實驗表明該策略能夠較好的提高去除重復(fù)網(wǎng)頁的效果。4在本文提出的策略和方法的基礎(chǔ)上對敏感信息監(jiān)測與重復(fù)展示去除進行了設(shè)計與實現(xiàn),對本校的部分網(wǎng)站進行了掃描和監(jiān)測,測試了系統(tǒng)的有效性和穩(wěn)定性。測試系統(tǒng)運行表明,本文提出的敏感信息發(fā)現(xiàn)及去重策略能夠較為及時的發(fā)現(xiàn)敏感信息。
[Abstract]:The development of the Internet has brought great convenience to people's life and greatly promoted the progress of society. But at the same time, some lawless elements used the convenience and speed of spreading information on the Internet, spreading sensitive information on the Internet that contained pornographic, violent, reactionary and other undesirable content, thus giving national security and social development. People's lives have had a great negative impact. Retrieving and monitoring these sensitive information from the huge Internet has become a research hotspot in the field of network security. In order to discover sensitive information in time, the monitoring and scheduling strategy of sensitive information and the rescheduling of sensitive web pages are studied in this paper. The main work is as follows: 1. A sensitive web page classification and monitoring strategy based on web sensitivity is proposed. In this paper, the sensitive keywords and their position in the web pages are obtained by matching the sensitive keywords to the web pages. Combined with the sensitivity of the sensitive words and the influencing factors of their location in the web pages, an algorithm for calculating the sensitivity of the web pages is presented. After calculating the sensitivity of web pages, we can monitor the sensitive web pages with different frequency according to the sensitivity classification of sensitive web pages, optimize the monitoring of sensitive web pages, and improve the timeliness of discovering sensitive information. Experiments show that the strategy can effectively improve the timeliness of sensitive information discovery and the proportion of key sensitive information. 2 A complementary discovery strategy for sensitive information is proposed based on the prediction of the change time of non-sensitive web pages. According to the times and time intervals of the recent page changes, this paper predicts the next change time of the web page, crawls the web page that meets the time condition, and improves the frequency of crawling the frequently changing web page. It can reduce the frequency of crawling the pages that do not change, improve the speed of detecting sensitive information, and increase the total number of new sensitive pages. The experimental results show that this strategy can complement the sensitive web page monitoring strategy based on the sensitivity of web pages and further improve the detection rate of sensitive information. 3 A strategy of removing heavy weight based on the summary of sensitive information is proposed. By matching the sensitive keywords of the web page, the location of the sensitive keywords contained in the web page is obtained, and the sensitive context corresponding to the sensitive word is extracted, and the sensitive context corresponding to all the sensitive words in the page is combined to generate the sensitive information summary of the web page. The similarity of the sensitive summary information is calculated by the editing distance of the web page sensitive summary information. Then, the similarity of sensitive summary information achieves the function of de-reduplication of sensitive web pages. Experiments show that the strategy can improve the effect of removing duplicate web pages. 4 on the basis of the strategies and methods proposed in this paper, the design and implementation of sensitive information monitoring and repeated display removal are carried out. Some websites of our school were scanned and monitored, and the effectiveness and stability of the system were tested. The operation of the test system shows that the sensitive information discovery and de-reduplication strategy proposed in this paper can discover sensitive information in a more timely manner.
【學(xué)位授予單位】:重慶大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP393.092;TP391.1

【參考文獻】

中國期刊全文數(shù)據(jù)庫 前4條

1 孟濤,閆宏飛,王繼民;一個增量搜集中國W eb的系統(tǒng)模型及其實現(xiàn)[J];清華大學(xué)學(xué)報(自然科學(xué)版);2005年S1期

2 周立柱,林玲;聚焦爬蟲技術(shù)研究綜述[J];計算機應(yīng)用;2005年09期

3 韓客松,王永成;一種用于主題提取的非線性加權(quán)方法[J];情報學(xué)報;2000年06期

4 孟濤;閆宏飛;王繼民;;Web網(wǎng)頁信息變化的時間局部性規(guī)律及其驗證[J];情報學(xué)報;2005年04期

中國碩士學(xué)位論文全文數(shù)據(jù)庫 前5條

1 李森;基于漏洞管理平臺的聚焦爬蟲技術(shù)研究分析[D];北京郵電大學(xué);2015年

2 溫都日娜;一種基于本體的敏感詞過濾方法研究[D];吉林大學(xué);2014年

3 李e呡,

本文編號:2406451


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2406451.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶c90ba***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
欧美日韩中黄片免费看| 国产老女人性生活视频| 91欧美日韩国产在线观看| 欧美日韩精品一区二区三区不卡| 亚洲一级在线免费观看| 丰满人妻一二三区av| 空之色水之色在线播放| 久久精品国产99精品亚洲| 亚洲专区中文字幕视频| 高清一区二区三区大伊香蕉| 国产精品一级香蕉一区| 欧美黑人在线一区二区| 久久99青青精品免费| 久久99精品国产麻豆婷婷洗澡| 小草少妇视频免费看视频| 亚洲精品蜜桃在线观看| 国产老女人性生活视频| 日韩精品一区二区三区av在线| 色丁香一区二区黑人巨大| 亚洲中文字幕在线乱码av| 中文字幕乱码免费人妻av| 日本在线高清精品人妻| 久久国产成人精品国产成人亚洲| 午夜国产精品福利在线观看| 欧美一区二区三区性视频| 日本一区不卡在线观看| 国产一区二区三区免费福利| 微拍一区二区三区福利| 国产一区日韩二区欧美| 好吊色欧美一区二区三区顽频| 99久久精品午夜一区二区| 国产女优视频一区二区| 日韩一级毛一欧美一级乱| 亚洲男人的天堂就去爱| 久热久热精品视频在线观看| 美女被后入视频在线观看| 99久久精品免费看国产高清| 蜜桃传媒在线正在播放| 五月天丁香亚洲综合网| 亚洲另类欧美综合日韩精品 | 精品国产91亚洲一区二区三区|