天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

基于Nutch的安全漏洞垂直搜索引擎的設(shè)計(jì)與實(shí)現(xiàn)

發(fā)布時(shí)間:2018-05-09 00:35

  本文選題:Nutch + 垂直搜索引擎; 參考:《北京郵電大學(xué)》2017年碩士論文


【摘要】:當(dāng)今社會,越來越多的人通過互聯(lián)網(wǎng)獲取信息資源,而面對海量的網(wǎng)絡(luò)信息,人們需要通過搜索引擎來快速檢索到所需的信息。傳統(tǒng)的搜索引擎技術(shù)是對整個(gè)互聯(lián)網(wǎng)資源進(jìn)行爬取,搜索范圍廣,但是搜索結(jié)果中包含了大量用戶不需要的信息,用戶體驗(yàn)感差。而垂直搜索引擎只檢索出用戶關(guān)心的、某一特定專業(yè)領(lǐng)域的相關(guān)信息,它的搜索范圍小,但是搜索結(jié)果更精準(zhǔn),符合用戶對特定領(lǐng)域的信息檢索需求。目前,人們的學(xué)習(xí)生活等各方面都離不開互聯(lián)網(wǎng),而個(gè)人、企業(yè)的信息泄露屢見不鮮,互聯(lián)網(wǎng)安全問題越來越引起人們的重視。而互聯(lián)網(wǎng)中大量的安全漏洞是構(gòu)成網(wǎng)絡(luò)安全威脅的重要原因,企業(yè)受到大規(guī)模ddos攻擊導(dǎo)致主機(jī)崩潰、用戶個(gè)人信息泄露等問題多是由安全漏洞所引發(fā)。安全漏洞導(dǎo)致的風(fēng)險(xiǎn)是巨大的,為了讓人們能夠了解到最新的安全漏洞信息,有必要構(gòu)建一個(gè)可以檢索安全漏洞信息的垂直搜索引擎。本文通過對垂直搜索引擎相關(guān)技術(shù)以及開源搜索引擎框架Nutch的研究,設(shè)計(jì)并實(shí)現(xiàn)了基于Nutch的安全漏洞垂直搜索引擎系統(tǒng)。該系統(tǒng)的主要功能模塊包括網(wǎng)絡(luò)爬蟲、特定主題信息過濾、索引、檢索排序以及第三方中文分詞器。本文的主要工作包括以下幾個(gè)方面:1、熟悉了搜索引擎的發(fā)展概況以及垂直搜索引擎的研究現(xiàn)狀,重點(diǎn)研究了垂直搜索引擎的各個(gè)模塊技術(shù),同時(shí)熟悉了開源Nutch框架的工作原理與插件機(jī)制。2、重點(diǎn)研究了垂直搜索引擎的主題過濾模塊,本文引入了分類器思想實(shí)現(xiàn)對信息的分類,從而實(shí)現(xiàn)面向特定領(lǐng)域信息的搜索。由于樸素貝葉斯分類器存在條件獨(dú)立性的天然缺陷,本文重點(diǎn)研究了二階AODE分類器,并在此基礎(chǔ)上改進(jìn)實(shí)現(xiàn)了基于屬性變量和類變量互信息加權(quán)的WAODE分類算法。同時(shí)將WAODE分類算法結(jié)合Nutch的插件機(jī)制實(shí)現(xiàn)本文的主題過濾模塊。3、改進(jìn)了 Nutch檢索排序算法模型,從內(nèi)容相關(guān)性、超鏈接分析網(wǎng)頁權(quán)威性以及時(shí)間因子三方面考慮,得到新的網(wǎng)頁排序評分模型并實(shí)驗(yàn)驗(yàn)證。4、在Nutch中加入第三方中文分詞器mmseg4j,實(shí)現(xiàn)了中文分詞功能。
[Abstract]:In today's society, more and more people obtain information resources through the Internet, and in the face of massive network information, people need to quickly retrieve the required information through search engines. Traditional search engine technology is to crawl the entire Internet resources, search a wide range, but the search results contain a large number of users do not need information, user experience is poor. The vertical search engine only retrieves the relevant information of a specific professional domain which is of concern to the user. Its search scope is small, but the search results are more accurate and meet the information retrieval needs of the user in a specific field. At present, people's study life and other aspects can not be separated from the Internet, and the information leakage of individuals and enterprises is common, Internet security issues have been paid more and more attention. However, a large number of security vulnerabilities in the Internet are the important reasons for the network security threats. Large scale ddos attacks on enterprises lead to the collapse of the host, and many other problems such as the disclosure of personal information of users are caused by security vulnerabilities. The risks caused by security vulnerabilities are enormous. In order to make people know the latest information of security vulnerabilities, it is necessary to build a vertical search engine which can retrieve the information of security vulnerabilities. Based on the research of vertical search engine technology and open source search engine framework Nutch, this paper designs and implements a security vulnerability vertical search engine system based on Nutch. The main function modules of the system include web crawler, specific topic information filtering, indexing, retrieval and sorting, and third party Chinese word segmentation. The main work of this paper includes the following aspects: 1, familiar with the development of the search engine and the status quo of the vertical search engine, focusing on the vertical search engine module technology, At the same time, we are familiar with the working principle of open source Nutch framework and plug-in mechanism. 2. We focus on the topic filtering module of vertical search engine. In this paper, we introduce the idea of classifier to realize the classification of information, so as to realize the search for specific domain information. Due to the natural defect of conditional independence of naive Bayesian classifier, the second order AODE classifier is studied in this paper, and an improved WAODE classification algorithm based on mutual information between attribute variables and class variables is implemented. At the same time, the WAODE classification algorithm is combined with the plug-in mechanism of Nutch to realize the topic filtering module. 3, which improves the sorting algorithm model of Nutch retrieval, considering from three aspects: content correlation, hyperlink analysis of web page authority and time factor. A new web page ranking scoring model was obtained and verified by experiments. The third party Chinese word particifier mmseg4jwas added to Nutch to realize the function of Chinese word segmentation.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 彭Z,

本文編號:1863811


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1863811.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶4ec00***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com
日本精品理论在线观看| 九九热这里只有免费精品| 亚洲国产成人av毛片国产| 日本精品中文字幕人妻| 欧美久久一区二区精品| 欧美久久一区二区精品| 熟女中文字幕一区二区三区| 日本91在线观看视频| 国产一级二级三级观看| 亚洲夫妻性生活免费视频| 久久碰国产一区二区三区| 成人午夜视频在线播放| 欧美一区二区三区不卡高清视| 午夜久久精品福利视频| 亚洲国产精品av在线观看| 精品欧美在线观看国产| 国产av熟女一区二区三区四区| 九九热九九热九九热九九热 | 成年男女午夜久久久精品| 中文字幕免费观看亚洲视频| 国产一区欧美一区日韩一区| 日韩三级黄色大片免费观看| 九九九热视频最新在线| 久久精品免费视看国产成人| 黄片三级免费在线观看| 日韩精品中文字幕亚洲| 91精品国产综合久久不卡| 午夜午夜精品一区二区| 精品视频一区二区不卡| 欧美老太太性生活大片| 美国欧洲日本韩国二本道| 熟女少妇久久一区二区三区| 在线一区二区免费的视频| 亚洲一区二区三区熟女少妇| 国产精品久久男人的天堂| 久久热在线视频免费观看| 国产精品国三级国产专不卡| 偷拍美女洗澡免费视频| 日本高清视频在线播放| 亚洲天堂精品一区二区| 亚洲男人天堂网在线视频|