基于全文檢索系統(tǒng)的安全索引技術(shù)研究與實(shí)現(xiàn)
本文選題:全文檢索 + 安全索引 ; 參考:《華中科技大學(xué)》2012年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)的興起與蓬勃發(fā)展,信息檢索技術(shù)成為了人們生活中不可或缺的工具。在某些場(chǎng)合中,為了達(dá)到保護(hù)用戶信息的目的,提出安全索引這個(gè)概念,即在保證用戶信息不被泄露的前提下,同時(shí)能達(dá)到信息檢索的目的。 根據(jù)當(dāng)前所存在的問(wèn)題,實(shí)現(xiàn)了兩種具有安全索引的全文檢索系統(tǒng)。第一種是基于倒排文檔的安全索引模式。它采取了倒排文檔作為其索引結(jié)構(gòu),在索引及查詢階段對(duì)文本進(jìn)行了加密處理,并在寫(xiě)入索引時(shí)采取了二次加密手段,在實(shí)現(xiàn)了安全索引的基礎(chǔ)上很好地繼承了倒排文檔結(jié)構(gòu)快速準(zhǔn)確等優(yōu)點(diǎn),但是對(duì)于選擇明文攻擊和頻率統(tǒng)計(jì)等攻擊手段尚不能有效防御。另外針對(duì)倒排文檔的安全性不足,設(shè)計(jì)實(shí)現(xiàn)了另一種安全索引模式:基于陷門單向函數(shù)的安全索引模式。將陷門單向函數(shù)用于信息檢索中,與偽隨機(jī)函數(shù)相結(jié)合,,其不可逆和偽隨機(jī)特性極大地彌補(bǔ)了倒排文檔技術(shù)在安全性上的不足,可以很好抵御諸如選擇明文攻擊等攻擊手段,從而更加安全地保護(hù)了用戶信息。但是計(jì)算上的復(fù)雜性、與倒排結(jié)構(gòu)的不兼容導(dǎo)致了其檢索效率的低下,此外占用的空間大,存在著一定的誤判機(jī)率等都是其不足之處。 分析了兩種索引方式進(jìn)行檢索的原理及各自的安全性,并分別用代碼實(shí)現(xiàn)了兩套索引系統(tǒng)S-Lucene及BF-Index。S-Lucene在開(kāi)源搜索引擎Lucene的基礎(chǔ)上,進(jìn)行安全性定制;BF-Index時(shí),則采用了Bloom Filter作為索引存儲(chǔ)結(jié)構(gòu)。通過(guò)實(shí)驗(yàn)在各方面性能上對(duì)兩套系統(tǒng)進(jìn)行了對(duì)比,包括建立索引時(shí)間、查詢時(shí)間、準(zhǔn)確率等。通過(guò)實(shí)驗(yàn)數(shù)據(jù)分析了兩套系統(tǒng)各自的優(yōu)缺點(diǎn),并明確以后改進(jìn)的方向。
[Abstract]:With the rise and flourishing of the Internet, information retrieval technology has become an indispensable tool in people's life. In some cases, in order to achieve the purpose of protecting user information, the concept of security index is put forward, that is, to ensure the purpose of information retrieval at the same time, to ensure that the information of the user is not leaked.
According to the existing problems, two full text retrieval systems with secure index are implemented. The first is a secure index pattern based on inverted document. It takes inverted document as its index structure, encrypts the text in the stage of index and query, and adopts two encryption means when it is written into the index. On the basis of security index, it inherits the advantages of fast and accurate inverted document structure, but it can not be effectively defended for selecting attack methods such as plaintext attack and frequency statistics. In addition, for the lack of security of inverted documents, another security index mode: a secure index mode based on trapdoor unidirectional function is designed and implemented. It combines the trapdoor unidirectional function with the pseudo random function. Its irreversible and pseudo random properties greatly compensate for the inadequacy of the inverted document technology in the security, and can well resist the attack means such as the selection of the plaintext attack, and thus protect the user information more safely. The incompatibility of inverted structure leads to its low retrieval efficiency. Besides, the large space occupied and the probability of misjudgement are all shortcomings.
This paper analyzes the principle of two indexing methods and their respective security, and implements two sets of index systems S-Lucene and BF-Index.S-Lucene on the basis of open source search engine Lucene respectively, and then uses the Bloom Filter as the index storage structure when BF-Index. The two systems are compared, including the establishment of index time, query time, accuracy and so on. The advantages and disadvantages of the two sets of systems are analyzed through the experimental data, and the direction for future improvement is also clearly defined.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP391.3;TP309.7
【參考文獻(xiàn)】
相關(guān)期刊論文 前9條
1 沈昌祥;張煥國(guó);馮登國(guó);曹珍富;黃繼武;;信息安全綜述[J];中國(guó)科學(xué)(E輯:信息科學(xué));2007年02期
2 彭洪匯;林作銓;;Internet上的搜索引擎和元搜索引擎[J];計(jì)算機(jī)科學(xué);2002年09期
3 印鑒,陳憶群,張鋼;搜索引擎技術(shù)研究與發(fā)展[J];計(jì)算機(jī)工程;2005年14期
4 李慶虎,陳玉健,孫家廣;一種中文分詞詞典新機(jī)制——雙字哈希機(jī)制[J];中文信息學(xué)報(bào);2003年04期
5 黃昌寧;趙海;;中文分詞十年回顧[J];中文信息學(xué)報(bào);2007年03期
6 孫琦;關(guān)于一類陷門單向函數(shù)[J];四川大學(xué)學(xué)報(bào)(自然科學(xué)版);1985年04期
7 陳曉峰,王育民;公鑰密碼體制研究與進(jìn)展[J];通信學(xué)報(bào);2004年08期
8 錢愛(ài)兵;全文檢索算法設(shè)計(jì)及全文檢索系統(tǒng)概述[J];現(xiàn)代圖書(shū)情報(bào)技術(shù);2003年02期
9 楊一平;中文全文檢索算法研究[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;1997年09期
相關(guān)碩士學(xué)位論文 前1條
1 劉興宇;基于倒排索引的全文檢索技術(shù)研究[D];華中科技大學(xué);2004年
本文編號(hào):2097494
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2097494.html