天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 搜索引擎論文 >

異構(gòu)數(shù)據(jù)聯(lián)合檢索系統(tǒng)的設(shè)計與實現(xiàn)

發(fā)布時間:2018-10-26 13:06
【摘要】:隨著計算機和網(wǎng)絡(luò)的普及,越來越多的企業(yè)、機關(guān)、學(xué)校等都利用計算機來處理文檔,而在這些機構(gòu)的管理過程中也必然會產(chǎn)生大量的電子文檔。如何從大量的文檔中快速而準(zhǔn)確地檢索出用戶所需要的信息成為擺在人們面前的一大難題。某企業(yè)對文檔的檢索上也存在這個問題,目前該企業(yè)對文檔采用目錄式管理,沒有一個針對所有文檔的檢索系統(tǒng),員工欲查找某項信息需花費大量的時間,并且尋找到的信息不完全。所以該企業(yè)急需一個針對其所有文檔來進行信息檢索的搜索引擎來滿足不同用戶的需求。本項目以該企業(yè)需求為依托,針對異構(gòu)數(shù)據(jù)聯(lián)合檢索系統(tǒng)中索引建立與搜索機制來進行研究。該系統(tǒng)提供了按文檔類型檢索、按發(fā)布者檢索、按發(fā)布日期檢索等多種檢索方式,以方便用戶的使用。同時,針對企業(yè)數(shù)據(jù)量龐大和檢索結(jié)果需準(zhǔn)確的特點,系統(tǒng)對索引的建立與檢索過程以及庖丁解牛中文分詞器均做了大量的優(yōu)化。本系統(tǒng)采用Java語言開發(fā),主要使用基于Java的全文索引工具包Lucene來實現(xiàn)?紤]到企業(yè)龐大的數(shù)據(jù)量以及未來的系統(tǒng)升級,數(shù)據(jù)庫采用專門針對大容量數(shù)據(jù)處理的GreenPlum數(shù)據(jù)庫。項目采用SSH框架,文檔解析采用了POI和PDFBox工具包,中文分詞器采用了庖丁解牛分詞器。開發(fā)工具使用MyEclipse10。系統(tǒng)運行情況良好,就檢索的效率和效果而言,基本達到了最初的設(shè)計要求。
[Abstract]:With the popularity of computers and networks, more and more enterprises, institutions, schools and so on use computers to process documents, and in the management process of these organizations will inevitably produce a large number of electronic documents. How to quickly and accurately retrieve the information needed by users from a large number of documents has become a big problem in front of people. There is also this problem in the retrieval of documents in a certain enterprise. At present, the enterprise uses directory management for documents, and there is no retrieval system for all documents. It takes a lot of time for employees to find a certain item of information. And the information found is incomplete. Therefore, the enterprise urgently needs a search engine for all its documents to meet the needs of different users. This project is based on the requirements of the enterprise and studies the indexing and searching mechanism in the heterogeneous data joint retrieval system. The system provides a variety of retrieval methods, such as retrieval by document type, by publisher, by publication date, and so on, in order to facilitate the use of users. At the same time, in view of the large amount of enterprise data and the need for accurate retrieval results, the system has made a great deal of optimization on the establishment and retrieval process of the index and the Chinese word particifier of Pao Ding Jie Niu. This system is developed with Java language, mainly using the full-text index toolkit Lucene based on Java. Considering the huge amount of enterprise data and the future system upgrade, the database adopts GreenPlum database which is specially designed for large capacity data processing. SSH framework is used in the project, POI and PDFBox toolkits are used in document parsing, and Pao Ding Jie Niu word Segmentation is used in Chinese word Segmentation. Development tools using MyEclipse10. The system runs well, and the efficiency and effect of retrieval basically meet the initial design requirements.
【學(xué)位授予單位】:東北大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.3


本文編號:2295815

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2295815.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶9b928***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
亚洲男人的天堂色偷偷| 一区二区日本一区二区欧美| 免费黄片视频美女一区| 亚洲天堂精品一区二区| 国产成人高清精品尤物| 国产老熟女超碰一区二区三区| 熟女体下毛荫荫黑森林自拍| 国产原创激情一区二区三区| 特黄大片性高水多欧美一级| 国产午夜精品亚洲精品国产| 免费在线观看激情小视频| 女人高潮被爽到呻吟在线观看| 日本本亚洲三级在线播放| 日韩三级黄色大片免费观看| 国产精品欧美一区二区三区| 亚洲av成人一区二区三区在线| 黄片在线免费观看全集| 国产日本欧美特黄在线观看| 中国少妇精品偷拍视频 | 欧美性猛交内射老熟妇| 香蕉尹人视频在线精品| 久久福利视频这里有精品| 一区二区三区日韩经典| 国内精品偷拍视频久久| 婷婷一区二区三区四区| 麻豆最新出品国产精品| 亚洲一区二区三区有码| 欧美日韩在线第一页日韩| 国产91人妻精品一区二区三区| 亚洲国产一级片在线观看| 欧美日韩精品一区二区三区不卡 | 美日韩一区二区精品系列| 一区二区三区人妻在线| 男人大臿蕉香蕉大视频| 成人午夜激情免费在线| 欧洲日本亚洲一区二区| 国产精品偷拍一区二区| 欧美一区二区口爆吞精| 日韩欧美高清国内精品| 亚洲国产精品久久综合网| 欧美日韩精品一区免费|