基于Lucene的無線城市站內(nèi)全文搜索系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)
本文選題:無線城市 + 全文搜索; 參考:《北京郵電大學(xué)》2013年碩士論文
【摘要】:隨著移動(dòng)互聯(lián)網(wǎng)以及無線接入技術(shù)的迅猛發(fā)展,無線城市為市民、企業(yè)和政府提供了共享無線信息平臺(tái)的可能性,隨時(shí)隨地隨需的獲取無線網(wǎng)絡(luò)服務(wù)已成為現(xiàn)代生活和辦公的一個(gè)非常重要的方面,無線城市作為新科技社會(huì)發(fā)展的產(chǎn)物,將極大的影響人們的生活和工作,推動(dòng)經(jīng)濟(jì)社會(huì)的進(jìn)一步發(fā)展。無線城市協(xié)調(diào)統(tǒng)一各類信息資源,整合各類業(yè)務(wù)和應(yīng)用,為用戶提供了在具有海量信息和應(yīng)用的互聯(lián)網(wǎng)查找信息資源的便利平臺(tái),但是縱觀已經(jīng)試點(diǎn)運(yùn)營(yíng)的各省市無線城市綜合門戶網(wǎng)站,卻缺少能夠讓用戶在海量信息應(yīng)用中快捷找到所需資源的全文搜索入口,不能滿足用戶深入、快速、準(zhǔn)確的查詢需求,已有的無線城市站內(nèi)搜索功能只是對(duì)站內(nèi)的新聞資訊進(jìn)行檢索,并不符合全文搜索的概念。 基于上述背景,本文提出一個(gè)無線城市站內(nèi)全文搜索系統(tǒng)的設(shè)計(jì)和實(shí)現(xiàn)方案,幫助無線城市用戶快速準(zhǔn)確地在海量的信息資訊和應(yīng)用資源中找到自已需要的信息或者應(yīng)用入口。針對(duì)無線城市作為城市綜合門戶平臺(tái)的特點(diǎn),包含的信息資訊新聞等非常廣泛,本文在基于關(guān)鍵字全文搜索的基礎(chǔ)上對(duì)搜索結(jié)果進(jìn)行優(yōu)化,把搜索結(jié)果基于應(yīng)用和信息資訊自動(dòng)分類,使得用戶通過簡(jiǎn)單的關(guān)鍵字搜索,可以快捷進(jìn)入查找的業(yè)務(wù)應(yīng)用或者查看包含關(guān)鍵字的具體內(nèi)容,進(jìn)而可以極大提高無線城市用戶群的使用體驗(yàn),進(jìn)一步提高各行各業(yè)的生產(chǎn)效率。 本文構(gòu)建的全文搜索系統(tǒng)是對(duì)全文搜索引擎Lucene的二次開發(fā),通過調(diào)研目前已上線無線城市的情況,總結(jié)現(xiàn)有無線城市站內(nèi)全文搜索系統(tǒng)的缺點(diǎn),提出本文的設(shè)計(jì)目標(biāo)并完成系統(tǒng)總體結(jié)構(gòu)設(shè)計(jì),在此基礎(chǔ)上分模塊進(jìn)行詳細(xì)需求分析和功能分析給出功能流程圖和實(shí)現(xiàn)過程;針對(duì)無線城市面向領(lǐng)域廣泛新詞出現(xiàn)頻率高的特點(diǎn),在對(duì)已有中文分詞算法研究的基礎(chǔ)上,將機(jī)械分詞與統(tǒng)計(jì)分詞相結(jié)合,提出一種引入動(dòng)態(tài)詞庫(kù)更新的中文分詞架構(gòu);改進(jìn)Lucene建立索引的過程,實(shí)現(xiàn)配置化建立索引的機(jī)制;設(shè)計(jì)實(shí)現(xiàn)了索引增量更新,以保證索引庫(kù)和無線城市業(yè)務(wù)數(shù)據(jù)庫(kù)的同步和一致性;搜索過程中利用Term Vector中的信息設(shè)計(jì)實(shí)現(xiàn)搜索結(jié)果的自動(dòng)分類;系統(tǒng)展現(xiàn)部分最終提供給用戶一個(gè)簡(jiǎn)潔并具有良好用戶體驗(yàn)的搜索界面,通過搜索詞聯(lián)想功能對(duì)用戶的搜索進(jìn)行相關(guān)提示。 最后,論文對(duì)基于Lucene構(gòu)建的無線城市站內(nèi)全文搜索系統(tǒng)的運(yùn)行效果給出整體運(yùn)行結(jié)果,對(duì)完成的研究工作進(jìn)行總結(jié),并提出無線城市下一步發(fā)展的方向和系統(tǒng)的改進(jìn)目標(biāo)。
[Abstract]:With the rapid development of mobile Internet and wireless access technology, wireless cities provide the possibility of sharing wireless information platform for citizens, enterprises and governments. Access to wireless network services at anytime and anywhere has become a very important aspect of modern life and office. The wireless city is the product of the social development of new technology. It will greatly influence people's life and work and promote the further development of the economy and society. Wireless cities coordinate and unify various information resources, integrate various kinds of business and applications, provide users with a convenient platform for searching information resources with massive information and application of the Internet, but look at the cities and cities that have been pilot operating in wireless cities. The comprehensive portal website, but lacks the full text search entrance which can allow the user to find the resource quickly in the mass information application, can not satisfy the user deep, fast, accurate query demand. The existing wireless city station search function only checks the news information in the station, and does not conform to the concept of full text search.
Based on the above background, the design and implementation of a full-text search system in the wireless city station is proposed in this paper to help the wireless city users find their own information or application entrance quickly and accurately in the mass information and application resources. The information contained in the wireless city as a city integrated portal platform. Information news and so on are very extensive. This paper optimizes the search results based on the full text search, and classifying the search results based on the application and information information automatically, so that the users can quickly enter the search business application or see the specific content containing the keywords through a simple keyword search. Greatly improve the experience of wireless city user groups, and further improve the efficiency of all walks of life.
The full text search system constructed in this paper is the two development of the full text search engine Lucene. Through the investigation of the existing wireless city, the shortcomings of the full text search system in the existing wireless city station are summarized, the design goal of this paper is put forward and the overall structure of the system is completed. On this basis, the detailed requirements analysis and analysis are carried out in the module. The function analysis gives the function flow chart and the implementation process. In view of the characteristics of the high frequency of the new words in the wireless city, based on the study of the existing Chinese word segmentation algorithm, the paper combines the mechanical participle and the statistical word segmentation, and proposes a Chinese word segmentation architecture which introduces the dynamic word library updating, and improves the Lucene index. In order to ensure the synchronization and consistency of index database and wireless city business database, the design realizes the automatic classification of search results by using information design in Term Vector in the search process. Good user experience search interface, through search word association function to user search related hints.
Finally, the whole operation results of the full text search system in wireless city station based on Lucene are given, the completed research work is summarized, and the direction of the next development of the wireless city and the improvement target of the system are put forward.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 羅智勇;宋柔;;現(xiàn)代漢語通用分詞系統(tǒng)中歧義切分的實(shí)用技術(shù)[J];計(jì)算機(jī)研究與發(fā)展;2006年06期
2 譚瓊,史忠植;分詞中的歧義處理[J];計(jì)算機(jī)工程與應(yīng)用;2002年11期
3 應(yīng)志偉,柴佩琪,陳其暉;文語轉(zhuǎn)換系統(tǒng)中基于語料的漢語自動(dòng)分詞研究[J];計(jì)算機(jī)應(yīng)用;2000年02期
4 齊文新;謝軍;熊濤;;基于Ajax技術(shù)即時(shí)通訊系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];計(jì)算機(jī)與數(shù)字工程;2007年07期
5 姚天順,張桂平,吳映明;基于規(guī)則的漢語自動(dòng)分詞系統(tǒng)[J];中文信息學(xué)報(bào);1990年01期
6 梁南元;漢語計(jì)算機(jī)自動(dòng)分詞知識(shí)[J];中文信息學(xué)報(bào);1990年02期
7 駱正清,陳增武,胡上序;一種改進(jìn)的MM分詞方法的算法設(shè)計(jì)[J];中文信息學(xué)報(bào);1996年03期
8 黃昌寧;趙海;;中文分詞十年回顧[J];中文信息學(xué)報(bào);2007年03期
9 馬志強(qiáng);劉利民;蘇依拉;馬瑞明;;基于Lucene的站內(nèi)搜索引擎研究[J];內(nèi)蒙古工業(yè)大學(xué)學(xué)報(bào)(自然科學(xué)版);2009年01期
10 韓維良;漢語自動(dòng)分詞系統(tǒng)中切分歧義與未登錄詞的處理策略[J];青海師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2004年02期
,本文編號(hào):1941511
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1941511.html