Sphinx站內(nèi)搜索技術(shù)在論壇產(chǎn)品中的應(yīng)用研究
本文選題:站內(nèi)全文搜索技術(shù) + phpwind論壇 ; 參考:《復(fù)旦大學(xué)》2012年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)的蓬勃發(fā)展,信息咨詢、社交類門戶、論壇已成為當(dāng)今資訊的主流載體。門戶論壇產(chǎn)品的市場需求量在日益增加,目前較流行的有disuz、 phpwind等。Phpwind是阿里巴巴公司近幾年主推的一款集應(yīng)用、論壇、社交、門戶為一體的軟件產(chǎn)品。本文作者經(jīng)歷了該論壇產(chǎn)品的多個(gè)版本開發(fā)與維護(hù)。過去老版本的phpwind論壇產(chǎn)品在高負(fù)載多并發(fā)壓力下進(jìn)行站內(nèi)搜索的性能效率非常低下,如客戶使用phpwind論壇產(chǎn)品建站后遇到日發(fā)帖量猛增等突發(fā)情況時(shí)經(jīng)常造成服務(wù)器癱瘓等嚴(yán)重后果。所以這些弊端造成了該論壇產(chǎn)品發(fā)展的巨大瓶頸,在市場激烈競爭中也一度處于略勢。 如何解決該論壇產(chǎn)品在大數(shù)據(jù)量高訪問負(fù)載壓力下又能提供快速精準(zhǔn)的搜索服務(wù);方便快捷地定位用戶所需要的資訊內(nèi)容的同時(shí)又能徹底釋放站內(nèi)搜索業(yè)務(wù)所帶來的負(fù)載壓力是本論文需要重點(diǎn)解決的問題之一。另外針對過去論壇產(chǎn)品不能進(jìn)行站內(nèi)準(zhǔn)確定位查詢也是本文需要重點(diǎn)解決的問題。 1.文本主要的研究成果: 本文通過在phpwind論壇產(chǎn)品內(nèi)部集成了全文搜索技術(shù)sphinx的應(yīng)用并結(jié)合搜索體系架構(gòu)方案圖解決了老phpwind論壇產(chǎn)品在垂直搜索時(shí)性能效率低下等問題。通過內(nèi)置多條件下分類信息聯(lián)動(dòng)查詢算法解決了老phpwind論壇產(chǎn)品在繁多商品信息數(shù)據(jù)負(fù)載壓力下不能準(zhǔn)確定位查詢的問題。 2.本文的創(chuàng)新點(diǎn): 本文在phpwind論壇產(chǎn)品中集成分布式sphinx搜索引擎技術(shù)解決了在高負(fù)載壓力下進(jìn)行搜索業(yè)務(wù)難與防止單點(diǎn)終端服務(wù)器掛機(jī)突發(fā)情況等諸多問題。同時(shí)本文還設(shè)計(jì)了論壇內(nèi)存索引處理機(jī)制與隊(duì)列控制技術(shù)在增加索引處理與響應(yīng)時(shí)間方面具有一定意義。 3.需求分析設(shè)計(jì)所達(dá)到的效果: 通過單點(diǎn)sphinx終端技術(shù)結(jié)合搜索體系新架構(gòu)圖實(shí)現(xiàn)了在MYSQL下的百萬級(jí)數(shù)據(jù)量論壇站內(nèi)查詢時(shí),響應(yīng)與返回?cái)?shù)據(jù)結(jié)果時(shí)間控制在毫秒級(jí),采用分布式sphinx搜索引擎技術(shù)做到了同樣在百萬級(jí)數(shù)據(jù)量站內(nèi)進(jìn)行查詢響應(yīng)與返回?cái)?shù)據(jù)結(jié)果時(shí)間控制在微妙級(jí)。最后通過在論壇產(chǎn)品中新增分類信息聯(lián)動(dòng)查詢算法實(shí)現(xiàn)了在多條件下進(jìn)行準(zhǔn)確定位查詢的效果,同時(shí)也解決了老phpwind論壇產(chǎn)品搜索形式單一的問題。
[Abstract]:With the vigorous development of the Internet, information consultation, social portals and forums have become the mainstream carrier of today's information. The market demand of portal forum products is increasing day by day. At present, the popular ones are disuz, phpwind and so on. Phpwind is a kind of software product which is mainly promoted by Alibaba Company in recent years. The author has experienced the development and maintenance of several versions of the forum product. In the past, older versions of phpwind forum products were very inefficient in performing in-station searches under high load and multiple concurrent pressures. If customers use phpwind forum products to set up a site after meeting with a surge in daily posting and other emergencies, often resulting in server paralysis and other serious consequences. So these malpractices caused a huge bottleneck in the development of the Forum products, in the fierce competition in the market was also in a strategic position. How to solve the problem that the forum product can provide fast and accurate search service under the pressure of large amount of data and high access load; One of the key problems in this paper is to locate the information content needed by the user conveniently and quickly, and at the same time to completely release the load pressure brought by the search service in the station. In addition, the past forum products can not be accurately located in the site query is also the focus of this paper to solve the problem. 1. The main research results of the text are as follows: In this paper, the application of full-text search technology sphinx is integrated within the phpwind forum product, and the problem of inefficient performance of the old phpwind forum product in vertical search is solved by combining with the architecture diagram of the search system. This paper solves the problem that the old phpwind forum products can not locate the query accurately under the pressure of many kinds of commodity information data by using the interlinked query algorithm of classified information under the condition of built-in multi-condition. 2. The innovations of this paper are as follows: This paper integrates the distributed sphinx search engine technology into the phpwind forum product to solve many problems such as the difficulty of searching under high load pressure and the prevention of single point terminal server hanging up sudden situation and so on. At the same time, this paper also designs the forum memory index processing mechanism and queue control technology, which has a certain significance in increasing index processing and response time. 3. Requirements analysis design results: In this paper, the single point sphinx terminal technology combined with the new architecture diagram of the search system is used to realize the response and return data result time control in millisecond level when querying in the multi-level data forum station under MYSQL. The distributed sphinx search engine technology is used to control the time of query response and return data in subtle order. Finally, by adding the classification information linkage query algorithm to the forum product, the effect of accurate location query under multiple conditions is realized. At the same time, the problem of single search form of the old phpwind forum product is also solved.
【學(xué)位授予單位】:復(fù)旦大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前9條
1 曹元大,賀海軍,涂哲明;中文Web文檔全文檢索系統(tǒng)的設(shè)計(jì)及實(shí)現(xiàn)[J];北京理工大學(xué)學(xué)報(bào);2002年01期
2 王繼成,蕭嶸,孫正興,張福炎;Web信息檢索研究進(jìn)展[J];計(jì)算機(jī)研究與發(fā)展;2001年02期
3 張衛(wèi)豐;徐寶文;周曉宇;許蕾;李東;;Web搜索引擎綜述[J];計(jì)算機(jī)科學(xué);2001年09期
4 宛玲,楊秀丹,杜曉靜;試析中文搜索引擎的評(píng)價(jià)標(biāo)準(zhǔn)[J];情報(bào)科學(xué);2000年01期
5 曾劍平;吳承榮;龔凌暉;;面向分布式搜索引擎的索引庫動(dòng)態(tài)維護(hù)算法[J];山東大學(xué)學(xué)報(bào)(理學(xué)版);2011年05期
6 胡駿;李星;;校園網(wǎng)信息資源搜索引擎的研究與實(shí)現(xiàn)[J];計(jì)算機(jī)工程與設(shè)計(jì);2006年24期
7 徐建華,伍憲,胡燕菘;國外六個(gè)著名搜索引擎的特征和評(píng)析[J];現(xiàn)代圖書情報(bào)技術(shù);2001年01期
8 王香蓮;Google和百度兩種搜索引擎比較研究[J];現(xiàn)代圖書情報(bào)技術(shù);2004年08期
9 楊杰,徐煒民;搜索引擎原型系統(tǒng)的研究與設(shè)計(jì)[J];小型微型計(jì)算機(jī)系統(tǒng);2002年10期
,本文編號(hào):1966953
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1966953.html