聚焦搜索引擎研究及其在社區(qū)信息化中的應(yīng)用
發(fā)布時間:2018-11-01 20:02
【摘要】:“云計(jì)算”作為一種全新的商業(yè)模式,是在2006年由Google提出的。它的提出為產(chǎn)業(yè)界和學(xué)術(shù)界提供了一個全新的思路。山東大學(xué)信息科學(xué)與工程學(xué)院袁東風(fēng)教授團(tuán)隊(duì)迅速抓住了這一機(jī)遇,在基于云計(jì)算的新型信息化模式方面展開了深入研究并取得了階段性成果。該團(tuán)隊(duì)已經(jīng)得到了兩個山東省自主創(chuàng)新成果轉(zhuǎn)化重大專項(xiàng)的支持,本文課題就是來源于第二個重大專項(xiàng)“低成本、低耗能、高可靠嵌入式終端與信息服務(wù)平臺”(2010ZHZX1A1001)。 在國家推行城鎮(zhèn)化的大趨勢下,針對農(nóng)村改造成社區(qū)并實(shí)行規(guī)模經(jīng)營和集體經(jīng)濟(jì)已經(jīng)開始啟動。山東省農(nóng)村改造工作取得了較快的發(fā)展,本課題所屬的重大專項(xiàng)選擇的試點(diǎn)地區(qū)就是一個農(nóng)村改造成社區(qū)的典型。社區(qū)信息化建設(shè)也成為信息化建設(shè)非常重要的一部分,在《2006-2020年國家信息化發(fā)展戰(zhàn)略》中,將推進(jìn)社區(qū)信息化建設(shè)列為我國信息化發(fā)展的戰(zhàn)略重點(diǎn)之一。本項(xiàng)目團(tuán)隊(duì)在這樣的背景下,展開了信息化關(guān)鍵技術(shù)研究,提出了“云計(jì)算服務(wù)器+寬帶網(wǎng)+瘦客戶端”這種完全摒棄PC的全新信息化模式。項(xiàng)目團(tuán)隊(duì)研發(fā)并批量生產(chǎn)了基于嵌入式架構(gòu)的瘦客戶端,成本和功耗都降低到了一個很低的水平;研發(fā)了云計(jì)算服務(wù)器集群,并針對社區(qū)用戶的調(diào)查結(jié)果開發(fā)了用戶關(guān)注的應(yīng)用和信息服務(wù)。用這種模式取代傳統(tǒng)的以PC為核心的信息化道路,展開了大規(guī)模的試點(diǎn)示范,并取得了良好的效果。 針對目標(biāo)用戶的使用要求,結(jié)合新型社區(qū)信息化模式的特點(diǎn),本文設(shè)計(jì)實(shí)現(xiàn)了針對淘寶購物的聚焦搜索引擎,為社區(qū)信息化用戶提供方便快捷的購物搜索和推薦。針對淘寶網(wǎng)商品種類繁多的特點(diǎn),設(shè)計(jì)實(shí)現(xiàn)了商品通用模型,達(dá)到新增商品的時候不用大規(guī)模更新數(shù)據(jù)表的效果。系統(tǒng)設(shè)計(jì)了網(wǎng)絡(luò)爬蟲和信息搜索兩大模塊,其中網(wǎng)絡(luò)爬蟲模塊實(shí)現(xiàn)了淘寶網(wǎng)商品信息抓取、索引文件的建立和商品詳細(xì)信息存入數(shù)據(jù)庫等操作,信息檢索模塊實(shí)現(xiàn)了用戶關(guān)鍵字查詢接口、索引文件查詢和數(shù)據(jù)庫查詢等,為用戶提供搜索結(jié)果列表顯示、詳細(xì)信息展示和信息推薦。 在爬蟲模塊,為了應(yīng)對海量數(shù)據(jù)的抓取效率問題,運(yùn)用java語言實(shí)現(xiàn)了基于hadoop的分布式網(wǎng)絡(luò)爬蟲。本文首先在ubuntu9.10操作系統(tǒng)下搭建了hadoop分布式環(huán)境,然后設(shè)計(jì)了針對hadoop的分布式爬蟲程序,實(shí)現(xiàn)了對淘寶網(wǎng)數(shù)據(jù)的抓。煌ㄟ^設(shè)計(jì)數(shù)據(jù)存儲策略實(shí)現(xiàn)了索引文件的建立;優(yōu)化了緩存策略,減少了物理空間占用率;針對淘寶網(wǎng)的數(shù)據(jù)特點(diǎn),設(shè)計(jì)了信息提取方法并實(shí)現(xiàn)了商品詳細(xì)信息存入數(shù)據(jù)庫的操作;針對網(wǎng)絡(luò)情況可能造成的系統(tǒng)運(yùn)行異常,設(shè)計(jì)了日志存儲規(guī)則;系統(tǒng)設(shè)計(jì)了用戶操作界面,可以對數(shù)據(jù)的抓取規(guī)則進(jìn)行設(shè)置。 在搜索模塊,實(shí)現(xiàn)了基于瀏覽器的信息搜索功能。搜索程序的核心是一個J2EE工程,它實(shí)現(xiàn)了索引文件查詢和數(shù)據(jù)庫查詢。系統(tǒng)首先實(shí)現(xiàn)了運(yùn)行環(huán)境配置功能,針對系統(tǒng)運(yùn)行的參數(shù)進(jìn)行設(shè)定;通過前臺頁面實(shí)現(xiàn)了用戶查詢接口,并對關(guān)鍵字進(jìn)行索引文件的檢索,得到目標(biāo)關(guān)鍵字的商品集合;根據(jù)商品集合中的數(shù)據(jù)庫入口信息,結(jié)合數(shù)據(jù)庫查詢得到結(jié)果集合;針對目標(biāo)用戶對價格敏感的特點(diǎn),實(shí)現(xiàn)了對結(jié)果集進(jìn)行價格排序;實(shí)現(xiàn)了商品詳細(xì)信息的查詢,可以顯示商品價格、標(biāo)題、描述信息、價格曲線,并且就相近價格區(qū)間的商品進(jìn)行推薦。
[Abstract]:Cloud Computing As a brand-new business model, it was proposed by Google in 2006. It offers a brand-new idea for industry and academia. The team of Dong Feng of Shandong University School of Technology and Engineering grasped this opportunity quickly, and carried out an in-depth study on the new information model based on cloud computing and made a phased achievement. The team has received the support of the transformation of the independent innovation achievements of two Shandong provinces. This paper aims to come from the second major special project Low-cost, low-consumption, high-reliability embedded terminal and information service platform (2010ZHZX1A1001). In the large trend of the country's urbanization, it has started to transform the countryside into a community and carry out large-scale operation and collective economy Starting with the rapid development of rural reconstruction in Shandong Province, the pilot area of the major special choice to which this project belongs is a rural transformation into a community. The construction of community informatization is also a very important part of informatization construction. In the National Information Development Strategy of 2006-2020, the construction of information construction of the community is listed as the strategic focus of China's information development. 1. In this background, the project team expands the key technology research of informatization, "Cloud Computing Server + Broadband Network + Thin Guest" is proposed Household End "This completely abandoned PC's brand-new informatization Pattern. The project team developed and mass-produced thin clients based on embedded architecture, reduced costs and power consumption to a very low level; developed cloud computing server clusters and developed user-focused applications and information for community users' findings Service. With this model, replace the traditional PC-centric informatization road, carry out a large-scale pilot demonstration, and have achieved good results According to the requirements of the target users and the characteristics of the new community information model, this paper designs a focus search engine for Taobao shopping, and provides convenient and convenient shopping for the community information users. Search and recommend. Aiming at the characteristics of the variety of products of Taobao, the general model of commodity is designed and realized, and the number of large-scale updating is not used when new goods are added. According to the effect of the table, the network crawler and the information searching module are designed in the system, wherein the network crawler module realizes the operation of the information retrieval module of the Taobao network, the establishment of the index file and the storage of the commodity detailed information into the database, and the information retrieval module realizes the key of the user. a word query interface, an index file query and a database query, and the like, provides a search result list display for a user, and detailed information display and information recommendation. In the crawler module, in order to deal with the grabbing efficiency of mass data, the java language is used to implement hadoop. In this paper, we set up the hadoop distributed environment under the operating system of ubuntu 9. 10, then designed the distributed crawler program directed to hadoop, which realized the grasping of the data of Taobao, and realized the establishment of the index file through the design data storage strategy. The caching strategy is optimized, the physical space occupation rate is reduced, the information extracting method is designed according to the data characteristics of the Taobao network, the operation of the commodity detailed information in the database is realized, the system running exception possibly caused by the network situation is abnormal, the log storage rule is designed, and the system is arranged. The user's operation interface is counted, which can be used for data. The capture rule is set. Based on the search module, the base is implemented. The core of the search program is a J2EE project, which realizes the information search function of the browser. The system firstly realizes the operation environment configuration function, sets the parameters for the system operation, realizes the user query interface through the foreground page, and indexes the keyword to search the index file to obtain the commodity collection of the target keyword; and according to the commodity, The database entry information in the collection is combined with the database query to obtain a result set; the price ordering is realized for the result set aiming at the characteristic of the target user on the price; the query of the commodity detailed information can be realized, and the commodity price and the mark can be displayed. Problem, description information, price curve, and simila
【學(xué)位授予單位】:山東大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.3
本文編號:2304952
[Abstract]:Cloud Computing As a brand-new business model, it was proposed by Google in 2006. It offers a brand-new idea for industry and academia. The team of Dong Feng of Shandong University School of Technology and Engineering grasped this opportunity quickly, and carried out an in-depth study on the new information model based on cloud computing and made a phased achievement. The team has received the support of the transformation of the independent innovation achievements of two Shandong provinces. This paper aims to come from the second major special project Low-cost, low-consumption, high-reliability embedded terminal and information service platform (2010ZHZX1A1001). In the large trend of the country's urbanization, it has started to transform the countryside into a community and carry out large-scale operation and collective economy Starting with the rapid development of rural reconstruction in Shandong Province, the pilot area of the major special choice to which this project belongs is a rural transformation into a community. The construction of community informatization is also a very important part of informatization construction. In the National Information Development Strategy of 2006-2020, the construction of information construction of the community is listed as the strategic focus of China's information development. 1. In this background, the project team expands the key technology research of informatization, "Cloud Computing Server + Broadband Network + Thin Guest" is proposed Household End "This completely abandoned PC's brand-new informatization Pattern. The project team developed and mass-produced thin clients based on embedded architecture, reduced costs and power consumption to a very low level; developed cloud computing server clusters and developed user-focused applications and information for community users' findings Service. With this model, replace the traditional PC-centric informatization road, carry out a large-scale pilot demonstration, and have achieved good results According to the requirements of the target users and the characteristics of the new community information model, this paper designs a focus search engine for Taobao shopping, and provides convenient and convenient shopping for the community information users. Search and recommend. Aiming at the characteristics of the variety of products of Taobao, the general model of commodity is designed and realized, and the number of large-scale updating is not used when new goods are added. According to the effect of the table, the network crawler and the information searching module are designed in the system, wherein the network crawler module realizes the operation of the information retrieval module of the Taobao network, the establishment of the index file and the storage of the commodity detailed information into the database, and the information retrieval module realizes the key of the user. a word query interface, an index file query and a database query, and the like, provides a search result list display for a user, and detailed information display and information recommendation. In the crawler module, in order to deal with the grabbing efficiency of mass data, the java language is used to implement hadoop. In this paper, we set up the hadoop distributed environment under the operating system of ubuntu 9. 10, then designed the distributed crawler program directed to hadoop, which realized the grasping of the data of Taobao, and realized the establishment of the index file through the design data storage strategy. The caching strategy is optimized, the physical space occupation rate is reduced, the information extracting method is designed according to the data characteristics of the Taobao network, the operation of the commodity detailed information in the database is realized, the system running exception possibly caused by the network situation is abnormal, the log storage rule is designed, and the system is arranged. The user's operation interface is counted, which can be used for data. The capture rule is set. Based on the search module, the base is implemented. The core of the search program is a J2EE project, which realizes the information search function of the browser. The system firstly realizes the operation environment configuration function, sets the parameters for the system operation, realizes the user query interface through the foreground page, and indexes the keyword to search the index file to obtain the commodity collection of the target keyword; and according to the commodity, The database entry information in the collection is combined with the database query to obtain a result set; the price ordering is realized for the result set aiming at the characteristic of the target user on the price; the query of the commodity detailed information can be realized, and the commodity price and the mark can be displayed. Problem, description information, price curve, and simila
【學(xué)位授予單位】:山東大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前7條
1 劉磊安;符志強(qiáng);;基于Lucene.net網(wǎng)絡(luò)爬蟲的設(shè)計(jì)與實(shí)現(xiàn)[J];電腦知識與技術(shù);2010年08期
2 肖瓏;元數(shù)據(jù)格式在數(shù)字圖書館中的應(yīng)用[J];大學(xué)圖書館學(xué)報;1999年04期
3 閻琦;;通用電子商品售后維修管理模塊的建模與實(shí)現(xiàn)[J];信息技術(shù);2012年09期
4 馬宏遠(yuǎn);王斌;;基于用戶特性的搜索引擎查詢結(jié)果緩存與預(yù)取[J];中文信息學(xué)報;2012年06期
5 胡晟;;基于網(wǎng)絡(luò)爬蟲的Web挖掘應(yīng)用[J];軟件;2012年07期
6 黨飛;江銘炎;袁東風(fēng);;基于KVM的B/S架構(gòu)虛擬化管理系統(tǒng)[J];計(jì)算機(jī)工程與設(shè)計(jì);2013年06期
7 梁弼;王光瓊;鄧小清;;基于Lucene的全文檢索系統(tǒng)模型的研究及應(yīng)用[J];微型機(jī)與應(yīng)用;2011年01期
相關(guān)碩士學(xué)位論文 前1條
1 陳玉鵬;基于語義網(wǎng)的web信息檢索研究[D];吉林大學(xué);2008年
,本文編號:2304952
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2304952.html
最近更新
教材專著