手機(jī)產(chǎn)品信息垂直搜索引擎的研究
發(fā)布時(shí)間:2018-04-14 17:31
本文選題:主題相關(guān)度 + 網(wǎng)絡(luò)爬蟲; 參考:《湖南工業(yè)大學(xué)》2013年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)技術(shù)的飛躍,電子商務(wù)的蓬勃發(fā)展,論壇,博客等的興起,越來越多的人們喜歡針對商品的各種屬性發(fā)表自己的評論,表達(dá)自己對某款商品的態(tài)度、看法以及使用感受。因此網(wǎng)絡(luò)上涌現(xiàn)了海量產(chǎn)品評論信息。通過閱讀這些評論,可以幫助潛在的商品購買者了解產(chǎn)品的特點(diǎn),作出是否購買的決策,另外商家也可以通過挖掘這些評論信息及時(shí)有效的了解到商品的供求關(guān)系,受歡迎程度,給自己的銷售決策提供很大的幫助。但是僅僅依靠人工的瀏覽、收集這些信息是費(fèi)時(shí)費(fèi)力的,而且獲得的信息不夠全面、及時(shí)、有效,因此人們在搜索信息時(shí)越來越依靠搜索引擎。但是針對具體領(lǐng)域,通用搜索引擎的缺點(diǎn)顯而易見,因此構(gòu)建一款針對具體產(chǎn)品領(lǐng)域的垂直搜索引擎是十分必要的。 在對國內(nèi)外垂直搜索引擎及情感分類研究現(xiàn)狀的分析基礎(chǔ)上,本文以構(gòu)建手機(jī)產(chǎn)品信息垂直搜索引擎為線索,所做的主要工作如下: (1)設(shè)計(jì)了針對手機(jī)產(chǎn)品領(lǐng)域的主題爬蟲框架,在爬蟲搜索策略上,深入研究了傳統(tǒng)的基于內(nèi)容的搜索策略和基于鏈接的搜索策略后,改進(jìn)了一種基于內(nèi)容和基于鏈接相結(jié)合的搜索策略,使爬蟲爬取到的網(wǎng)頁主題相關(guān)程度大大增加,方便了構(gòu)建垂直搜索引擎的后續(xù)步驟。同時(shí)通過實(shí)驗(yàn)對比了HITS算法,,寬度優(yōu)先算法,PageRank算法,顯示了本文算法的優(yōu)勢。 (2)在獲取手機(jī)產(chǎn)品屬性和情感詞之后,提出了一種屬性詞和情感詞的搭配識別方法,通過SVM訓(xùn)練分類器,有效的獲取評論中針對產(chǎn)品的某個(gè)屬性的情感傾向得分,然后綜合某一手機(jī)型號的所有評論信息給出總體滿意度。通過實(shí)驗(yàn)對比,驗(yàn)證了搭配方法的有效性。 (3)設(shè)計(jì)實(shí)現(xiàn)了一個(gè)針對手機(jī)產(chǎn)品信息的垂直搜索引擎,給出了設(shè)計(jì)的框架,并對各個(gè)模塊的實(shí)現(xiàn)進(jìn)行描述,給出了系統(tǒng)界面。
[Abstract]:With the rapid development of Internet technology, e-commerce, forums, blogs and so on, more and more people like to express their attitude towards a certain product by commenting on the various attributes of the product.Perception and use of feelings.As a result, a large number of product reviews have emerged on the network.By reading these comments, you can help potential commodity buyers understand the characteristics of the product and make decisions about whether to buy or not. In addition, merchants can also find out the supply and demand relationship of the goods in a timely and effective manner by digging up these comments.Popularity, to their own sales decisions to provide a great help.But only relying on manual browsing to collect these information is time-consuming and laborious, and the information obtained is not comprehensive, timely and effective, so people rely more and more on search engine when searching for information.But for specific areas, the shortcomings of general search engine are obvious, so it is necessary to build a vertical search engine for specific product domain.Based on the analysis of the research status of vertical search engine and emotion classification at home and abroad, this paper takes the construction of vertical search engine of mobile phone product information as the clue, and the main work is as follows:In this paper, we design a topic crawler framework in the field of mobile phone products. In the crawler search strategy, we deeply study the traditional content-based search strategy and the linked based search strategy.A search strategy based on content and link is improved to increase the correlation degree of web topics crawled by crawlers and facilitate the subsequent steps of building vertical search engines.At the same time, the HITS algorithm and the width first algorithm are compared with the PageRank algorithm, which shows the advantages of this algorithm.(2) after obtaining mobile phone product attributes and affective words, a collocation recognition method of attribute words and affective words is proposed. By training the classifier with SVM, we can effectively obtain the scores of emotional tendency of a certain attribute of a product in a comment.Then give overall satisfaction by synthesizing all the comments on a mobile phone model.The validity of the collocation method is verified by experimental comparison.A vertical search engine for mobile phone product information is designed and implemented. The framework of the design is given. The implementation of each module is described and the system interface is given.
【學(xué)位授予單位】:湖南工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 秦鋒;任詩流;程澤凱;羅慧;;基于屬性加權(quán)的樸素貝葉斯分類算法[J];計(jì)算機(jī)工程與應(yīng)用;2008年06期
2 汪濤,樊孝忠;主題爬蟲的設(shè)計(jì)與實(shí)現(xiàn)[J];計(jì)算機(jī)應(yīng)用;2004年S1期
3 和文全;薛惠峰;解丹蕊;杜U
本文編號:1750349
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1750349.html
最近更新
教材專著