基于數(shù)據(jù)挖掘的淘寶商品競爭力分析系統(tǒng)的設(shè)計與實現(xiàn)
發(fā)布時間:2018-01-29 16:27
本文關(guān)鍵詞: 網(wǎng)絡(luò)爬蟲 數(shù)據(jù)挖掘 電子商務(wù) 決策支持 出處:《山東大學(xué)》2015年碩士論文 論文類型:學(xué)位論文
【摘要】:當(dāng)今社會,科技發(fā)展日新月異,網(wǎng)絡(luò)科技的普及為新興的電子商務(wù)交易注入了嶄新的活力,不僅減少了人力物力的損耗,而且創(chuàng)造了巨額的商業(yè)利潤。在電子商務(wù)交易中,顧客通過網(wǎng)絡(luò)購買商品,只能瀏覽網(wǎng)頁圖片信息介紹,不能實際觀察商品,對電子商務(wù)交易的真實性容易產(chǎn)生懷疑。在這個過程中,一方面需要網(wǎng)絡(luò)政策法規(guī)的約束,另一方面也要對交易商品進行仔細(xì)對比和審查。與此同時,商戶也需要對出售的商品進行分析,了解同行業(yè)商品的優(yōu)劣和價格走向,根據(jù)市場信息做出商業(yè)決策。但是目前網(wǎng)絡(luò)上商品種類繁多,數(shù)量巨大,對商品信息的獲取并非易事。本文主要通過網(wǎng)絡(luò)爬蟲的方法采集在網(wǎng)絡(luò)交易中具有代表性的天貓商城在售商品信息,并對商品各種信息進行分析和判斷,得到?jīng)Q策支持,從而滿足交易雙方的需求。本系統(tǒng)主要是應(yīng)用網(wǎng)絡(luò)爬蟲技術(shù)對天貓商城在線商品信息進行采集,并對采集的網(wǎng)頁信息進行篩選,提取出商品類別、商品標(biāo)簽、商品品牌、商品詳細(xì)介紹、評論信息及店鋪介紹等信息,并將這些數(shù)據(jù)經(jīng)過系統(tǒng)處理,然后存入數(shù)據(jù)庫中。采用主題爬蟲的方式進行數(shù)據(jù)采集,有利于頁面信息的分析,方便URL鏈接的定位,減少提取次數(shù)提高提取信息的效率,而且可以靈活設(shè)定提取開始的類別及層次,能夠有效保證采集數(shù)據(jù)的真實性和實時性。對于采集的數(shù)據(jù)主要通過數(shù)據(jù)挖掘的方式進行分析。通過對商品介紹、用戶評價、店鋪介紹等信息的整理和分析,可以找出銷售、價格、發(fā)貨等方面的排名情況,進而可以分析商品的潛在價值,獲取該商品的競爭力排名,可以成為買家置信商品理論依據(jù)。更深的層次可以挖掘交易雙方的交易行為等情況,方便商家做出策略調(diào)整以提高銷售量或應(yīng)時上新。目前本系統(tǒng)可以正常運行,但工作效率隨著網(wǎng)絡(luò)數(shù)據(jù)的增加還有提升空間。未來的發(fā)展方向是多線程并發(fā),大規(guī)模數(shù)據(jù)采集和整理。下一步工作將著重發(fā)展多線程數(shù)據(jù)采集,對更新數(shù)據(jù)的定點提取方面進行完善,并加強分析及數(shù)據(jù)挖掘的范圍與深度。
[Abstract]:Nowadays, with the rapid development of science and technology, the popularity of network technology has injected new vitality into the emerging e-commerce transactions, not only reducing the loss of manpower and material resources. And created a huge commercial profit. In e-commerce transactions, customers through the network to buy goods, can only browse the web page picture information introduction, can not actually observe the goods. It is easy to doubt the authenticity of electronic commerce transactions. In this process, on the one hand, we need network policies and regulations, on the other hand, we should carefully compare and examine the traded commodities. At the same time. Merchants also need to analyze the goods sold to understand the advantages and disadvantages of the same industry commodities and price trends, according to market information to make business decisions. But at present, there are many kinds of goods on the network, and the quantity is huge. It is not easy to obtain commodity information. In this paper, we collect the representative information of Tmall in the online transaction through the method of web crawler, and analyze and judge all kinds of commodity information. This system mainly uses web crawler technology to collect Tmall online commodity information and screen the collected web page information. Extract the category, label, brand, detailed description, comment information and shop description, and process these data through the system. Then stored in the database. Using the method of topic crawler to collect data, which is conducive to the analysis of page information, facilitate the location of URL links, reduce the number of times of extraction to improve the efficiency of information extraction. And can flexibly set the beginning of the extraction of categories and levels, can effectively ensure the authenticity of the collected data and real-time. For the collected data mainly through the way of data mining analysis, through the introduction of commodities. User evaluation, shop introduction and other information collation and analysis, can find out the sales, prices, shipping and other aspects of the ranking, and then can analyze the potential value of goods, obtain the competitiveness of the goods ranking. It can be used as the theoretical basis for the buyer to believe in the commodity. A deeper level can be used to excavate the transaction behavior of both sides of the transaction. It is convenient for the merchant to adjust the strategy to increase the sales volume or update the time. At present, the system can run normally, but with the increase of network data, there is still room for improvement. The future development direction is multithreading and concurrency. The next work will focus on the development of multithreaded data acquisition, improve the fixed-point extraction of updated data, and strengthen the scope and depth of analysis and data mining.
【學(xué)位授予單位】:山東大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2015
【分類號】:TP391.1;F724.6
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 陳莉萍;利用UML的面向?qū)ο筌浖こ探J];渭南師范學(xué)院學(xué)報;2004年S1期
,本文編號:1473831
本文鏈接:http://sikaile.net/jingjilunwen/guojimaoyilunwen/1473831.html
最近更新
教材專著