某醫(yī)藥流通企業(yè)銷售數(shù)據(jù)的統(tǒng)計(jì)特征提取K-means聚類的實(shí)證研究
本文關(guān)鍵詞:某醫(yī)藥流通企業(yè)銷售數(shù)據(jù)的統(tǒng)計(jì)特征提取K-means聚類的實(shí)證研究 出處:《華南理工大學(xué)》2015年碩士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 銷售數(shù)據(jù) 統(tǒng)計(jì)特征 指標(biāo)體系 聚類分析 R語(yǔ)言
【摘要】:某醫(yī)藥流通公司(簡(jiǎn)稱A公司)是一家集科研、生產(chǎn)、銷售為一體的大型醫(yī)藥流通企業(yè),每年所經(jīng)營(yíng)商品有六千多種,企業(yè)內(nèi)部缺乏統(tǒng)一、科學(xué)管理,采購(gòu)數(shù)量主要是靠經(jīng)驗(yàn)、憑感覺(jué),近年庫(kù)存積壓嚴(yán)重。A公司數(shù)據(jù)庫(kù)中擁有大量經(jīng)營(yíng)數(shù)據(jù),但目前數(shù)據(jù)利用率非常低,“數(shù)據(jù)豐富、信息匱乏”,A公司所經(jīng)營(yíng)商品種類繁多,異常銷量高,增加了銷售預(yù)測(cè)和統(tǒng)一管理的難度。本文以A公司2012年所有在營(yíng)商品每周銷量為研究對(duì)象,首先從A公司6837種商品的銷售數(shù)據(jù)的集中程度、波動(dòng)情況、分布狀況、銷售曲線形狀、盈利能力、季節(jié)性等維度對(duì)A公司所有在營(yíng)商品的統(tǒng)計(jì)特征進(jìn)行一次探索性分析,發(fā)現(xiàn)A公司商品平均銷量差異大、大異常值比例高、小異常值比例低、銷量為0的周數(shù)比例高,商品生命周期難以界定,盈利能力各不相同、部分商品銷量存在很強(qiáng)的季節(jié)性;贏公司商品銷售數(shù)據(jù)的集中程度、分布情況、異常情況、缺失情況、季節(jié)因素、盈利能力等方面,選取和構(gòu)建統(tǒng)計(jì)指標(biāo),并建立了一個(gè)銷售數(shù)據(jù)的特征指標(biāo)體系,該體系可幫助企業(yè)經(jīng)營(yíng)者快速掌握商品的銷售特性。進(jìn)一步,基于時(shí)間序列聚類的思想對(duì)特征指標(biāo)體系中的部分統(tǒng)計(jì)指標(biāo)進(jìn)行K-means聚類及分析。結(jié)果表明,所選統(tǒng)計(jì)指標(biāo)能較好地解釋聚類結(jié)果,各聚類所反映的商品銷售特征可為A公司商品經(jīng)營(yíng)提供一定的參考依據(jù)和數(shù)據(jù)支撐,為商品分類管理提供一個(gè)研究方向。本文基于箱線圖和變異系數(shù),引入箱線系數(shù)。箱線系數(shù)可以消除商品間箱線圖的尺寸和量綱的影響,均值結(jié)合箱線系數(shù),可以掌握商品間的銷售水平及分別情況。本文K-means算法中,k值的選取是首先將k值限定在n,n為樣本數(shù)。然后通過(guò)計(jì)算不同k值對(duì)應(yīng)的類間平方和總量與生成類的總體距離平方和的比值,將比值最大時(shí)對(duì)應(yīng)的k值作為聚類數(shù)目,該方法可以達(dá)到類內(nèi)緊湊、類間分離的效果。
[Abstract]:A pharmaceutical circulation company (referred to as "A company") is a large pharmaceutical circulation enterprise which integrates scientific research, production and sales. It operates more than 6,000 kinds of commodities every year, and lacks unity and scientific management within the enterprise. The purchase quantity mainly depends on the experience, according to the feeling, in recent years the stock backlog serious. A company database has the massive management data, but at present the data utilization ratio is very low, "the data is rich, the information is scarce". Company A has a wide variety of products and high sales volume, which increases the difficulty of sales forecasting and unified management. This paper takes the weekly sales volume of all the commodities in operation in 2012 as the research object. First, from the A company 6837 commodity sales data concentration, fluctuation, distribution, sales curve shape, profitability. Seasonality and other dimensions of all the company in operation of the statistical characteristics of a exploratory analysis, found that A company's average sales volume difference is large, the proportion of large outliers is high, the proportion of small abnormal value is low. Sales of the number of weeks is high, the commodity life cycle is difficult to define, profitability is different, some of the sales volume has a strong seasonality. Based on the concentration of A company's commodity sales data, distribution. Abnormal situation, missing situation, seasonal factors, profitability and other aspects, select and build a statistical index, and establish a sales data characteristic index system. The system can help business operators to quickly grasp the characteristics of the sale of goods. Based on the idea of time series clustering, the K-means clustering and analysis of some statistical indicators in the characteristic index system are carried out. The results show that the selected statistical indicators can better explain the clustering results. The commodity sales characteristics reflected by each cluster can provide a certain reference basis and data support for A company's commodity management, and provide a research direction for commodity classification management. This paper is based on box diagram and coefficient of variation. The box line coefficient can be used to eliminate the influence of the dimension and dimension of the box line diagram, and the mean value can be combined with the box line coefficient. This paper K-means algorithm in the selection of K value is the first to limit the value of k in n. N is the number of samples. Then, by calculating the ratio of the sum of square between classes corresponding to different k values and the total distance square sum of generated classes, the corresponding k value when the ratio is maximum is taken as the clustering number. This method can achieve the effect of compactness and separation between classes.
【學(xué)位授予單位】:華南理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類號(hào)】:F426.72;O212.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 馬志強(qiáng);系統(tǒng)的可觀測(cè)性與不良數(shù)據(jù)的再估計(jì)識(shí)別[J];電網(wǎng)技術(shù);1979年02期
2 王瑩;李仁旺;李斌;張志樂(lè);;基于CURE算法和C4.5決策樹(shù)的服裝銷售預(yù)測(cè)模型[J];紡織學(xué)報(bào);2008年09期
3 于劍,程乾生;模糊聚類方法中的最佳聚類數(shù)的搜索范圍[J];中國(guó)科學(xué)E輯:技術(shù)科學(xué);2002年02期
4 寧俊舉,王偉,于達(dá)仁;基于關(guān)聯(lián)規(guī)則的時(shí)延不良數(shù)據(jù)檢驗(yàn)PCA方法[J];節(jié)能技術(shù);2003年06期
5 毛韶陽(yáng);李肯立;;優(yōu)化K-means初始聚類中心研究[J];計(jì)算機(jī)工程與應(yīng)用;2007年22期
6 袁方;周志勇;宋鑫;;初始聚類中心優(yōu)化的k-means算法[J];計(jì)算機(jī)工程;2007年03期
7 菅志剛,金旭;數(shù)據(jù)挖掘中數(shù)據(jù)預(yù)處理的研究與實(shí)現(xiàn)[J];計(jì)算機(jī)應(yīng)用研究;2004年07期
8 賈澎濤;何華燦;劉麗;孫濤;;時(shí)間序列數(shù)據(jù)挖掘綜述[J];計(jì)算機(jī)應(yīng)用研究;2007年11期
9 趙偉;張姝;李文輝;;改進(jìn)K-means的空間聚類算法[J];計(jì)算機(jī)應(yīng)用研究;2008年07期
10 張小剛,章兢,陳華;模糊時(shí)間序列挖掘在復(fù)雜系統(tǒng)模糊建模中的應(yīng)用[J];控制理論與應(yīng)用;2002年06期
相關(guān)博士學(xué)位論文 前2條
1 楊風(fēng)召;高維數(shù)據(jù)挖掘中若干關(guān)鍵問(wèn)題的研究[D];復(fù)旦大學(xué);2003年
2 何曉旭;時(shí)間序列數(shù)據(jù)挖掘若干關(guān)鍵問(wèn)題研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2014年
相關(guān)碩士學(xué)位論文 前4條
1 馮超;K-means聚類算法的研究[D];大連理工大學(xué);2007年
2 吳曉蓉;K-均值聚類算法初始中心選取相關(guān)問(wèn)題的研究[D];湖南大學(xué);2008年
3 陳路瑩;高維數(shù)據(jù)的聚類分析方法研究及其應(yīng)用[D];廈門大學(xué);2009年
4 苗潤(rùn)華;基于聚類和孤立點(diǎn)檢測(cè)的數(shù)據(jù)預(yù)處理方法的研究[D];北京交通大學(xué);2012年
,本文編號(hào):1388248
本文鏈接:http://sikaile.net/kejilunwen/yysx/1388248.html