天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 軟件論文 >

k-means聚類算法的改進(jìn)研究及應(yīng)用

發(fā)布時(shí)間:2018-06-25 01:02

  本文選題:改進(jìn)k-means算法 + BWP指標(biāo)值; 參考:《蘭州交通大學(xué)》2017年碩士論文


【摘要】:數(shù)據(jù)挖掘是從大量、雜亂無(wú)章的數(shù)據(jù)中,提取到深層且有價(jià)值信息的過(guò)程。數(shù)據(jù)挖掘應(yīng)用涉及到多種技術(shù),主要包括聚類、分類、關(guān)聯(lián)以及預(yù)測(cè)控制等方面。其中,聚類分析是數(shù)據(jù)挖掘的一個(gè)重要方向,是一個(gè)把數(shù)據(jù)集對(duì)象劃分成不相容子集的過(guò)程。目前,聚類分析已經(jīng)廣泛地運(yùn)用于很多領(lǐng)域,如Web搜索、人工智能、信息檢索、圖像模式識(shí)別、空間數(shù)據(jù)庫(kù)技術(shù)和市場(chǎng)營(yíng)銷等。目前,被人們熟知且廣泛使用的聚類方法有:劃分方法、層次方法、基于密度的方法、基于網(wǎng)格的方法和基于概率模型的方法[1]。k-means算法是常用的劃分聚類算法,具有原理簡(jiǎn)單、便于理解和實(shí)現(xiàn)、能處理大數(shù)據(jù)集等優(yōu)點(diǎn)。給定訓(xùn)練數(shù)據(jù)集和聚類數(shù),該算法即可依據(jù)準(zhǔn)則函數(shù)將數(shù)據(jù)集迭代聚類,直到函數(shù)不再發(fā)生變化或達(dá)到約定的閾值為止。該算法的缺點(diǎn)主要有:聚類數(shù)需要事先給定,聚類結(jié)果對(duì)選取的初始中心點(diǎn)和數(shù)據(jù)集中的噪聲點(diǎn)敏感和聚類結(jié)果可能是局部最優(yōu)解等。本文主要針對(duì)k-means算法中聚類數(shù)需要事先給定、初始中心點(diǎn)的選取對(duì)聚類結(jié)果影響較大以及聚類結(jié)果對(duì)異常點(diǎn)敏感這三方面的缺點(diǎn)做出了相應(yīng)改進(jìn),提出了一種改進(jìn)的基于最大最小距離的k-means聚類算法。該算法在利用最大最小距離方法時(shí),先利用分治算法思想把參數(shù)值θ所在的理論區(qū)間分解成較小區(qū)間,在每一個(gè)小區(qū)間上選取一個(gè)數(shù)作為θ值,依據(jù)不同的θ值分別對(duì)數(shù)據(jù)集進(jìn)行聚類,去掉聚類效果不好的區(qū)間,然后利用連續(xù)屬性離散化的思想對(duì)剩余區(qū)間進(jìn)行離散,θ取遍離散化后的區(qū)間端點(diǎn)值,對(duì)數(shù)據(jù)集進(jìn)行聚類,利用95%的有序BWP指標(biāo)值的均值來(lái)衡量聚類結(jié)果,均值越大,說(shuō)明聚類效果越好,最大的均值對(duì)應(yīng)著最好的聚類結(jié)果。該改進(jìn)算法解決了k-means聚類算法的聚類數(shù)需要事先給定、對(duì)初始中心點(diǎn)的選取和異常點(diǎn)較敏感的問(wèn)題。為驗(yàn)證改進(jìn)算法的有效性,文章選取UCI數(shù)據(jù)庫(kù)中的三個(gè)數(shù)據(jù)集,并分別用不同的聚類算法進(jìn)行分析,結(jié)果表明改進(jìn)算法準(zhǔn)確率更高,具有更好的聚類效果。最后,文章選取浙江省杭州市部分電信用戶數(shù)據(jù)集為研究對(duì)象,一方面,利用傳統(tǒng)k-means算法、基于最大最小距離的k-means算法和改進(jìn)k-means算法分別對(duì)其進(jìn)行聚類分析,結(jié)果表明改進(jìn)算法聚類效果更好,類簇間差異更明顯;同時(shí),針對(duì)不同類別群體進(jìn)行特征總結(jié)分析,定義類別名稱,并制定差異化的營(yíng)銷方案,以此來(lái)提高行業(yè)服務(wù)質(zhì)量。另一方面,根據(jù)logistic建模步驟及方法,本文利用歷史數(shù)據(jù)訓(xùn)練logistic分類模型,對(duì)細(xì)分人群進(jìn)行流失率預(yù)測(cè),以便企業(yè)提前做好對(duì)流失用戶的挽留措施。
[Abstract]:Data mining is the process of extracting deep and valuable information from a lot of messy data. Data mining applications involve a variety of technologies, including clustering, classification, association and predictive control. Among them, clustering analysis is an important direction of data mining, and it is a process of dividing dataset objects into incompatible subsets. At present, clustering analysis has been widely used in many fields, such as Web search, artificial intelligence, information retrieval, image pattern recognition, spatial database technology and marketing. At present, the widely used clustering methods are as follows: partitioning method, hierarchical method, density-based method, grid-based method and probabilistic model-based method [1] .k-means algorithm. Easy to understand and implement, can deal with big data set and other advantages. Given the training data set and the clustering number, the algorithm can cluster the data set iteratively according to the criterion function until the function no longer changes or reaches the agreed threshold. The main disadvantages of this algorithm are that the number of clusters needs to be given beforehand, the clustering results are sensitive to the selected initial center points and the noise points in the data sets, and the clustering results may be local optimal solutions, etc. In this paper, the clustering number needs to be given in the k-means algorithm, the selection of the initial center has a great influence on the clustering results and the clustering results are sensitive to the outliers. An improved k-means clustering algorithm based on maximum and minimum distance is proposed. When the maximum and minimum distance method is used, the theoretical interval in which the parameter value 胃 is decomposed into smaller intervals, and a number is selected as the 胃 value in each interval. According to the different 胃 values, the data sets are clustered separately to remove the regions with poor clustering effect, then the remaining intervals are discretized by the idea of continuous attribute discretization, and the data sets are clustered according to the values of the end points of the interval after 胃 is discretized. The average value of 95% ordered BWP index is used to measure the clustering result. The larger the average value is, the better the clustering effect is, and the maximum mean value corresponds to the best clustering result. The improved algorithm solves the problem that the clustering number of k-means clustering algorithm needs to be given beforehand and sensitive to the selection of initial center points and outliers. In order to verify the effectiveness of the improved algorithm, three datasets in UCI database are selected and analyzed with different clustering algorithms. The results show that the improved algorithm has higher accuracy and better clustering effect. Finally, this paper selects some telecom data sets in Hangzhou, Zhejiang Province as the research object. On the one hand, the traditional k-means algorithm, the k-means algorithm based on the maximum and minimum distance and the improved k-means algorithm are used to cluster the data sets. The results show that the improved clustering algorithm is more effective and the difference between clusters is more obvious. At the same time, the characteristics of different groups are summarized and analyzed, category names are defined, and differentiated marketing schemes are formulated to improve the service quality of the industry. On the other hand, according to the steps and methods of logistic modeling, this paper uses historical data to train the logistic classification model to predict the loss rate of the subdivided population, so that enterprises can do a good job of retaining the lost users in advance.
【學(xué)位授予單位】:蘭州交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 田琿;;移動(dòng)行業(yè)集團(tuán)客戶價(jià)值評(píng)估模型的應(yīng)用研究[J];現(xiàn)代工業(yè)經(jīng)濟(jì)和信息化;2016年23期

2 蹤鋒;程林;;K-means算法在物流快遞企業(yè)客戶細(xì)分中的應(yīng)用[J];中國(guó)市場(chǎng);2016年36期

3 魏瑾;;基于客戶細(xì)分的電信聚類市場(chǎng)營(yíng)銷策略研究[J];中國(guó)市場(chǎng);2016年31期

4 梁霄波;;電信客戶細(xì)分中基于聚類算法的數(shù)據(jù)挖掘技術(shù)研究[J];現(xiàn)代電子技術(shù);2016年15期

5 左倪娜;;基于改進(jìn)遺傳算法的K-means聚類方法[J];軟件導(dǎo)刊;2016年04期

6 方匡南;范新妍;馬雙鴿;;基于網(wǎng)絡(luò)結(jié)構(gòu)Logistic模型的企業(yè)信用風(fēng)險(xiǎn)預(yù)警[J];統(tǒng)計(jì)研究;2016年04期

7 楊曉斌;毛雪岷;;聚類分析在電信客戶細(xì)分中的應(yīng)用[J];鄂州大學(xué)學(xué)報(bào);2015年07期

8 何坤金;;分治算法的探討及應(yīng)用[J];福建電腦;2015年04期

9 方方;王子英;;K-means聚類分析在人體體型分類中的應(yīng)用[J];東華大學(xué)學(xué)報(bào)(自然科學(xué)版);2014年05期

10 曹樹(shù)國(guó);;基于考場(chǎng)編排的改進(jìn)分治混洗算法研究[J];計(jì)算機(jī)應(yīng)用與軟件;2014年06期

相關(guān)博士學(xué)位論文 前1條

1 周世兵;聚類分析中的最佳聚類數(shù)確定方法研究及應(yīng)用[D];江南大學(xué);2011年

相關(guān)碩士學(xué)位論文 前5條

1 宋建林;K-means聚類算法的改進(jìn)研究[D];安徽大學(xué);2016年

2 王帥宇;K-Means算法在用戶細(xì)分方面的應(yīng)用研究[D];北京理工大學(xué);2015年

3 董騏瑞;k-均值聚類算法的改進(jìn)與實(shí)現(xiàn)[D];吉林大學(xué);2015年

4 劉鳳芹;K-means聚類算法改進(jìn)研究[D];山東師范大學(xué);2013年

5 吳曉蓉;K-均值聚類算法初始中心選取相關(guān)問(wèn)題的研究[D];湖南大學(xué);2008年

,

本文編號(hào):2063796

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2063796.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶9842d***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
丰满熟女少妇一区二区三区| 成人精品亚洲欧美日韩| 色婷婷激情五月天丁香| 成人免费高清在线一区二区| 粉嫩国产美女国产av| 少妇毛片一区二区三区| 麻豆亚州无矿码专区视频| 亚洲一区二区三区四区| 亚洲免费视频中文字幕在线观看| 熟女中文字幕一区二区三区| 欧美日韩在线第一页日韩| 国产精品日韩欧美第一页| 亚洲精品深夜福利视频| 久久精品亚洲精品国产欧美| 成年男女午夜久久久精品| 人妻巨大乳一二三区麻豆| 亚洲国产性生活高潮免费视频 | 精品久久av一二三区| 国产又粗又长又大高潮视频 | 国产精品久久久久久久久久久痴汉| 国产一区二区三区免费福利| 国产精品伦一区二区三区在线| 丝袜破了有美女肉体免费观看| 精品al亚洲麻豆一区| 色婷婷久久五月中文字幕| 搡老熟女老女人一区二区| 大香蕉大香蕉手机在线视频| 狠狠亚洲丁香综合久久| 亚洲男人天堂网在线视频| 欧美偷拍一区二区三区四区| 国产亚洲欧美自拍中文自拍| 国产免费自拍黄片免费看| 国产欧美日韩精品一区二| 免费特黄一级一区二区三区| 国产精品亚洲一区二区| 国产高清在线不卡一区| 亚洲欧美中文日韩综合| 欧美人禽色视频免费看| 91久久国产福利自产拍| 一本久道久久综合中文字幕| 99国产精品国产精品九九|