基于密度峰值聚類算法的電力大數(shù)據(jù)異常值檢測(cè)及用電行為分析研究
本文關(guān)鍵詞:基于密度峰值聚類算法的電力大數(shù)據(jù)異常值檢測(cè)及用電行為分析研究 出處:《中國(guó)電力科學(xué)研究院》2017年碩士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 電力大數(shù)據(jù) 特征提取 異常值檢測(cè) 聚類分析 用電行為分析
【摘要】:隨著智能電網(wǎng)的建設(shè)與發(fā)展,電網(wǎng)的各個(gè)環(huán)節(jié)都產(chǎn)生著體量巨大、結(jié)構(gòu)復(fù)雜、復(fù)雜關(guān)聯(lián)的數(shù)據(jù),是電力大數(shù)據(jù)的主要來(lái)源。數(shù)據(jù)的價(jià)值產(chǎn)生于數(shù)據(jù)分析,通過(guò)對(duì)海量電力數(shù)據(jù)分析,可以在電網(wǎng)規(guī)劃運(yùn)行、資產(chǎn)運(yùn)維管理、用戶和社會(huì)服務(wù)三大領(lǐng)域發(fā)揮重要作用。特征提取和聚類分析是進(jìn)行電力大數(shù)據(jù)分析的基礎(chǔ)工作,是影響分析結(jié)果的關(guān)鍵所在,除需要業(yè)務(wù)領(lǐng)域知識(shí)外,還需要深厚的統(tǒng)計(jì)和機(jī)器學(xué)習(xí)建模背景知識(shí)。對(duì)于特征提取算法,本文對(duì)比離散小波變換和高斯混合模型兩種常用方法,給出在用電行為分析采用離散小波變換的原因。對(duì)于聚類算法,本文對(duì)比了 K-Means、DBSCAN和快速密度峰值聚類算法,分析其優(yōu)缺點(diǎn),給出后文異常值檢測(cè)和用戶行為分析選擇改進(jìn)快速密度峰值聚類算法的原因。由于數(shù)據(jù)來(lái)源、統(tǒng)計(jì)口徑、人員錄入、異常行為等問(wèn)題以及缺乏數(shù)據(jù)質(zhì)量管控體系,會(huì)導(dǎo)致異常數(shù)據(jù)產(chǎn)生。異常數(shù)據(jù)包含了與系統(tǒng)異常情況出現(xiàn)的相關(guān)信息,同時(shí),異常值的存在會(huì)影響數(shù)據(jù)的特征提取和聚類的準(zhǔn)確性,因此異常數(shù)據(jù)有巨大的研究?jī)r(jià)值。故本文提出一種基于KNN的快速密度峰值異常值檢測(cè)算法。針對(duì)快速密度峰值聚類算法用于異常值檢測(cè)時(shí)未考慮數(shù)據(jù)的局部特點(diǎn)以及局部密度依賴于截?cái)嗑嚯x選取的不足,利用K-近鄰(K-Nearest Neighbors,KNN)思想重新定義局部密度和距離,并設(shè)計(jì)判斷異常值的規(guī)則,改善原始算法沒(méi)有考慮數(shù)據(jù)局部特點(diǎn)以及依賴于截?cái)嗑嚯x的不足,實(shí)現(xiàn)更加準(zhǔn)確的異常值檢測(cè);谀呈∨潆娮儔浩魅肇(fù)荷數(shù)據(jù)的異常檢測(cè)仿真實(shí)驗(yàn)證明了該算法的有效性。用電行為分析是電力大數(shù)據(jù)研究的重要組成部分,是負(fù)荷預(yù)測(cè)、需求側(cè)響應(yīng)、電網(wǎng)規(guī)劃、經(jīng)濟(jì)運(yùn)行、費(fèi)率制定、能效提升等研究與工作的基礎(chǔ)。本文在利用KNN思想改進(jìn)快速密度峰值中局部密度和距離定義的基礎(chǔ)上,針對(duì)原始算法依賴于人為識(shí)別決策圖中可能的聚類中心的不足,用向外統(tǒng)計(jì)檢驗(yàn)的方法實(shí)現(xiàn)聚類中心自動(dòng)選取。利用離散小波變換的方法提取用戶負(fù)荷數(shù)據(jù)多時(shí)間尺度特征,進(jìn)而對(duì)不同時(shí)間尺度的負(fù)荷數(shù)據(jù)進(jìn)行聚類分析,典型負(fù)荷曲線重構(gòu),從而實(shí)現(xiàn)用電行為分析。該分析方法在單個(gè)用戶及不同行業(yè)用戶的實(shí)際數(shù)據(jù)集上均得到了較好的結(jié)果。
[Abstract]:With the construction and development of smart grid, every link of power grid produces data with huge volume, complex structure and complex correlation, which is the main source of power big data. The value of data comes from data analysis. Through the analysis of massive power data, it can be planned and operated in the power network, and the operation and maintenance of assets can be managed. Feature extraction and clustering analysis are the basic work of power big data analysis, which is the key to affect the analysis results, except for the business domain knowledge. It also needs profound background knowledge of statistical and machine learning modeling. For feature extraction algorithms, this paper compares discrete wavelet transform and Gao Si hybrid model two common methods. For the clustering algorithm, the K-Means DBSCAN and the fast peak density clustering algorithm are compared, and their advantages and disadvantages are analyzed. The reasons why the outlier detection and user behavior analysis can improve the fast peak density clustering algorithm are given. The data source, statistical caliber, and personnel input are given. Problems such as abnormal behavior and the lack of data quality control system will lead to abnormal data. The abnormal data contains information related to the abnormal situation of the system and at the same time. The existence of outliers will affect the accuracy of feature extraction and clustering. Therefore, this paper proposes a fast density peak anomaly detection algorithm based on KNN. The fast density peak clustering algorithm is applied to detect outliers without considering the local data. The local density depends on the selection of truncation distance. The local density and distance are redefined by K-nearest neighbor KNN, and the rule of judging outliers is designed. The improvement of the original algorithm does not take into account the local characteristics of the data and the lack of dependence on the truncation distance. The simulation results of outlier detection based on daily load data of a province power distribution transformer show that the algorithm is effective. The analysis of power consumption behavior is an important part of the research of power big data. Load forecasting, demand-side response, grid planning, economic operation, rate setting. In this paper, we use KNN to improve the definition of local density and distance in fast peak density. The original algorithm relies on artificial identification of the possible clustering center in the decision map. The cluster center is automatically selected by the method of outward statistical test, and the multi-time scale feature of user load data is extracted by discrete wavelet transform, and then the load data of different time scales are analyzed by clustering. The typical load curve is reconstructed to realize the analysis of power consumption behavior, and the results of this analysis method are good on the actual data sets of individual users and users in different industries.
【學(xué)位授予單位】:中國(guó)電力科學(xué)研究院
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP311.13;TM73;TM76
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 田力;向敏;;基于密度聚類技術(shù)的電力系統(tǒng)用電量異常分析算法[J];電力系統(tǒng)自動(dòng)化;2017年05期
2 趙俊華;董朝陽(yáng);文福拴;薛禹勝;;面向能源系統(tǒng)的數(shù)據(jù)科學(xué):理論、技術(shù)與展望[J];電力系統(tǒng)自動(dòng)化;2017年04期
3 周孝信;曾嶸;高峰;屈魯;;能源互聯(lián)網(wǎng)的發(fā)展現(xiàn)狀與展望[J];中國(guó)科學(xué):信息科學(xué);2017年02期
4 莊池杰;張斌;胡軍;李秋碩;曾嶸;;基于無(wú)監(jiān)督學(xué)習(xí)的電力用戶異常用電模式檢測(cè)[J];中國(guó)電機(jī)工程學(xué)報(bào);2016年02期
5 江櫻;王志強(qiáng);戴波;;基于大數(shù)據(jù)的居民用電消費(fèi)習(xí)慣研究與分析[J];電力信息與通信技術(shù);2015年11期
6 張欣;高衛(wèi)國(guó);蘇運(yùn);;基于函數(shù)型數(shù)據(jù)分析和k-means算法的電力用戶分類(英文)[J];電網(wǎng)技術(shù);2015年11期
7 王繼業(yè);;大數(shù)據(jù)在電網(wǎng)企業(yè)的應(yīng)用探索[J];中國(guó)電力企業(yè)管理;2015年17期
8 張斌;莊池杰;胡軍;陳水明;張明明;王科;曾嶸;;結(jié)合降維技術(shù)的電力負(fù)荷曲線集成聚類算法[J];中國(guó)電機(jī)工程學(xué)報(bào);2015年15期
9 周小明;蘇安龍;楊宏宇;;基于K-Means聚類算法的行業(yè)用電行為分析[J];電氣應(yīng)用;2015年S1期
10 張強(qiáng);王序文;王小捷;陳光;劉娟;;基于OPTICS的變電設(shè)備狀態(tài)監(jiān)測(cè)異常數(shù)據(jù)過(guò)濾算法[J];電力信息與通信技術(shù);2015年06期
,本文編號(hào):1415499
本文鏈接:http://sikaile.net/kejilunwen/dianlidianqilunwen/1415499.html