云環(huán)境下具有隱私保護(hù)的K-means聚類算法研究與設(shè)計(jì)

發(fā)布時(shí)間：2018-05-07 15:39

本文選題：隱私保護(hù) + 數(shù)據(jù)挖掘��；參考：《哈爾濱工業(yè)大學(xué)》2017年碩士論文

【摘要】：眾所周知,K-means聚類是數(shù)據(jù)挖掘中非常經(jīng)典和常用的方法之一,它通過計(jì)算數(shù)據(jù)項(xiàng)之間的距離可以把相似的數(shù)據(jù)項(xiàng)聚集在一起。隨著信息化、數(shù)字化、網(wǎng)絡(luò)化進(jìn)程加速,經(jīng)濟(jì)全球化已成為一種不可逆的趨勢,聚類算法中的數(shù)據(jù)來源越來越多樣化,數(shù)據(jù)安全越來越重要。考慮到數(shù)據(jù)會來自多個(gè)參與方,在這些數(shù)據(jù)中可能包含關(guān)于參與方的敏感信息或私人信息,如果這些信息在多個(gè)參與方之間共享,那么數(shù)據(jù)的隱私性將不能得到保證。具有隱私保護(hù)的聯(lián)合數(shù)據(jù)挖掘可以在保護(hù)用戶數(shù)據(jù)和挖掘結(jié)果隱私性的同時(shí),對多個(gè)參與方的聯(lián)合數(shù)據(jù)庫進(jìn)行數(shù)據(jù)挖掘,進(jìn)一步提取出有用的信息。因此,如何設(shè)計(jì)出具有隱私保護(hù)的聯(lián)合數(shù)據(jù)挖掘算法成為一個(gè)需要解決的難題。半誠實(shí)模型在許多情況下是符合實(shí)際場景的,該模型下數(shù)據(jù)的隱私性是通過各個(gè)參與方始終遵循協(xié)議來保證的。但是為保證數(shù)據(jù)的隱私性,該模型下的解決方案通常因?yàn)橛?jì)算消耗和通信消耗較高,所以實(shí)際中并不可行。如今,隨著科學(xué)技術(shù)的進(jìn)步,越來越多的企業(yè)將數(shù)據(jù)存儲在云平臺,同時(shí)分布式云計(jì)算框架也為處理大數(shù)據(jù)提供強(qiáng)大的計(jì)算能力。本論文將借助云計(jì)算強(qiáng)大的計(jì)算能力提升算法的效率,保證算法的可行性。針對具有隱私保護(hù)的數(shù)據(jù)挖掘中存在的性能問題,本論文開展了對現(xiàn)有具有隱私保護(hù)的數(shù)據(jù)挖掘算法的深入研究,進(jìn)而在水平劃分的數(shù)據(jù)集上提出一種高效的具有隱私保護(hù)的K-means聚類算法,該算法支持有兩個(gè)數(shù)據(jù)擁有者和云平臺同時(shí)存在的存儲外包和計(jì)算外包。數(shù)據(jù)以密文形式存儲在云端,云平臺通過與兩個(gè)數(shù)據(jù)擁有者交互,完成在雙方的聯(lián)合數(shù)據(jù)集上K-means聚類數(shù)據(jù)挖掘的任務(wù)。本論文分別設(shè)計(jì)不同的安全協(xié)議解決具有隱私保護(hù)的K-means聚類算法中的三個(gè)技術(shù)難題:解決密文距離計(jì)算問題的安全距離計(jì)算協(xié)議、解決密文比較問題的安全比較協(xié)議和解決密文除法問題的安全電路協(xié)議。進(jìn)而將這些安全協(xié)議應(yīng)用到聚類算法框架中,實(shí)現(xiàn)具有隱私保護(hù)的K-means聚類算法。本論文從理論上分析了該算法的時(shí)間復(fù)雜度、空間復(fù)雜度和通訊復(fù)雜度,給出該算法在半誠實(shí)模型下的安全性證明,并且證明該算法在重計(jì)算質(zhì)心點(diǎn)階段允許參與方中最多有一個(gè)方為惡意方的安全性,最后通過實(shí)驗(yàn)計(jì)算加密數(shù)據(jù)的時(shí)間消耗和一次迭代過程中各參與方的時(shí)間消耗,驗(yàn)證了算法的可行性。
[Abstract]:It is well known that K-means clustering is one of the most classical and commonly used methods in data mining. It can gather similar data items together by calculating the distance between data items. With the acceleration of information, digitization and networking, economic globalization has become an irreversible trend. Data sources in clustering algorithms are becoming more and more diverse, and data security is becoming more and more important. Considering that the data will come from more than one participant, sensitive information or private information about the participants may be included in the data. If the information is shared among the participants, the privacy of the data will not be guaranteed. The joint data mining with privacy protection can not only protect the privacy of user data and mining results, but also mine the federated database of multiple participants to extract useful information. Therefore, how to design a joint data mining algorithm with privacy protection becomes a difficult problem to be solved. In many cases, the semi-honest model is in line with the actual scenario, and the privacy of the data in the model is guaranteed by the agreement that each participant always follows. However, in order to ensure the privacy of the data, the solution under the model is usually not feasible because of the high computation consumption and communication consumption. Nowadays, with the development of science and technology, more and more enterprises store data on cloud platform, and distributed cloud computing framework also provides powerful computing power for big data. This paper will improve the efficiency of the algorithm with the help of the powerful computing power of cloud computing, and ensure the feasibility of the algorithm. Aiming at the performance problems in data mining with privacy protection, this paper has carried out in-depth research on existing data mining algorithms with privacy protection. Furthermore, an efficient K-means clustering algorithm with privacy protection on horizontally partitioned data sets is proposed, which supports both storage outsourcing and computing outsourcing with two data owners and cloud platforms. The data is stored in the cloud in the form of ciphertext. By interacting with two data owners, the cloud platform completes the task of K-means clustering data mining on the joint data sets of both parties. In this paper, we design different security protocols to solve the three technical problems in K-means clustering algorithm with privacy protection: the secure distance computing protocol to solve the problem of ciphertext distance calculation. Security comparison protocol to solve ciphertext comparison problem and secure circuit protocol to solve ciphertext division problem. Then, these security protocols are applied to the clustering algorithm framework to implement the K-means clustering algorithm with privacy protection. In this paper, the time complexity, space complexity and communication complexity of the algorithm are analyzed theoretically, and the security proof of the algorithm under semi-honest model is given. It is proved that the algorithm allows the security of up to one of the participants to be malicious in the phase of recalculating the centroid point. Finally, the time consumption of encrypted data and the time consumption of each participant in an iterative process are calculated experimentally. The feasibility of the algorithm is verified.
【學(xué)位授予單位】：哈爾濱工業(yè)大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2017
【分類號】：TP309;TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 袁武;任勛益;;水平分割數(shù)據(jù)的保護(hù)隱私聚類挖掘方法研究[J];計(jì)算機(jī)技術(shù)與發(fā)展;2015年05期

2 薛安榮;劉彬;聞丹丹;;基于小波變換的分布式隱私保護(hù)聚類算法[J];計(jì)算機(jī)應(yīng)用;2014年04期

3 張海濤;黃慧慧;徐亮;高莎莎;;隱私保護(hù)數(shù)據(jù)挖掘研究進(jìn)展[J];計(jì)算機(jī)應(yīng)用研究;2013年12期

4 方煒煒;楊炳儒;夏紅科;;基于SMC的隱私保護(hù)聚類模型[J];系統(tǒng)工程與電子技術(shù);2012年07期

5 王利興;梁建勇;;隱私保護(hù)關(guān)聯(lián)規(guī)則挖掘的研究[J];硅谷;2011年19期

6 方煒煒;楊炳儒;楊君;周長勝;;基于隱私保護(hù)的決策樹模型[J];模式識別與人工智能;2010年06期

7 姚瑤;吉根林;;一種基于隱私保護(hù)的分布式聚類算法[J];計(jì)算機(jī)科學(xué);2009年03期

8 張鋒;孫雪冬;常會友;趙淦森;;兩方參與的隱私保護(hù)協(xié)同過濾推薦研究[J];電子學(xué)報(bào);2009年01期

9 陳曉明;李軍懷;彭軍;劉海玲;張t，

本文編號：1857491

資料下載