連續(xù)數(shù)據(jù)發(fā)布的隱私保護(hù)研究
發(fā)布時(shí)間:2019-07-06 13:17
【摘要】:隨著互聯(lián)網(wǎng)的飛速發(fā)展,大數(shù)據(jù)時(shí)代已經(jīng)到來(lái),數(shù)據(jù)信息發(fā)布已成為很普遍的現(xiàn)象。大量的連續(xù)數(shù)據(jù)發(fā)布難免會(huì)泄露個(gè)人隱私信息,連續(xù)發(fā)布的隱私保護(hù)逐漸受到科技研究者的重視。發(fā)布后的數(shù)據(jù)具有較高的隱私保護(hù)度、較低的信息損失度和較好的可用性是數(shù)據(jù)發(fā)布隱私保護(hù)研究的重要目標(biāo)。目前連續(xù)數(shù)據(jù)發(fā)布隱私保護(hù)研究還處于初級(jí)階段,研究更有效的數(shù)據(jù)發(fā)布隱私保護(hù)算法迫在眉睫。首先分析了 LDMICA算法的優(yōu)缺點(diǎn),提出了一個(gè)靜態(tài)數(shù)據(jù)集更新隱私保護(hù)算法—LDICA算法。LDICA算法使用LDMICA算法的聚類方式,利用方差計(jì)算每個(gè)屬性的權(quán)重,再計(jì)算出每條記錄的綜合值大小。由綜合值大小劃分等價(jià)類,并合理的將剩余記錄增加到等價(jià)類,使發(fā)布數(shù)據(jù)的每個(gè)等價(jià)類滿足l-多樣性以及具有相似的記錄集。對(duì)LDICA算法進(jìn)行了實(shí)驗(yàn)測(cè)試,LDICA算法不修改屬性值,對(duì)劃分后的等價(jià)類進(jìn)行分割,并利用有損連接達(dá)到隱私保護(hù),無(wú)信息損失度,多樣性參數(shù)l選取總敏感屬性值種類數(shù)的1/2具有最優(yōu)的計(jì)算性能開(kāi)銷。接著結(jié)合靜態(tài)數(shù)據(jù)發(fā)布的聚類思想以及置換匿名分割技術(shù)提出了一個(gè)適用于動(dòng)態(tài)數(shù)據(jù)集更新連續(xù)發(fā)布的隱私保護(hù)算法—LDACA算法。由綜合值大小劃分等價(jià)類,實(shí)現(xiàn)數(shù)據(jù)的完全更新,使發(fā)布的數(shù)據(jù)表也相應(yīng)的進(jìn)行更新,并保持發(fā)布前的數(shù)據(jù)具有相同的簽名。LDACA算法對(duì)數(shù)據(jù)的完全更新逐步處理刪除記錄模塊、修改記錄模塊、偽記錄表、增量表記錄模塊,處理增量表記錄使用了 LDICA算法,使新增等價(jià)類滿足l-多樣性,同時(shí)保持原等價(jià)類簽名不變。對(duì)LDACA算法進(jìn)行了實(shí)驗(yàn)測(cè)試,LDACA算法的隱私泄露率相對(duì)于M-distinct算法降低了 20%,遠(yuǎn)遠(yuǎn)低于1/l,減小了信息損失度,可以有效的防止連接攻擊。算法執(zhí)行時(shí)間低,5s內(nèi)執(zhí)行完畢,性能較好,能有效的起到隱私保護(hù)的作用。
[Abstract]:With the rapid development of the Internet, the era of big data has arrived, data and information release has become a very common phenomenon. A large number of continuous data release will inevitably reveal personal privacy information, and the privacy protection of continuous release will be paid more and more attention by science and technology researchers. The published data have high privacy protection, low information loss and good availability are the important objectives of data release privacy protection research. At present, the research on privacy protection of continuous data release is still in its infancy, so it is urgent to study a more effective privacy protection algorithm for data release. Firstly, the advantages and disadvantages of LDMICA algorithm are analyzed, and a static dataset update privacy protection algorithm, LDICA algorithm, is proposed. LDICA algorithm uses the clustering method of LDMICA algorithm, uses variance to calculate the weight of each attribute, and then calculates the comprehensive value size of each record. The equivalence class is divided by the size of the synthesis value, and the remaining records are reasonably added to the equivalent class, so that each equivalent class of the published data satisfies l-diversity and has a similar record set. The LDICA algorithm is tested by experiments. The LDICA algorithm does not modify the attribute values, divides the divided equivalent classes, and uses lossy connections to achieve privacy protection, no information loss, and the diversity parameter l to select 1 鈮,
本文編號(hào):2511039
[Abstract]:With the rapid development of the Internet, the era of big data has arrived, data and information release has become a very common phenomenon. A large number of continuous data release will inevitably reveal personal privacy information, and the privacy protection of continuous release will be paid more and more attention by science and technology researchers. The published data have high privacy protection, low information loss and good availability are the important objectives of data release privacy protection research. At present, the research on privacy protection of continuous data release is still in its infancy, so it is urgent to study a more effective privacy protection algorithm for data release. Firstly, the advantages and disadvantages of LDMICA algorithm are analyzed, and a static dataset update privacy protection algorithm, LDICA algorithm, is proposed. LDICA algorithm uses the clustering method of LDMICA algorithm, uses variance to calculate the weight of each attribute, and then calculates the comprehensive value size of each record. The equivalence class is divided by the size of the synthesis value, and the remaining records are reasonably added to the equivalent class, so that each equivalent class of the published data satisfies l-diversity and has a similar record set. The LDICA algorithm is tested by experiments. The LDICA algorithm does not modify the attribute values, divides the divided equivalent classes, and uses lossy connections to achieve privacy protection, no information loss, and the diversity parameter l to select 1 鈮,
本文編號(hào):2511039
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2511039.html
最近更新
教材專著