海量虛擬身份數(shù)據(jù)的存儲(chǔ)管理關(guān)鍵技術(shù)研究與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-08-03 20:06
【摘要】:隨著計(jì)算機(jī)網(wǎng)絡(luò)及其應(yīng)用的快速發(fā)展,網(wǎng)絡(luò)上出現(xiàn)了越來(lái)越多的網(wǎng)絡(luò)平臺(tái)、應(yīng)用,用戶在不同的平臺(tái)、應(yīng)用可能會(huì)使用大量的虛擬身份應(yīng)用信息。不論是靜態(tài)數(shù)據(jù)如注冊(cè)賬號(hào),還是用戶交互消息如信息等都屬于虛擬身份應(yīng)用信息,它們存儲(chǔ)的數(shù)據(jù)總量均達(dá)到TB級(jí)別甚至PB級(jí)別。在Web2.0時(shí)代,互聯(lián)網(wǎng)應(yīng)用需要處理大量用戶創(chuàng)作或者分享的數(shù)據(jù),比如圖片、視頻、博客日志等,這些數(shù)據(jù)類型多種多樣并且格式、大小也不盡相同。數(shù)據(jù)量大,類型多樣,大小不一的特性對(duì)于海量數(shù)據(jù)存儲(chǔ)、管理提出了嚴(yán)峻的考驗(yàn)。本文是基于863重大項(xiàng)目——***網(wǎng)絡(luò)身份管理與應(yīng)用技術(shù)中的子課題***虛擬身份管理。它的主要功能是通過(guò)多種手段獲得不同平臺(tái)下的虛擬身份數(shù)據(jù),并對(duì)它們做以統(tǒng)一管理,為實(shí)際的網(wǎng)絡(luò)平臺(tái)、應(yīng)用提供接口,方便查找、追溯等。本文是對(duì)虛擬身份數(shù)據(jù)的存儲(chǔ)關(guān)鍵技術(shù)進(jìn)行研究,主要解決和實(shí)現(xiàn)了存儲(chǔ)時(shí)的數(shù)據(jù)模型,在分布式環(huán)境下數(shù)據(jù)劃分、數(shù)據(jù)副本以及查詢時(shí)提高效率的多維索引和緩存等問(wèn)題,并在虛擬身份追溯系統(tǒng)中模擬運(yùn)行進(jìn)行檢測(cè),為實(shí)現(xiàn)項(xiàng)目的要求提供存儲(chǔ)基礎(chǔ)。本文是基于Cassandra數(shù)據(jù)庫(kù)的,主要工作包括:(1)在存儲(chǔ)方面,針對(duì)虛擬身份數(shù)據(jù)量大,涉及模糊查詢等特點(diǎn),提出了基于MySQL數(shù)據(jù)庫(kù)和Cassandra數(shù)據(jù)庫(kù)相結(jié)合的數(shù)據(jù)模型。在分布式環(huán)境下,考慮了數(shù)據(jù)劃分和數(shù)據(jù)備份等問(wèn)題,設(shè)計(jì)與實(shí)現(xiàn)了基于加權(quán)改進(jìn)一致性hash算法的數(shù)據(jù)劃分方法和基于數(shù)據(jù)規(guī)模與熱點(diǎn)變化相結(jié)合的數(shù)據(jù)副本策略。(2)在查詢方面,針對(duì)虛擬身份查詢請(qǐng)求中的無(wú)指定列的查詢,機(jī)器節(jié)點(diǎn)快速準(zhǔn)確定位等問(wèn)題,設(shè)計(jì)并實(shí)現(xiàn)了Cassandra索引與倒排索引、節(jié)點(diǎn)索引相結(jié)合的多維度索引?紤]到請(qǐng)求訪問(wèn)的局部性原理,設(shè)計(jì)實(shí)現(xiàn)了針對(duì)虛擬身份特點(diǎn)的語(yǔ)義緩存技術(shù)。(3)在系統(tǒng)實(shí)現(xiàn)方面,以虛擬追溯系統(tǒng)為依托,對(duì)存儲(chǔ)方面的數(shù)據(jù)模型、數(shù)據(jù)劃分思想以及數(shù)據(jù)副本策略,查詢方面的多維度索引和語(yǔ)義緩存做了性能測(cè)試,證明了以上方法對(duì)提高系統(tǒng)效率具有很好的性能。
[Abstract]:With the rapid development of computer network and its applications, more and more network platforms appear on the network. Users may use a large amount of virtual identity application information in different platforms. Both static data such as registered accounts and interactive messages such as information belong to virtual identity application information. The total amount of data stored by them reaches TB level or even PB level. In the era of Web2.0, Internet applications need to deal with a large number of user-created or shared data, such as pictures, videos, blog logs, and so on. The characteristics of large amount of data, diverse types and different sizes put forward a severe test for massive data storage and management. This paper is based on 863 major project * Network identity management and application technology in the subproject * virtual identity management. Its main function is to obtain virtual identity data under different platforms by various means, and to manage them uniformly, to provide interfaces for practical network platforms and applications, to facilitate searching and tracing, and so on. In this paper, the key technology of storage of virtual identity data is studied, which mainly solves and implements the data model, data partition in distributed environment, data replica, multidimensional index and cache to improve the efficiency of query, and so on. In the virtual identity traceability system, the simulated operation is tested to provide the storage base for the project. This paper is based on Cassandra database. The main work includes: (1) aiming at the characteristics of large amount of virtual identity data and fuzzy query, a data model based on the combination of MySQL database and Cassandra database is proposed. In the distributed environment, the problems of data partitioning and data backup are considered. This paper designs and implements the data partitioning method based on the weighted improved consistent hash algorithm and the data replica strategy based on the combination of data scale and hot spot change. (2) in the aspect of query, the query with no specified column in the virtual identity query request is designed and implemented. In order to locate the machine nodes quickly and accurately, this paper designs and implements a multi-dimensional index which combines Cassandra index, inverted index and node index. Considering the local principle of request access, this paper designs and implements the semantic cache technology for the characteristics of virtual identity. (3) in the aspect of system implementation, the data model of storage is based on virtual traceability system. The idea of data partitioning, data replica strategy, multi-dimensional index and semantic cache in query are tested, which proves that these methods have good performance in improving system efficiency.
【學(xué)位授予單位】:國(guó)防科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP333
,
本文編號(hào):2162866
[Abstract]:With the rapid development of computer network and its applications, more and more network platforms appear on the network. Users may use a large amount of virtual identity application information in different platforms. Both static data such as registered accounts and interactive messages such as information belong to virtual identity application information. The total amount of data stored by them reaches TB level or even PB level. In the era of Web2.0, Internet applications need to deal with a large number of user-created or shared data, such as pictures, videos, blog logs, and so on. The characteristics of large amount of data, diverse types and different sizes put forward a severe test for massive data storage and management. This paper is based on 863 major project * Network identity management and application technology in the subproject * virtual identity management. Its main function is to obtain virtual identity data under different platforms by various means, and to manage them uniformly, to provide interfaces for practical network platforms and applications, to facilitate searching and tracing, and so on. In this paper, the key technology of storage of virtual identity data is studied, which mainly solves and implements the data model, data partition in distributed environment, data replica, multidimensional index and cache to improve the efficiency of query, and so on. In the virtual identity traceability system, the simulated operation is tested to provide the storage base for the project. This paper is based on Cassandra database. The main work includes: (1) aiming at the characteristics of large amount of virtual identity data and fuzzy query, a data model based on the combination of MySQL database and Cassandra database is proposed. In the distributed environment, the problems of data partitioning and data backup are considered. This paper designs and implements the data partitioning method based on the weighted improved consistent hash algorithm and the data replica strategy based on the combination of data scale and hot spot change. (2) in the aspect of query, the query with no specified column in the virtual identity query request is designed and implemented. In order to locate the machine nodes quickly and accurately, this paper designs and implements a multi-dimensional index which combines Cassandra index, inverted index and node index. Considering the local principle of request access, this paper designs and implements the semantic cache technology for the characteristics of virtual identity. (3) in the aspect of system implementation, the data model of storage is based on virtual traceability system. The idea of data partitioning, data replica strategy, multi-dimensional index and semantic cache in query are tested, which proves that these methods have good performance in improving system efficiency.
【學(xué)位授予單位】:國(guó)防科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP333
,
本文編號(hào):2162866
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2162866.html
最近更新
教材專著