天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 碩博論文 > 信息類博士論文 >

面向評(píng)測(cè)基準(zhǔn)的社交流數(shù)據(jù)生成

發(fā)布時(shí)間:2018-05-06 12:11

  本文選題:社交流 + 數(shù)據(jù)生成器。 參考:《華東師范大學(xué)》2016年博士論文


【摘要】:社交流數(shù)據(jù)指記錄了社交實(shí)體狀態(tài)及實(shí)體間關(guān)系動(dòng)態(tài)變化的數(shù)據(jù)流。它可以用來表示大量應(yīng)用中的實(shí)體狀態(tài)變化,如社交媒體數(shù)據(jù)中用戶發(fā)布和轉(zhuǎn)發(fā)信息的動(dòng)態(tài)、利技文獻(xiàn)間的引用、分布式系統(tǒng)中各節(jié)點(diǎn)間數(shù)據(jù)的傳輸?shù)。社交流?shù)據(jù)與傳統(tǒng)的網(wǎng)絡(luò)和流數(shù)據(jù)不同,它既是一系列是實(shí)體狀態(tài)數(shù)據(jù)流,又是動(dòng)態(tài)變化的網(wǎng)絡(luò)數(shù)據(jù),它是圖數(shù)據(jù)和流數(shù)據(jù)的結(jié)合。正是由于社交流數(shù)據(jù)存在圖和流數(shù)據(jù)的復(fù)合特點(diǎn),因此,社交流數(shù)據(jù)存在巨大的商業(yè)和研究價(jià)值,有效的數(shù)據(jù)管理和挖掘是學(xué)術(shù)界和工業(yè)界共同關(guān)注的焦點(diǎn)。目前,多種技術(shù)可以用來管理或者處理社交流數(shù)據(jù),如何針對(duì)應(yīng)用選取合適的數(shù)據(jù)生成器是評(píng)測(cè)基準(zhǔn)需要解決的問題。然而,出于對(duì)隱私、數(shù)據(jù)量龐大不易轉(zhuǎn)移等問題的考慮,評(píng)測(cè)基準(zhǔn)通常無法提供真實(shí)數(shù)據(jù)評(píng)測(cè)系統(tǒng)。因此,一個(gè)能夠靈活地、高效地產(chǎn)生大規(guī)!罢鎸(shí)的”人工數(shù)據(jù)的生成器具有重要意義。本文面向評(píng)測(cè)基準(zhǔn)的需求研究產(chǎn)生社交流數(shù)據(jù)的方法。提出的產(chǎn)生社交流數(shù)據(jù)的方法能針對(duì)不同類型社交流數(shù)據(jù)產(chǎn)生與“真實(shí)的”社交流數(shù)據(jù)特征相一致的數(shù)據(jù)。為了實(shí)現(xiàn)高吞吐量的產(chǎn)生大規(guī)模數(shù)據(jù)的目的,本文設(shè)計(jì)并實(shí)現(xiàn)了一個(gè)分布式生成生成社交流數(shù)據(jù)的系統(tǒng)。此外,本文以社交流數(shù)據(jù)生成器為基礎(chǔ),設(shè)計(jì)了一個(gè)基于社交媒體分析查詢的評(píng)測(cè)基準(zhǔn)?傊,全文圍繞著社交流生成這一問題而展開,主要貢獻(xiàn)具體包括以下四個(gè)方面1.提出了一種基于人類動(dòng)力學(xué)模型和時(shí)序生長網(wǎng)絡(luò)模型產(chǎn)生單鏈接社交流數(shù)據(jù)的方法單鏈接社交流中的社交項(xiàng)目最多可連接一個(gè)歷史項(xiàng)目。該方法利用兩個(gè)緩沖區(qū)的迭代更新而順序地產(chǎn)生單鏈接社交流數(shù)據(jù)。其中一個(gè)緩沖區(qū)是未來項(xiàng)目緩沖區(qū):它存儲(chǔ)生產(chǎn)者未來將來發(fā)布的社交項(xiàng)目。另一是近期項(xiàng)目緩沖池,其中保存了指定窗口大小的近期歷史項(xiàng)目。在兩個(gè)緩沖區(qū)迭代更新的過程中,該方法使用人類動(dòng)力學(xué)模型為每個(gè)生產(chǎn)者產(chǎn)生不帶鏈接信息的社交項(xiàng)目,利用時(shí)序生長網(wǎng)絡(luò)模型確定社交項(xiàng)目的項(xiàng)目鏈接信息。用戶可以通過配置參數(shù)產(chǎn)生指定規(guī)模、數(shù)據(jù)分布和類型的數(shù)據(jù)集。通過實(shí)驗(yàn)表明,提出的方法能夠以穩(wěn)定的吞吐量和內(nèi)存占用持續(xù)地產(chǎn)生“真實(shí)的”單鏈接社交流數(shù)據(jù)。2.提出了一種基于人類動(dòng)力學(xué)模型和網(wǎng)絡(luò)生成模型產(chǎn)生多鏈接社交流數(shù)據(jù)的方法多鏈接社交流內(nèi)的社交項(xiàng)目可以連接多個(gè)歷史項(xiàng)目,因此,在產(chǎn)生多鏈接社交流數(shù)據(jù)的過程中對(duì)社交項(xiàng)目的鏈接生成部分有新的要求。該方法在產(chǎn)生單鏈接社交流數(shù)據(jù)方法的基礎(chǔ)上,同樣使用兩個(gè)緩沖池迭代更新的方法順序地產(chǎn)生多鏈接社交流數(shù)據(jù)。在生產(chǎn)社交項(xiàng)目的鏈接信息時(shí),擴(kuò)展的時(shí)序生長模型和邊復(fù)制模型都可以用來產(chǎn)生鏈接信息。經(jīng)實(shí)驗(yàn)驗(yàn)證分析,利用擴(kuò)展的時(shí)序生長模型產(chǎn)生的社交流數(shù)據(jù)能更好地匹配真實(shí)的數(shù)據(jù)分布。基于擴(kuò)展的時(shí)序生長模型產(chǎn)生多鏈接社交流的方法能夠以穩(wěn)定的吞吐量和內(nèi)存占用持續(xù)地產(chǎn)生“真實(shí)的”多鏈接社交流數(shù)據(jù)。3.采用主從架構(gòu),實(shí)現(xiàn)了一個(gè)分布式生成社交流數(shù)據(jù)的系統(tǒng)為了實(shí)現(xiàn)高吞吐量的產(chǎn)生大規(guī)模社交流數(shù)據(jù)的目的,該系統(tǒng)可分布式地產(chǎn)生單鏈接和多鏈接社交流數(shù)據(jù)。該系統(tǒng)采用單個(gè)主機(jī)和多個(gè)工作節(jié)點(diǎn)的架構(gòu)產(chǎn)生數(shù)據(jù)。工作節(jié)點(diǎn)利用產(chǎn)生單鏈接和多鏈接社交流數(shù)據(jù)的方法,在使用時(shí)序生長模型產(chǎn)生鏈接信息的基礎(chǔ)上,產(chǎn)生指定生產(chǎn)者分區(qū)內(nèi)的社交流數(shù)據(jù)。主機(jī)則合并來自各個(gè)節(jié)點(diǎn)的部分社交流以產(chǎn)生最終的全局社交流數(shù)據(jù)。該系統(tǒng)利用分布式鏈接生成方法、異步模型和延遲更新策略具體實(shí)現(xiàn)分布式生產(chǎn)數(shù)據(jù)的細(xì)節(jié)。通過實(shí)驗(yàn)表明,分布式數(shù)據(jù)生成系統(tǒng)在產(chǎn)生不失真數(shù)據(jù)的前提下,可以通過增加節(jié)點(diǎn)的方式實(shí)現(xiàn)生產(chǎn)數(shù)據(jù)吞吐量的線性增長。4.以社交流數(shù)據(jù)生成器為基礎(chǔ),設(shè)計(jì)了一個(gè)基于社交媒體分析查詢的基準(zhǔn)測(cè)試社交媒體服務(wù)已經(jīng)成為互聯(lián)網(wǎng)上最流行的服務(wù)之一,社交媒體數(shù)據(jù)是—類典型的社交流數(shù)據(jù)。本文設(shè)計(jì)了一種基于社交媒體數(shù)據(jù)分析的評(píng)測(cè)基準(zhǔn)BSMA,它包含了數(shù)據(jù)支持、負(fù)載生成器和一個(gè)性能測(cè)試工具。負(fù)載生成器部分定義了社交媒體的數(shù)據(jù)模型,并在此基礎(chǔ)上定義了4類24個(gè)查詢模版,并提供一個(gè)可以根據(jù)需求產(chǎn)生不同參數(shù)值給查詢?nèi)蝿?wù)的參數(shù)生成器。數(shù)據(jù)支持部分不僅提供提供真實(shí)的新浪微博數(shù)據(jù)集,還提供能夠產(chǎn)生社交流數(shù)據(jù)的生成器BSMA-Gen。BSMA-Gen使用本文提供的生成社交流數(shù)據(jù)的方法產(chǎn)生數(shù)據(jù)。24個(gè)查詢模版中包含多個(gè)基于社交流數(shù)據(jù)中的時(shí)序和鏈接關(guān)系網(wǎng)絡(luò)的查詢,BSMA-Gen可作為該類查詢的數(shù)據(jù)支持。用戶可以使用測(cè)試工具連接待測(cè)系統(tǒng)、配置并執(zhí)行測(cè)試任務(wù),最終根據(jù)定義的三個(gè)評(píng)測(cè)指標(biāo)輸出評(píng)測(cè)結(jié)果。綜上所述,本文形式化定義了社交流模型以及相關(guān)特征。提出產(chǎn)生單鏈接和多鏈接社交流數(shù)據(jù)的架構(gòu)、模型和生成算法。用戶可以根據(jù)需求來配置社交流數(shù)據(jù)生成器,使其產(chǎn)生指定數(shù)據(jù)分布、指定類型的數(shù)據(jù)。為了能夠高吞吐量產(chǎn)生大規(guī)模社交流數(shù)據(jù),本文設(shè)計(jì)并實(shí)現(xiàn)了一個(gè)分布式生成系統(tǒng)。本文以社交流數(shù)據(jù)生成器為基礎(chǔ),設(shè)計(jì)了一個(gè)基于社交媒體查詢的評(píng)測(cè)基準(zhǔn)。
[Abstract]:Social exchange data refers to a data stream that records dynamic changes in the state of a social entity and the relationship between entities. It can be used to represent the changes in the state of the entity in a large number of applications, such as the dynamic of the user's release and forwarding of information in the social media data, the reference between the benefit and technology documents, the transmission of data among the nodes in the distributed system, and so on. The traditional network and flow data are different. It is a series of entity state data flow and dynamic changing network data. It is the combination of graph data and stream data. It is the complex characteristics of graph and stream data of social exchange data. Therefore, social exchange data has huge commercial and research value, effective data management and excavation. Mining is the focus of both the academia and the industry. At present, a variety of technologies can be used to manage or deal with the exchange of data, and how to select a suitable data generator for applications is a problem to be solved for benchmarking. However, the benchmarks are usually not available for consideration of privacy and the large amount of data that is not easy to transfer. Real data evaluation system. Therefore, a flexible, efficient generation of large scale "real" artificial data is of great significance. In order to achieve high throughput, this paper designs and implements a distributed generation system for generating data exchange. In addition, based on the social communication data generator, this paper designs a benchmarking based on social media analysis query. The full text is focused on the problem of social exchange. The main contributions include the following four aspects: 1., a method based on the human dynamics model and the time series growth network model to generate single chain connection exchange data is proposed. The social projects in single chain connection exchanges can be connected to one historical project at most. The method uses two slow events. One buffer zone is the future project buffer: it stores the future social projects that the producer will release in the future. The other is the recent project buffer pool, which preserves the recent historical project of the size of the specified window. In the process of iteration of the two buffer zones, this party The method uses the human dynamics model to generate social projects without link information for each producer, using the time series growth network model to determine the link information of social projects. Users can generate a specified size, data distribution and type of data set by configuring parameters. Quantity and memory occupy a continuous generation of "real" single chain communication data..2. proposes a method based on human dynamics model and network generation model to generate multi link societies to exchange data. There are new requirements for the link generation of social projects. Based on the method of generating single chain link exchange data, this method also uses two buffer pools to iterate update methods in order to exchange data in the real estate multi link society. In the production of link information for social projects, the extended time sequence growth model and the edge replication model can be used for the production of social project. It produces link information. Through experimental verification, the cooperative data generated by the extended time series growth model can match the real data distribution better. The method of generating multi link communication based on the extended time series growth model can continue to live a real "real" multi link exchange with stable throughput and memory occupation. According to.3., a distributed generation society exchange data system is implemented. In order to achieve high throughput, the system can generate large scale social exchange data. This system can distribute data from single link and multi link society. The system uses a single host and multiple work nodes to generate data. Using the method of generating single link and multi link data exchange data, on the basis of using the time series growth model to generate link information, the cooperative data in the designated producer partition is generated. The host combines the social communication from each node to produce the final global social flow data. The system uses the distributed link generation method. The step model and the delay update strategy implement the details of the distributed production data. The experiment shows that the distributed data generation system can achieve linear growth of production throughput by increasing nodes, based on social communication data generator, based on the premise of producing undistorted data, and design a social based.4. based on social communication data generator. Social media service has become one of the most popular services on the Internet, social media data is a typical social exchange data. This paper designs a benchmarking BSMA based on social media data analysis, which includes data support, load generator, and a performance testing tool. The generator part defines the data model of social media, and on this basis defines 4 classes of 24 query templates, and provides a parameter generator that produces different parameter values to the query task according to the requirements. The data support section not only provides a real Sina micro-blog data set, but also provides the generation of social exchange data. The BSMA-Gen.BSMA-Gen uses the method of exchanging data provided by the generator in this article to produce data.24 query templates containing multiple queries based on time series and link relations based on social communication data. BSMA-Gen can be used as data support for this type of query. In conclusion, this paper formally defines the social communication model and the related features. This paper presents a framework, model and generation algorithm for generating the data of single link and multi link society. Users can configure the social data generator to produce the specified data according to the requirements. A distributed generation system is designed and implemented in this paper to generate large scale social exchange data for high throughput. This paper designs a benchmarking based on social media query based on social media query.

【學(xué)位授予單位】:華東師范大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP311.13

【相似文獻(xiàn)】

相關(guān)期刊論文 前10條

1 金澈清,錢衛(wèi)寧,周傲英;流數(shù)據(jù)分析與管理綜述[J];軟件學(xué)報(bào);2004年08期

2 聶國梁;盧正鼎;;流數(shù)據(jù)實(shí)時(shí)近似求和的算法研究[J];小型微型計(jì)算機(jī)系統(tǒng);2005年10期

3 李衛(wèi)民;于守健;駱軼姝;樂嘉錦;;流數(shù)據(jù)管理的降載技術(shù):研究進(jìn)展[J];計(jì)算機(jī)科學(xué);2007年06期

4 李子杰;鄭誠;;流數(shù)據(jù)和傳統(tǒng)數(shù)據(jù)存儲(chǔ)及管理方法比較研究[J];計(jì)算機(jī)技術(shù)與發(fā)展;2009年04期

5 潘靜;于宏偉;;流數(shù)據(jù)管理降載技術(shù)研究綜述[J];中國管理信息化;2009年21期

6 鄒永貴;龔海平;夏英;宋強(qiáng);;一種面向流數(shù)據(jù)頻繁項(xiàng)挖掘的降載策略[J];計(jì)算機(jī)應(yīng)用研究;2011年04期

7 聶國梁;盧正鼎;聶國棟;;流數(shù)據(jù)近似統(tǒng)計(jì)算法研究[J];計(jì)算機(jī)科學(xué);2005年04期

8 魏晶晶;金培權(quán);龔育昌;岳麗華;;基于流數(shù)據(jù)的大對(duì)象數(shù)據(jù)緩沖機(jī)制[J];計(jì)算機(jī)工程;2006年11期

9 楊立;;基于權(quán)重的流數(shù)據(jù)頻繁項(xiàng)挖掘算法的應(yīng)用[J];微型機(jī)與應(yīng)用;2011年02期

10 尹為;張成虎;楊彬;;基于流數(shù)據(jù)頻繁項(xiàng)挖掘的可疑金融交易識(shí)別研究[J];西安交通大學(xué)學(xué)報(bào)(社會(huì)科學(xué)版);2011年05期

相關(guān)會(huì)議論文 前3條

1 劉正濤;毛宇光;吳莊;;一種新的流數(shù)據(jù)模型及其擴(kuò)展[A];第二十二屆中國數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集(技術(shù)報(bào)告篇)[C];2005年

2 姚春芬;陳紅;;分布偏斜的流數(shù)據(jù)上的一種直方圖維護(hù)算法[A];第二十三屆中國數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集(技術(shù)報(bào)告篇)[C];2006年

3 孫煥良;趙法信;鮑玉斌;于戈;王大玲;;CD-Stream——一種基于空間劃分的流數(shù)據(jù)密度聚類算法[A];第二十一屆中國數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集(研究報(bào)告篇)[C];2004年

相關(guān)博士學(xué)位論文 前7條

1 丁智國;流數(shù)據(jù)在線異常檢測(cè)方法研究[D];上海大學(xué);2015年

2 于程程;面向評(píng)測(cè)基準(zhǔn)的社交流數(shù)據(jù)生成[D];華東師范大學(xué);2016年

3 聶國梁;流數(shù)據(jù)統(tǒng)計(jì)算法研究[D];華中科技大學(xué);2006年

4 劉建偉;流數(shù)據(jù)查詢系統(tǒng)結(jié)構(gòu)及模式查詢算法的研究[D];東華大學(xué);2005年

5 李衛(wèi)民;流數(shù)據(jù)查詢算法若干關(guān)鍵技術(shù)研究[D];東華大學(xué);2008年

6 屠莉;流數(shù)據(jù)的頻繁項(xiàng)挖掘及聚類的關(guān)鍵技術(shù)研究[D];南京航空航天大學(xué);2009年

7 陳筠翰;車載網(wǎng)絡(luò)的若干關(guān)鍵技術(shù)研究[D];吉林大學(xué);2014年

相關(guān)碩士學(xué)位論文 前10條

1 肖丙賢;大規(guī)模流數(shù)據(jù)聚集查詢服務(wù)的生成與優(yōu)化[D];北方工業(yè)大學(xué);2016年

2 劉曉斐;分布式流處理系統(tǒng)操作共享優(yōu)化算法研究[D];吉林大學(xué);2016年

3 張媛;基于彈性分布式數(shù)據(jù)集的流數(shù)據(jù)聚類分析[D];華東師范大學(xué);2016年

4 王曾亦;基于內(nèi)存計(jì)算的流數(shù)據(jù)處理在飛行大數(shù)據(jù)的研究與應(yīng)用[D];電子科技大學(xué);2016年

5 王紹輝;流數(shù)據(jù)協(xié)議特征分析[D];電子科技大學(xué);2016年

6 馬可;基于Storm的流數(shù)據(jù)聚類挖掘算法的研究[D];南京郵電大學(xué);2016年

7 鄭詩敏;云環(huán)境下流數(shù)據(jù)關(guān)鍵字的實(shí)時(shí)查詢處理技術(shù)研究[D];南京航空航天大學(xué);2016年

8 牛牧;基于Kafka的大規(guī)模流數(shù)據(jù)分布式緩存與分析平臺(tái)[D];吉林大學(xué);2016年

9 孔祥佳;基于海洋平臺(tái)監(jiān)測(cè)的流數(shù)據(jù)管理研究[D];大連理工大學(xué);2015年

10 張金川;基于反饋機(jī)制的流數(shù)據(jù)查詢[D];蘭州大學(xué);2007年

,

本文編號(hào):1852266

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/1852266.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶b3c0c***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com