天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 軟件論文 >

大數(shù)據(jù)環(huán)境下群體計(jì)算任務(wù)分配和關(guān)聯(lián)分析算法的優(yōu)化研究

發(fā)布時(shí)間:2019-05-19 08:41
【摘要】:隨著大數(shù)據(jù)時(shí)代到來(lái),數(shù)據(jù)規(guī)模劇增。盡管大數(shù)據(jù)帶來(lái)了豐富的信息和知識(shí),但大數(shù)據(jù)的規(guī)模繁雜性、高速增長(zhǎng)性、形式多樣性和價(jià)值密度低等特點(diǎn)也對(duì)傳統(tǒng)數(shù)據(jù)處理技術(shù)帶來(lái)了嚴(yán)峻的挑戰(zhàn)。因此,亟須適應(yīng)于大數(shù)據(jù)環(huán)境的大數(shù)據(jù)處理技術(shù)。大數(shù)據(jù)處理技術(shù)可以劃分為:人機(jī)協(xié)作群體計(jì)算技術(shù)和數(shù)據(jù)處理算法技術(shù)。本文已在這兩方面開(kāi)展了研究工作,取得了如下兩方面的研究成果:(一)在人機(jī)協(xié)作群體計(jì)算技術(shù)方面,針對(duì)大數(shù)據(jù)任務(wù)對(duì)復(fù)雜認(rèn)知推理技術(shù)的依賴(lài)問(wèn)題,主要通過(guò)優(yōu)化群體計(jì)算的方法來(lái)解決。其中合理的分配策略是計(jì)算的重要階段,本文提出了一種基于用戶(hù)主題精確感知的大數(shù)據(jù)群體任務(wù)分配算法。為了提高計(jì)算的準(zhǔn)確率,首先通過(guò)基于自適應(yīng)模糊聚類(lèi)與主題提取模型相結(jié)合的方法,提取已發(fā)布群體任務(wù)的主題;然后構(gòu)建特定群體任務(wù)模型和用戶(hù)模型,并計(jì)算關(guān)聯(lián)度;再利用已提交高質(zhì)量答案的歷史任務(wù)迭代地檢測(cè)新用戶(hù)的真實(shí)主題并計(jì)算初始準(zhǔn)確率;其次,通過(guò)邏輯回歸(LR)方法預(yù)測(cè)用戶(hù)能參與到某類(lèi)任務(wù)的可能性并得到參與用戶(hù)候選序列,在充分了解用戶(hù)真實(shí)主題和對(duì)應(yīng)主題上的準(zhǔn)確率以及用戶(hù)誠(chéng)信度的情況下進(jìn)行精準(zhǔn)分配。最后通過(guò)實(shí)驗(yàn),驗(yàn)證本文所研究算法更精準(zhǔn),尤其適用于大數(shù)據(jù)環(huán)境,并一定程度上節(jié)約了隨機(jī)算法需多次重復(fù)分配確保準(zhǔn)確率的花銷(xiāo)。(二)在數(shù)據(jù)處理算法技術(shù)方面,針對(duì)海量數(shù)據(jù)處理的效率需求問(wèn)題,本文提出了一種基于云計(jì)算的改進(jìn)算法并行化方法。傳統(tǒng)算法已不能滿(mǎn)足對(duì)大數(shù)據(jù)的處理需求。其中,關(guān)聯(lián)分析算法是數(shù)據(jù)處理技術(shù)的研究熱點(diǎn)之一。本文對(duì)關(guān)聯(lián)分析Apriori算法的改進(jìn)工作主要包括兩部分內(nèi)容:首先,提出了一種基于矩陣的Apriori算法改進(jìn)方法(M_Apriori),該方法的創(chuàng)新之處在于構(gòu)造矩陣的方式和計(jì)算步驟的改變,算法采用基于矩陣的數(shù)據(jù)結(jié)構(gòu)進(jìn)行存儲(chǔ)與處理,只需掃描一次數(shù)據(jù)庫(kù),減少了數(shù)據(jù)庫(kù)I/O開(kāi)銷(xiāo),通過(guò)構(gòu)造支持頻數(shù)矩陣,利用邏輯“與”運(yùn)算對(duì)算法核心操作步驟(連接與剪枝)進(jìn)行改進(jìn),并進(jìn)行了理論驗(yàn)證與分析。然后提出了一種基于Spark的M_Apriori算法并行化方法(SPM_Apriori),該方法采用數(shù)據(jù)并行和局部代替全局策略,充分利用Spark基于內(nèi)存計(jì)算、RDD存儲(chǔ)數(shù)據(jù)項(xiàng)等優(yōu)勢(shì),通過(guò)對(duì)M_Aprior算法進(jìn)行并行化設(shè)計(jì),并移植到Spark平臺(tái)進(jìn)行并行化實(shí)現(xiàn),豐富了Spark MLlib。最后,對(duì)算法進(jìn)行實(shí)驗(yàn),驗(yàn)證本文算法取得了較好的效果。
[Abstract]:With the advent of the big data era, the scale of data has increased dramatically. Although big data has brought rich information and knowledge, big data's complicated scale, high speed growth, low form diversity and low value density have also brought severe challenges to the traditional data processing technology. Therefore, it is urgent to adapt to big data environment big data treatment technology. Big data processing technology can be divided into: man-machine cooperation group computing technology and data processing algorithm technology. In this paper, the research work has been carried out in these two aspects, and the following two research results have been obtained: (1) in the aspect of human-computer cooperative group computing technology, aiming at the dependence of big data's task on complex cognitive reasoning technology, It is mainly solved by optimizing the method of group calculation. Among them, reasonable allocation strategy is an important stage of computing. In this paper, a big data group task allocation algorithm based on user topic accurate perception is proposed. In order to improve the accuracy of calculation, the topic of published group task is extracted by combining adaptive fuzzy clustering with topic extraction model, and then the specific group task model and user model are constructed, and the correlation degree is calculated. Then the historical tasks that have submitted high quality answers are used to iteratively detect the real topics of new users and calculate the initial accuracy. Secondly, the logical regression (LR) method is used to predict the possibility that the user can participate in a certain kind of task and get the candidate sequence of the participating user. Accurate allocation is carried out with a full understanding of the accuracy of the real topic and the corresponding topic of the user, as well as the credibility of the user. Finally, the experimental results show that the algorithm studied in this paper is more accurate, especially suitable for big data environment, and to a certain extent, it saves the cost of multiple repeated allocation of random algorithms to ensure the accuracy of the algorithm. (2) in the aspect of data processing algorithm technology, aiming at the efficiency requirement of massive data processing, this paper proposes an improved algorithm parallelization method based on cloud computing. The traditional algorithm can no longer meet the processing needs of big data. Among them, association analysis algorithm is one of the research hotspots of data processing technology. In this paper, the improvement of Apriori algorithm for association analysis mainly includes two parts: firstly, an improved method of Apriori algorithm based on matrix (M_Apriori) is proposed. The innovation of this method lies in the change of the way of constructing matrix and the calculation step. The algorithm uses the data structure based on matrix to store and process, and only needs to scan the database once, which reduces the overhead of database I 鈮,

本文編號(hào):2480574

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2480574.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶(hù)68112***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com