基于校園資源云的Spark圖書推薦技術(shù)的研究

發(fā)布時(shí)間：2018-10-31 17:04

【摘要】：隨著高校信息化建設(shè)的推進(jìn)和深入,校園云平臺(tái)的建設(shè)成為各高校關(guān)注的焦點(diǎn)。建設(shè)校園資源云平臺(tái)能夠滿足和保障學(xué)校在各方面的需求,而且為校園大數(shù)據(jù)分析提供了高效可靠的計(jì)算存儲(chǔ)平臺(tái),本課題的研究依托于校園資源云平臺(tái),也因此獲得了強(qiáng)有力的信息化基礎(chǔ)設(shè)施的支撐。同時(shí),各種業(yè)務(wù)管理信息系統(tǒng)的廣泛應(yīng)用,使得數(shù)據(jù)不斷地積累,其中,圖書管理應(yīng)用系統(tǒng)積累了大量圖書流通歷史數(shù)據(jù),并且隨著時(shí)間的推進(jìn),系統(tǒng)內(nèi)的數(shù)據(jù)還在不斷增多,而這些數(shù)據(jù)背后潛藏著大量有價(jià)值的信息。為了更充分地利用圖書館圖書流通數(shù)據(jù),改善師生信息化體驗(yàn),本文對(duì)其進(jìn)行了更深入的分析研究,使師生獲得個(gè)性化圖書推薦服務(wù)。本文首先對(duì)校園資源云平臺(tái)進(jìn)行計(jì)算、存儲(chǔ)資源及平臺(tái)功能的設(shè)計(jì),然后以云平臺(tái)作為圖書推薦的測試和運(yùn)行平臺(tái),在其上搭建Spark集群,以HDFS為存儲(chǔ)系統(tǒng),Spark為計(jì)算平臺(tái),對(duì)圖書推薦技術(shù)進(jìn)行了研究。本文針對(duì)數(shù)據(jù)缺失和數(shù)據(jù)形式問題,對(duì)原始數(shù)據(jù)進(jìn)行了預(yù)處理,構(gòu)建了用戶-圖書評(píng)分矩陣。為解決數(shù)據(jù)稀疏性問題,本文采用了 ALS矩陣分解的協(xié)同過濾算法,然后將K-Means聚類算法融入ALS矩陣分解算法中以解決用戶冷啟動(dòng)問題,并針對(duì)K-Means算法屬性權(quán)重和初始值問題,利用加權(quán)歐式距離和最大最小值算法對(duì)其進(jìn)行了優(yōu)化。最后在Spark上實(shí)現(xiàn)算法,并設(shè)計(jì)實(shí)驗(yàn)進(jìn)行驗(yàn)證,針對(duì)不同的用戶實(shí)現(xiàn)了個(gè)性化圖書推薦。通過實(shí)驗(yàn),本文確定了 ALS矩陣分解算法的最優(yōu)參數(shù),證明了本文提出的混合推薦算法可以解決數(shù)據(jù)稀疏性問題和冷啟動(dòng)問題,并且K-Means算法的優(yōu)化提升了聚類效果,同時(shí),聚類算法的融入提高了預(yù)測準(zhǔn)確率和計(jì)算速度。最后,通過Spark平臺(tái)并行計(jì)算加速比驗(yàn)證了 Spark集群的優(yōu)勢(shì)。
[Abstract]:With the development of information construction in colleges and universities, the construction of campus cloud platform has become the focus of attention. The construction of campus resource cloud platform can meet and protect the needs of the school in all aspects, and provide an efficient and reliable computing storage platform for the analysis of campus big data. The research of this topic depends on the campus resource cloud platform. Because of this also obtained the strong information infrastructure support. At the same time, the extensive application of various business management information systems makes the data accumulate continuously. Among them, the library management application system accumulates a large number of historical data of the circulation of books, and with the development of time, the data in the system is increasing. And there's a lot of valuable information lurking behind these data. In order to make full use of the library book circulation data and improve the information experience of teachers and students, this paper makes a deeper analysis and research on it, so that teachers and students can get personalized book recommendation service. In this paper, the cloud platform of campus resources is first calculated, storage resources and platform functions are designed, then the cloud platform is used as the test and running platform of book recommendation, on which Spark cluster is built, HDFS as storage system and Spark as computing platform. This paper studies the technology of book recommendation. In order to solve the problem of missing data and data form, this paper preprocesses the original data and constructs the user-book scoring matrix. In order to solve the problem of data sparsity, this paper adopts the cooperative filtering algorithm of ALS matrix decomposition, and then integrates K-Means clustering algorithm into ALS matrix decomposition algorithm to solve the cold start problem of users. Aiming at the problem of attribute weight and initial value of K-Means algorithm, the weighted Euclidean distance and the maximum minimum algorithm are used to optimize the algorithm. Finally, the algorithm is implemented on Spark, and the experiment is designed to verify the implementation of personalized book recommendation for different users. Through experiments, the optimal parameters of ALS matrix decomposition algorithm are determined. It is proved that the proposed hybrid recommendation algorithm can solve the problem of data sparsity and cold start, and the optimization of K-Means algorithm can improve the clustering effect. The integration of clustering algorithm improves the prediction accuracy and computing speed. Finally, the advantage of Spark cluster is verified by parallel computing speedup on Spark platform.
【學(xué)位授予單位】：西安科技大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 吳金李;張建明;;基于二分K-means的協(xié)同過濾推薦算法[J];軟件導(dǎo)刊;2017年01期

2 董文俊;李艷;郎建華;張晨;沐士光;;云計(jì)算背景下的云存儲(chǔ)服務(wù)研究[J];中小企業(yè)管理與科技(上旬刊);2016年08期

3 李東興;;虛擬化技術(shù)及其在數(shù)據(jù)中心的應(yīng)用研究[J];中國教育技術(shù)裝備;2015年10期

4 李彥廣;;基于Spark+MLlib分布式學(xué)習(xí)算法的研究[J];商洛學(xué)院學(xué)報(bào);2015年02期

5 朱揚(yáng)勇;孫婧;;推薦系統(tǒng)研究進(jìn)展[J];計(jì)算機(jī)科學(xué)與探索;2015年05期

6 閆曉麗;;云計(jì)算安全問題[J];信息安全與技術(shù);2014年03期

7 曹磊;;世界云服務(wù)市場發(fā)展趨勢(shì)研究[J];競爭情報(bào);2013年03期

8 Jun Li;Baochun Li;;Erasure Coding for Cloud Storage Systems: A Survey[J];Tsinghua Science and Technology;2013年03期

9 龔強(qiáng);;當(dāng)代云計(jì)算發(fā)展研究現(xiàn)狀[J];測繪與空間地理信息;2013年05期

10 張建莉;;云存儲(chǔ)技術(shù)在高校信息化建設(shè)中的應(yīng)用分析[J];科技視界;2012年28期

相關(guān)博士學(xué)位論文前1條

1 黎明;云計(jì)算資源管理關(guān)鍵技術(shù)研究[D];電子科技大學(xué);2015年

相關(guān)碩士學(xué)位論文前10條

1 徐江輝;基于Hadoop的聚類協(xié)同過濾推薦算法研究及應(yīng)用[D];湖南大學(xué);2016年

2 陳傳瑜;基于聚類的協(xié)同過濾推薦算法研究[D];廣東工業(yè)大學(xué);2016年

3 楊志偉;基于Spark平臺(tái)推薦系統(tǒng)研究[D];中國科學(xué)技術(shù)大學(xué);2015年

4 王一霈;分布式全文檢索系統(tǒng)中索引平臺(tái)和信息過濾的研究與應(yīng)用[D];中國科學(xué)技術(shù)大學(xué);2015年

5 李文棟;基于Spark的大數(shù)據(jù)挖掘技術(shù)的研究與實(shí)現(xiàn)[D];山東大學(xué);2015年

6 胡于響;基于Spark的推薦系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];浙江大學(xué);2015年

7 謝歡;大數(shù)據(jù)挖掘中的并行算法研究及應(yīng)用[D];電子科技大學(xué);2015年

8 孫科;基于Spark的機(jī)器學(xué)習(xí)應(yīng)用框架研究與實(shí)現(xiàn)[D];上海交通大學(xué);2015年

9 王琪;基于聚類的商品推薦算法的研究與應(yīng)用[D];北京交通大學(xué);2014年

10 陳天昊;互聯(lián)網(wǎng)電影推薦方法的研究與實(shí)現(xiàn)[D];中國科學(xué)技術(shù)大學(xué);2014年

，

本文編號(hào)：2302976

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2302976.html

上一篇：結(jié)合壓縮感知和曲波的天文圖像去噪
下一篇：基于海量車牌識(shí)別數(shù)據(jù)的相似軌跡查詢方法

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于校園資源云的Spark圖書推薦技術(shù)的研究