基于Spark的高考推薦系統(tǒng)設(shè)計與實現(xiàn)
本文選題:大數(shù)據(jù) 切入點:推薦系統(tǒng) 出處:《山東師范大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
【摘要】:為了解決用戶無法獲取有價值信息和信息無法被需要的用戶所利用的困境,人們提出推薦系統(tǒng)的概念。隨著大數(shù)據(jù)時代的到來,推薦系統(tǒng)也開始面臨難以處理海量數(shù)據(jù)的困境,為了走出困境,與大數(shù)據(jù)處理技術(shù)相結(jié)合是必然的趨勢。Spark作為大數(shù)據(jù)處理技術(shù)中的佼佼者,提出了RDD的數(shù)據(jù)模型與基于內(nèi)存的計算模式,現(xiàn)已被廣泛應(yīng)用于電子商務(wù)、視頻、社交等領(lǐng)域。但在教育領(lǐng)域內(nèi),無論是推薦系統(tǒng)還是大數(shù)據(jù)處理技術(shù),都涉及較少。高考作為教育領(lǐng)域中的大事件,其志愿填報更是考生關(guān)注的焦點。歷年的考生志愿錄取信息作為考生志愿填報的重要參考數(shù)據(jù),因其數(shù)據(jù)龐大且復(fù)雜的特點造成其利用率極低。本文將推薦系統(tǒng)與大數(shù)據(jù)處理框架Spark相結(jié)合,應(yīng)用于推薦系統(tǒng)與Spark較少涉及的教育領(lǐng)域,幫助考生解決高考志愿填報環(huán)節(jié)的志愿選擇問題。本文完成的工作有以下幾點:(1)利用HTML+CSS級聯(lián)樣式表+JSP的前端開發(fā)技術(shù),設(shè)計開發(fā)了高考志愿推薦的Web前端界面。其中包括用戶注冊界面、用戶登錄界面、志愿推薦結(jié)果展示界面以及相關(guān)高考信息(政策、新聞、高校信息與專業(yè)信息)的瀏覽界面。在保證本系統(tǒng)實用性和易用性的同時為用戶提供良好的交互體驗。(2)以Web前端作為用戶日志的生產(chǎn)方,設(shè)計性能良好的日志收集模塊。首先,采用Flume日志收集工具收集日志信息;其次,通過Sink組件將收集到的信息傳送給Kafka消息中間件,利用其功能對日志信息進(jìn)行統(tǒng)一下發(fā);最后,使用Spark Streaming流式處理框架對Kafka中收集到的日志信息進(jìn)行清理與提取,并將其存儲于HDFS文件系統(tǒng)中。(3)設(shè)計高考志愿場景下的志愿推薦引擎。首先,通過閱讀大量高考志愿填報文獻(xiàn),選取合適的用戶屬性,計算相似性,建立相似矩陣,尋找相似用戶;其次,分析幾種最常見的推薦算法,結(jié)合高考志愿填報的真實場景選擇基于用戶的協(xié)同過濾算法作為本系統(tǒng)的推薦算法;最后通過Spark計算框架的并行化計算方式生成最終的推薦列表。(4)搭建Spark分布式集群開發(fā)環(huán)境,實現(xiàn)系統(tǒng)整體的開發(fā)和相關(guān)測試。首先,閱讀相關(guān)文檔,在實驗室實際環(huán)境中搭建具有三個節(jié)點的Spark分布式集群開發(fā)環(huán)境;其次,使用Scala語言編寫相關(guān)代碼,實現(xiàn)系統(tǒng)開發(fā);最后,系統(tǒng)開發(fā)完成后對日志收集工具以及Spark相關(guān)組件進(jìn)行性能,確保系統(tǒng)正確高效運行,同時對推薦結(jié)果準(zhǔn)確度以及整體系統(tǒng)滿意度進(jìn)行測試,保證用戶的良好體驗。
[Abstract]:In order to solve the dilemma that users can not obtain valuable information and information can not be used by users, people put forward the concept of recommendation system. With the arrival of big data era, recommendation system also began to face the dilemma of dealing with massive data. In order to get out of the dilemma, it is an inevitable trend to combine with big data's processing technology. As a leader in big data processing technology, Spark has put forward the data model and memory-based computing model of RDD, which has been widely used in electronic commerce, video, etc. But in the field of education, neither the recommendation system nor big data's handling techniques are involved. College entrance examination is a major event in the field of education. It is the focus that candidates pay more attention to. The information of candidates' voluntary admission over the years is regarded as an important reference data for candidates to fill in voluntary information. Because of its huge and complex data, its utilization rate is very low. This paper combines the recommendation system with big data processing framework Spark, and applies it to the educational field which is seldom involved in recommendation system and Spark. To help the examinee solve the problem of volunteer selection in the process of filling in the college entrance examination. The work accomplished in this paper is as follows: 1) using the front-end development technology of HTML CSS cascading style sheet JSP, The Web front-end interface of college entrance examination voluntary recommendation is designed and developed, which includes user registration interface, user login interface, volunteer recommendation result display interface and related college entrance examination information (policy, news, etc.). The browsing interface of university information and professional information. While ensuring the practicability and ease of use of this system, it provides a good interactive experience for users. The Web front-end is used as the producer of user log, and a log collection module with good performance is designed. The Flume log collection tool is used to collect log information. Secondly, the collected information is transported to the Kafka message middleware through Sink components, and the log information is distributed uniformly using its functions. Finally, The Spark Streaming streaming processing framework is used to clean up and extract the log information collected in Kafka, and it is stored in the HDFS file system. Select appropriate user attributes, calculate similarity, build similarity matrix, find similar users. Secondly, analyze several common recommendation algorithms. Combined with the real scene of college entrance examination voluntary report, the user-based collaborative filtering algorithm is selected as the recommendation algorithm of the system. Finally, the final recommendation list. 4 is generated by parallelizing the Spark computing framework. Finally, the distributed cluster development environment of Spark is built. First, read the relevant documents, build a three-node Spark distributed cluster development environment in the laboratory environment; secondly, use Scala language to write the relevant code to realize the system development. Finally, after the development of the system, log collection tools and Spark components are performed to ensure the correct and efficient operation of the system. At the same time, the accuracy of the recommended results and the overall system satisfaction are tested to ensure the user's good experience.
【學(xué)位授予單位】:山東師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 陳功平;王紅;;改進(jìn)Pearson相關(guān)系數(shù)的個性化推薦算法[J];山東農(nóng)業(yè)大學(xué)學(xué)報(自然科學(xué)版);2016年06期
2 徐海榮;陳閔葉;張興媛;;基于Flume、Kafka、Storm、HDFS的航空維修大數(shù)據(jù)系統(tǒng)[J];上海工程技術(shù)大學(xué)學(xué)報;2015年04期
3 楊忠斌;;高考學(xué)生填報志愿制約因素分析及對策[J];管理觀察;2015年12期
4 LIU Qingwen;XIONG Yan;HUANG Wenchao;;Combining User-Based and Item-Based Models for Collaborative Filtering Using Stacked Regression[J];Chinese Journal of Electronics;2014年04期
5 孟小峰;慈祥;;大數(shù)據(jù)管理:概念、技術(shù)與挑戰(zhàn)[J];計算機(jī)研究與發(fā)展;2013年01期
6 肖燦;張自力;何小明;夏大飛;;基于商務(wù)智能的高考志愿填報指導(dǎo)系統(tǒng)設(shè)計與實現(xiàn)[J];西南師范大學(xué)學(xué)報(自然科學(xué)版);2012年05期
7 朱郁筱;呂琳媛;;推薦系統(tǒng)評價指標(biāo)綜述[J];電子科技大學(xué)學(xué)報;2012年02期
8 郭明昌;;志愿填報:是分盡其用,還是各取所得?[J];高校招生;2012年03期
9 王國霞;劉賀平;;個性化推薦系統(tǒng)綜述[J];計算機(jī)工程與應(yīng)用;2012年07期
10 楊博;趙鵬飛;;推薦算法綜述[J];山西大學(xué)學(xué)報(自然科學(xué)版);2011年03期
相關(guān)碩士學(xué)位論文 前5條
1 高大月;基于Hadoop的數(shù)據(jù)倉庫引擎的設(shè)計與實現(xiàn)[D];北京交通大學(xué);2015年
2 楊志偉;基于Spark平臺推薦系統(tǒng)研究[D];中國科學(xué)技術(shù)大學(xué);2015年
3 胡于響;基于Spark的推薦系統(tǒng)的設(shè)計與實現(xiàn)[D];浙江大學(xué);2015年
4 唐振坤;基于Spark的機(jī)器學(xué)習(xí)平臺設(shè)計與實現(xiàn)[D];廈門大學(xué);2014年
5 肖燦;數(shù)據(jù)挖掘系統(tǒng)支撐下的高考志愿填報在線咨詢系統(tǒng)設(shè)計與實現(xiàn)[D];西南大學(xué);2012年
,本文編號:1643171
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1643171.html