天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 教育論文 > 高等教育論文 >

基于Hadoop的校園卡數(shù)據(jù)挖掘的研究與實(shí)現(xiàn)

發(fā)布時(shí)間:2018-06-24 01:43

  本文選題:校園卡 + hadoop; 參考:《南昌航空大學(xué)》2017年碩士論文


【摘要】:隨著高校內(nèi)的各種業(yè)務(wù)系統(tǒng)不斷增加,高校內(nèi)積累的師生數(shù)據(jù)急劇的增長(zhǎng),已經(jīng)形成了典型的大數(shù)據(jù)環(huán)境。校園卡作為數(shù)字校園的一部分,存儲(chǔ)著所有師生的各種校內(nèi)活動(dòng)的記錄,包括食堂餐飲消費(fèi)記錄、開(kāi)水消費(fèi)記錄、超市購(gòu)物記錄、圖書(shū)館出入記錄、電費(fèi)繳納記錄、圖書(shū)借閱記錄、體育場(chǎng)館使用記錄等。這些記錄當(dāng)中隱藏著大量有價(jià)值的信息,但我們很難憑借直觀的感覺(jué)發(fā)現(xiàn)它,必須通過(guò)數(shù)據(jù)挖掘的方法挖掘出來(lái)。通過(guò)對(duì)這些數(shù)據(jù)的深入挖掘,發(fā)現(xiàn)其中的有價(jià)值的信息,學(xué)校管理者就能對(duì)師生的消費(fèi)規(guī)律、學(xué)習(xí)情況有一個(gè)更理性、清晰的認(rèn)識(shí)。這將為高校資源的合理分配,校園的規(guī)劃建設(shè)及師生的管理等工作提供有價(jià)值的參考。本文基于校園卡近幾年來(lái)產(chǎn)生的大量數(shù)據(jù),采用主流的Hadoop生態(tài)下的大數(shù)據(jù)處理框架進(jìn)行校園卡數(shù)據(jù)的清洗、分析、挖掘等工作。首先,本文分析了挖掘校園卡數(shù)據(jù)的重要性及其相關(guān)技術(shù)的研究現(xiàn)狀。然后對(duì)數(shù)據(jù)挖掘中使用的Hadoop相關(guān)技術(shù)(HDFS文件系統(tǒng),Hive數(shù)據(jù)倉(cāng)庫(kù),MapReduce分布式計(jì)算框架)、FP-Growth算法及決策樹(shù)算法進(jìn)行了介紹。最后,采用sqoop、Hive等技術(shù),對(duì)校園卡數(shù)據(jù)建立以校園消費(fèi)為主題的數(shù)據(jù)倉(cāng)庫(kù)。在此數(shù)據(jù)倉(cāng)庫(kù)之上做了以下三項(xiàng)工作:第一、統(tǒng)計(jì)各個(gè)時(shí)間段中各食堂就餐人數(shù),發(fā)現(xiàn)了在校就餐人數(shù)的周期性變化,對(duì)學(xué)校的早中晚就餐高峰時(shí)間也有了一個(gè)更直觀的認(rèn)識(shí)。第二、統(tǒng)計(jì)學(xué)生的各類(lèi)消費(fèi)金額,使用C4.5決策樹(shù)算法建立學(xué)生貧困程度預(yù)測(cè)模型,通過(guò)剪枝等優(yōu)化手段后評(píng)估準(zhǔn)確率達(dá)到85.4%,對(duì)學(xué)校的貧困生評(píng)定有一定的參考價(jià)值。第三、統(tǒng)計(jì)學(xué)生常去商家,運(yùn)用FP-Growth算法挖掘出大量頻繁模式,得出大量學(xué)生與商戶(hù)之間、商戶(hù)與商戶(hù)之間的關(guān)聯(lián)規(guī)則,使得學(xué)校及商戶(hù)對(duì)學(xué)生的消費(fèi)習(xí)慣有更清晰的認(rèn)識(shí)。目前大多數(shù)高校的信息化平臺(tái)還只關(guān)注在建立事務(wù)管理系統(tǒng),對(duì)數(shù)據(jù)挖掘的運(yùn)用還不多見(jiàn)。相信隨著大數(shù)據(jù)、機(jī)器學(xué)習(xí)等技術(shù)的不斷發(fā)展,校園數(shù)據(jù)的分析挖掘在輔助學(xué)校管理中將會(huì)扮演越來(lái)越重要的角色。
[Abstract]:With the increasing of various business systems in colleges and universities, the accumulation of data between teachers and students in colleges and universities is increasing rapidly, which has formed a typical big data environment. As part of the digital campus, the campus card stores records of all kinds of campus activities of teachers and students, including dining hall consumption records, boiling water consumption records, supermarket shopping records, library entry and exit records, and electricity payment records. Books borrowing records, stadiums use records, etc. There is a lot of valuable information hidden in these records, but it is difficult to find it by intuitive sense, which must be mined by the method of data mining. Through the deep mining of these data, find out the valuable information, the school administrator can have a more rational and clear understanding of the law of the consumption of teachers and students and the situation of learning. This will provide valuable reference for the rational allocation of university resources, the planning and construction of campus and the management of teachers and students. Based on a large number of data generated by campus cards in recent years, this paper adopts the big data processing framework under the mainstream Hadoop ecology to clean, analyze and mine campus card data. Firstly, this paper analyzes the importance of mining campus card data and the research status of related technologies. Then the FP-Growth algorithm and decision tree algorithm are introduced for Hadoop related technology (HDFS file system Hive data warehouse / MapReduce distributed computing framework) used in data mining. Finally, using sqoop Hive and other technologies, the campus card data warehouse with the theme of campus consumption is established. The following three tasks have been done on this data warehouse: first, the number of dining rooms in each time period has been counted, the periodic changes of the number of diners in school have been found, and a more intuitive understanding of the peak time of the morning, middle and late dining in the school has also been obtained. Secondly, statistics of all kinds of consumption amount of students, using C4.5 decision tree algorithm to establish the model of student poverty degree prediction, through pruning and other optimization means, the accuracy of evaluation reached 85.40.It has a certain reference value for the evaluation of poor students in schools. Thirdly, statistics students often go to business, use FP-Growth algorithm mining out a large number of frequent patterns, get a large number of students and merchants, business and business between the association rules, so that schools and businesses have a clearer understanding of students' consumption habits. At present, the information platform of most colleges and universities only pays attention to the establishment of transaction management system, and the application of data mining is rare. It is believed that with the development of big data, machine learning and other technologies, the analysis and mining of campus data will play a more and more important role in assisting school management.
【學(xué)位授予單位】:南昌航空大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:G647;TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 張亮;寧芊;;CART決策樹(shù)的兩種改進(jìn)及應(yīng)用[J];計(jì)算機(jī)工程與設(shè)計(jì);2015年05期

2 先曉兵;陳鳳;王繼元;王加年;;基于大數(shù)據(jù)的高校學(xué)生管理工作研究與實(shí)踐[J];中國(guó)教育信息化;2015年10期

3 李學(xué)龍;龔海剛;;大數(shù)據(jù)系統(tǒng)綜述[J];中國(guó)科學(xué):信息科學(xué);2015年01期

4 呂偉;張祥云;葉逢福;賴(lài)勇強(qiáng);;“智慧校園”浪潮下的高教變革展望[J];高教探索;2014年04期

5 吳學(xué)雁;莫贊;;基于Aproiri算法的頻繁項(xiàng)集挖掘優(yōu)化方法[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2014年06期

6 孫大為;張廣艷;鄭緯民;;大數(shù)據(jù)流式計(jì)算:關(guān)鍵技術(shù)及系統(tǒng)實(shí)例[J];軟件學(xué)報(bào);2014年04期

7 胡欽太;鄭凱;林南暉;;教育信息化的發(fā)展轉(zhuǎn)型:從“數(shù)字校園”到“智慧校園”[J];中國(guó)電化教育;2014年01期

8 劉師語(yǔ);周淵平;杜江;;基于HADOOP分布式系統(tǒng)的數(shù)據(jù)處理分析[J];通信技術(shù);2013年09期

9 張霄宏;海林鵬;賈宗璞;沈記全;趙文濤;;同構(gòu)Hadoop環(huán)境作業(yè)執(zhí)行時(shí)間計(jì)算方法[J];計(jì)算機(jī)工程與應(yīng)用;2014年10期

10 申德榮;于戈;王習(xí)特;聶鐵錚;寇月;;支持大數(shù)據(jù)管理的NoSQL系統(tǒng)研究綜述[J];軟件學(xué)報(bào);2013年08期

相關(guān)碩士學(xué)位論文 前10條

1 王小雨;基于校園一卡通開(kāi)水?dāng)?shù)據(jù)的分析[D];北京化工大學(xué);2015年

2 張建明;基于數(shù)據(jù)挖掘的高校貧困生認(rèn)定系統(tǒng)設(shè)計(jì)和分析[D];東南大學(xué);2015年

3 陳堯;支持多計(jì)算模式的大數(shù)據(jù)系統(tǒng)的研究[D];電子科技大學(xué);2015年

4 李鐵;面向海量小文件存取的HDFS優(yōu)化研究[D];東華大學(xué);2015年

5 盧俊華;HDFS存儲(chǔ)機(jī)制的分析與研究[D];武漢理工大學(xué);2014年

6 李俊;基于塊聚集的MapReduce性能研究與優(yōu)化[D];北京交通大學(xué);2014年

7 郝向濤;基于Hadoop的分布式文件系統(tǒng)技術(shù)分析及應(yīng)用[D];武漢理工大學(xué);2013年

8 亢潔;云環(huán)境下面向數(shù)據(jù)密集型應(yīng)用的數(shù)據(jù)布局策略研究[D];南京郵電大學(xué);2013年

9 王磊;一種高性能HDFS存儲(chǔ)平臺(tái)的研究與實(shí)現(xiàn)[D];西安電子科技大學(xué);2013年

10 胡昊;Key-Value數(shù)據(jù)存儲(chǔ)加速技術(shù)研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2012年

,

本文編號(hào):2059412

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/jiaoyulunwen/gaodengjiaoyulunwen/2059412.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶(hù)dc539***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com