協(xié)同過濾算法在教育數(shù)據(jù)挖掘中學(xué)生成績(jī)預(yù)測(cè)的研究
發(fā)布時(shí)間:2018-04-08 17:42
本文選題:教育數(shù)據(jù)挖掘 切入點(diǎn):K最近鄰 出處:《昆明理工大學(xué)》2016年碩士論文
【摘要】:目前,在學(xué)校對(duì)學(xué)生的教育中累積了大量較為明顯的各種數(shù)據(jù),例如學(xué)生的入學(xué)率、輟學(xué)率以及學(xué)生各科成績(jī)分?jǐn)?shù)的數(shù)據(jù),具體到課堂中的學(xué)生回答題目的正確率、知識(shí)點(diǎn)的掌握程度等信息。顯然,這些教育領(lǐng)域中的各種數(shù)據(jù)是不斷變化的,會(huì)隨著信息化的發(fā)展而累積增加,那么如何提取這些復(fù)雜繁冗數(shù)據(jù)中有用的信息,具有很好的研究?jī)r(jià)值。本文結(jié)合協(xié)同過濾算法在電子商務(wù)等領(lǐng)域數(shù)據(jù)分析的相似性,將協(xié)同過濾算法應(yīng)用到教育數(shù)據(jù)領(lǐng)域,重點(diǎn)對(duì)學(xué)生成績(jī)預(yù)測(cè)進(jìn)行研究,對(duì)KDD Cup 2010比賽中從ITS智能導(dǎo)師系統(tǒng)中選取的890萬條數(shù)據(jù)作為實(shí)驗(yàn)數(shù)據(jù)集,進(jìn)行學(xué)生成績(jī)預(yù)測(cè)的教育數(shù)據(jù)挖掘?qū)嵺`探索。實(shí)驗(yàn)數(shù)據(jù)集中特征量大,取值范圍也較大,多為文本類型的數(shù)據(jù),部分?jǐn)?shù)據(jù)稀疏等特點(diǎn)。針對(duì)這些問題,本文主要進(jìn)行如下工作:(1)采用漸進(jìn)抽樣方式,確定最優(yōu)的訓(xùn)練集樣本大小,大幅縮減訓(xùn)練集記錄量;結(jié)合數(shù)據(jù)集時(shí)間特性,抽取訓(xùn)練集最新的N個(gè)數(shù)據(jù);刪除隱含作答結(jié)果集空值比例大的特征,分離部分復(fù)雜結(jié)構(gòu)屬性。(2)將單一分類算法K最近鄰和奇異值分解模型SVD應(yīng)用到教育數(shù)據(jù)集中進(jìn)行驗(yàn)證,對(duì)測(cè)試集中的Correct First Attempt(CFA)屬性進(jìn)行預(yù)測(cè),并以此作為評(píng)價(jià)內(nèi)容,同時(shí)對(duì)比兩種算法的預(yù)測(cè)效果。(3)本文還依據(jù)兩種基礎(chǔ)算法的特點(diǎn)互補(bǔ)性,將SVD降維與K最近鄰算法相結(jié)合預(yù)測(cè)學(xué)生成績(jī)。進(jìn)行實(shí)驗(yàn)可以分析出,該算法能使數(shù)據(jù)稀疏性得到一定程度上的緩解,但只保留了數(shù)據(jù)的基本特征,因降維造成的部分?jǐn)?shù)據(jù)丟失對(duì)評(píng)價(jià)效果會(huì)造成些許影響。
[Abstract]:At present, a large number of obvious data have been accumulated in school-to-student education, such as student enrolment, drop-out rates, and scores of students in various subjects, in particular the correct rate of students answering questions in the classroom.Knowledge of the degree of mastery of information.Obviously, all kinds of data in the field of education are constantly changing and will accumulate with the development of information, so how to extract useful information from these complex and redundant data has good research value.Based on the similarity of collaborative filtering algorithm in electronic commerce and other fields, this paper applies the collaborative filtering algorithm to the field of educational data, and focuses on the research of student achievement prediction.8.9 million data selected from the ITS intelligent tutor system in the KDD Cup 2010 competition are used as experimental data sets to explore the educational data mining practice of student achievement prediction.The experimental data set features a large number of features and a large range of values, mostly text type data, part of the data sparse and so on.Aiming at these problems, this paper mainly carries out the following work: 1) to determine the optimal sample size of the training set, to reduce the record amount of the training set, to extract the latest N data of the training set according to the time characteristics of the training set, and to adopt the method of gradual sampling.Removing the feature of large proportion of null value in implicit answer result set, separating part of complex structure attribute. (2) the single classification algorithm K nearest neighbor and singular value decomposition model SVD are applied to educational data set to verify.This paper predicts the Correct First AttemptCFAs in the test set, and takes them as the evaluation contents. At the same time, the prediction results of the two algorithms are compared.SVD dimension reduction and K-nearest neighbor algorithm are combined to predict students' scores.Experimental results show that the algorithm can alleviate the data sparsity to a certain extent, but only retain the basic characteristics of the data. Some data loss caused by dimensionality reduction will have some impact on the evaluation effect.
【學(xué)位授予單位】:昆明理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP311.13
,
本文編號(hào):1722657
本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/1722657.html
最近更新
教材專著