應(yīng)用k-means算法實現(xiàn)標(biāo)記分布學(xué)習(xí)
發(fā)布時間:2018-07-28 16:36
【摘要】:標(biāo)記分布學(xué)習(xí)是近年來提出的一種新的機器學(xué)習(xí)范式,它能很好地解決某些標(biāo)記多義性的問題。現(xiàn)有的標(biāo)記分布學(xué)習(xí)算法均利用條件概率建立參數(shù)模型,但未能充分利用特征和標(biāo)記間的聯(lián)系。本文考慮到特征相似的樣本所對應(yīng)的標(biāo)記分布也應(yīng)當(dāng)相似,利用原型聚類的k均值算法(k-means),將訓(xùn)練集的樣本進行聚類,提出基于kmeans算法的標(biāo)記分布學(xué)習(xí)(label distribution learning based on k-means algorithm,LDLKM)。首先通過聚類算法kmeans求得每一個簇的均值向量,然后分別求得對應(yīng)標(biāo)記分布的均值向量。最后將測試集和訓(xùn)練集的均值向量間的距離作為權(quán)重,應(yīng)用到對測試集標(biāo)記分布的預(yù)測上。在6個公開的數(shù)據(jù)集上進行實驗,并與3種已有的標(biāo)記分布學(xué)習(xí)算法在5種評價指標(biāo)上進行比較,實驗結(jié)果表明提出的LDLKM算法是有效的。
[Abstract]:Label distributed learning is a new machine learning paradigm proposed in recent years. It can solve some problems of label polysemy. The existing algorithm of label distribution learning uses conditional probability to establish parameter model, but it fails to make full use of the relationship between feature and marker. In this paper, we consider that the label distribution of the samples with similar features should also be similar. Using the k-means algorithm (k-means) of the prototype clustering, the samples of the training set are clustered, and the label distribution based on the kmeans algorithm is proposed to learn the (label distribution learning based on k-means algorithm (LDLKM). First, the mean vector of each cluster is obtained by clustering algorithm kmeans, and then the mean vector of the corresponding label distribution is obtained respectively. Finally, the distance between the mean vector of the test set and the training set is used as the weight to predict the marked distribution of the test set. The experiments are carried out on six open data sets and compared with three existing label distributed learning algorithms on five evaluation indexes. The experimental results show that the proposed LDLKM algorithm is effective.
【作者單位】: 閩南師范大學(xué)粒計算重點實驗室;
【基金】:國家自然科學(xué)基金項目(61379049,61379089)
【分類號】:TP181
,
本文編號:2150903
[Abstract]:Label distributed learning is a new machine learning paradigm proposed in recent years. It can solve some problems of label polysemy. The existing algorithm of label distribution learning uses conditional probability to establish parameter model, but it fails to make full use of the relationship between feature and marker. In this paper, we consider that the label distribution of the samples with similar features should also be similar. Using the k-means algorithm (k-means) of the prototype clustering, the samples of the training set are clustered, and the label distribution based on the kmeans algorithm is proposed to learn the (label distribution learning based on k-means algorithm (LDLKM). First, the mean vector of each cluster is obtained by clustering algorithm kmeans, and then the mean vector of the corresponding label distribution is obtained respectively. Finally, the distance between the mean vector of the test set and the training set is used as the weight to predict the marked distribution of the test set. The experiments are carried out on six open data sets and compared with three existing label distributed learning algorithms on five evaluation indexes. The experimental results show that the proposed LDLKM algorithm is effective.
【作者單位】: 閩南師范大學(xué)粒計算重點實驗室;
【基金】:國家自然科學(xué)基金項目(61379049,61379089)
【分類號】:TP181
,
本文編號:2150903
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2150903.html
最近更新
教材專著