一種基于二部圖的遷移學(xué)習(xí)算法
發(fā)布時(shí)間:2018-05-12 19:00
本文選題:文本分類(lèi) + 遷移學(xué)習(xí) ; 參考:《廣東外語(yǔ)外貿(mào)大學(xué)》2017年碩士論文
【摘要】:文本是互聯(lián)網(wǎng)中一種常見(jiàn)的數(shù)據(jù)表現(xiàn)形式。然而,互聯(lián)網(wǎng)迅猛發(fā)展導(dǎo)致大量冗余數(shù)據(jù)的產(chǎn)生給數(shù)據(jù)生產(chǎn)者、管理者以及消費(fèi)者均形成極大的負(fù)擔(dān)。針對(duì)這一問(wèn)題,學(xué)者們提出了基于機(jī)器學(xué)習(xí)的文本自動(dòng)分類(lèi)方法,用以管理網(wǎng)絡(luò)文本數(shù)據(jù),從而解決因數(shù)據(jù)冗余造成人力成本浪費(fèi)的問(wèn)題。然而,互聯(lián)網(wǎng)文本數(shù)據(jù)時(shí)效性強(qiáng)及新舊文本領(lǐng)域差異性大的特點(diǎn),又會(huì)導(dǎo)致舊有的、已標(biāo)注的文本和新生成的文本在特征空間上不滿(mǎn)足獨(dú)立同分布,也即,不能將舊有的、已標(biāo)注的數(shù)據(jù)直接訓(xùn)練模型用于新生成的數(shù)據(jù)的自動(dòng)分類(lèi)任務(wù)上。為解決這一問(wèn)題,遷移學(xué)習(xí)提出一種知識(shí)遷移的思路,使不同又相似的領(lǐng)域或任務(wù)能夠借用舊有的知識(shí)進(jìn)行知識(shí)的遷移。可即便如此,目前的遷移學(xué)習(xí)算法仍存在其局限性,如解釋性較差、效率低等問(wèn)題;谏鲜鲅芯勘尘,本文在綜述了文本自動(dòng)分類(lèi)和遷移學(xué)習(xí)常用的關(guān)鍵技術(shù)后,提出一種基于二部圖的遷移學(xué)習(xí)算法。該算法的主要思路是:首先,對(duì)文本數(shù)據(jù)進(jìn)行特征提取和特征選擇,聯(lián)合源領(lǐng)域和目標(biāo)領(lǐng)域的文檔及特征構(gòu)建文檔-特征二部圖;接著,基于構(gòu)建的二部圖,計(jì)算聯(lián)合領(lǐng)域中任意兩個(gè)特征之間的傳遞關(guān)系,以任意特征之間的傳遞關(guān)系作為知識(shí)遷移的橋梁,將目標(biāo)領(lǐng)域的文檔的特征空間映射到源領(lǐng)域的特征空間中;然后,對(duì)源領(lǐng)域的、已標(biāo)注的文本,采用經(jīng)典的機(jī)器學(xué)習(xí)分類(lèi)器進(jìn)行模型訓(xùn)練;最后,利用源領(lǐng)域的模型對(duì)目標(biāo)領(lǐng)域的文檔進(jìn)行文本自動(dòng)分類(lèi)。通過(guò)參數(shù)實(shí)驗(yàn)、分類(lèi)器實(shí)驗(yàn)、對(duì)比實(shí)驗(yàn)以及可解釋性實(shí)驗(yàn)證明提出的算法能夠有效地解決遷移學(xué)習(xí)中的解釋性問(wèn)題以及效率提升的問(wèn)題。
[Abstract]:Text is a common form of data representation in the Internet. However, the rapid development of the Internet has led to a great deal of redundant data production to data producers, managers and consumers have formed a great burden. To solve this problem, scholars put forward an automatic text classification method based on machine learning to manage network text data, thus solving the problem of human cost waste caused by data redundancy. However, the characteristics of strong timeliness of Internet text data and great differences between new and old text fields will lead to old text, tagged text and newly generated text do not satisfy independent distribution in the feature space, that is, the old text cannot be distributed. The annotated data direct training model is used for automatic classification of newly generated data. In order to solve this problem, transfer learning proposes a knowledge transfer approach, which enables different and similar fields or tasks to transfer knowledge by using old knowledge. But even so, the current transfer learning algorithm still has its limitations, such as poor explanation, low efficiency and so on. Based on the above research background, this paper presents a bipartite graph based transfer learning algorithm after summarizing the key technologies of text automatic classification and transfer learning. The main ideas of the algorithm are as follows: firstly, the text data are extracted and selected, and the documents of source domain and target domain are combined to construct document-feature bipartite graph, and then, based on the bipartite graph constructed, The transfer relation between any two features in a joint domain is calculated, and the transfer relation between arbitrary features is used as a bridge for knowledge transfer. The document feature space of the target domain is mapped to the feature space of the source domain. The text tagged in source domain is trained by classical machine learning classifier. Finally, the document of target domain is automatically classified by using the model of source domain. Through parameter experiment, classifier experiment, contrast experiment and interpretable experiment, it is proved that the proposed algorithm can effectively solve the problem of explanation and efficiency improvement in transfer learning.
【學(xué)位授予單位】:廣東外語(yǔ)外貿(mào)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:TP181
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 張新猛;蔣盛益;;基于加權(quán)二部圖的個(gè)性化推薦算法[J];計(jì)算機(jī)應(yīng)用;2012年03期
2 莊福振;羅平;何清;史忠植;;基于混合正則化的無(wú)標(biāo)簽領(lǐng)域的歸納遷移學(xué)習(xí)[J];科學(xué)通報(bào);2009年11期
,本文編號(hào):1879771
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1879771.html
最近更新
教材專(zhuān)著