多義詞向量的優(yōu)化研究
發(fā)布時(shí)間:2018-05-06 09:54
本文選題:表示學(xué)習(xí) + 多特征融合; 參考:《北京郵電大學(xué)》2016年碩士論文
【摘要】:隨著神經(jīng)網(wǎng)絡(luò)算法和分布式并行計(jì)算技術(shù)的迅速發(fā)展,文本的表示問(wèn)題又重新回歸到人們的視野中。作為自然語(yǔ)言處理的基本問(wèn)題,如何有效表征人類抽象復(fù)雜的語(yǔ)言一直是不可逃避的難題。近年來(lái),隨著互聯(lián)網(wǎng)數(shù)據(jù)呈指數(shù)形式增長(zhǎng),令這一問(wèn)題更加凸顯。基于神經(jīng)網(wǎng)絡(luò)的詞的表示學(xué)習(xí)旨在以詞為最小單位來(lái)解決該問(wèn)題,該類模型不僅充分利用大語(yǔ)料的信息,還通過(guò)各種優(yōu)化手段降低訓(xùn)練時(shí)間復(fù)雜度,使人們能夠方便地獲得保留語(yǔ)義語(yǔ)法信息的表示向量,為自然語(yǔ)言處理的其他任務(wù)建立了良好的特征基礎(chǔ)。詞向量在信息檢索,情感分析,機(jī)器翻譯等任務(wù)取得了不錯(cuò)的成績(jī),但是仍然有提升空間;诖吮尘,本文進(jìn)行了一下工作:第一,本文研究了詞的表示學(xué)習(xí)方法及優(yōu)化策略,提出了多特性融合的詞向量的優(yōu)化方法,實(shí)現(xiàn)了先驗(yàn)的詞性信息,位置權(quán)重因子,段落向量相融合,在詞類比測(cè)試比原模型準(zhǔn)確率提高了兩個(gè)百分點(diǎn)。第二,本文還發(fā)現(xiàn)了詞向量反義詞區(qū)分能力上的不足,調(diào)研分析了區(qū)分因素,在同反義詞集合上驗(yàn)證了模型的區(qū)分能力。第三,本文在Skip-gram模型的基礎(chǔ)上提出和實(shí)現(xiàn)了在線學(xué)習(xí)多義詞模型,對(duì)詞學(xué)習(xí)多個(gè)語(yǔ)義對(duì)應(yīng)的向量,并且再次融合多特性以進(jìn)一步提升的多義詞模型的效果,獲得了與當(dāng)前最優(yōu)結(jié)果比肩的效果。
[Abstract]:With the rapid development of neural network algorithm and distributed parallel computing technology, the problem of text representation is returning to people's vision. As a basic problem of natural language processing, how to effectively represent human abstract and complex language has always been an unavoidable problem. In recent years, with the exponential growth of Internet data, this problem has become more prominent. The representation learning of words based on neural network aims to solve the problem by taking words as the smallest unit. This kind of model not only makes full use of the information of large corpus, but also reduces the complexity of training time by various optimization methods. It makes it easy to obtain the representation vector of preserving semantic grammar information, and establishes a good feature foundation for other tasks of natural language processing. Word vectors have achieved good results in information retrieval, affective analysis, machine translation and other tasks, but there is still room for improvement. Based on this background, this paper does some work: first, this paper studies the representation learning method and optimization strategy of words, proposes a multi-feature fusion word vector optimization method, realizes the prior part of speech information, position weight factor, Paragraph vector fusion improves the accuracy of part of speech test by two percentage points compared with the original model. Secondly, this paper also finds out the deficiency of lexical vector antonym distinguishing ability, investigates and analyzes the distinguishing factors, and verifies the model's distinguishing ability on the same antonym set. Thirdly, based on the Skip-gram model, this paper proposes and implements an online learning polysemous word model, learning multiple semantic corresponding vectors for words, and again fusing multiple features to further improve the effectiveness of the polysemous word model. The results are compared with the current optimal results.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.1;TP18
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 黃玉潔;陶鳳梅;劉婭;;極大似然估計(jì)及其應(yīng)用[J];鞍山師范學(xué)院學(xué)報(bào);2011年04期
2 鄒小云;;極大似然估計(jì)的性質(zhì)探討[J];湖北職業(yè)技術(shù)學(xué)院學(xué)報(bào);2007年02期
相關(guān)博士學(xué)位論文 前1條
1 胡珍珍;關(guān)于互聯(lián)網(wǎng)視覺媒體若干問(wèn)題的研究和應(yīng)用[D];合肥工業(yè)大學(xué);2014年
相關(guān)碩士學(xué)位論文 前1條
1 林堅(jiān);基于神經(jīng)網(wǎng)絡(luò)的模型參考自適應(yīng)逆飛行控制[D];南京航空航天大學(xué);2013年
,本文編號(hào):1851859
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1851859.html
最近更新
教材專著