基于LSTM的漢語語義角色標注研究

發(fā)布時間：2018-01-21 17:15

本文關鍵詞： 深度學習 LSTM 反向傳播算法語義角色標注　出處：《西藏大學》2017年碩士論文　論文類型：學位論文

【摘要】：隨著計算機網絡技術和通信技術的迅速發(fā)展和普及,自然語言處理技術的應用需求急劇增加,人們迫切需要實用的自然語言處理技術來幫助打破語言屏障,為人際之間、人機之間的信息交流提供便捷、有效的人性化服務。中文信息處理作為自然語言處理中的一個分支,近些年來得到了快速發(fā)展,無論是在基礎理論研究方面,還是在技術開發(fā)和產業(yè)化發(fā)展方面都取得了顯著成績。語義角色標注是淺層語義分析的一種實現(xiàn)方式,近幾年頗受研究人員的關注。深度學習是一種讓計算機自動進行特征學習的技術,隨著該技術在圖像識別、語音識別等領域取得了巨大成功,研究人員逐漸開始將這一技術應用在自然語言處理領域,成為目前該領域的一大研究熱點。在當前多個深度學習模型中,基于長短期記憶單元(Long Short-term memory,LSTM)的遞歸神經網絡(recurrent neural network,RNN)模型因為能有效利用序列數(shù)據(jù)中長距離的依賴信息,被認為特別適合文本序列數(shù)據(jù)的處理。因此,本文提出一種基于LSTM的漢語語義角色標注模型,該方法避免了復雜的特征提取和選擇工作,擺脫了語義角色標注對句法分析的依賴,其最好標注結果的F值為70.34%。本文的主要工作如下:(1)確定了實驗所用的語料和標記集:在中文賓州樹庫(Chinese Proposition Bank,CPB)標注語料的基礎上,確定了本文使用的19類語義角色。結合模型特點,選擇使用IOBES序列標注規(guī)則,并由此形成77個標簽。實驗采用OntoNote 5.0包含的文件chtb_0001.onf chtb_0399.onf按3:1的比例分別為訓練和測試語料。(2)構建并訓練了基于LSTM的語義角色標注模型:本文以詞為基本標注單元,利用Word2Vec訓練得到的詞向量為輸入,以LSTM標準單元為神經元構建網絡層用于學習語義角色相關特征表達,并將得到的特征向量經過softmax函數(shù)計算和后處理后得到詞對應的語義角色標簽。采用反向傳播算法進行模型訓練,并對模型各參數(shù)進行實驗分析。(3)采用LSTM模型訓練得到詞性向量并與詞向量結合進行語義角色標注:首先構建LSTM網絡層用于學習得到詞性向量的表達,再將得到的詞性向量與詞向量結合,構建并訓練LSTM網絡層得到每個詞相應的語義角色標簽,最后對模型各參數(shù)進行實驗,將其結果與前述模型進行對比分析。實驗表明,詞性信息有助于語義角色的識別和分類,且該模型可以有效地進行自動語義角色標注。盡管本文所構建的模型還不能與目前基于人工提取特征的最好結果相媲美,但已經取得了良好的效果,顯示出了LSTM在語義角色標注任務中的強大能力。
[Abstract]:With the rapid development and popularization of computer network technology and communication technology, the application demand of natural language processing technology increases rapidly. People urgently need practical natural language processing technology to help break the language barrier. Chinese information processing, as a branch of natural language processing, has developed rapidly in recent years. Both in basic theory research and in the development of technology development and industrialization have made remarkable achievements. Semantic role annotation is a shallow semantic analysis of the way to achieve. In recent years, researchers have paid close attention to it. Deep learning is a technology that allows computers to learn features automatically. With the development of this technology, it has achieved great success in image recognition, speech recognition and other fields. Researchers have gradually begun to apply this technology in the field of natural language processing, which has become a major research hotspot in this field. Long Short-term memory based on long and short term memory unit. The recurrent neural Network (RNN) model of LSTM can effectively utilize the long distance dependency information in the sequence data. Therefore, this paper proposes a Chinese semantic role annotation model based on LSTM, which avoids complex feature extraction and selection. It gets rid of the dependence of semantic role annotation on syntactic parsing. The F value of the best tagged result is 70.34. The main work of this paper is as follows: (1) the corpus and marker set used in the experiment are determined. Chinese Proposition Bank. Based on the IOBES tagging corpus, 19 kinds of semantic roles used in this paper are determined. Combined with the characteristics of the model, we choose to use IOBES sequence annotation rules. From this, 77 tags were formed. The experiment used the file chtb_0001.onf chtb_0399.onf included in OntoNote 5.0 to press 3:. The proportion of 1 is training and testing corpus. 2) construct and train semantic role annotation model based on LSTM: this paper takes words as the basic annotation unit. The word vector trained by Word2Vec is used as input and LSTM standard unit is used as neuron to construct network layer for learning semantic role-related feature expression. The corresponding semantic role labels are obtained by softmax function calculation and post-processing, and the model is trained by back-propagation algorithm. The parameters of the model are analyzed experimentally. The part of speech vector is trained by LSTM model and combined with word vector for semantic role tagging. Firstly, the LSTM network layer is constructed for learning to get the expression of part of speech vector. Then the word vector and the word vector are combined to construct and train the LSTM network layer to get the corresponding semantic role label of each word. Finally, the model parameters are experimented. The experimental results show that the part of speech information is helpful to the recognition and classification of semantic roles. And this model can effectively automate semantic role annotation. Although the model constructed in this paper is not comparable to the best results based on artificial feature extraction, it has achieved good results. It shows the powerful ability of LSTM in semantic role tagging task.
【學位授予單位】：西藏大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：TP391.1

【相似文獻】

相關期刊論文前10條

1 樂小虬;楊崇俊;于文洋;;基于空間語義角色的自然語言空間概念提取[J];武漢大學學報(信息科學版);2005年12期

2 劉懷軍;車萬翔;劉挺;;中文語義角色標注的特征工程[J];中文信息學報;2007年01期

3 袁毓林;;語義角色的精細等級及其在信息處理中的應用[J];中文信息學報;2007年04期

4 孔芳;朱巧明;周國棟;錢培德;;基于中心理論的指代消解研究[J];計算機科學;2009年06期

5 丁偉偉;常寶寶;;基于語義組塊分析的漢語語義角色標注[J];中文信息學報;2009年05期

6 李軍輝;王紅玲;周國棟;朱巧明;錢培德;;語義角色標注中句法特征的研究[J];中文信息學報;2009年06期

7 賈君枝;趙文娟;王東元;;漢語框架網絡知識庫的語義角色特征識別[J];圖書情報工作;2009年17期

8 楊選選;張蕾;;基于語義角色和概念圖的信息抽取模型[J];計算機應用;2010年02期

9 路青;崔新春;胡艷波;;基于文獻計量的國內語義角色標注研究現(xiàn)狀分析[J];情報雜志;2012年04期

10 張秀龍;李新德;戴先中;;基于組塊分析的路徑自然語言語義角色標注方法[J];東南大學學報(自然科學版);2012年S1期

相關會議論文前10條

1 楊敏;常寶寶;;基于北大網庫的語義角色分類[A];第五屆全國青年計算語言學研討會論文集[C];2010年

2 李琳;畢玉德;陳潔;;朝鮮語對格的語義角色分析[A];第五屆全國青年計算語言學研討會論文集[C];2010年

3 劉懷軍;車萬翔;劉挺;;中文語義角色標注的特征工程[A];第三屆學生計算語言學研討會論文集[C];2006年

4 劉鳴洋;由麗萍;;漢語感知詞語的語義角色標注規(guī)則初探[A];內容計算的研究與應用前沿——第九屆全國計算語言學學術會議論文集[C];2007年

5 王海東;譚魏旋;周國棟;;語義角色在指代消解中的研究[A];第四屆全國學生計算語言學研討會會議論文集[C];2008年

6 郝志新;王軒;李露;范士喜;;基于句法依存關系的語義角色標注[A];第四屆全國信息檢索與內容安全學術會議論文集（上）[C];2008年

7 車萬翔;劉挺;李生;;語義角色標注的方法與挑戰(zhàn)[A];第四屆全國學生計算語言學研討會會議論文集[C];2008年

8 李濟洪;王瑞波;王蔚林;楊杏麗;高亞慧;李國臣;谷波;;漢語框架語義角色的自動標注研究進展[A];中國計算機語言學研究前沿進展（2007-2009）[C];2009年

9 丁偉偉;常寶寶;;基于最大熵原則的漢語語義角色分類[A];第四屆全國學生計算語言學研討會會議論文集[C];2008年

10 汪紅林;丁金濤;王紅玲;周國棟;;基于依存關系的語義角色標注[A];第四屆全國學生計算語言學研討會會議論文集[C];2008年

相關博士學位論文前7條

1 車萬翔;基于核方法的語義角色標注研究[D];哈爾濱工業(yè)大學;2008年

2 王紅玲;基于特征向量的中英文語義角色標注研究[D];蘇州大學;2009年

3 李濟洪;漢語框架語義角色的自動標注技術研究[D];山西大學;2010年

4 袁冬;基于海量文本的語義構建方法研究[D];中國海洋大學;2012年

5 孔芳;指代消解關鍵問題研究[D];蘇州大學;2009年

6 齊琳;《左傳》事件句研究[D];山東師范大學;2015年

7 張占山;語義角色視角下的謂詞同義詞辨析[D];廈門大學;2006年

相關碩士學位論文前10條

1 蘇萌;融合語義角色特征的納西漢語機器翻譯研究[D];昆明理工大學;2015年

2 劉一韜;基于漢語虛詞用法的語義角色標注研究[D];鄭州大學;2015年

3 黨帥兵;基于詞分布表征的漢語框架語義角色識別研究[D];山西大學;2015年

4 呂雷;漢語框架語義角色自動標注研究[D];山西大學;2014年

5 于卉;俄語主體范疇的語義—句法研究[D];黑龍江大學;2015年

6 陳耀文;英文名詞短語事件指代消解方法研究[D];太原理工大學;2016年

7 張晨;詞匯語義制約語義角色映射為句法成分的特點及新詞語語義推測研究[D];魯東大學;2016年

8 冀婷;現(xiàn)代漢語動詞“貼”的語義角色分析[D];喀什大學;2016年

9 屠寒非;基于主動學習的漢語框架語義角色標注[D];山西大學;2016年

10 楊耀文;基于神經網絡模型的漢語框架語義角色識別[D];山西大學;2016年

，

本文編號：1452087

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/shoufeilunwen/xixikjs/1452087.html

上一篇：基于模糊神經網絡的多源信息融合
下一篇：寬帶大規(guī)模MIMO系統(tǒng)空間調制技術研究

論文發(fā)表

·知網|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于LSTM的漢語語義角色標注研究