天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

基于神經(jīng)網(wǎng)絡(luò)的詞法分析研究

發(fā)布時間:2018-03-08 01:22

  本文選題:中文分詞 切入點:詞性標(biāo)注 出處:《南京大學(xué)》2017年碩士論文 論文類型:學(xué)位論文


【摘要】:詞法分析是自然語言處理領(lǐng)域中一項重要的基礎(chǔ)任務(wù)。詞法分析任務(wù)由中文分詞和詞性標(biāo)注這兩個基本任務(wù)組成。分詞是一種將中文字串轉(zhuǎn)換為中文詞串的任務(wù)。對于中文文本分析來說,幾乎所有的任務(wù)都依賴于分詞。詞性標(biāo)注是給組成句子的每一個詞指定一個詞性類別的任務(wù)。對于句法分析,語義分析等高層次任務(wù)來說,詞性可以幫助消解歧義,緩解詞特征的稀疏性。詞法分析任務(wù)雖然比較基礎(chǔ),但是具有著非常廣泛的需求和應(yīng)用前景,目前仍是自然語言處理領(lǐng)域中的熱點問題。中文分詞技術(shù)在早期由于計算資源有限以及缺乏標(biāo)注語料,一般采用基于詞典的規(guī)則方法。隨著計算能力的增長以及標(biāo)注語料的出現(xiàn),中文分詞的處理技術(shù)慢慢從規(guī)則方法轉(zhuǎn)移到基于機器學(xué)習(xí)的方法,其中字標(biāo)注方法是目前解決分詞問題最常使用的手段。在深度學(xué)習(xí)興起之后,也有一些研究者嘗試?yán)蒙窠?jīng)網(wǎng)絡(luò)解決分詞問題,取得了一些進展。詞性標(biāo)注任務(wù)也存在著類似的研究路徑。在本文中,首先針對傳統(tǒng)基于字標(biāo)注的分詞模型基于窗口抽取局部特征,無法捕獲長距離依賴的問題,我們提出使用雙向長短期記憶網(wǎng)絡(luò)代替原有特征抽取模塊,該網(wǎng)絡(luò)既可以保存長距離信息也簡化了特征抽取工作。其次,我們設(shè)計了基于雙向長短期記憶網(wǎng)絡(luò)的貪心模型和結(jié)構(gòu)化模型。最后我們針對通用的詞嵌入與具體任務(wù)不契合的問題,我們分別設(shè)計了分詞和詞性標(biāo)注任務(wù)相關(guān)的詞嵌入模型。實驗結(jié)果表明,基于雙向長短期記憶神經(jīng)網(wǎng)絡(luò)的分詞模型取得了和傳統(tǒng)模型相當(dāng)?shù)男Ч?而且簡單快速的貪心模型與結(jié)構(gòu)化模型性能相當(dāng);在加入WCC(Word-context Character Embedding)模型預(yù)訓(xùn)練的字嵌入后,在標(biāo)準(zhǔn)數(shù)據(jù)集上取得了當(dāng)前最佳或相當(dāng)?shù)男阅?在領(lǐng)域遷移試驗中也取得了不錯的效果。對于詞性標(biāo)注模型,在加入PCS(POS Sensitive Embedding)模型預(yù)訓(xùn)練的詞嵌入后,提升了標(biāo)注系統(tǒng)的能力,并且PCS模型可以快速利用異構(gòu)數(shù)據(jù)提高模型性能。
[Abstract]:Lexical analysis is an important basic task in the field of natural language processing. Lexical analysis task consists of two basic tasks: Chinese word segmentation and part of speech tagging. Word segmentation is a task of converting Chinese string into Chinese string. For Chinese text analysis, Almost all tasks depend on participle. Part of speech tagging is the task of assigning a part of speech category to each word that makes up a sentence. For high-level tasks such as syntactic analysis, semantic analysis, and so on, part of speech can help to resolve ambiguity. Although the lexical analysis task is relatively basic, it has a very wide range of needs and application prospects. At present, Chinese word segmentation is still a hot topic in the field of natural language processing. With the increase of computing power and the appearance of tagging corpus, the processing technology of Chinese word segmentation is gradually transferred from rule method to machine learning method. Word tagging is the most commonly used method to solve word segmentation problem. After the rise of in-depth learning, some researchers also try to use neural network to solve word segmentation problem. Some progress has been made. Part of speech tagging task also has a similar research path. In this paper, firstly, aiming at the problem of extracting local features based on window in traditional word segmentation model based on word tagging, we can not capture long distance dependence. We propose to use bidirectional long and short term memory network instead of the original feature extraction module. This network can not only save the long distance information but also simplify the feature extraction work. Secondly, We design a greedy model and a structured model based on a bidirectional short and long term memory network. Finally, we aim at the problem of mismatch between general word embedding and specific tasks. We have designed word embedding models related to word segmentation and part of speech tagging task respectively. The experimental results show that the segmentation model based on bi-directional long-term and short-term memory neural network has the same effect as the traditional model. And the performance of the simple and fast greedy model is comparable to that of the structured model; after the word embedding of the pre-trained WCC(Word-context Character embedding model is added, the best or equivalent performance is achieved on the standard data set. For the part of speech tagging model, the ability of the tagging system can be improved by adding the pre-trained word embedding of the PCS(POS Sensitive embed model, and the PCS model can quickly improve the performance of the model by using heterogeneous data.
【學(xué)位授予單位】:南京大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1;TP183

【參考文獻】

相關(guān)期刊論文 前3條

1 陳明華;殷景華;舒昌;王明江;;基于正反向最大匹配分詞系統(tǒng)的實現(xiàn)[J];信息技術(shù);2009年06期

2 黃昌寧;趙海;;中文分詞十年回顧[J];中文信息學(xué)報;2007年03期

3 張華平,劉群;基于N-最短路徑方法的中文詞語粗分模型[J];中文信息學(xué)報;2002年05期

,

本文編號:1581807

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1581807.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶7f1f2***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
老司机激情五月天在线不卡 | 欧美乱妇日本乱码特黄大片| 亚洲高清中文字幕一区二区三区 | 久久热在线免费视频精品| 日本国产欧美精品视频| 久久精品中文扫妇内射| 国产精品第一香蕉视频| 欧美一区二区三区播放| 黄片在线观看一区二区三区| 亚洲国产综合久久天堂| 99久久国产亚洲综合精品| 东京热男人的天堂社区| 男人和女人草逼免费视频| 亚洲精品一区三区三区| 国产精品香蕉一级免费| 国产又色又爽又黄又免费| 老熟妇乱视频一区二区| 偷拍偷窥女厕一区二区视频| 欧美日韩久久精品一区二区| 中文字幕精品一区二区三| 日本熟妇熟女久久综合| 国产一区二区三区午夜精品| 国产亚洲精品久久久优势| 日韩人妻毛片中文字幕| 国产午夜精品美女露脸视频 | 欧美一级黄片免费视频| 久久精品福利在线观看| 麻豆一区二区三区精品视频| 日韩精品在线观看一区| 中文字幕高清免费日韩视频| 黄片在线免费看日韩欧美| 国产精品一区二区三区欧美| 亚洲欧洲日韩综合二区| 一二区中文字幕在线观看| 一区二区三区日韩在线| 日本精品中文字幕在线视频| 日本不卡一区视频欧美| 精品少妇人妻一区二区三区| 极品少妇一区二区三区精品视频 | 欧美日韩综合在线精品| 国产一级精品色特级色国产|