改進機器翻譯中的句子切分模型
發(fā)布時間:2018-09-05 07:19
【摘要】:隨著統(tǒng)計機器翻譯系統(tǒng)訓(xùn)練語料的不斷增加,長句子的數(shù)量越來越多,如何有效地利用長句子中的信息改進翻譯質(zhì)量是統(tǒng)計機器翻譯系統(tǒng)面臨的主要問題之一。該文基于Xu的句子切分模型,提出了一種在訓(xùn)練階段切分長句子的方法,該方法利用自動獲取的邊界詞概率和切分后子句對的長度比例來指導(dǎo)切分過程,從而得到更符合語義信息的句子切分結(jié)果。在NIST測試集上的實驗結(jié)果表明,該方法獲得了最大0.5個BLEU值的提升。
[Abstract]:With the increasing number of statistical machine translation system training materials, the number of long sentences is increasing. How to effectively use the information in long sentences to improve the translation quality is one of the main problems facing the statistical machine translation system. Based on Xu's sentence segmentation model, this paper proposes a method of segmenting long sentences in training stage. The method uses the probability of boundary words obtained automatically and the length ratio of clause pairs after segmentation to guide the segmentation process. In order to obtain more semantic information of sentence segmentation results. The experimental results on the NIST test set show that the method achieves a maximum improvement of 0.5 BLEU.
【作者單位】: 東芝(中國)研究開發(fā)中心;
【分類號】:H085
本文編號:2223576
[Abstract]:With the increasing number of statistical machine translation system training materials, the number of long sentences is increasing. How to effectively use the information in long sentences to improve the translation quality is one of the main problems facing the statistical machine translation system. Based on Xu's sentence segmentation model, this paper proposes a method of segmenting long sentences in training stage. The method uses the probability of boundary words obtained automatically and the length ratio of clause pairs after segmentation to guide the segmentation process. In order to obtain more semantic information of sentence segmentation results. The experimental results on the NIST test set show that the method achieves a maximum improvement of 0.5 BLEU.
【作者單位】: 東芝(中國)研究開發(fā)中心;
【分類號】:H085
【相似文獻】
相關(guān)期刊論文 前1條
1 馮志偉;;《統(tǒng)計機器翻譯》述評[J];外語教學(xué)與研究;2013年04期
相關(guān)會議論文 前2條
1 付雷;呂雅娟;劉群;;基于句型模板和統(tǒng)計機器翻譯技術(shù)的翻譯方法[A];內(nèi)容計算的研究與應(yīng)用前沿——第九屆全國計算語言學(xué)學(xué)術(shù)會議論文集[C];2007年
2 柴春光;宗成慶;;影響統(tǒng)計翻譯系統(tǒng)性能的因素分析[A];第三屆學(xué)生計算語言學(xué)研討會論文集[C];2006年
相關(guān)碩士學(xué)位論文 前1條
1 修馳;統(tǒng)計機器翻譯語料預(yù)處理中的問題研究[D];北京語言大學(xué);2009年
,本文編號:2223576
本文鏈接:http://sikaile.net/wenyilunwen/yuyanyishu/2223576.html
最近更新
教材專著