天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 軟件論文 >

面向SWIFT報文生成的信息抽取與輔助翻譯方法研究

發(fā)布時間:2018-07-11 12:07

  本文選題:SWIFT + 報文; 參考:《哈爾濱工業(yè)大學》2016年碩士論文


【摘要】:SWIFT(Society for Worldwide Interbank Financial Telecommunications)是全球銀行金融通訊協(xié)會的簡稱,SWIFT運營著全世界大部分金融機構(gòu)的信息交換網(wǎng)絡(luò),為世界范圍內(nèi)快捷、準確、優(yōu)良的金融交易提供保障。深圳證券信息有限公司于2008年加入SWIFT組織,向SWIFT組織和會員提供公司行為(Corporate Action,CA)報文服務(wù)。SWIFT報文生成流程是從上市公司公告文本中抽取公司行為數(shù)據(jù),然后把公司行為數(shù)據(jù)從中文翻譯成英文,最后填充SWIFT模板生成報文文件。目前,翻譯和抽取都依賴人工、效率較低、數(shù)據(jù)一致性難以保證,針對這些問題,本文主要研究面向SWIFT報文自動生成的信息抽取與輔助翻譯方法。本文主要工作包括:第一、在對股東大會通知公告文本特點進行深入分析的基礎(chǔ)上,本文設(shè)計了一種基于隨機森林分類器的股東大會基礎(chǔ)信息抽取方法。該方法首先剔除不相關(guān)的文本段落,然后通過文本分類獲取公告中會議基礎(chǔ)信息所對應(yīng)的段落,最后對發(fā)現(xiàn)的段落進行正則匹配,獲取股東大會基礎(chǔ)信息屬性和屬性值。在利用2014-2015年1000個股東大會通知公告文本構(gòu)造的數(shù)據(jù)集上,上述方法獲得了F值為0.92的股東大會基礎(chǔ)信息屬性和屬性值的抽取性能。第二、針對股東大會基礎(chǔ)信息文本具有結(jié)構(gòu)和表達規(guī)范性的特點,本文設(shè)計了一種基于命名實體識別和文本相似度的計算機輔助翻譯方法。該方法首先利用條件隨機場模型從文本中識別機構(gòu)和人名實體以及數(shù)字,并進行實體抽象,應(yīng)用文本相似度模型從平行語料庫中找到與輸入議案文本最相似的翻譯樣本,并進一步通過將樣本英文中的人名和數(shù)字進行替換,獲得最終的翻譯結(jié)果。在利用2010-2015年660251條股東大會議案翻譯對建立平行語料庫的基礎(chǔ)上,應(yīng)用本文提出的計算機輔助翻譯方法,對1萬條SWIFT議案文本進行測試,使用BLEU*評價方法得分為0.83,完全匹配得分為0.69。在上述技術(shù)的基礎(chǔ)上,本文構(gòu)建了一個面向SWIFT報文生成的信息抽取和輔助翻譯系統(tǒng)。該系統(tǒng)目前已經(jīng)用于生產(chǎn)實踐,實現(xiàn)了股東大會SWIFT報文的自動生成與可視化。該系統(tǒng)能夠很好地提高SWIFT報文生產(chǎn)速度,減少了人工依賴和成本,提高了報文生成的一致性。
[Abstract]:Swift (Society for Worldwide Interbank Telecommunications) is a global association of banking and financial communications, which operates the information exchange network of most financial institutions in the world, which provides a guarantee for fast, accurate and excellent financial transactions in the world. Shenzhen Securities Information Co., Ltd. joined Swift in 2008 to provide Swift organization and members with Corporate Action CA message service .Swift message generation process is to extract corporate behavior data from the public announcement text. Then the corporate behavior data is translated from Chinese to English, and the Swift template is filled to generate the message file. At present, translation and extraction rely on manual, low efficiency and difficult to ensure the consistency of data. In view of these problems, this paper mainly studies information extraction and auxiliary translation methods for Swift message generation automatically. The main work of this paper includes: first, based on the in-depth analysis of the characteristics of the notice text of shareholders' general meeting, this paper designs a method of extracting basic information of shareholders' general meeting based on stochastic forest classifier. Firstly, the irrelevant text paragraphs are eliminated, then the corresponding paragraphs of the meeting basic information in the announcement are obtained by text classification. Finally, the found paragraphs are regularly matched to obtain the attributes and attribute values of the basic information of the shareholders' general meeting. In the data set constructed from the text of 1000 notice notices of shareholders' general meeting in 2014-2015, the above method obtained the basic information attribute and the extraction performance of attribute value of the shareholders' meeting with F value of 0.92. Secondly, aiming at the characteristics of the structure and expression of the basic information text of shareholders' general meeting, this paper designs a computer-aided translation method based on named entity recognition and text similarity. Firstly, the conditional random field model is used to identify the entity and the number from the text, and then the text similarity model is used to find the most similar translation sample from the parallel corpus. The final translation results are obtained by replacing the names and numbers in the sample English. On the basis of establishing a parallel corpus by using the translation of 660251 motions of shareholders' meeting from 2010 to 2015, this paper uses the computer-aided translation method proposed in this paper to test 10,000 Swift texts. Using the BLEU* evaluation method, the score was 0. 83, and the perfect match score was 0. 69. Based on the above techniques, a message extraction and translation system for Swift message generation is constructed. The system has been used in production practice and realized the automatic generation and visualization of Swift message. The system can improve the speed of Swift message production, reduce the labor dependence and cost, and improve the consistency of message generation.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:TP391.1

【參考文獻】

相關(guān)期刊論文 前5條

1 虞寧翌;饒高琦;荀恩東;;文言信息的自動抽取:基于統(tǒng)計和規(guī)則的嘗試[J];中文信息學報;2015年06期

2 郭少華;郭巖;李海燕;劉悅;張瑾;程學旗;;可擴展的網(wǎng)頁關(guān)鍵信息抽取研究[J];中文信息學報;2015年01期

3 俞敬松;王惠臨;吳勝蘭;;高正確率的雙語語塊對齊算法研究[J];中文信息學報;2015年01期

4 熊維;吳健;劉匯丹;張立強;;基于短語串實例的漢藏輔助翻譯[J];中文信息學報;2013年03期

5 葉娜;張桂平;韓亞冬;蔡東風;;基于用戶行為模型的計算機輔助翻譯方法[J];中文信息學報;2011年03期

,

本文編號:2115145

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2115145.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶bcb09***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com