語料庫短語序列提取系統(tǒng)的設(shè)計(jì)與開發(fā)

發(fā)布時(shí)間：2018-05-08 12:32

本文選題：語料庫驅(qū)動(dòng) + 短語序列��；參考：《外語電化教學(xué)》2017年04期

【摘要】：語料庫短語序列提取一直是短語學(xué)研究的關(guān)鍵技術(shù)環(huán)節(jié)。囿于計(jì)算和操作的復(fù)雜性,前人研究多使用相對(duì)單一的統(tǒng)計(jì)方法測量和提取短語序列,導(dǎo)致提取的數(shù)據(jù)包含大量噪音。文章使用前沿的大數(shù)據(jù)處理手段和計(jì)算技術(shù),實(shí)現(xiàn)了基于頻數(shù)、互信息、邊界熵等多種統(tǒng)計(jì)手段的短語序列提取方法,并研制開發(fā)了相應(yīng)的系統(tǒng)。實(shí)驗(yàn)結(jié)果表明,該系統(tǒng)能夠在普通計(jì)算機(jī)上支持千萬詞級(jí)規(guī)模的大型語料庫運(yùn)算,并能顯著提高短語序列的提取質(zhì)量。
[Abstract]:Phrase sequence extraction from corpus is always the key technology of phrasology. Due to the complexity of computation and operation, previous studies often use a relatively single statistical method to measure and extract phrase sequences, resulting in a large amount of noise in extracted packets. In this paper, a new method of phrase sequence extraction based on frequency, mutual information, boundary entropy and other statistical means is realized by using the advanced processing means and computing techniques of big data, and the corresponding system is developed. The experimental results show that the system can support a large corpus with a scale of ten million words on a common computer, and can improve the quality of phrase sequence extraction significantly.
【作者單位】：北京航空航天大學(xué);中國人民解放軍后勤科學(xué)研究所;東華大學(xué);
【基金】：國家社會(huì)科學(xué)基金項(xiàng)目(項(xiàng)目編號(hào):13BYY074;14CYY049) 北京市社會(huì)科學(xué)基金項(xiàng)目(項(xiàng)目編號(hào):16JDYYA001)的部分研究成果
【分類號(hào)】：H314.3;TP311.52
，

本文編號(hào)：1861420

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/waiyulunwen/yingyulunwen/1861420.html

上一篇：《完美回歸》翻譯實(shí)踐報(bào)告
下一篇：基于中學(xué)英語教師校本研修實(shí)踐研究的教師學(xué)習(xí)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

語料庫短語序列提取系統(tǒng)的設(shè)計(jì)與開發(fā)