天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 外語論文 > 英語論文 >

語料庫短語序列提取系統(tǒng)的設(shè)計(jì)與開發(fā)

發(fā)布時(shí)間:2018-05-08 12:32

  本文選題:語料庫驅(qū)動(dòng) + 短語序列; 參考:《外語電化教學(xué)》2017年04期


【摘要】:語料庫短語序列提取一直是短語學(xué)研究的關(guān)鍵技術(shù)環(huán)節(jié)。囿于計(jì)算和操作的復(fù)雜性,前人研究多使用相對(duì)單一的統(tǒng)計(jì)方法測量和提取短語序列,導(dǎo)致提取的數(shù)據(jù)包含大量噪音。文章使用前沿的大數(shù)據(jù)處理手段和計(jì)算技術(shù),實(shí)現(xiàn)了基于頻數(shù)、互信息、邊界熵等多種統(tǒng)計(jì)手段的短語序列提取方法,并研制開發(fā)了相應(yīng)的系統(tǒng)。實(shí)驗(yàn)結(jié)果表明,該系統(tǒng)能夠在普通計(jì)算機(jī)上支持千萬詞級(jí)規(guī)模的大型語料庫運(yùn)算,并能顯著提高短語序列的提取質(zhì)量。
[Abstract]:Phrase sequence extraction from corpus is always the key technology of phrasology. Due to the complexity of computation and operation, previous studies often use a relatively single statistical method to measure and extract phrase sequences, resulting in a large amount of noise in extracted packets. In this paper, a new method of phrase sequence extraction based on frequency, mutual information, boundary entropy and other statistical means is realized by using the advanced processing means and computing techniques of big data, and the corresponding system is developed. The experimental results show that the system can support a large corpus with a scale of ten million words on a common computer, and can improve the quality of phrase sequence extraction significantly.
【作者單位】: 北京航空航天大學(xué);中國人民解放軍后勤科學(xué)研究所;東華大學(xué);
【基金】:國家社會(huì)科學(xué)基金項(xiàng)目(項(xiàng)目編號(hào):13BYY074;14CYY049) 北京市社會(huì)科學(xué)基金項(xiàng)目(項(xiàng)目編號(hào):16JDYYA001)的部分研究成果
【分類號(hào)】:H314.3;TP311.52
,

本文編號(hào):1861420

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/waiyulunwen/yingyulunwen/1861420.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶9a2f7***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com