天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

基于雙語文檔相似度的跨語言文檔排序?qū)W習(xí)方法研究

發(fā)布時間:2018-03-30 22:33

  本文選題:信息檢索 切入點(diǎn):雙語文檔相似度 出處:《昆明理工大學(xué)》2017年碩士論文


【摘要】:跨語言的信息檢索是當(dāng)前研究的熱點(diǎn),對跨語言文檔分析以及跨語言新聞獲取等研究領(lǐng)域具有重要的作用。當(dāng)前的跨語言信息檢索的研究主要集中在基于查詢翻譯和文檔翻譯的方法,對基于統(tǒng)計概率的機(jī)器翻譯十分依賴,面臨著訓(xùn)練語料難以獲取以及翻譯精度低等問題。目前基于排序?qū)W習(xí)的信息檢索研究集中在單語言的文檔排序上,跨語言的文檔排序?qū)W習(xí)并沒有得到很大關(guān)注。本文提出一種基于雙語文檔相似度的跨語言文檔排序?qū)W習(xí)模型,利用機(jī)器學(xué)習(xí)的方法訓(xùn)練出排序函數(shù),并融合雙語文檔的相似度因素對跨語言文檔進(jìn)行排序。本文在構(gòu)建跨語言的文檔排序?qū)W習(xí)模型過程中主要解決了以下兩個問題:1.提出了雙語文檔之間的相似度計算方法:針對雙語文檔相似度計算過程中難以對不同語言的文檔進(jìn)行統(tǒng)一空間表示的問題,提出了基于雙語詞嵌入的雙語文檔相似度計算方法,首先對雙語文檔進(jìn)行關(guān)鍵詞提取,然后把雙語文檔的關(guān)鍵詞映射到同一個語義空間,并用這些關(guān)鍵詞之間的距離來表示雙語文檔之間的相似度。實(shí)驗(yàn)結(jié)果表明,提出方法能夠很好地對雙語文檔之間的相似度進(jìn)行計算。2.構(gòu)建了基于雙語文檔相似度的跨語言文檔排序?qū)W習(xí)模型:針對基于點(diǎn)和基于對的排序?qū)W習(xí)損失函數(shù)不能準(zhǔn)確地對排序損失進(jìn)行表示的問題,本文采用基于列表的概率分布交叉熵的損失函數(shù)以及基于人工神經(jīng)網(wǎng)絡(luò)的排序函數(shù)來構(gòu)建排序?qū)W習(xí)模型,提出了融合雙語文檔相似度的特征來對跨語言文檔進(jìn)行統(tǒng)一排序的方法,以雙語文檔相似度作為對目標(biāo)語言進(jìn)行排序打分的依據(jù)。實(shí)驗(yàn)結(jié)果表明提出的跨語言文檔排序?qū)W習(xí)模型在英漢和英越兩種語料集下表現(xiàn)了很好的排序效果。
[Abstract]:Cross-language information retrieval is a hot topic in current research. It plays an important role in the field of cross-language document analysis and cross-language news acquisition. The current research on cross-language information retrieval mainly focuses on the methods of query translation and document translation. Machine translation based on statistical probability is very dependent, and it is faced with the problems of difficult acquisition of training corpus and low translation accuracy. At present, the research of information retrieval based on sorting learning is focused on the sorting of documents in a single language. Cross-language document sorting learning has not been paid much attention. In this paper, a cross-language document sorting learning model based on bilingual document similarity is proposed, and the sorting function is trained by machine learning. Combining the similarity factors of bilingual documents to sort the cross-language documents, this paper mainly solves the following two problems: 1. In the process of constructing a cross-language document sorting learning model, we propose a similarity meter between bilingual documents. Calculation methods: in the process of calculating the similarity of bilingual documents, it is difficult to unify the spatial representation of documents in different languages. This paper proposes a method for calculating the similarity of bilingual documents based on the embedding of bilingual words. Firstly, the keywords of bilingual documents are extracted, then the keywords of bilingual documents are mapped to the same semantic space. The distance between these keywords is used to express the similarity between bilingual documents. The experimental results show that, The proposed method can well calculate the similarity between bilingual documents. 2. A cross-language document ranking learning model based on bilingual document similarity is constructed. The loss function of sorting based on point and pair cannot be used. The problem of accurately representing the sort loss, In this paper, the loss function of cross-entropy of probability distribution based on list and the sort function based on artificial neural network are used to construct the ranking learning model. This paper proposes a method of uniform sorting of cross-language documents by combining the similarity features of bilingual documents. Based on the similarity of bilingual documents as the basis for sorting the target language, the experimental results show that the proposed cross-language document sorting learning model performs well in both English-Chinese and English-Vietnamese corpus.
【學(xué)位授予單位】:昆明理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前6條

1 郝嘉樹;王惠臨;劉耀;;基于本體的跨語言信息檢索模型和關(guān)鍵技術(shù)研究[J];情報科學(xué);2009年02期

2 鄭德權(quán);李生;趙鐵軍;于浩;;結(jié)合本體論和統(tǒng)計方法的跨語言信息檢索模型[J];哈爾濱工業(yè)大學(xué)學(xué)報;2008年01期

3 姚文琳;王存剛;任麗婕;仇利克;郜振霞;;基于核心概念集的多語言O(shè)ntology[J];計算機(jī)應(yīng)用研究;2006年04期

4 張俊林;曲為民;杜林;孫玉芳;;跨語言信息檢索研究進(jìn)展[J];計算機(jī)科學(xué);2004年07期

5 王進(jìn),陳恩紅,張振亞,王煦法;基于本體的跨語言信息檢索模型[J];中文信息學(xué)報;2004年03期

6 徐紅姣;王惠臨;;跨語言信息檢索中的查詢翻譯方法研究[J];數(shù)字圖書館論壇;2009年04期

,

本文編號:1687979

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1687979.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶bd4eb***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
视频在线观看色一区二区| 亚洲av专区在线观看| 久久亚洲国产视频三级黄| 久久亚洲精品成人国产| 精品亚洲一区二区三区w竹菊| 黑丝国产精品一区二区| 日韩欧美高清国内精品| 91老熟妇嗷嗷叫太91| 国产精品国产亚洲看不卡 | 国产传媒一区二区三区| 日韩高清一区二区三区四区| 免费在线观看欧美喷水黄片| 熟女一区二区三区国产| 色婷婷日本视频在线观看| 免费在线播放不卡视频| 国产精品一区二区三区激情| 女同伦理国产精品久久久| 99久热只有精品视频最新| 久久国产精品亚州精品毛片| 国产又大又硬又粗又湿| 丁香六月婷婷基地伊人| 国产人妻熟女高跟丝袜| 91久久精品国产一区蜜臀| 91精品视频全国免费| 麻豆看片麻豆免费视频| 日本加勒比系列在线播放| 欧美日韩视频中文字幕| 日本精品免费在线观看| 97精品人妻一区二区三区麻豆| 国产又粗又长又大高潮视频| 欧美一区二区三区喷汁尤物| 麻豆视传媒短视频免费观看| 亚洲一区二区三区免费的视频| 亚洲av又爽又色又色| 亚洲熟女熟妇乱色一区| 欧美日韩中黄片免费看| 一个人的久久精彩视频| 欧美激情一区二区亚洲专区| 亚洲综合色婷婷七月丁香| 色老汉在线视频免费亚欧| 色婷婷久久五月中文字幕|