天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 自動化論文 >

用于比較RAMS標準的機器學(xué)習(xí)算法開發(fā)

發(fā)布時間:2022-09-30 16:18
  語言是人類用來溝通的工具。盡管所有人都對它很熟悉,但我們的知識和文化直接影響著我們與他人交流的方式,因此不同的句子可能具有相同的含義。自然語言處理是專注于研究計算機和語言之間交互的領(lǐng)域。在過去的幾十年中,隨著重要性日益增加的信息工具,分析文本片段變得更容易和更快捷,這領(lǐng)域也受到了越來越多的關(guān)注。更確切地說,文本比較是許多應(yīng)用中的關(guān)鍵任務(wù),例如機器翻譯,信息檢索和問答等等。這項任務(wù)的主要困難是確保計算機程序能夠有效地處理文本片段或大型語料庫,以真正理解句子的含義。在這項研究工作中,我們專注于近義句子識別任務(wù)(判斷一對句子是否近義)的應(yīng)用,以比較RAMS標準文檔。我們的方法研究了大量的詞匯,句法和語義特征。我們研究這些特征對模型性能的影響,特別是將它們結(jié)合在一起以確保對句子全面的理解。之后,我們用這些屬性訓(xùn)練兩種不同類型的模型,一個多數(shù)勝算法和輸種機器學(xué)習(xí)分類器(線性和非線性)。我們發(fā)現(xiàn)特征選擇和組合是確保近義句子識別任務(wù)良好表現(xiàn)的關(guān)鍵步驟。另外,我們的結(jié)論是,雖然基于經(jīng)驗和傳統(tǒng)方法的多數(shù)勝算法表現(xiàn)的不錯,但幾乎所有的機器學(xué)習(xí)分類器都超過了它。通過對支持向量分類器的算法進行調(diào)整,我們可以為... 

【文章頁數(shù)】:106 頁

【學(xué)位級別】:碩士

【文章目錄】:
摘要
Abstract
Chapter 1 Research Context
    1.1 Background
        1.1.1 The OBOR Project
        1.1.2 RAMS Standards
        1.1.3 China National Institute of Standards(CNIS)
    1.2 Problem Definition
    1.3 Purpose of Study
    1.4 Proposed Solution
Chapter 2 Literature Review
    2.1 Introduction
    2.2 Text comparison using traditional methods
    2.3 Text comparison using machine learning methods
        2.3.1 Introduction to Machine Learning and Deep Learning
        2.3.2 Use of Machine Learning Classifiers
        2.3.3 Use of Deep Learning Neural Networks
Chapter 3 Theoretical Framework
    3.1 Challenges to overcome
        3.1.1 Major concepts for text comparisons
        3.1.2 Major issues faced for text comparisons
    3.2 Global Methodology
    3.3 Determination of lexical features
        3.3.1 Introduction to the role of lexical features
        3.3.2 Bag of Words
        3.3.3 String Matching
        3.3.4 Longest Common Substring
        3.3.5 Longest Common Subsequence
        3.3.6 Word Error Rate
        3.3.7 Position Independent Word Error Rate
    3.4 Determination of syntactic and semantic features
        3.4.1 Syntactic Features
        3.4.2 Semantic Features
    3.5 Different methodologies to pursue text comparison
    3.6 Determination of performances
Chapter 4 Experiments& Results
    4.1 Datasets
        4.1.1 Twitter Paraphrase Corpus
        4.1.2 PPDB: Paraphrase Database
        4.1.3 Microsoft Research Paraphrase Corpus
    4.2 Experiment 1:simple feature comparison
        4.2.1 Bag of Words
        4.2.2 String Matching
        4.2.3 Longest Common Subsequence
        4.2.4 Longest Common Substring
        4.2.5 Word Error Rate
        4.2.6 Position Independent Word Error Rate
        4.2.7 Part of Speech Tagging
        4.2.8 Wu Palmer Similarity
        4.2.9 Conclusion on Simple Feature Comparison
    4.3 Experiment 2: "majority wins" comparison
        4.3.1 Correlation among features
        4.3.2 Analysis of the influence of each lexical feature
        4.3.3 Analysis of the influence of the syntactic and semantic features
        4.3.4 Results for the "Majority Wins" algorithm
        4.3.5 Conclusion
    4.4 Experiment 3:machine learning classification comparison
        4.4.1 First raw of experiments
        4.4.2 Feature Selection
        4.4.3 Algorithm Tuning
        4.4.4 Final Results
    4.5 Summary of Findings
Chapter 5 Discussion of Findings
    5.1 Comparison with Baseline results
    5.2 Discussion& Future Work
List of Nomenclatures
References
Acknowledgements
Appendix
    Appendix A:main codes implemented
Resume and Academic Achievements



本文編號:3683874

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/3683874.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶2dfa1***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com