紹納語詞性標注器詞法與轉(zhuǎn)換規(guī)則的改進方法研究
發(fā)布時間:2023-03-10 19:17
自然語言處理(NLP)是指對人類語言的處理,它是人工智能領域內(nèi)的一門學科。自然語言處理研究的最終目標是解析和理解語言,然而這個目標還尚未實現(xiàn)。因為這個原因,對自然語言處理的大量研究工作集中在中間任務上,所謂的中間任務就是說只研究一些能理解語言中內(nèi)在結(jié)構(gòu)的一些方法,而不需要完全的理解語言。其中的一個主要研究任務是詞性標注或僅僅進行簡單的標注。由于紹納語缺乏標準的詞性標注器,導致紹納語在機器翻譯、拼寫檢查、詞典編纂、和自動句法分析和構(gòu)造等領域,成為研究者們開展研究的主要困難。到目前為止,還沒有紹納語的詞性標注的相關(guān)研究工作,詞性標注器的性能還沒有得到足夠的改進。因此,本文的研究目的是使用足夠大的訓練語料來提高Brill詞性標注器在紹納語上的詞法和轉(zhuǎn)換規(guī)則方面的能力。因此,我們回顧了紹納語關(guān)于語法和形態(tài)的文獻以理解紹納語的性質(zhì),并且識別出了可能的標注集合。通過閱讀資料,我們確定了26個廣泛的標注集,并且從包含6750個不同單詞的1100個句子中提取了17473個被標注的單詞用于訓練和測試。其中,258個句子來自于先前的工作中。由于只有少數(shù)現(xiàn)成的標準語料庫,而人工標注來得到語料庫是一項艱巨的任...
【文章頁數(shù)】:66 頁
【學位級別】:碩士
【文章目錄】:
摘要
Abstract
Chapter 1 Introduction
1.1 Background
1.2 Statement of the Problem
1.3 Objective
1.3.1 Specific Objectives
1.4 Methodology
1.4.1 Data Collection
1.4.2 Modeling
1.4.3 Testing and Validation
1.5 Tools and Techniques
1.6 Application of Results
1.7 Organization of the Paper
Chapter 2 Literature
2.1 Literature Review
2.1.1 Statistical Approach
2.1.2 Hidden Markov Model
2.1.3 Maximum Entropy Model
2.2 Rule-Based Approach
2.2.1 Transformation-Based Approach
2.2.2 Artificial Neural Network Approach
2.2.3 Hybrid Approach
2.3 Related Works
Chapter 3 Tag-set preparation
3.1 Introduction
3.2 The Shona Language Phonetics
3.3 The Shona Language Sentence Structure
3.4 Shona Language Word Classes
3.4.1 Shona Noun (Zita)
3.4.2 Shona Pronoun
3.4.3 Shona Adjective
3.4.4 Afaan Oromo Verb (Xumura)
3.4.5 Shona Adverbs
3.4.6 Shona Conjunction
3.4.7 Shona Pre-position
3.4.8 Shona Introjections
3.4.9 Shona Numeral
3.5 Shona Tags and Tag sets
Chapter 4 Design of the POS tagger
4.1 Introduction
4.2 Approaches and techniques
4.3 Designing Transformation-based error-driven learning
4.2.1 Rules
4.2.2 Learning Phase
4.2.3 The Lexical Rule Learner
4.2.4 The Contextual Rule Learner
4.2.5 Brill Tagger Architecture
Chapter 5 Implementation
5.1 Introduction
5.2 Corpus Preparation
5.3 Implementation of the Brill's Tagger
5.3.1 Implementation of the Initial State Tagger (HMM Tagger)
5.3.2 Implementation of the Brill's tagger Learning phase
Chapter 6 Experiment and performance analysis
6.1 Introduction
6.2 Experiments
6.2.1 Brill's Tagger Versus Corpus Size
6.3 Performance Analysis
6.4 Discussion
Chapter 7 Conclusion and Recommendation
7.1 Conclusion
7.2 Recommendations
References
Acknowledgements
本文編號:3758406
【文章頁數(shù)】:66 頁
【學位級別】:碩士
【文章目錄】:
摘要
Abstract
Chapter 1 Introduction
1.1 Background
1.2 Statement of the Problem
1.3 Objective
1.3.1 Specific Objectives
1.4 Methodology
1.4.1 Data Collection
1.4.2 Modeling
1.4.3 Testing and Validation
1.5 Tools and Techniques
1.6 Application of Results
1.7 Organization of the Paper
Chapter 2 Literature
2.1 Literature Review
2.1.1 Statistical Approach
2.1.2 Hidden Markov Model
2.1.3 Maximum Entropy Model
2.2 Rule-Based Approach
2.2.1 Transformation-Based Approach
2.2.2 Artificial Neural Network Approach
2.2.3 Hybrid Approach
2.3 Related Works
Chapter 3 Tag-set preparation
3.1 Introduction
3.2 The Shona Language Phonetics
3.3 The Shona Language Sentence Structure
3.4 Shona Language Word Classes
3.4.1 Shona Noun (Zita)
3.4.2 Shona Pronoun
3.4.3 Shona Adjective
3.4.4 Afaan Oromo Verb (Xumura)
3.4.5 Shona Adverbs
3.4.6 Shona Conjunction
3.4.7 Shona Pre-position
3.4.8 Shona Introjections
3.4.9 Shona Numeral
3.5 Shona Tags and Tag sets
Chapter 4 Design of the POS tagger
4.1 Introduction
4.2 Approaches and techniques
4.3 Designing Transformation-based error-driven learning
4.2.1 Rules
4.2.2 Learning Phase
4.2.3 The Lexical Rule Learner
4.2.4 The Contextual Rule Learner
4.2.5 Brill Tagger Architecture
Chapter 5 Implementation
5.1 Introduction
5.2 Corpus Preparation
5.3 Implementation of the Brill's Tagger
5.3.1 Implementation of the Initial State Tagger (HMM Tagger)
5.3.2 Implementation of the Brill's tagger Learning phase
Chapter 6 Experiment and performance analysis
6.1 Introduction
6.2 Experiments
6.2.1 Brill's Tagger Versus Corpus Size
6.3 Performance Analysis
6.4 Discussion
Chapter 7 Conclusion and Recommendation
7.1 Conclusion
7.2 Recommendations
References
Acknowledgements
本文編號:3758406
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/3758406.html
最近更新
教材專著