Sentence-embedding and Similarity via Hybrid Bidirectional-L
發(fā)布時間:2022-01-19 14:08
在過去的十年中,文本理解和信息檢索等領(lǐng)域以及對自然語言處理中句子相似性的分析引起了研究者們的巨大關(guān)注。盡管用于操作相似性系統(tǒng)的傳統(tǒng)方法完全取決于手工制作的特征。最近,由于神經(jīng)網(wǎng)絡(luò)在處理語義合成方面的成功,它們在句子相似性測量系統(tǒng)中受到了相當(dāng)大的關(guān)注。然而,現(xiàn)有的神經(jīng)網(wǎng)絡(luò)方法在捕獲隱藏在句子中的最重要的語義信息方面不夠有效。此外,越來越多的深度神經(jīng)網(wǎng)絡(luò)的應(yīng)用已經(jīng)將興趣從詞級別轉(zhuǎn)移到粒度更大的文本一級上,例如句子嵌入。為了解決這個問題,本文提出了一種新的加權(quán)池注意層,以保留最顯著的注意力向量,有序模式和忽略不相關(guān)的詞。已經(jīng)確定長短期記憶網(wǎng)絡(luò)和卷積神經(jīng)網(wǎng)絡(luò)具有很強的累積整個句子語義表示的豐富模式的能力。兩種模型的組合提高了模型提取綜合上下文信息的能力。首先,通過采用基于雙向長短期記憶網(wǎng)絡(luò)和卷積神經(jīng)網(wǎng)絡(luò)的模型結(jié)構(gòu)來生成句子表示。隨后,應(yīng)用加權(quán)池注意層以獲得關(guān)注向量。最后,利用關(guān)注向量的信息來計算句子相似度的得分。研究表明,本文所提出的方法優(yōu)于現(xiàn)有兩個任務(wù)的數(shù)據(jù)集上的最新方法,即語義相關(guān)性和微軟研究釋義識別。通過對LSTM細(xì)胞單元的不同參數(shù)值進行了實驗,包括dropout概率,并且與其他現(xiàn)有的注...
【文章來源】:大連理工大學(xué)遼寧省 211工程院校 985工程院校 教育部直屬院校
【文章頁數(shù)】:60 頁
【學(xué)位級別】:碩士
【文章目錄】:
摘要
Abstract
1 Introduction
1.1 Research Background
1.2 Research Motivation and Objective
1.2.1 Research Motivation
1.2.2 Research Objective
1.3 Domestic and Overseas Progress
1.3.1 Domestic Progress
1.3.2 Overseas Progress
1.4 Problem Statement
1.5 Main Content and Research Methods
1.5.1 Main Content
1.5.2 Research Methods
1.6 The Structure of Thesis
2 Theoretical and model Analysis
2.1 Semantic Similarity
2.1.1 Vector Space Model
2.1.2 Corpus and Knowledge-based Methods
2.2 Machine Learning
2.2.1 Support Vector Machine
2.2.2 Artificial Neural Network(ANN)
2.2.3 Deep Learning
2.3 Word2Vec
2.3.1 Continuous bag-of-words
2.3.2 Skip-gram Model
2.3.3 Word Mover’s Distance Model
2.4 Related Work
3 Proposed Framework for Sentence Similarity
3.1 Proposed Model
3.1.1 Input Layer
3.1.2 Embedding Layer
3.1.3 Bidirectional LSTM
3.1.4 Convolutional neural network
3.1.5 Weighted-pooling attention
3.2 Proposed Algorithm
4 Experiments and Discussion
4.1 Experimental Setup
4.1.1 Datasets
4.1.2 Pre-Trained-Embedding
4.1.3 Comparison systems
4.1.4 Experimental Parameters
4.2 Results and Discussion
5 Conclusion and Future Direction
5.1 Main Contributions
5.2 Conclusion
5.3 Future Direction
References
Research Projects and Publications in Master Study
Acknowledgement
【參考文獻】:
期刊論文
[1]基于同義詞詞林的詞語相似度計算方法[J]. 田久樂,趙蔚. 吉林大學(xué)學(xué)報(信息科學(xué)版). 2010(06)
本文編號:3596992
【文章來源】:大連理工大學(xué)遼寧省 211工程院校 985工程院校 教育部直屬院校
【文章頁數(shù)】:60 頁
【學(xué)位級別】:碩士
【文章目錄】:
摘要
Abstract
1 Introduction
1.1 Research Background
1.2 Research Motivation and Objective
1.2.1 Research Motivation
1.2.2 Research Objective
1.3 Domestic and Overseas Progress
1.3.1 Domestic Progress
1.3.2 Overseas Progress
1.4 Problem Statement
1.5 Main Content and Research Methods
1.5.1 Main Content
1.5.2 Research Methods
1.6 The Structure of Thesis
2 Theoretical and model Analysis
2.1 Semantic Similarity
2.1.1 Vector Space Model
2.1.2 Corpus and Knowledge-based Methods
2.2 Machine Learning
2.2.1 Support Vector Machine
2.2.2 Artificial Neural Network(ANN)
2.2.3 Deep Learning
2.3 Word2Vec
2.3.1 Continuous bag-of-words
2.3.2 Skip-gram Model
2.3.3 Word Mover’s Distance Model
2.4 Related Work
3 Proposed Framework for Sentence Similarity
3.1 Proposed Model
3.1.1 Input Layer
3.1.2 Embedding Layer
3.1.3 Bidirectional LSTM
3.1.4 Convolutional neural network
3.1.5 Weighted-pooling attention
3.2 Proposed Algorithm
4 Experiments and Discussion
4.1 Experimental Setup
4.1.1 Datasets
4.1.2 Pre-Trained-Embedding
4.1.3 Comparison systems
4.1.4 Experimental Parameters
4.2 Results and Discussion
5 Conclusion and Future Direction
5.1 Main Contributions
5.2 Conclusion
5.3 Future Direction
References
Research Projects and Publications in Master Study
Acknowledgement
【參考文獻】:
期刊論文
[1]基于同義詞詞林的詞語相似度計算方法[J]. 田久樂,趙蔚. 吉林大學(xué)學(xué)報(信息科學(xué)版). 2010(06)
本文編號:3596992
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/3596992.html
最近更新
教材專著