語音驅(qū)動虛擬說話人研究
[Abstract]:Speech driven virtual speaker technology refers to the generation of virtual human facial animation by input of speech information. It not only improves the user's understanding of speech, but also provides a real and friendly way of human-computer interaction. With the development of the technology, it will bring us more new human-computer interaction experience and enrich our daily life. In this paper, two schemes are used to study speech driven virtual speaker animation synthesis, and to analyze and compare them. The first scheme is speech driven speech motion synthesis based on deep neural network. The second scheme is speech driven virtual speaker animation synthesis based on MPEG-4. Both schemes need to find the corresponding corpus and construct sound vision data suitable for the research of this paper. The first scheme: speech production is directly related to the movement of vocal organs, such as the lip, tongue and soft palate position and movement. The mapping relationship between speech feature parameters and speech organ location information is studied by deep neural network. The system estimates the movement track of the speech organ according to the input speech data and embodies it on a 3D virtual human. First, the optimal network is obtained by comparing the experimental results of traditional neural network (Artificial Neural Network,ANN) and depth neural network (Deep Neural Network,DNN) with a series of parameters. Secondly, the length of speech feature parameters of different contexts is set and the number of hidden layer cells is adjusted. The optimal context length is obtained. Finally, the optimal network structure is selected, and the motion path information of the speech organ output by the optimal network is used to control the speech organ motion synthesis, and the virtual human animation synthesis is realized. The second scheme: speech driven virtual speaker animation synthesis method based on MPEG-4 is a data-driven method. Firstly, the sound visual corpus is constructed from LIPS2008 database. Then, BP (Back Propagation) neural network is used to study the mapping relationship between speech feature parameters and virtual human face animation parameters (Facial Animation Parameters,FAP). Finally, according to the predicted FAP sequence, the virtual human facial model is controlled to synthesize virtual population animation. In this paper, the animations synthesized by the two schemes are evaluated subjectively and objectively, and the validity of the two schemes is proved, and the animation effect is natural and lifelike. Compared with the two animation synthesis schemes, the first one needs a lip model which is suitable for it. Although its accuracy is high, it is not universal enough, and its corpus is not easy to obtain. The second scheme conforms to MPEG-4 standard and uses FAP sequence driven virtual human facial model to synthesize animation, which is more versatile and more convenient for wide application.
【學(xué)位授予單位】:西南交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TN912.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 吳志明;侯進(jìn);位雪嶺;;基于運動分解與權(quán)重函數(shù)的嘴部中文語音動畫[J];計算機(jī)應(yīng)用研究;2016年12期
2 雷騰;侯進(jìn);王獻(xiàn);;基于改進(jìn)Candide-3模型的眼部動畫建模[J];哈爾濱工程大學(xué)學(xué)報;2015年04期
3 萬賢美;金小剛;;真實感3D人臉表情合成技術(shù)研究進(jìn)展[J];計算機(jī)輔助設(shè)計與圖形學(xué)學(xué)報;2014年02期
4 王婭;侯進(jìn);王獻(xiàn);;基于頂點權(quán)重的網(wǎng)格簡化在虛擬人臉中的應(yīng)用[J];計算機(jī)仿真;2014年02期
5 李冰鋒;謝磊;朱鵬程;樊博;;語音驅(qū)動虛擬說話人的自然頭動生成[J];清華大學(xué)學(xué)報(自然科學(xué)版);2013年06期
6 楊逸;侯進(jìn);王獻(xiàn);;基于運動軌跡分析的3D唇舌肌肉控制模型[J];計算機(jī)應(yīng)用研究;2013年07期
7 李皓;陳艷艷;唐朝京;;唇部子運動與權(quán)重函數(shù)表征的漢語動態(tài)視位[J];信號處理;2012年03期
8 李冰鋒;謝磊;周祥增;付中華;張艷寧;;實時語音驅(qū)動的虛擬說話人[J];清華大學(xué)學(xué)報(自然科學(xué)版);2011年09期
9 范懿文;柳學(xué)成;夏時洪;;人臉表情動畫與語音的典型相關(guān)性分析[J];計算機(jī)輔助設(shè)計與圖形學(xué)學(xué)報;2011年05期
10 尹寶才;王愷;王立春;;基于MPEG-4的融合多元素的三維人臉動畫合成方法[J];北京工業(yè)大學(xué)學(xué)報;2011年02期
,本文編號:2225080
本文鏈接:http://sikaile.net/kejilunwen/xinxigongchenglunwen/2225080.html