基于視覺(jué)和聽(tīng)覺(jué)融合的移動(dòng)機(jī)器人目標(biāo)識(shí)別與定位方法研究
本文選題:移動(dòng)機(jī)器人 切入點(diǎn):聲源定位 出處:《南京理工大學(xué)》2017年碩士論文
【摘要】:隨著信息技術(shù)水平的提高,機(jī)器人在家庭服務(wù)中的作用也變得尤為突出,研究具有感知和決策能力的服務(wù)機(jī)器人具有重要的意義。基于人類(lèi)在其生活環(huán)境中扮演著至關(guān)重要的角色,本文以人為目標(biāo),分析其聽(tīng)覺(jué)和視覺(jué)特征,主要研究了基于視聽(tīng)覺(jué)融合的移動(dòng)機(jī)器人目標(biāo)識(shí)別與定位技術(shù),著重做了以下的研究工作:首先,總結(jié)了移動(dòng)機(jī)器人的目標(biāo)識(shí)別與定位方法的研究現(xiàn)狀,對(duì)移動(dòng)機(jī)器人的視聽(tīng)覺(jué)融合進(jìn)行了系統(tǒng)平臺(tái)設(shè)計(jì),分析了移動(dòng)機(jī)器人交互控制以及場(chǎng)景的應(yīng)用。其次,研究了移動(dòng)機(jī)器人的目標(biāo)定位。對(duì)聲源定位的方法進(jìn)行了研究,在多信號(hào)分類(lèi)算法中,采用廣義特征值分解抑制噪聲的影響。移動(dòng)機(jī)器人根據(jù)聲源定位得到的方位角,結(jié)合運(yùn)動(dòng)距離,利用三角測(cè)量得到聲源距離。通過(guò)實(shí)驗(yàn)分析驗(yàn)證了該方法精度和魯棒性。再次,研究了說(shuō)話人識(shí)別和人臉識(shí)別技術(shù)。在人臉識(shí)別之前,首先進(jìn)行人臉檢測(cè)定位人臉區(qū)域,將人臉圖像進(jìn)行分塊處理,利用小波分解和奇異值分解相結(jié)合提取特征后,采用稀疏表示的人臉識(shí)別;在說(shuō)話人識(shí)別中,語(yǔ)音預(yù)處理后,經(jīng)過(guò)同態(tài)處理和倒譜分析后,提取語(yǔ)音的特征Mel頻率倒譜系數(shù),進(jìn)行矢量量化,通過(guò)LBG聚類(lèi)處理,為說(shuō)話人建立碼本模型。最后通過(guò)實(shí)驗(yàn)表明,說(shuō)話人識(shí)別和人臉識(shí)別的有效性。進(jìn)一步地,研究了基于視聽(tīng)信息融合的識(shí)別技術(shù)。分別在匹配層和決策層上對(duì)語(yǔ)音和人臉信息進(jìn)行了融合識(shí)別。在匹配層上提出了基于語(yǔ)音優(yōu)先的匹配的加權(quán)融合和基于人臉優(yōu)先匹配的加權(quán)融合,并通過(guò)實(shí)驗(yàn)與非加權(quán)融合進(jìn)行了對(duì)比,驗(yàn)證了加權(quán)融合的識(shí)別率更高。在決策層上利用模糊積分將說(shuō)話人識(shí)別和人臉識(shí)別的輸出結(jié)果進(jìn)行非線性的加權(quán),最后通過(guò)實(shí)驗(yàn)表明了模糊積分對(duì)于視聽(tīng)融合的有效性。最后,在目標(biāo)定位與識(shí)別的基礎(chǔ)上,搭建了移動(dòng)機(jī)器人的目標(biāo)定位與識(shí)別系統(tǒng)平臺(tái)。
[Abstract]:With the improvement of the level of information technology, the role of robots in home service has become more and more prominent. It is of great significance to study service robots with the ability of perception and decision making.Based on the fact that human beings play an important role in their living environment, this paper analyzes the auditory and visual characteristics of human beings, and mainly studies the target recognition and localization technology of mobile robots based on audiovisual fusion.The following research work is emphasized: firstly, the research status of target recognition and localization method of mobile robot is summarized, and the system platform of audio-visual fusion of mobile robot is designed.The interactive control of mobile robot and the application of scene are analyzed.Secondly, the target location of mobile robot is studied.In this paper, the method of acoustic source location is studied. In the multi-signal classification algorithm, generalized eigenvalue decomposition is used to suppress the influence of noise.According to the azimuth of sound source and the distance of motion, the distance of sound source is obtained by triangulation.The accuracy and robustness of the method are verified by experimental analysis.Thirdly, the techniques of speaker recognition and face recognition are studied.Before face recognition, first, face detection is carried out to locate the face region, then the face image is divided into blocks. After the feature is extracted by wavelet decomposition and singular value decomposition, the sparse representation of face recognition is adopted.After speech preprocessing and homomorphism processing and cepstrum analysis, the speech feature Mel frequency cepstrum number is extracted, vector quantization is carried out, and the codebook model is established for the speaker by LBG clustering processing.Finally, experiments show that speaker recognition and face recognition are effective.Furthermore, the recognition technology based on audiovisual information fusion is studied.The speech and face information are fused and recognized at the matching layer and the decision level respectively.The weighted fusion based on speech first matching and the weighted fusion based on face priority matching are proposed on the matching layer, and compared with non-weighted fusion through experiments, it is proved that the recognition rate of weighted fusion is higher than that of non-weighted fusion.The output results of speaker recognition and face recognition are weighted by fuzzy integral in decision level. Finally, the effectiveness of fuzzy integral for audio-visual fusion is demonstrated by experiments.Finally, on the basis of target location and recognition, a mobile robot target location and recognition system platform is built.
【學(xué)位授予單位】:南京理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:TP391.41;TP242
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 霍靜茹;宋文豪;;基于傳聲器陣列的聲源定位[J];信息技術(shù);2016年06期
2 江偉堅(jiān);郭躬德;賴智銘;;基于新Haar-like特征的Adaboost人臉檢測(cè)算法[J];山東大學(xué)學(xué)報(bào)(工學(xué)版);2014年02期
3 董遠(yuǎn);陸亮;趙賢宇;趙建;;對(duì)本文無(wú)關(guān)的說(shuō)話人驗(yàn)證中模型距離歸一化問(wèn)題的研究(英文)[J];自動(dòng)化學(xué)報(bào);2009年05期
4 嚴(yán)云洋;郭志波;陳伏兵;楊靜宇;;融合多尺度多特征的人臉識(shí)別方法[J];南京理工大學(xué)學(xué)報(bào)(自然科學(xué)版);2009年01期
5 李金秀;高新波;楊越;肖冰;;一種基于E-HMM的選擇性集成人臉識(shí)別算法[J];電子與信息學(xué)報(bào);2009年02期
6 孔志周;蔡自興;;分類(lèi)器融合中模糊積分理論研究進(jìn)展[J];小型微型計(jì)算機(jī)系統(tǒng);2008年06期
7 孔勇平;;矢量量化LBG算法的研究[J];硅谷;2008年06期
8 ;生物特征識(shí)別技術(shù)概述[J];中國(guó)自動(dòng)識(shí)別技術(shù);2006年01期
9 陳可,汪增福;基于聲壓幅度比的聲源定位[J];計(jì)算機(jī)仿真;2004年11期
10 陳華偉,趙俊渭,郭業(yè)才;一種頻域自適應(yīng)最大似然時(shí)延估計(jì)算法[J];系統(tǒng)工程與電子技術(shù);2003年11期
,本文編號(hào):1695572
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1695572.html