基于iOS系統(tǒng)的語音云開放平臺(tái)客戶端SDK的設(shè)計(jì)與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-06-12 04:45
本文選題:語音云 + 語音識(shí)別; 參考:《北京郵電大學(xué)》2014年碩士論文
【摘要】:在智能手機(jī)與智能平板等移動(dòng)終端高度普及的今天,移動(dòng)互聯(lián)網(wǎng)飛速發(fā)展,移動(dòng)終端應(yīng)用對(duì)文字輸入的要求也變得越來越高,導(dǎo)航類、聊天類等應(yīng)用更是希望通過語音識(shí)別技術(shù)解放用戶雙手進(jìn)行文字輸入。隨著iOS設(shè)備上Siri平臺(tái)的日漸成熟,各大互聯(lián)網(wǎng)公司也相繼推出了自己的語音識(shí)別系統(tǒng),但就目前來看iOS系統(tǒng)還未能給開發(fā)者提供公共的Siri API來調(diào)用語音識(shí)別功能,而各大互聯(lián)網(wǎng)公司對(duì)客戶端語音識(shí)別SDK又有嚴(yán)格限制,iOS系統(tǒng)缺乏通用的開放的語音識(shí)別SDK供開發(fā)者使用。 本文主要研究了目前在iOS系統(tǒng)上可用的開放語音識(shí)別SDK,對(duì)比各語音識(shí)別SDK的產(chǎn)品功能,分析開發(fā)者對(duì)語音識(shí)別SDK的需求,提出了一整套新的解決方案來實(shí)現(xiàn)客戶端語音識(shí)別SDK,全稱為語音云開放平臺(tái)客戶端SDK,簡稱語音云SDK。語音云SDK使開發(fā)者可以輕松地在iOS設(shè)備上構(gòu)建功能完備、交互性強(qiáng)的語音識(shí)別應(yīng)用程序,在整個(gè)開發(fā)和使用過程中,開發(fā)者無需維護(hù)語音引擎即可享有語音識(shí)別服務(wù)。 本文在軟件工程思想的指導(dǎo)下,按照軟件開發(fā)的過程,逐步實(shí)現(xiàn)語音云SDK系統(tǒng)。首先在了解了語音識(shí)別服務(wù)器端的基本流程,結(jié)合用戶對(duì)語音識(shí)別的使用習(xí)慣,提出了語音云開放平臺(tái)客戶端SDK的需求,需求分析主要列出了語音云SDK給用戶提供的功能以及語音云與服務(wù)器交互需要實(shí)現(xiàn)的功能。在詳細(xì)的需求分析后對(duì)語音云SDK進(jìn)行了詳細(xì)地設(shè)計(jì),設(shè)計(jì)過程中將整個(gè)語音云SDK按照功能分成了幾個(gè)主要模塊,分別為:錄音模塊、有效聲音檢測模塊、音頻壓縮編碼模塊、網(wǎng)絡(luò)收發(fā)模塊以及識(shí)別結(jié)果回傳模塊等,并詳細(xì)地列舉了各個(gè)模塊內(nèi)的參數(shù)和方法,最后通過圖表解釋了各模塊之間的工作流程以及交互關(guān)系。接下來根據(jù)設(shè)計(jì)進(jìn)行了代碼實(shí)現(xiàn),代碼實(shí)現(xiàn)的過程是按照音頻數(shù)據(jù)在各模塊中的流程順序分先后實(shí)現(xiàn)。最后對(duì)整個(gè)語音云SDK進(jìn)行了系統(tǒng)化的軟件測試,并通過軟件測試進(jìn)一步完善了整個(gè)語音云SDK的可用性和安全性。
[Abstract]:With the popularity of mobile terminals, such as smart phones and intelligent tablets, mobile Internet has developed rapidly, and mobile terminal applications have become more and more demanding for text input. The applications of navigation and chat classes are more likely to emancipate users through speech recognition technology. With the increasing of the Siri platform on iOS devices Mature, the major Internet Co have also launched their own speech recognition system, but at present, the iOS system has not provided the developer with the public Siri API to call the voice recognition function, and the major Internet Co has strict restrictions on the client voice recognition SDK, and the iOS system lacks general open speech recognition SDK for opening. The hair is used.
This paper mainly studies the open speech recognition SDK available on the iOS system, compares the product function of each voice recognition SDK, analyzes the developer's demand for the voice recognition SDK, and puts forward a set of new solutions to realize the client voice recognition SDK, which is called the voice cloud open platform client SDK, abbreviated as voice cloud SDK. voice cloud SD. K makes it easy for developers to build a fully functional and interactive voice recognition application on iOS devices. In the whole process of development and use, developers can enjoy voice recognition services without the need to maintain a voice engine.
Under the guidance of software engineering thought, the speech cloud SDK system is gradually realized in accordance with the software development process. First, the basic flow of the voice recognition server is understood, and the requirement of the voice cloud open platform client SDK is put forward by combining the user's habit of using speech recognition. The requirement analysis mainly lists the voice cloud SDK. The function provided by the user and the function of the voice cloud and the server interaction need to be realized. After detailed requirement analysis, the voice cloud SDK is designed in detail. The whole voice cloud SDK is divided into several main modules in the design process, which are the recording module, the effective sound detection module, the audio compression coding module, and the network. The parameters and methods of each module are enumerated in detail. Finally, the work flow and interaction between each module are explained by the chart. Then the code implementation is carried out according to the design. The process of the code realization is divided into the process sequence of each module according to the audio data. Finally, the whole voice cloud SDK is tested in a systematic way, and the usability and security of the whole voice cloud SDK is further improved through software testing.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TN912.34
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 劉澤琛;;語音端點(diǎn)檢測的常用方法及改進(jìn)[J];高等函授學(xué)報(bào)(自然科學(xué)版);2008年03期
2 李榮榮;胡昌奎;余娟;;基于譜熵的語音端點(diǎn)檢測算法改進(jìn)研究[J];武漢理工大學(xué)學(xué)報(bào);2013年07期
,本文編號(hào):2008426
本文鏈接:http://sikaile.net/kejilunwen/wltx/2008426.html
最近更新
教材專著