基于卷積神經(jīng)網(wǎng)絡(luò)的場(chǎng)景文本定位及多方向字符識(shí)別研究
本文選題:文本定位 + 字符識(shí)別 ; 參考:《華中科技大學(xué)》2016年博士論文
【摘要】:隨著智能交通、盲人導(dǎo)航和智能物流應(yīng)用的快速發(fā)展,包含路標(biāo)、廣告牌、車牌、書籍和物品包裝等場(chǎng)景圖像中的文本定位與識(shí)別已成為計(jì)算機(jī)視覺(jué)領(lǐng)域研究的熱點(diǎn)。由于場(chǎng)景文本圖像不僅存在分辨率低、光照不均勻、失焦模糊、仿射失真問(wèn)題,還含有樹(shù)木、磚墻和欄桿等復(fù)雜多變的背景紋理干擾,文字本身的顏色、字體、大小、方向和排列方式也具有多樣性,直接利用現(xiàn)有的光學(xué)字符識(shí)別技術(shù)處理,識(shí)別精度低,對(duì)應(yīng)用環(huán)境變化的適應(yīng)性差。因此,如何快速、準(zhǔn)確、魯棒地定位和識(shí)別場(chǎng)景圖像中的文字仍然是一個(gè)具有挑戰(zhàn)性的研究課題。大量的觀察試驗(yàn)發(fā)現(xiàn),雖然場(chǎng)景文本圖像中的背景紋理干擾是復(fù)雜多變的,但字符筆畫區(qū)域的紋理特征卻是相對(duì)不變的;谧址P畫區(qū)域紋理特征的這種不變性,本文利用卷積神經(jīng)網(wǎng)絡(luò),提出一種字符筆畫區(qū)域的紋理特征提取方法,并分別結(jié)合字符筆畫的幾何特征以及字符區(qū)域的場(chǎng)景上下文特征,來(lái)抑制背景紋理干擾,以提高場(chǎng)景圖像中文本定位的準(zhǔn)確性與適應(yīng)性。此外,為了提高字符識(shí)別對(duì)文本方向變化的適應(yīng)性,我們提出一種字符均勻采樣點(diǎn)區(qū)域的紋理特征和對(duì)應(yīng)的結(jié)構(gòu)特征提取方法,并利用特征詞袋模型和支撐向量機(jī)(SVM)進(jìn)行字符分類。因此,本文分別從場(chǎng)景圖像文本定位和識(shí)別兩個(gè)方向進(jìn)行研究,并取得了如下的研究成果:首先,由于卷積神經(jīng)網(wǎng)絡(luò)通過(guò)設(shè)計(jì)層次結(jié)構(gòu)學(xué)習(xí)可以獲取豐富的高層語(yǔ)義信息,能有效地提取背景紋理復(fù)雜的目標(biāo)區(qū)域特征,故本文通過(guò)卷積神經(jīng)網(wǎng)絡(luò),提取候選字符的紋理特征,并設(shè)計(jì)了基于聯(lián)合幾何和紋理特征的連通域SVM分類器,以抑制非字符連通域。此外,為了精確定位多方向文本區(qū)域,本文對(duì)傾斜矯正后的候選文本區(qū)域,利用幾何相似度度量和基于梯度統(tǒng)計(jì)特征的SVM分類器進(jìn)行過(guò)濾,排除背景干擾,實(shí)現(xiàn)文本的精確定位。本文提出的方法對(duì)場(chǎng)景文本的位置、角度、尺度和灰度變化有較好的適應(yīng)性,而且能有效地抑制復(fù)雜背景紋理干擾,提高場(chǎng)景圖像文本定位的精確性和適應(yīng)性。其次,利用場(chǎng)景分割模型,提出將場(chǎng)景上下文和卷積神經(jīng)網(wǎng)絡(luò)結(jié)合的場(chǎng)景文本定位方法。對(duì)于場(chǎng)景圖像中字符和背景區(qū)域的分類,大多數(shù)方法僅僅考慮字符級(jí)的特征,如對(duì)邊緣密度、筆畫寬度或梯度分布等進(jìn)行判斷,對(duì)于類字符的背景,容易得到錯(cuò)誤的分類結(jié)果。對(duì)此,本文提出利用候選字符周邊區(qū)域的場(chǎng)景上下文信息輔助場(chǎng)景文本定位的思想。首先,利用紋理基元增強(qiáng)方法(TextonBoost)和全連接的條件隨機(jī)場(chǎng)獲取圖像中每個(gè)像素點(diǎn)屬于樹(shù)木、路標(biāo)、墻、天空等14類目標(biāo)的概率,同時(shí),提取場(chǎng)景圖像中最大穩(wěn)定極值區(qū)域,并將其擴(kuò)展為矩形塊區(qū)域。然后,將矩形塊區(qū)域中所有像素點(diǎn)的概率向量平均作為該區(qū)域的場(chǎng)景上下文特征,結(jié)合卷積神經(jīng)網(wǎng)絡(luò)和SVM分類器,進(jìn)行字符和非字符分類。最后,利用場(chǎng)景上下文特征以及幾何和顏色信息將字符區(qū)域組合成文本區(qū)域。該方法可以有效地抑制文字存在概率較小的場(chǎng)景中的復(fù)雜背景紋理干擾,提高場(chǎng)景文本定位的準(zhǔn)確性。最后,為了適應(yīng)不同方向的文本識(shí)別,提出一種結(jié)合區(qū)域紋理特征和結(jié)構(gòu)特征的抗旋轉(zhuǎn)性字符表達(dá)模型。目前,場(chǎng)景文本的識(shí)別技術(shù)往往只研究水平方向的字符,缺乏通用的文字表達(dá)模型。針對(duì)這一問(wèn)題,本文利用字符結(jié)構(gòu)之間的相對(duì)方向和相對(duì)位置關(guān)系設(shè)計(jì)字符特征。在歸一化字符圖像的均勻采樣點(diǎn)上以任意一點(diǎn)為目標(biāo),計(jì)算和其他點(diǎn)的去方向性梯度統(tǒng)計(jì)特征獲得其紋理特征,并同時(shí)記錄對(duì)應(yīng)的空間坐標(biāo)關(guān)系作為結(jié)構(gòu)特征,利用特征詞袋模型對(duì)這兩種特征進(jìn)行統(tǒng)計(jì),進(jìn)而通過(guò)SVM分類器分類識(shí)別。由于提取的字符特征具有旋轉(zhuǎn)不變性,故該模型能適應(yīng)不同文本方向的變化。在標(biāo)準(zhǔn)字符數(shù)據(jù)集和任意方向字符數(shù)據(jù)集上的實(shí)驗(yàn)結(jié)果表明,本文提出的方法可以獲得較高的識(shí)別精度。
[Abstract]:With the rapid development of intelligent traffic, blind navigation and intelligent logistics applications, the location and recognition of text in scene images, including road signs, billboards, license plates, books and goods packaging, has become a hot spot in the field of computer vision. Because scene text images not only have low resolution, uneven illumination, blurred and affine distortion, and affine distortion. It also contains complex and changeable background texture interference such as trees, brick walls and railings. The color, font, size, direction and arrangement of the text itself are also diverse, and are processed directly by the existing optical character recognition technology, and the recognition accuracy is low and the adaptability to the application environment is poor. Therefore, how to quickly, accurately and robust location The text in the scene image is still a challenging research topic. A large number of observation experiments have found that although the background texture interference in the scene text image is complex and changeable, the texture features of the character strokes are relatively unchanged. This invariance based on the texture features of the character strokes is used in this paper. The convolution neural network (convolution neural network) proposes a method of texture feature extraction in character strokes. It combines the geometric features of character strokes and the context features of the character region to suppress the background texture interference, in order to improve the accuracy and adaptability of the text location in the scene image. In addition, in order to improve the character recognition to the text direction. According to the adaptability of the change, we propose a texture feature and the corresponding structural feature extraction method for the region of the character uniform sampling point, and use the feature word bag model and the support vector machine (SVM) to classify the characters. Therefore, this paper studies the two directions from the scene image text location and recognition, and the following research results are obtained. First, because the convolution neural network can acquire rich high-level semantic information through the design hierarchy process, it can effectively extract the complex target area features of the background texture. Therefore, this paper extracts the texture features of the candidate characters through the convolution neural network, and designs a connected domain SVM classifier based on the joint geometry and texture features. In order to suppress the non character connected domain. In addition, in order to locate the multi direction text area accurately, this paper filters the candidate text region after the tilt correction, uses the geometric similarity measure and the SVM classifier based on the gradient statistical feature to filter, excludes the background interference, and realizes the precise location of the text. The method proposed in this paper has the position and angle of the scene text. Degree, scale and gray scale change have good adaptability, and can effectively suppress complex background texture interference and improve the accuracy and adaptability of scene image text location. Secondly, using scene segmentation model, a scene text location method combining scene context and convolution neural network is proposed. The classification of regions, most of the methods only consider the character level features, such as the edge density, the stroke width or the gradient distribution. For the background of the character class, it is easy to get the wrong classification results. In this paper, the idea of using the scene context information in the surrounding region of the candidate character to assist the scene text location is proposed. Using the texture element enhancement method (TextonBoost) and the full connection conditional random field, each pixel in the image belongs to the probability of 14 kinds of targets, such as trees, road signs, walls, and sky. At the same time, the maximum stable extremum area in the scene image is extracted and expanded into a rectangular block region. Then, the probability of all pixels in the rectangle block region is given. As the context feature of the scene in the region, it combines the convolution neural network and the SVM classifier to classify the character and non character. Finally, the character region is combined into the text region by using the scene context features and the geometric and color information. This method can effectively suppress the complex background in the scene with small probability. Texture interference improves the accuracy of scene text location. Finally, in order to adapt to different directions of text recognition, an anti rotation character expression model combining regional texture features and structural features is proposed. At present, the scene text recognition technology often only studies the characters in the horizontal direction and lacks a universal word expression model. In this paper, the character features are designed by using the relative direction and relative position relationship between the character structures. At the uniform sampling point of the normalized character image, the texture features are obtained by any point in the uniform sampling point of the normalized character image, and the statistical features of the direction gradient of other points are calculated, and the corresponding spatial coordinates are recorded as the structural features. The two features are classified by the feature word bag model, and then the SVM classifier is classified. Because the extracted character features have rotation invariance, the model can adapt to the change of different text directions. The results of the standard character data set and the arbitrary direction character dataset show that the proposed method can be obtained. Higher recognition accuracy.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.41;TP183
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 黃治虎;;圖像文本定位技術(shù)研究[J];計(jì)算機(jī)光盤軟件與應(yīng)用;2013年01期
2 謝鳳英;姜志國(guó);汪雷;;基于空白條方向擬合的復(fù)雜文本圖像傾斜檢測(cè)[J];計(jì)算機(jī)應(yīng)用;2006年07期
3 侯躍云;劉立柱;;文本圖像語(yǔ)種識(shí)別技術(shù)[J];計(jì)算機(jī)應(yīng)用;2006年S1期
4 陸小川;伊兵哲;平西建;程娟;;含噪文本圖像的中英文文種識(shí)別研究[J];計(jì)算機(jī)工程與設(shè)計(jì);2007年21期
5 賀志明;;射影文本圖像的校正[J];電氣自動(dòng)化;2008年01期
6 劉仁金;高遠(yuǎn)飆;郝祥根;;文本圖像頁(yè)面分割算法研究[J];中國(guó)科學(xué)技術(shù)大學(xué)學(xué)報(bào);2010年05期
7 李曉昆;基于筆劃識(shí)別的文本圖像壓縮[J];微型機(jī)與應(yīng)用;1998年09期
8 曾凡鋒;付亞南;;基于文字筆畫結(jié)構(gòu)的文本圖像校正處理[J];無(wú)線互聯(lián)科技;2014年02期
9 童莉,平西建;基于信息度量的圖像特征與文本圖像分類[J];計(jì)算機(jī)工程;2004年17期
10 賀志明;;數(shù)碼相機(jī)拍攝的透視文本圖像的校正[J];上海工程技術(shù)大學(xué)學(xué)報(bào);2007年03期
相關(guān)會(huì)議論文 前1條
1 李蘭蘭;吳樂(lè)南;;一種帶噪聲文本圖像的增強(qiáng)算法[A];全國(guó)第16屆計(jì)算機(jī)科學(xué)與技術(shù)應(yīng)用(CACIS)學(xué)術(shù)會(huì)議論文集[C];2004年
相關(guān)重要報(bào)紙文章 前1條
1 ;認(rèn)識(shí)自動(dòng)OCR技術(shù)[N];計(jì)算機(jī)世界;2000年
相關(guān)博士學(xué)位論文 前10條
1 朱安娜;基于卷積神經(jīng)網(wǎng)絡(luò)的場(chǎng)景文本定位及多方向字符識(shí)別研究[D];華中科技大學(xué);2016年
2 章東平;視頻文本的提取[D];浙江大學(xué);2006年
3 戴祖旭;文本載體信息隱藏研究[D];華中科技大學(xué);2007年
4 許劍峰;數(shù)字視頻中的文本分割的研究[D];華南理工大學(xué);2005年
5 譚利娜;文本圖像魯棒認(rèn)證技術(shù)研究[D];湖南大學(xué);2012年
6 王振;數(shù)字視頻中文本的提取方法研究[D];中國(guó)海洋大學(xué);2011年
7 黃曉冬;基于特征融合的視頻文本獲取研究[D];北京郵電大學(xué);2010年
8 張昕;自然場(chǎng)景圖像文本信息提取的理論與方法[D];清華大學(xué);2014年
9 孫羽菲;低質(zhì)量文本圖像OCR技術(shù)的研究[D];中國(guó)科學(xué)院研究生院(計(jì)算技術(shù)研究所);2005年
10 劉麗;近重復(fù)文本圖像匹配研究[D];華東師范大學(xué);2014年
相關(guān)碩士學(xué)位論文 前10條
1 肖媛;文本圖像復(fù)原方法的研究[D];昆明理工大學(xué);2015年
2 李曉鑫;嵌入式平臺(tái)下場(chǎng)景圖片中文字定位與識(shí)別的實(shí)現(xiàn)[D];內(nèi)蒙古大學(xué);2015年
3 袁俊淼;基于幾何約束的筆劃寬度變換(SWT)算法及其字幕文本定位應(yīng)用[D];電子科技大學(xué);2015年
4 滕苑;二值文本圖像數(shù)字水印研究[D];吉林大學(xué);2015年
5 張?chǎng)?脫機(jī)手寫維吾爾文本圖像中粘連字符定位及分割[D];新疆大學(xué);2015年
6 王國(guó)成;基于形態(tài)學(xué)的文本圖像光照均衡化算法研究及實(shí)現(xiàn)[D];電子科技大學(xué);2015年
7 尹占輝;場(chǎng)景圖像文本區(qū)域定位方法研究與實(shí)現(xiàn)[D];西安電子科技大學(xué);2014年
8 張勝龍;基于文本圖像二值算法的優(yōu)化研究[D];湘潭大學(xué);2015年
9 孫婷;基于連通域的中英文混排扭曲圖像校正研究[D];北方工業(yè)大學(xué);2016年
10 徐浩然;基于Harris角點(diǎn)的網(wǎng)絡(luò)視頻中文本區(qū)域檢測(cè)方法的研究[D];吉林大學(xué);2016年
,本文編號(hào):1825371
本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/1825371.html