基于自然場景的文本識別技術研究

發(fā)布時間：2019-06-25 21:25

【摘要】：隨著智能設備的普及,智能設備拍攝的場景圖像中包含的高級語義信息越來越受研究人員的關注。傳統(tǒng)的光學字符識別技術往往無法很好地應用于場景圖像文本的提取和識別,因為場景圖像具有掃描圖像所不具備的特性。造成識別困難的原因一方面是由于場景圖像本身背景復雜,且拍攝場景圖像時往往條件不可控,使得場景圖像存在分辨率低、光照不均、模糊等問題;另一方面,由于場景圖像中的字符往往在字體、大小和顏色等方面各有差異。因此,針對場景圖像的文本識別問題需要新的解決方法。本文的工作是基于這樣的背景開展的。本文主要工作包括:(1)深入研究和分析了當前自然場景文本識別領域的相關技術和現(xiàn)狀;(2)基于深度學習中的卷積神經(jīng)網(wǎng)絡,實現(xiàn)了一個端到端的場景文本識別系統(tǒng)——MatE2E系統(tǒng)。該系統(tǒng)利用卷積神經(jīng)網(wǎng)絡學習字符特征,訓練了兩個分類器,分別用于字符的判定和識別。MatE2E系統(tǒng)的主要模塊有兩個。第一個主要模塊是文本檢測模塊,該模塊利用字符判斷分類器和滑動窗口掃描場景圖像來檢測可能的文本區(qū)域,之后根據(jù)圖像中不同位置的文本置信度過濾非文本區(qū)域。第二個主要模塊是文本識別模塊,該模塊結合字符識別分類器和滑動窗口掃描文本區(qū)域圖像并識別其中的文字,之后利用詞典更正文本識別結果;(3)使用ICDAR2011數(shù)據(jù)集、ICDAR2015數(shù)據(jù)集和街景數(shù)據(jù)集驗證了系統(tǒng)的準確率。實驗結果表明,本文提出的系統(tǒng)具有較好的識別效果。本文在實驗數(shù)據(jù)集上的驗證結果表明MatE2E系統(tǒng)在實際應用中具有一定的參考價值。不過,MatE2E系統(tǒng)還有局限性,例如:只能識別英文字母和數(shù)字;另外還需在識別速度、傾斜文本檢測等方面進行改進等。
[Abstract]:With the popularity of intelligent devices, more and more researchers pay attention to the advanced semantic information contained in the scene images taken by intelligent devices. The traditional optical character recognition technology can not be well applied to the extraction and recognition of scene image text, because the scene image has the characteristics that scanning image does not have. On the one hand, the difficulty of recognition is due to the complexity of the background of the scene image itself, and the conditions are often uncontrollable when shooting the scene image, which makes the scene image have some problems, such as low resolution, uneven light, blurring and so on; on the other hand, the characters in the scene image are often different in font, size and color. Therefore, new solutions are needed to solve the problem of text recognition of scene images. The work of this paper is based on this background. The main work of this paper is as follows: (1) the related technologies and present situation in the field of natural scene text recognition are deeply studied and analyzed; (2) based on the convolution neural network in deep learning, an end-to-end scene text recognition system, MatE2E system, is implemented. The system uses convolution neural network to learn character features and trains two classifiers for character determination and recognition respectively. There are two main modules of MatE2E system. The first main module is the text detection module, which uses character judgment classifier and sliding window to scan the scene image to detect the possible text area, and then filters the non-text region according to the text confidence of different positions in the image. The second main module is the text recognition module, which combines character recognition classifier and sliding window to scan the text area image and recognize the text, and then uses the dictionary to correct the text recognition results. (3) ICDAR2011 dataset, ICDAR2015 dataset and street view data set are used to verify the accuracy of the system. The experimental results show that the system proposed in this paper has a good recognition effect. The verification results on the experimental data set show that the MatE2E system has certain reference value in practical application. However, MatE2E system has limitations, such as: can only recognize English letters and numbers; in addition, it also needs to be improved in recognition speed, tilt text detection and so on.
【學位授予單位】：北京郵電大學
【學位級別】：碩士
【學位授予年份】：2016
【分類號】：TP391.4

【相似文獻】

相關期刊論文前10條

1 王莉麗;于印;;一種基于雙向投影的文本圖像字符分割方法[J];數(shù)字技術與應用;2017年05期

2 薛松;于印;;紙質(zhì)文檔數(shù)據(jù)防泄與追溯中文本圖像傾斜校正方法[J];電子技術與軟件工程;2017年04期

3 賀志明;;透視文本圖像的滅點探測[J];上海工程技術大學學報;2009年03期

4 李曉昆;基于筆劃識別的文本圖像壓縮[J];微型機與應用;1998年09期

5 賀志明;;射影文本圖像的校正[J];電氣自動化;2008年01期

6 賀志明;;數(shù)碼相機拍攝的透視文本圖像的校正[J];上海工程技術大學學報;2007年03期

7 謝鳳英;姜志國;汪雷;;基于空白條方向擬合的復雜文本圖像傾斜檢測[J];計算機應用;2006年07期

8 董湘君,常鴻森,鄭楚君;一種基于小波變換的文本圖像閾值法分割[J];華南師范大學學報(自然科學版);2004年03期

9 萬長明,趙宇明,趙麗;基于粗糙集的彩色文本圖像特征分割算法[J];紅外與激光工程;2003年06期

10 沈大龍,任東,陳增強,袁著祉;基于最小二乘法的壓縮文本圖像恢復算法[J];計算機工程與應用;2002年02期

相關會議論文前4條

1 馮莉;;文本識別技術在電視內(nèi)容監(jiān)管中的應用[A];中國新聞技術工作者聯(lián)合會2017年學術年會論文集（學術論文篇）[C];2017年

2 李蘭蘭;吳樂南;;一種帶噪聲文本圖像的增強算法[A];全國第16屆計算機科學與技術應用（CACIS）學術會議論文集[C];2004年

3 張媛;蔡利棟;;一種去除文本圖像椒鹽噪聲的方法[A];圖像圖形技術研究與應用(2010)[C];2010年

4 哈力木拉提．買買提;;基于輪廓的維吾爾文切分[A];民族語言文字信息技術研究——第十一屆全國民族語言文字信息學術研討會論文集[C];2007年

相關重要報紙文章前2條

1 諸艷;輕松在線OCR[N];中國電腦教育報;2004年

2 上海市高東中學鄭鋼;練就一雙“讀圖時代”的慧眼[N];中國教育報;2015年

相關博士學位論文前10條

1 許劍峰;數(shù)字視頻中的文本分割的研究[D];華南理工大學;2005年

2 章東平;視頻文本的提取[D];浙江大學;2006年

3 劉濤;現(xiàn)代信息檢索中的文本分類及圖像恢復研究[D];北京郵電大學;2006年

4 孫羽菲;低質(zhì)量文本圖像OCR技術的研究[D];中國科學院研究生院（計算技術研究所）;2005年

5 楊春;復雜場景文本識別技術研究[D];北京科技大學;2018年

6 劉麗;近重復文本圖像匹配研究[D];華東師范大學;2014年

7 譚利娜;文本圖像魯棒認證技術研究[D];湖南大學;2012年

8 吳銳;自然場景中文本識別技術研究及實現(xiàn)[D];哈爾濱工業(yè)大學;2010年

9 張昕;自然場景圖像文本信息提取的理論與方法[D];清華大學;2014年

10 孫日明;幾種圖形圖像壓縮方法[D];大連理工大學;2013年

相關碩士學位論文前10條

1 黃舒嘯;基于自然場景的文本識別技術研究[D];北京郵電大學;2016年

2 鐘巧;基于圖論的掃描圖像文本行分割與矯正[D];湖南大學;2017年

3 張媛媛;基于l_p范數(shù)稀疏先驗的文本圖像去模糊算法[D];北京工業(yè)大學;2017年

4 張鵬;基于機器學習的自然圖像中文本檢測及多文種辨識方法研究[D];延邊大學;2017年

5 張紅;基于L_0正則化的文本圖像去模糊方法研究[D];昆明理工大學;2017年

6 杜敏;文本圖像認證技術研究[D];西北大學;2010年

7 陳侃;基于模糊計算的文本圖像二值化方法研究與應用[D];北方工業(yè)大學;2010年

8 張鑫;脫機手寫維吾爾文本圖像中粘連字符定位及分割[D];新疆大學;2015年

9 朱其猛;基于文字結構特征的文本圖像方向的研究與應用[D];北方工業(yè)大學;2014年

10 周強;基于模糊核稀疏先驗的文本圖像運動模糊去除[D];安徽大學;2017年

，

本文編號：2506022

資料下載

論文發(fā)表

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2506022.html

上一篇：計算成像領域新理論和新方法（英文）
下一篇：基于.NET的高校作業(yè)管理系統(tǒng)的設計與實現(xiàn)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于自然場景的文本識別技術研究