掃描檔案的圖像處理技術(shù)
本文選題:掃描檔案 + 圖像處理 ; 參考:《天津大學(xué)》2016年碩士論文
【摘要】:對紙質(zhì)保護(hù)和利用一直是檔案系統(tǒng)的重要工作,紙質(zhì)檔案容易損壞、查閱效率低、無法通過計算機方便利用等問題也一直困擾著檔案人員。對紙質(zhì)檔案進(jìn)行數(shù)字化掃描,將其轉(zhuǎn)化為數(shù)字信息是現(xiàn)今采取的比較常見又行之有效的技術(shù)手段。近年來,檔案數(shù)字化加工在我國已經(jīng)由省市級檔案館推進(jìn)到區(qū)縣級檔案館。從全國范圍內(nèi)看檔案數(shù)字化的增速很快,但由于歷史欠賬過多、各地區(qū)發(fā)展不平衡等原因,已經(jīng)經(jīng)過數(shù)字化加工的檔案占全部檔案的比重還很有限。隨著檔案數(shù)字化在我國進(jìn)程的不斷加快,掃描檔案存在的一些問題也逐漸引起檔案部門的重視。由于設(shè)備、文件保存質(zhì)量等原因造成掃描的圖像存在噪聲、圖像明暗不均、扭曲變形等情況,這些問題嚴(yán)重影響到對數(shù)字圖像的OCR(Optical Character Recognition光學(xué)識別技術(shù))識別,以及后期對掃描檔案的利用。本文主要研究利用數(shù)字圖像處理技術(shù)解決掃描圖像存在的問題。由于檔案資源有其自身的保密屬性,在保證質(zhì)量相同的前提下看,本文使用一些公開的掃描圖像來模擬紙質(zhì)檔案的掃描圖像,在MATALB(Matrix Laboratory矩陣實驗室)軟件環(huán)境下進(jìn)行仿真。介紹了OCR識別軟件的工作原理,結(jié)合檔案管理的自身特點,對形成的數(shù)字檔案文件的文件格式進(jìn)行了探討,分析了檔案文件對格式的要求,比較了常見的幾種文件格式。分析了掃描檔案產(chǎn)生噪音的特點,特別是其產(chǎn)生椒鹽噪聲和高斯噪聲的情況,嘗試了灰度直方圖算法、均值濾波、中值濾波等圖像預(yù)處理算法,并提出一種基于中值濾波的自適應(yīng)算法。針對圖像中需要提取加工的信息使用圖像分割技術(shù),將存在信息一致性的圖像區(qū)域整體提取出來進(jìn)行分析。使用圖像二值化算法確定合適的閾值對圖像信息進(jìn)行提取,提高OCR識別率。使用邊緣檢測算法,銳化圖像中部分模糊的信息。
[Abstract]:The protection and utilization of paper has always been an important work in the file system. The problems such as easy damage of paper archives, low efficiency of consulting and being unable to be easily used by computer have also troubled archivists. It is a common and effective technique to scan paper files digitally and convert them into digital information. In recent years, the digital processing of archives has been promoted from provincial and municipal archives to district and county level archives in China. The digitization of archives is increasing rapidly in the whole country, but the proportion of digitally processed archives to the total archives is still very limited because of too much historical debts and unbalanced development in various regions. With the rapid development of file digitization in our country, some problems existing in scanning archives have been paid more and more attention by archives departments. Because of the equipment, the quality of document preservation and so on, the scanned image has noise, uneven light and dark, distorted deformation and so on. These problems seriously affect the recognition of digital image by OCR(Optical Character Recognition optical recognition technology. And the later use of scanned files. In this paper, digital image processing technology is mainly used to solve the problem of scanning image. Because archives have their own secret property, under the premise of ensuring the same quality, this paper uses some open scanning images to simulate the scanned images of paper files, and simulates them under the software environment of MATALB(Matrix Laboratory Matrix Laboratory. This paper introduces the working principle of OCR recognition software, discusses the file format of the digital archive file formed by combining with the characteristics of file management, analyzes the requirements of the file format, and compares several common file formats. In this paper, the characteristics of noise generated by scanning file, especially the noise of salt and pepper and Gao Si noise, are analyzed. The image preprocessing algorithms, such as gray histogram algorithm, mean filter, median filter and so on, are tried. An adaptive algorithm based on median filter is proposed. The image segmentation technique is used to extract the information which needs to be extracted and processed in the image, and the whole image region with information consistency is extracted for analysis. Image binarization algorithm is used to determine the appropriate threshold to extract the image information to improve the OCR recognition rate. The edge detection algorithm is used to sharpen the partially blurred information in the image.
【學(xué)位授予單位】:天津大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP391.41
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 索辰妍;;數(shù)字化檔案文件格式選擇的基本要求[J];黑龍江科技信息;2016年07期
2 鮑義東;周改云;趙偉艇;;自適應(yīng)蟻群和模糊聚類的SAR圖像分割[J];測繪科學(xué);2016年08期
3 張婷;王卓英;;手寫體數(shù)字計算機識別系統(tǒng)的應(yīng)用研究[J];微型電腦應(yīng)用;2016年01期
4 李明華;;在全國檔案工作暨表彰先進(jìn)會議上的講話[J];中國檔案;2016年01期
5 李春剛;;檔案數(shù)字化建設(shè)實踐的探討[J];電子測試;2016年01期
6 王玲麗;;淺談OCR技術(shù)在圖書館文獻(xiàn)資源加工中的應(yīng)用——以上海圖書館近代文獻(xiàn)全文OCR數(shù)據(jù)制作項目為例[J];數(shù)字與縮微影像;2015年01期
7 張培華;;淺議照片檔案的信息化建設(shè)——以廣東省檔案館聲像檔案信息化建設(shè)為例[J];廣東檔案;2014年04期
8 段煉;;照片檔案數(shù)字化管理研究[J];才智;2014年18期
9 孫琰;;照片檔案的管理及安全保護(hù)[J];科技情報開發(fā)與經(jīng)濟(jì);2013年14期
10 李淑梅;;現(xiàn)代檔案管理如何發(fā)揮圖像處理的最大功效[J];黑龍江史志;2013年11期
相關(guān)碩士學(xué)位論文 前4條
1 蔣智文;視覺文檔圖像的矯正方法研究[D];華南理工大學(xué);2015年
2 吳翔;數(shù)字圖像處理在辦公自動化系統(tǒng)中的應(yīng)用[D];濟(jì)南大學(xué);2014年
3 梁晨曦;數(shù)字檔案管理系統(tǒng)的設(shè)計與實現(xiàn)[D];天津大學(xué);2013年
4 尚晉;圖像處理在工商企業(yè)檔案信息系統(tǒng)中的應(yīng)用研究[D];重慶大學(xué);2007年
,本文編號:1886683
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1886683.html