敦煌遺書(shū)綴殘中的相關(guān)殘片檢索技術(shù)研究及系統(tǒng)實(shí)現(xiàn)
發(fā)布時(shí)間:2018-05-26 17:04
本文選題:敦煌遺書(shū) + 綴殘 ; 參考:《浙江大學(xué)》2017年碩士論文
【摘要】:敦煌遺書(shū)是指敦煌莫高窟中出土的一批具有重要研究?jī)r(jià)值的古代文籍,F(xiàn)今敦煌遺書(shū)由于出土?xí)r代對(duì)文物不重視等各類(lèi)原因,散布在世界各地,不便于學(xué)者們的研究工作。2012年開(kāi)始的國(guó)家重點(diǎn)項(xiàng)目敦煌遺書(shū)數(shù)據(jù)庫(kù)的建設(shè),使學(xué)者們能夠更方便地在線對(duì)敦煌遺書(shū)進(jìn)行研究。由于年代久遠(yuǎn),敦煌遺書(shū)中存在大量殘片殘卷,其中許多是可以綴合的。但由于遺書(shū)整體數(shù)量眾多,人工綴合費(fèi)時(shí)費(fèi)力。隨著數(shù)字化技術(shù)的發(fā)展,利用圖像檢索技術(shù)幫助進(jìn)行敦煌遺書(shū)綴殘工作成為可能,這既是敦煌遺書(shū)相關(guān)研究的基本需求,也是敦煌遺書(shū)數(shù)據(jù)庫(kù)項(xiàng)目的重要功能。在數(shù)字化敦煌遺書(shū)綴殘工作中,核心問(wèn)題是相關(guān)殘片檢索,即可以綴合的殘片檢索,這也是本文的主要研究課題。本文的研究工作主要內(nèi)容如下:首先,針對(duì)遺書(shū)綴殘的需求,確定了遺書(shū)殘片的材質(zhì)、邊緣和字形三個(gè)主要特征,并提出了基于這三個(gè)特征組成的遺書(shū)殘片圖像特征模型。針對(duì)遺書(shū)殘片圖像顏色組成類(lèi)別較為明確的特點(diǎn),設(shè)計(jì)了篩選主次要顏色的方法,并利用主色調(diào)顏色直方圖的思想設(shè)計(jì)了材質(zhì)特征直方圖表示材質(zhì)特征。針對(duì)遺書(shū)綴殘主要考慮殘片圖像左右邊緣匹配程度的特點(diǎn),提出了基于Canny算法的遺書(shū)殘片左右邊緣提取算法,并利用左右邊緣點(diǎn)集表示邊緣特征。研究并綜合SURF算法和最大最小聚類(lèi)算法,設(shè)計(jì)了字形特征提取算法,并利用每個(gè)字的特征點(diǎn)集表示字形特征。其次,研究了各個(gè)特征的差異度定義,并綜合提出了遺書(shū)殘片圖像差異度定義以及基于遺書(shū)殘片圖像差異度的相關(guān)殘片檢索算法。研究了利用EMD距離定義材質(zhì)差異度的方法。設(shè)計(jì)了統(tǒng)一圖像邊緣基準(zhǔn)的方法,并利用統(tǒng)一基準(zhǔn)后的Hausdorff距離定義邊緣差異度。設(shè)計(jì)了建立字形方向向量直方圖的方法,然后利用EMD距離定義字形差異度。并在以上三者基礎(chǔ)上,提出了遺書(shū)殘片圖像的差異度定義,并提出了基于此差異度的相關(guān)殘片檢索算法。該算法輸入一個(gè)殘片圖像集合,然后先將所有圖像根據(jù)材質(zhì)特征聚類(lèi),然后分別計(jì)算聚類(lèi)中所有圖像間綜合差異度,并根據(jù)朝代信息進(jìn)行過(guò)濾,最后輸出殘片集合中每一張殘片的對(duì)應(yīng)匹配殘片。最后,針對(duì)國(guó)家重點(diǎn)項(xiàng)目敦煌遺書(shū)數(shù)據(jù)庫(kù)的二期工作需求,本文設(shè)計(jì)了敦煌遺書(shū)數(shù)據(jù)庫(kù)二期系統(tǒng)的主要模塊,實(shí)現(xiàn)了其中的一系列高級(jí)功能。并將之前提出的相關(guān)殘片檢索算法應(yīng)用到敦煌遺書(shū)數(shù)據(jù)庫(kù)項(xiàng)目中,完成了相關(guān)殘片瀏覽這個(gè)二期項(xiàng)目主要功能。
[Abstract]:Dunhuang remains are ancient books unearthed in Dunhuang Mogao Grottoes. Due to various reasons such as the lack of attention to cultural relics in the unearthed era, Dunhuang remains scattered all over the world and is not conducive to the research work of scholars. The construction of the Dunhuang legacy database, a national key project, began in 2012. It makes it easier for scholars to study Dunhuang inscriptions online. Because of old age, Dunhuang remains a large number of fragments, many of which can be conjugated. However, due to the large number of suicide notes, manual conjugation takes time and effort. With the development of digital technology, it is possible to use image retrieval technology to help carry out the work of Dunhuang relic affixes, which is not only the basic demand of Dunhuang inscription research, but also the important function of Dunhuang sequestration database project. In the work of digitizing Dunhuang relic, the core problem is the retrieval of relevant fragments, that is, the retrieval of fragments that can be conjugated, which is also the main research topic of this paper. The main contents of this paper are as follows: firstly, according to the requirements of the scraps, the paper determines the material quality, edge and glyph of the scraps, and puts forward the image feature model of the scraps based on these three features. In view of the clear color category of the scraps, a method of selecting primary and secondary colors is designed, and the material feature histogram is designed to represent the material feature by using the idea of the main color histogram. Aiming at the feature that the left and right edge matching of the fragment image is considered, a left and right edge extraction algorithm based on Canny algorithm is proposed, and the left and right edge points set is used to represent the edge feature. In this paper, the SURF algorithm and the maximum and minimum clustering algorithm are studied and synthesized, and the glyph feature extraction algorithm is designed, and the glyph feature is represented by the feature set of each word. Secondly, the definition of the difference degree of each feature is studied, and the definition of the difference degree of the remnant image and the retrieval algorithm based on the difference degree of the remnant image are put forward synthetically. The method of defining material difference by EMD distance is studied. The method of unified image edge reference is designed, and the edge difference is defined by the Hausdorff distance. The method of setting up the histogram of glyph direction vector is designed, and then the difference degree of glyph is defined by EMD distance. On the basis of the above three methods, the definition of the difference degree of the scraper image is proposed, and the retrieval algorithm based on the difference degree is proposed. The algorithm inputs a fragment image set, then clusters all the images according to the material characteristics, then calculates the comprehensive differences between all the images in the clustering, and filters them according to the information of the dynasty. Finally, the corresponding matching fragments of each fragment in the set of fragments are outputted. Finally, according to the demand of the second phase of the national key project, the paper designs the main module of the second phase of the database of Dunhuang relic, and realizes a series of advanced functions. The related fragment retrieval algorithm was applied to Dunhuang sequel database project, and the main function of the second phase project was completed.
【學(xué)位授予單位】:浙江大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:TP391.41
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 劉軍;周明全;耿國(guó)華;李姬俊男;;基于輪廓與斷面匹配的秦俑碎片拼接方法[J];計(jì)算機(jī)工程;2014年01期
2 韓春平;;敦煌遺書(shū)與數(shù)字化[J];敦煌學(xué)輯刊;2013年04期
3 鄭蓓蓓;郭立本;;改進(jìn)的遺傳算法應(yīng)用于碎片拼接[J];計(jì)算機(jī)與現(xiàn)代化;2011年05期
4 方廣,
本文編號(hào):1938112
本文鏈接:http://sikaile.net/shoufeilunwen/xixikjs/1938112.html
最近更新
教材專(zhuān)著