網(wǎng)頁中文簡繁體即時翻譯插件的設(shè)計與實現(xiàn)
本文選題:網(wǎng)頁 + 簡繁體; 參考:《內(nèi)蒙古大學(xué)》2014年碩士論文
【摘要】:漢語言文字已經(jīng)有幾千年的歷史,它不僅是中華文化的傳承,更是華人之間乃至與世界各國之間交流的工具。由于歷史原因,港臺地區(qū)和大陸地區(qū)仍然使用不同的中文系統(tǒng),這給兩岸三地的中文信息的溝通與交流造成巨大的障礙。隨著網(wǎng)絡(luò)技術(shù)的發(fā)展,為了給信息交換構(gòu)筑更好的平臺,瀏覽器成為大家上網(wǎng)的必須軟件。 本文以瀏覽器插件的形式實現(xiàn)網(wǎng)頁中文簡繁體轉(zhuǎn)換,由于IE瀏覽器占有較高的市場份額,所以選擇現(xiàn)在最為流行的IE1O。 常見的網(wǎng)頁中文編碼方式有GB2312、GBK、UTF-8、BIG-5,中國大陸多以GB2312、GBK、UTF-8為主,而臺灣、香港及部分海外華人地區(qū)多采用BIG-5。其中,GB2312字符集僅包含簡體字,GBK和UTF-8可以同時顯示簡體字和繁體字,BIG-5字符集僅包含繁體字。本文通過對各類編碼方式的研究與分析,將網(wǎng)頁中文的簡繁體轉(zhuǎn)換具體分為兩大類:同種編碼內(nèi)的簡繁體轉(zhuǎn)換和不同編碼之間的簡繁體轉(zhuǎn)換。不同字符之間的編碼是不相同的,要實現(xiàn)不同字符之間的編碼轉(zhuǎn)換必須在這兩個字符之間建立用于翻譯的雙向索引,這就需要借助一些現(xiàn)有的編碼轉(zhuǎn)換和簡繁體轉(zhuǎn)換工具進行查詢和批量轉(zhuǎn)換。考慮到轉(zhuǎn)換效率的問題,本文采用將簡體編碼和繁體編碼分開存儲的方式,并采用高效的哈希算法進行查找替換。在插件注冊好,當(dāng)用戶瀏覽網(wǎng)頁時選擇好網(wǎng)頁要顯示的中文方式后,系統(tǒng)會自動抓取網(wǎng)頁文檔內(nèi)容、識別網(wǎng)頁編碼方式、自行判斷簡繁體轉(zhuǎn)換方案并進行轉(zhuǎn)換,最后再將翻譯好的網(wǎng)頁返回。
[Abstract]:The Chinese language has a history of thousands of years. It is not only the inheritance of Chinese culture, but also the tool of communication between Chinese and other countries. Due to historical reasons, different Chinese systems are still used in Hong Kong and Taiwan and the mainland, which creates a huge obstacle to the communication and exchange of Chinese information between the two sides of the Taiwan Strait and the mainland. With the development of network technology, in order to build a better platform for information exchange, browser becomes the necessary software for everyone to surf the Internet. This paper uses the browser plug-in to realize the conversion of traditional Chinese characters of web pages. Because IE browser has a high market share, we choose the most popular IE 1O. The common Chinese coding methods for web pages are GB2312 / GBKUF-8 / BIG-5, while in mainland China, GB2312 / GBK/ UUTF-8 is the main coding method, while BIG-5 is widely used in Taiwan, Hong Kong and some overseas Chinese regions. The GB 2312 character set contains only simplified characters GBK and UTF-8 which can display both simplified and traditional characters. The BIG-5 character set contains only traditional characters. Based on the research and analysis of all kinds of coding methods, this paper divides the simplified Chinese conversion of web pages into two categories: the conversion of simplified traditional Chinese within the same coding and the conversion of simplified and traditional Chinese between different codes. Encoding varies from character to character, and to achieve encoding conversion between different characters, you must establish a bidirectional index between the two characters for translation, This requires querying and batch conversion with some existing coding and simplified conversion tools. Considering the efficiency of conversion, the simplified and traditional coding are stored separately, and the efficient hash algorithm is used to find and replace them. After registration of the plug-in, when the user chooses the Chinese way to display the web page, the system automatically grabs the document content of the web page, recognizes the coding method of the web page, judges the simple and complicated conversion scheme and converts it by itself. Finally, the translated web page will be returned.
【學(xué)位授予單位】:內(nèi)蒙古大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.092;TP391.1
【參考文獻】
相關(guān)期刊論文 前10條
1 王秀珍;;GBK內(nèi)碼轉(zhuǎn)換的設(shè)計與實踐[J];長春師范學(xué)院學(xué)報;2006年08期
2 肖民杰;;利用GB18030字庫實現(xiàn)JIS X 0208編碼的日文漢字及BIG5編碼的漢字輸出[J];福建電腦;2006年02期
3 馮霞;;中文繁簡轉(zhuǎn)換及其轉(zhuǎn)換工具[J];電腦知識與技術(shù)(學(xué)術(shù)交流);2007年12期
4 王娟;郭永沖;王強;;基于BHO的網(wǎng)絡(luò)隱蔽通道研究[J];計算機工程;2009年05期
5 鹿文鵬,薛若娟;Unicode與UTF-8編碼轉(zhuǎn)換方法研究[J];計算機時代;2005年09期
6 黨春;段汕;;Internet Explorer插件開發(fā)技術(shù)研究[J];科技創(chuàng)業(yè)月刊;2007年03期
7 王立軍;王曉明;吳健;;簡繁對應(yīng)關(guān)系與簡繁轉(zhuǎn)換[J];中文信息學(xué)報;2013年04期
8 徐研;張偉;;基于BHO的淘寶網(wǎng)賬戶自動登錄系統(tǒng)研究與實現(xiàn)[J];軟件導(dǎo)刊;2011年04期
9 辛春生,孫玉芳;簡繁漢字轉(zhuǎn)換系統(tǒng)的設(shè)計與實現(xiàn)[J];軟件學(xué)報;2000年11期
10 成亞萍;張?zhí)?;基于GB2312國標(biāo)碼的Word文檔零水印算法[J];計算機工程與設(shè)計;2009年21期
,本文編號:1775114
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1775114.html