天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

藏文網(wǎng)頁(yè)除噪技術(shù)研究

發(fā)布時(shí)間:2018-12-07 17:24
【摘要】: 隨著網(wǎng)絡(luò)信息技術(shù)的飛速發(fā)展以及藏族地區(qū)計(jì)算機(jī)應(yīng)用技術(shù)的不斷提高,越來(lái)越多的藏文網(wǎng)頁(yè)出現(xiàn)在互聯(lián)網(wǎng)中,使我們更多地了解到廣大藏族同胞的文化生活和民風(fēng)民俗,增進(jìn)了我們之間的交流,推動(dòng)了藏族地區(qū)的發(fā)展。然而,在藏文網(wǎng)頁(yè)的有用信息周圍往往夾雜著很多噪聲信息,例如彈出的廣告、多余的圖片以及一些無(wú)關(guān)的鏈接等。這些信息嚴(yán)重影響了藏文網(wǎng)頁(yè)中有用信息的獲取效率,如何有效地去除這些無(wú)用的噪聲信息已經(jīng)成為藏文信息處理領(lǐng)域一個(gè)亟待解決的問(wèn)題。本文分析了大量當(dāng)前存在的網(wǎng)頁(yè)除噪技術(shù)以及藏文網(wǎng)頁(yè)的內(nèi)容類型,研究了DOM技術(shù)的特點(diǎn)和一些主要的操作規(guī)范,在此基礎(chǔ)上提出了一種基于DOM和顯示屬性相結(jié)合的藏文網(wǎng)頁(yè)除噪技術(shù)。本技術(shù)通過(guò)分析人們?cè)陂喿x瀏覽網(wǎng)頁(yè)內(nèi)容時(shí)的潛在行為,得出了網(wǎng)頁(yè)元素從顯示屬性上分塊的特征,使用了一種顯示屬性分塊模型,并通過(guò)示例頁(yè)面展示了此模型的具體應(yīng)用,通過(guò)把藏文網(wǎng)頁(yè)解析成DOM樹(shù)結(jié)構(gòu),結(jié)合顯示屬性和分塊模型對(duì)頁(yè)面內(nèi)容進(jìn)行分析,經(jīng)過(guò)一系列的顯示塊劃分、DOM節(jié)點(diǎn)的合并與刪除、DOM樹(shù)簡(jiǎn)化對(duì)藏文頁(yè)面進(jìn)行去噪處理。 本文除噪技術(shù)的核心步驟是提取網(wǎng)頁(yè)DOM樹(shù)節(jié)點(diǎn)的顯示屬性,因此必須實(shí)現(xiàn)藏文網(wǎng)頁(yè)的DOM解析。在深入研究了大量網(wǎng)頁(yè)解析技術(shù)的基礎(chǔ)上,本文使用Java程序設(shè)計(jì)語(yǔ)言在Eclipse平臺(tái)上開(kāi)發(fā)出了一個(gè)藏文網(wǎng)頁(yè)DOM解析器,可以把一個(gè)藏文HTML頁(yè)面解析成一棵DOM節(jié)點(diǎn)樹(shù),每個(gè)節(jié)點(diǎn)都完整地包含了HTML文檔的標(biāo)簽屬性,可以根據(jù)需要隨機(jī)提取網(wǎng)頁(yè)各信息塊的顯示屬性。本解析器還具有簡(jiǎn)單的瀏覽器功能,可以直接通過(guò)輸入網(wǎng)址來(lái)解析一個(gè)藏文網(wǎng)頁(yè),也可以通過(guò)把網(wǎng)頁(yè)源碼下載到本地計(jì)算機(jī)上進(jìn)行解析,具有很強(qiáng)的標(biāo)簽識(shí)別和修復(fù)能力,適用于大多數(shù)藏文網(wǎng)頁(yè)。同時(shí),通過(guò)分析藏文網(wǎng)頁(yè)信息的特征,本文提出了依據(jù)藏文信息音節(jié)點(diǎn)出現(xiàn)頻率和網(wǎng)頁(yè)超鏈率進(jìn)行噪聲信息塊識(shí)別的方法,可以有效地識(shí)別出大部分藏文網(wǎng)頁(yè)中包含的噪聲信息塊。最后,對(duì)保留的有用信息塊進(jìn)行DOM節(jié)點(diǎn)過(guò)濾可以完成對(duì)藏文網(wǎng)頁(yè)的除噪。經(jīng)過(guò)大量測(cè)試,本文的除噪技術(shù)可以有效地去除藏文網(wǎng)頁(yè)中的大多數(shù)噪聲信息,具有很好的實(shí)用價(jià)值和應(yīng)用前景。
[Abstract]:With the rapid development of network information technology and the continuous improvement of computer application technology in Tibetan areas, more and more Tibetan web pages appear on the Internet, which makes us know more about the cultural life and folk customs of the Tibetan compatriots. This has enhanced exchanges between us and promoted the development of Tibetan areas. However, the useful information of Tibetan web pages is often surrounded by a lot of noise information, such as pop-up ads, redundant pictures and irrelevant links. This information seriously affects the efficiency of obtaining useful information in Tibetan web pages. How to effectively remove these useless noise information has become an urgent problem in the field of Tibetan information processing. This paper analyzes a large number of existing web page denoising techniques and the content types of Tibetan web pages, and studies the characteristics of DOM technology and some main operating specifications. On this basis, a Tibetan web page denoising technology based on DOM and display attributes is proposed. By analyzing the potential behavior of people when reading and browsing the web content, the technology obtains the feature that the elements of the web page are divided into blocks from the display attributes, and uses a model to divide the display attributes into blocks, and shows the concrete application of the model through an example page. Through parsing Tibetan web pages into DOM tree structure, combining display attribute and block model to analyze the content of the page, after a series of display blocks partition, DOM node merging and deleting, DOM tree simplifies the denoising processing of Tibetan pages. In this paper, the key step of the denoising technique is to extract the display attributes of the DOM tree node of the web page, so it is necessary to realize the DOM parsing of the Tibetan web page. Based on the deep study of a large number of web page parsing techniques, a Tibetan web page DOM parser is developed on the Eclipse platform by using Java programming language, which can parse a Tibetan HTML page into a DOM node tree. Each node contains the label attributes of HTML documents, and it can randomly extract the display attributes of each information block of the web page according to the need. The parser also has a simple browser function, which can directly parse a Tibetan web page by entering a URL, or can be parsed by downloading the source code of the web page to a local computer. It has a strong ability to identify and repair tags. Suitable for most Tibetan web pages. At the same time, by analyzing the characteristics of Tibetan web page information, this paper proposes a method to identify the noise information blocks based on the frequency of syllable points of Tibetan information and the hyperchain rate of web pages. It can effectively identify the noise information blocks contained in most Tibetan web pages. Finally, the DOM node filtering of reserved useful information blocks can eliminate the noise of Tibetan web pages. After a lot of tests, the denoising technology in this paper can effectively remove most of the noise information from Tibetan web pages, which has good practical value and application prospect.
【學(xué)位授予單位】:西北民族大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2010
【分類號(hào)】:TP393.092

【參考文獻(xiàn)】

相關(guān)期刊論文 前8條

1 韓家煒,孟小峰,王靜,李盛恩;Web挖掘研究[J];計(jì)算機(jī)研究與發(fā)展;2001年04期

2 王琦,唐世渭,楊冬青,王騰蛟;基于DOM的網(wǎng)頁(yè)主題信息自動(dòng)提取[J];計(jì)算機(jī)研究與發(fā)展;2004年10期

3 常育紅,姜哲,朱小燕;基于標(biāo)記樹(shù)表示方法的頁(yè)面結(jié)構(gòu)分析[J];計(jì)算機(jī)工程與應(yīng)用;2004年16期

4 李朝;彭宏;葉蘇南;張歡;楊親遙;;基于DOM樹(shù)的可適應(yīng)性Web信息抽取[J];計(jì)算機(jī)科學(xué);2009年07期

5 珠杰;歐珠;格桑多吉;;基于DOM修剪的藏文Web信息提取[J];計(jì)算機(jī)工程;2008年24期

6 宋睿華,馬少平,陳剛,李景陽(yáng);一種提高中文搜索引擎檢索質(zhì)量的HTML解析方法[J];中文信息學(xué)報(bào);2003年04期

7 楊曦,高功步;HTML,DHTML,VRML,XML功能分析與比較研究[J];現(xiàn)代電子技術(shù);2003年10期

8 于洪志,喇秉軍,何向真;Web環(huán)境下藏文信息處理技術(shù)[J];西北民族大學(xué)學(xué)報(bào)(自然科學(xué)版);2005年01期



本文編號(hào):2367555

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/wenyilunwen/guanggaoshejilunwen/2367555.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶d5ddf***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
国产日韩欧美一区二区| 国产老熟女超碰一区二区三区| 日本三区不卡高清更新二区| 亚洲伊人久久精品国产| 日韩精品区欧美在线一区| 中文字幕中文字幕在线十八区| 亚洲精品成人福利在线| 久久99亚洲小姐精品综合| 欧美日韩一区二区综合| 欧美精品激情视频一区| 日本在线不卡高清欧美| 欧美一区二区三区十区| 国产精品午夜福利在线观看| 精品人妻一区二区三区四区久久| 色无极东京热男人的天堂| 日本高清不卡在线一区| 中国日韩一级黄色大片| 日韩一区二区三区久久| 青青操视频在线观看国产| 九九蜜桃视频香蕉视频| 国产精品第一香蕉视频| 久草视频在线视频在线观看| 欧美日韩人妻中文一区二区| 久久精品亚洲欧美日韩| 国产专区亚洲专区久久| 在线观看视频日韩精品| 国产高清精品福利私拍| 日本女优一区二区三区免费| 欧美日韩国内一区二区| 国产成人亚洲综合色就色| 日韩欧美在线看一卡一卡| 不卡中文字幕在线视频| 少妇人妻中出中文字幕| 日韩精品人妻少妇一区二区| 五月婷婷六月丁香亚洲| 欧美日韩精品一区二区三区不卡| 亚洲av专区在线观看| 亚洲国产香蕉视频在线观看| 国产亚洲欧美日韩国亚语| 国产一区日韩二区欧美| 日韩少妇人妻中文字幕|