Web視頻信息提取研究

發(fā)布時間：2018-07-17 16:28

【摘要】：在這個信息的時代,網(wǎng)絡(luò)信息量急劇增加,像百度、Google等通用搜索引擎越來越體會到龐大數(shù)據(jù)庫所帶來的查找速度慢、硬件要求高等壓力,除此之外,在查找準(zhǔn)確率、統(tǒng)一存儲、統(tǒng)一顯示方面,通用搜索引擎也存在不小的難題,在這種環(huán)境下,關(guān)注于特定領(lǐng)域的垂直搜索引擎蓬勃發(fā)展起來,它沒有通用搜索引擎擁有的廣度,但它避免了上述缺點。最近這些年,視頻網(wǎng)站如雨后春筍般出現(xiàn)在網(wǎng)絡(luò)用戶面前,由于各視頻網(wǎng)站的顯示風(fēng)格和視頻數(shù)據(jù)庫不盡相同,所以如何方便、準(zhǔn)確地反饋給用戶需要的視頻是當(dāng)今需要解決的問題。另外,有些不法商家、用戶在網(wǎng)絡(luò)上散布扭曲事實視頻或色情視頻等,這對公眾產(chǎn)生了不良影響,相關(guān)管理部門需要統(tǒng)一檢索網(wǎng)絡(luò)視頻的工具。雖然現(xiàn)在有些搜索引擎與某些視頻網(wǎng)站合作,通過傳遞視頻相關(guān)信息的方式達(dá)到視頻的統(tǒng)一檢索,但是參與合作的都是較大的視頻網(wǎng)站,所以要達(dá)到更大范圍的視頻檢索就需要使用Web視頻信息提取。作為垂直搜索引擎和Web視頻統(tǒng)一檢索的交集,Web視頻信息提取得到了人們的重視并將發(fā)揮更大的作用,然而在實現(xiàn)過程中,現(xiàn)有的一些網(wǎng)頁分類方法和網(wǎng)頁凈化方法并沒有充分考慮Web視頻網(wǎng)頁的特點,這就造成了效果不佳的困境。本文從Web視頻網(wǎng)站實際出發(fā),首先通過分析視頻網(wǎng)站上網(wǎng)頁的分類,得出通過對視頻播放頁進(jìn)行信息提取可以得到很好效果的結(jié)論,然后根據(jù)視頻播放頁的特點描述了通過模板、視覺特征、特征腳本等信息進(jìn)行網(wǎng)頁分類的方法,最后在網(wǎng)頁凈化方面,可以將視頻播放頁的噪聲分為三類：背景噪聲、隨機噪聲和殘留噪聲,可以分別通過模板、網(wǎng)頁結(jié)構(gòu)和語義分析進(jìn)行消除。通過實驗對比、分析,也證明了在Web視頻信息提取中,本文描述的網(wǎng)頁分類和網(wǎng)頁凈化方法能夠達(dá)到很好的效果。
[Abstract]:In this era of information, the amount of information on the network has increased dramatically. General search engines such as Baidu and Google have increasingly realized the pressure of slow search speed and high hardware requirements brought by large databases. In addition, they are looking for accuracy and storing them uniformly. In the unified display aspect, the common search engine also has the big difficulty, in this environment, the vertical search engine that focuses on the specific domain flourishes, it does not have the breadth which the general search engine has, but it avoids the above shortcoming. In recent years, video websites have sprung up in front of Internet users. Because the display styles and video databases of different video websites are different, how convenient are they? Accurate feedback to users needs video is the problem that needs to be solved today. In addition, some illegal businesses, users on the Internet to distribute distorted fact video or pornographic video, which has a negative impact on the public, relevant management departments need to unify the retrieval of network video tools. Although some search engines now cooperate with some video websites to achieve unified video retrieval by means of transmitting video related information, they are all involved in the cooperation of larger video websites. Therefore, to achieve a wider range of video retrieval, we need to use Web video information extraction. As a vertical search engine and Web video retrieval, intersecting Web video information extraction has been paid more attention to and will play a more important role. However, in the process of implementation, Some existing web page classification methods and page purification methods do not fully take into account the characteristics of Web video pages, which has resulted in a difficult situation. Starting from the reality of the web video website, this paper first analyzes the classification of the web page on the video website, and draws the conclusion that the information extraction of the video playing page can get a good effect. Then, according to the characteristics of video playing pages, the paper describes the methods of classifying web pages by template, visual features, feature scripts, etc. Finally, in the aspect of page purification, the noise of video playing pages can be divided into three categories: background noise, and so on. Random noise and residual noise can be eliminated by template, page structure and semantic analysis respectively. Through experimental comparison and analysis, it is also proved that the methods of web page classification and page purification described in this paper can achieve good results in Web video information extraction.
【學(xué)位授予單位】：武漢理工大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2013
【分類號】：TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文前6條

1 胡軍偉;秦奕青;張偉;;正則表達(dá)式在Web信息抽取中的應(yīng)用[J];北京信息科技大學(xué)學(xué)報(自然科學(xué)版);2011年06期

2 黃子越;萬常選;;XML檢索中基于聚類的查詢詞擴展[J];電子科技大學(xué)學(xué)報;2009年S1期

3 張鑫;陳梅;王翰虎;王嫣然;;基于視覺特征和領(lǐng)域本體的Web信息抽取[J];計算機技術(shù)與發(fā)展;2011年02期

4 陳旭春 ,趙明生;分布式多搜索引擎系統(tǒng)的研究與實現(xiàn)[J];微計算機信息;2005年20期

5 李志義;;網(wǎng)絡(luò)爬蟲的優(yōu)化策略探略[J];現(xiàn)代情報;2011年10期

6 易榮鋒;朱六璋;尹文科;;互聯(lián)網(wǎng)視頻摘要信息自動抽取[J];計算機系統(tǒng)應(yīng)用;2010年10期

相關(guān)碩士學(xué)位論文前4條

1 張瑞雪;基于DOM樹的網(wǎng)頁相似度研究與應(yīng)用[D];大連理工大學(xué);2011年

2 李少波;支持語義的分布式視頻檢索系統(tǒng)的設(shè)計與實現(xiàn)[D];中國科學(xué)技術(shù)大學(xué);2011年

3 呂韓飛;主題（topical）crawler及其應(yīng)用——主題搜索引擎[D];浙江大學(xué);2005年

4 袁宇麗;基于HTML網(wǎng)頁的Web信息提取研究[D];電子科技大學(xué);2006年

，

本文編號：2130240

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2130240.html

上一篇：面向XML文檔的模糊檢索排序模型
下一篇：論低成本鎖定目標(biāo)客戶的搜索引擎營銷

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

Web視頻信息提取研究