天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 搜索引擎論文 >

基于本體的視頻服務(wù)網(wǎng)站監(jiān)管技術(shù)研究

發(fā)布時(shí)間:2018-04-24 23:23

  本文選題:本體 + 本體自動(dòng)構(gòu)建; 參考:《中國(guó)科學(xué)技術(shù)大學(xué)》2013年博士論文


【摘要】:隨著網(wǎng)絡(luò)帶寬的提高、網(wǎng)絡(luò)用戶的增加,以及數(shù)碼產(chǎn)品的普及,網(wǎng)上視頻內(nèi)容日益豐富,收看網(wǎng)絡(luò)視頻的用戶急劇增多,視頻服務(wù)網(wǎng)站也不斷涌現(xiàn)。然而由于開放性、匿名性以及資源缺乏統(tǒng)一管理的特點(diǎn),互聯(lián)網(wǎng)在帶給人們便利的同時(shí),許多不良視頻服務(wù)網(wǎng)站也混入其中。這些不良視頻服務(wù)網(wǎng)站對(duì)青少年的健康成長(zhǎng)和社會(huì)的穩(wěn)定有著非常不利的影響。盡管國(guó)家已加大了打擊力度,然而事實(shí)上不良視頻服務(wù)網(wǎng)站仍然存在,且通過搜索引擎的幫助獲取的難度很低。因此如何自動(dòng)發(fā)現(xiàn)和準(zhǔn)確識(shí)別不良視頻服務(wù)網(wǎng)站從而對(duì)其進(jìn)行有效監(jiān)管成為了值得研究的問題。 目前視頻服務(wù)網(wǎng)站監(jiān)管主要存在的難點(diǎn)和問題包括:(1)視頻服務(wù)網(wǎng)站的自動(dòng)發(fā)現(xiàn),目前僅中國(guó)的網(wǎng)站數(shù)量就已經(jīng)達(dá)到230萬之多,如何從互聯(lián)網(wǎng)的海洋中自動(dòng)發(fā)現(xiàn)視頻服務(wù)網(wǎng)站成為了視頻服務(wù)網(wǎng)站監(jiān)管的重要問題;(2)支持網(wǎng)站健康性評(píng)估的領(lǐng)域本體自動(dòng)構(gòu)建技術(shù),不良視頻領(lǐng)域本體可以提供計(jì)算機(jī)可理解的不良視頻的語(yǔ)義描述,從而為后續(xù)的不良視頻網(wǎng)頁(yè)識(shí)別以及網(wǎng)站的健康性評(píng)估提供語(yǔ)義基礎(chǔ),傳統(tǒng)的領(lǐng)域本體自動(dòng)構(gòu)建方法多依賴于自然語(yǔ)言處理技術(shù),受限于自然語(yǔ)言處理工具的性能,構(gòu)建出的領(lǐng)域本體的質(zhì)量往往不高;(3)基于領(lǐng)域本體的網(wǎng)站健康性分析評(píng)估技術(shù),在擁有了不良視頻領(lǐng)域本體之后,如何設(shè)計(jì)出一種可以利用本體中的文字和結(jié)構(gòu)信息的網(wǎng)頁(yè)相關(guān)度計(jì)算方法從而準(zhǔn)確地計(jì)算出網(wǎng)站的健康性,就成為了值得研究的問題。針對(duì)這三個(gè)問題,本論文的主要研究工作和創(chuàng)新點(diǎn)如下: 1.提出了視頻服務(wù)網(wǎng)站的自動(dòng)發(fā)現(xiàn)方法。 針對(duì)視頻服務(wù)網(wǎng)站自動(dòng)發(fā)現(xiàn)的問題,本文首先提出了一種基于元搜索的視頻服務(wù)網(wǎng)站的自動(dòng)發(fā)現(xiàn)方法,該方法設(shè)計(jì)了一種關(guān)鍵詞更新和評(píng)價(jià)機(jī)制用以向元搜索系統(tǒng)提供高質(zhì)量的搜索關(guān)鍵詞,元搜索的結(jié)果將作為初始網(wǎng)站列表提供給主題爬行模塊以進(jìn)一步發(fā)現(xiàn)更多的視頻服務(wù)網(wǎng)站。通過分析網(wǎng)頁(yè)的標(biāo)簽特征和候選播放器的視覺特征,本文提出了一種基于多特征多策略的視頻播放頁(yè)識(shí)別方法。在確定一個(gè)視頻播放頁(yè)后,將該頁(yè)面存為播放頁(yè)模板,后續(xù)視頻播放頁(yè)的識(shí)別利用其與播放頁(yè)模板的相似性來判定。在對(duì)搜索過程所遇到的網(wǎng)頁(yè)及鏈接主題相關(guān)度的分析基礎(chǔ)上,本文提出了一種URL的預(yù)期剩余能量模型用以計(jì)算每個(gè)方向的搜索能量,進(jìn)而決定主題爬蟲的搜索方向和步長(zhǎng)。實(shí)驗(yàn)中,基于多特征多策略的視頻播放頁(yè)識(shí)別準(zhǔn)確率和召回率分別達(dá)到了99.21%和99.24%,而基于預(yù)期剩余能量模型的主題爬行算法則明顯優(yōu)于對(duì)比算法的性能。 2.提出了基于超鏈接結(jié)構(gòu)圖聚類的領(lǐng)域本體自動(dòng)構(gòu)建方法。 針對(duì)領(lǐng)域本體自動(dòng)構(gòu)建的問題,著重研究了領(lǐng)域概念的自動(dòng)識(shí)別和領(lǐng)域概念間同義/近義關(guān)系的自動(dòng)構(gòu)建方法。首先,本文提出了基于超鏈接結(jié)構(gòu)圖聚類的領(lǐng)域概念識(shí)別方法,該方法首先利用網(wǎng)絡(luò)爬蟲從指定的入口地址處深度受限廣度優(yōu)先遍歷Wiki頁(yè)面,構(gòu)建關(guān)于某特定領(lǐng)域的無向超鏈接結(jié)構(gòu)圖。然后利用得到的網(wǎng)頁(yè)數(shù)據(jù)庫(kù)和詞匯表構(gòu)建詞匯-文檔矩陣,使用潛在語(yǔ)義索引算法和余弦相似度計(jì)算節(jié)點(diǎn)間的相似度,將該相似度作為相應(yīng)邊的權(quán)重,再利用有權(quán)圖滲濾算法對(duì)有權(quán)無向鏈接結(jié)構(gòu)圖進(jìn)行聚類,并對(duì)聚類結(jié)果進(jìn)行評(píng)估,進(jìn)而得到相應(yīng)的領(lǐng)域概念。針對(duì)概念間同義/近義關(guān)系的自動(dòng)構(gòu)建,本文首先構(gòu)建鏈接-詞語(yǔ)的共現(xiàn)矩陣,使用余弦相似度度量,再使用自底向上的凝聚型層次聚類算法對(duì)詞語(yǔ)進(jìn)行聚類,從而得到詞語(yǔ)間的同義/近義關(guān)系。實(shí)驗(yàn)結(jié)果顯示,領(lǐng)域概念識(shí)別的準(zhǔn)確率在top-10階段接近96%,而同義/近義聯(lián)系的識(shí)別準(zhǔn)確率則接近90%。 3.提出了基于領(lǐng)域本體的網(wǎng)站健康性評(píng)估方法。 針對(duì)網(wǎng)站健康性的計(jì)算問題,本文提出了一種基于領(lǐng)域本體的網(wǎng)站健康性計(jì)算方法。傳統(tǒng)的網(wǎng)頁(yè)分類和相關(guān)性計(jì)算方法使用的文檔表示模型通常假設(shè)特征項(xiàng)之間是相互獨(dú)立的,并且特征詞的權(quán)值大都基于詞頻信息,忽略了詞的位置及上下文信息,而已有的本體分類系統(tǒng)只是將本體用于輔助分類的過程,無法有效利用到本體自身的結(jié)構(gòu)和文字信息。針對(duì)這些問題,本文提出了一種基于網(wǎng)頁(yè)概念樹和領(lǐng)域本體樹匹配的網(wǎng)頁(yè)健康性計(jì)算方法。該方法首先提出了一種可以不依賴于獨(dú)立性假設(shè)的新的網(wǎng)頁(yè)文檔表示模型,并在該模型的基礎(chǔ)上使用了一種可以利用到詞語(yǔ)的位置及上下文信息的詞語(yǔ)加權(quán)算法,最后在新的網(wǎng)頁(yè)表示模型的基礎(chǔ)上,提出了一種可以有效利用領(lǐng)域本體的結(jié)構(gòu)及文字信息的網(wǎng)頁(yè)健康性計(jì)算方法。實(shí)驗(yàn)結(jié)果顯示,該方法的不良網(wǎng)頁(yè)識(shí)別準(zhǔn)確率、召回率和F1值分別為96%、95.7%和95.8%,視頻服務(wù)網(wǎng)站健康性評(píng)估的準(zhǔn)確率則達(dá)到了95%。 上述方法已部分應(yīng)用于國(guó)家863項(xiàng)目“結(jié)合語(yǔ)義的視頻網(wǎng)站自動(dòng)發(fā)現(xiàn)與分析評(píng)估服務(wù)”課題中的主題聚集搜索和網(wǎng)站內(nèi)容分析評(píng)估部分,并將應(yīng)用于國(guó)家科技支撐計(jì)劃“增強(qiáng)型搜索系統(tǒng)架構(gòu)、關(guān)鍵技術(shù)及測(cè)試規(guī)范的研究”以及公安部重點(diǎn)研究計(jì)劃項(xiàng)目“多媒體服務(wù)網(wǎng)站監(jiān)管技術(shù)研究”之中。
[Abstract]:With the increase of network bandwidth , the increase of network users , and the popularization of digital products , the online video content is increasingly rich , and the users of network video are increasing rapidly , and the website of video service is also emerging . However , because of the openness , anonymity and lack of uniform management of resources , many poor video service websites have been mixed . However , the website of poor video service is still in existence , and the difficulty of getting through the help of the search engine is very low . Therefore , it is worth studying how to automatically discover and accurately identify the website of poor video service so as to effectively supervise it .

At present , the difficulties and problems existing in the video service website supervision include : ( 1 ) the automatic discovery of the video service website , the number of websites in China is now more than 2.3 million , and how to automatically discover the video service website from the sea of the Internet becomes an important issue of the video service website supervision ;
( 2 ) Support website health evaluation field ontology automatic construction technology , the poor video field ontology can provide the semantic description of the computer - understandable poor video , thus providing the semantic foundation for the subsequent poor video webpage recognition and the health evaluation of the website , the traditional domain ontology automatic construction method relies on natural language processing technology , is limited by the performance of the natural language processing tool , and the quality of the constructed field ontology is often not high ;
( 3 ) Based on the field ontology ' s website health analysis and evaluation technology , how to design a web page affinity calculation method which can utilize the text and the structure information in the ontology to calculate the health of the website accurately after having the main body of the poor video , has become a question worth studying . For these three problems , the main research and innovation points of this paper are as follows :

1 . The automatic discovery method of video service website is proposed .

This paper presents a method for automatically discovering video service websites based on meta - search . The method designs a keyword updating and evaluation mechanism to provide high - quality search keywords to the meta - search system .

2 . The automatic construction method of domain ontology based on hyperlink structure clustering is proposed .

In order to solve the problem of auto - construction of domain ontology , we focus on the automatic identification of domain concept and the automatic construction method of the same meaning / near - meaning relationship between domain concepts . First of all , this paper proposes a domain concept recognition method based on hyperlink structure clustering . Firstly , we construct vocabulary - document matrix from the depth - limited breadth of the specified portal address . Then , we use the obtained web page database and vocabulary to construct vocabulary - document matrix . Then , we use the right - graph percolation algorithm to cluster the words . The results show that the accuracy rate of domain concept recognition is close to 96 % in top - 10 stage , while the accuracy rate of synonymous / near - sense contact is close to 90 % .

3 . A health evaluation method based on domain ontology is proposed .

This paper presents a method for calculating the health of a website based on the domain ontology . A new method for calculating the health of the web page based on the concept tree of the web page and the contextual information is presented in this paper .

The above - mentioned methods have been applied to the topic aggregation search and website content analysis and evaluation part of the project " Automatic Discovery and Analysis Evaluation Service " of the National 863 Project , and will be applied to the National Science and Technology Support Plan " Enhanced Search System Architecture , Key Technologies and Test Specifications " and the " Research on Supervision Technology of Multimedia Service Website " of the key research program of the Ministry of Public Security .

【學(xué)位授予單位】:中國(guó)科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前5條

1 鄧志鴻,唐世渭,張銘,楊冬青,陳捷;Ontology研究綜述[J];北京大學(xué)學(xué)報(bào)(自然科學(xué)版);2002年05期

2 樊小輝;石晨光;;本體構(gòu)建研究綜述[J];艦船電子工程;2011年06期

3 楊博;劉大有;金弟;馬海賓;;復(fù)雜網(wǎng)絡(luò)聚類方法[J];軟件學(xué)報(bào);2009年01期

4 黃芳;劉友華;張克狀;李寅;;結(jié)合鏈接結(jié)構(gòu)和共現(xiàn)分析的同義詞自動(dòng)識(shí)別方法[J];現(xiàn)代情報(bào);2009年08期

5 易榮鋒;朱六璋;尹文科;;互聯(lián)網(wǎng)視頻摘要信息自動(dòng)抽取[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2010年10期

相關(guān)博士學(xué)位論文 前2條

1 杜文華;本體的構(gòu)建及其在數(shù)字圖書館中的應(yīng)用研究[D];武漢大學(xué);2005年

2 鐘美;基于Web的空間本體構(gòu)建方法研究[D];武漢大學(xué);2010年

相關(guān)碩士學(xué)位論文 前1條

1 易榮鋒;互聯(lián)網(wǎng)視頻信息獲取技術(shù)研究與實(shí)現(xiàn)[D];中國(guó)科學(xué)技術(shù)大學(xué);2010年

,

本文編號(hào):1798730

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1798730.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶8e4cd***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com