天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

多視圖學(xué)習(xí)在垃圾網(wǎng)頁檢測中的應(yīng)用研究

發(fā)布時間:2018-04-27 12:03

  本文選題:多視圖學(xué)習(xí) + 垃圾網(wǎng)頁檢測; 參考:《山東師范大學(xué)》2014年碩士論文


【摘要】:現(xiàn)在網(wǎng)絡(luò)極大地改變了人們表達自己和與他人互動的方式,已經(jīng)成為最主要的信息檢索方式。正因如此,向HTML頁面或其他網(wǎng)絡(luò)文件添加信息也變得越來越容易,同時用戶就會更難分辨準(zhǔn)確和不準(zhǔn)確的信息或可信賴和不可靠的信息,因此創(chuàng)建一個有效的垃圾網(wǎng)頁檢測方法是當(dāng)前面對的一大挑戰(zhàn)。如今垃圾網(wǎng)頁檢測的主要工作在于檢測基于內(nèi)容作弊和鏈接作弊的垃圾網(wǎng)頁,F(xiàn)有垃圾網(wǎng)頁的檢測方法通常利用網(wǎng)頁單一視圖的特征對其是否屬于垃圾網(wǎng)頁進行分類,而將垃圾網(wǎng)頁兩個方面的特征同時用于檢測的多視圖學(xué)習(xí)手段,可以使垃圾網(wǎng)頁檢測問題更為全面。 本文圍繞多視圖學(xué)習(xí),針對垃圾網(wǎng)頁檢測的問題,對多視圖學(xué)習(xí)的特征提取方法、分類方法以及網(wǎng)頁具體鏈接結(jié)構(gòu)等進行研究,具體研究成果如下: (1)將垃圾網(wǎng)頁數(shù)據(jù)集基于內(nèi)容和鏈接的特征看作此檢測問題的兩個不同的視圖,首先應(yīng)用典型相關(guān)分析和其他改進方法提取特征,用轉(zhuǎn)換矩陣得到兩視圖上相關(guān)性最大的投影方向的特征。然后使用不同的針對兩視圖特征的組合方式將兩視圖特征合為一個特征,使用新的單視圖特征來訓(xùn)練分類器進行分類。實驗結(jié)果顯示把垃圾網(wǎng)頁檢測作為多視圖分類問題即看成兩個視圖的數(shù)據(jù)使用典型相關(guān)分析方法,可提高分類精度。 (2)由于垃圾網(wǎng)頁檢測問題中只有少量標(biāo)記網(wǎng)頁,因此可使用半監(jiān)督協(xié)同訓(xùn)練方法進行垃圾網(wǎng)頁檢測。將網(wǎng)頁特征分為內(nèi)容和鏈接兩個視圖。在進行具體的分類步驟之前使用獨立成分分析,提取兩個視圖特征的獨立成分,具體的分類步驟是由協(xié)同訓(xùn)練實現(xiàn)的。實驗結(jié)果顯示這種特征提取和半監(jiān)督分類的組合能夠提高垃圾網(wǎng)頁檢測精度,對兩個視圖分別進行獨立成分分析也更為有效。 (3)利用網(wǎng)頁鏈接結(jié)構(gòu)修改SVM分類器,,首先利用直接鏈接矩陣和間接鏈接矩陣來構(gòu)建保持鏈接結(jié)構(gòu)的類內(nèi)散布矩陣,然后將網(wǎng)頁鏈接結(jié)構(gòu)組合到SVM分類器中來重新配置一個優(yōu)化問題。此方法在利用網(wǎng)頁鏈接信息方面具有優(yōu)勢。垃圾網(wǎng)頁數(shù)據(jù)集上的實驗結(jié)果表明將網(wǎng)頁鏈接結(jié)構(gòu)與SVM分類器組合可以顯著地優(yōu)于其他相關(guān)方法,實驗還顯示了分類準(zhǔn)確率隨間接鏈接步長的變化。 (4)通過嚴密考慮內(nèi)容與鏈接兩視圖特征的不同構(gòu)造和統(tǒng)計特性來解決這個問題。分別針對內(nèi)容及鏈接特征重構(gòu)特征提取方法PCA和LPP,然后將它們組合到本文的方法中,從多視圖表示的多視圖嵌入中提取出一個一致的模式。通過一個迭代算法,可以求出每個視圖的不同的嵌入表示以及從每個視圖到一致模式的轉(zhuǎn)換矩陣。同時提供了一個計算測試樣本的一致模式的方法。WEBSPAM-UK2006和WEBSPAM-UK2007數(shù)據(jù)集上的實驗結(jié)果顯示使用一致模式來解決垃圾網(wǎng)頁檢測問題優(yōu)于其他相關(guān)的降維方法。
[Abstract]:Nowadays, the Internet has greatly changed the way people express themselves and interact with others, and has become the most important way of information retrieval. As a result, it is becoming easier to add information to HTML pages or other web files, and it is becoming more difficult for users to distinguish between accurate and inaccurate information or trustworthy and unreliable information. Therefore, it is a great challenge to create an effective method for detecting spam pages. Nowadays, the main task of spam detection is to detect spam pages based on content cheating and link cheating. The existing detection methods of garbage pages usually use the features of a single view to classify whether they belong to garbage pages, while the features of the two aspects of garbage pages are used to detect the multi-view learning method at the same time. Can make the spam page detection problem more comprehensive. This paper focuses on multi-view learning, aiming at the problem of spam page detection, the feature extraction method, classification method and specific link structure of multi-view learning are studied. The specific research results are as follows: (1) considering the feature of garbage page dataset based on content and link as two different views of this detection problem, we first apply canonical correlation analysis and other improved methods to extract features. The transformation matrix is used to obtain the features of the projection direction with the greatest correlation between the two views. Then, two view features are combined into one feature by different combination methods for two view features, and a new single view feature is used to train the classifier for classification. The experimental results show that using the canonical correlation analysis method to treat garbage page detection as a multi-view classification problem can improve the classification accuracy. 2) since there are only a few tagged pages in the problem of spam page detection, semi-supervised cooperative training method can be used to detect spam pages. The page features are divided into two views: content and link. The independent component analysis (ICA) is used to extract the independent components of the two view features before the specific classification steps are implemented by cooperative training. The experimental results show that the combination of feature extraction and semi-supervised classification can improve the accuracy of garbage page detection, and the independent component analysis for the two views is also more effective. The SVM classifier is modified by using the link structure of the web page. Firstly, the direct link matrix and the indirect link matrix are used to construct the in-class scatter matrix that maintains the link structure. Then the web page link structure is combined into the SVM classifier to reconfigure an optimization problem. This method has advantages in utilizing web link information. The experimental results on the garbage data set show that the combination of the web page link structure and the SVM classifier can be significantly superior to other related methods. The experimental results also show that the classification accuracy varies with the indirect link step size. 4) this problem is solved by carefully considering the different structure and statistical characteristics of the features of the two views of content and link. The methods of feature extraction for content and link reconstruction are PCA and LPP respectively. Then they are combined into this method to extract a consistent pattern from multi-view embedding of multi-view representation. Through an iterative algorithm, the different embedded representations of each view and the transformation matrix from each view to the consistent mode can be obtained. The experimental results on WEBSPAM-UK2006 and WEBSPAM-UK2007 datasets show that using consistent mode to solve the problem of spam detection is better than other related dimensionality reduction methods.
【學(xué)位授予單位】:山東師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.092

【參考文獻】

相關(guān)期刊論文 前2條

1 楊竹青,李勇,胡德文;獨立成分分析方法綜述[J];自動化學(xué)報;2002年05期

2 陳曉紅;陳松燦;;監(jiān)督型局部保持的典型相關(guān)分析[J];小型微型計算機系統(tǒng);2010年08期

相關(guān)博士學(xué)位論文 前2條

1 孫廷凱;增強型典型相關(guān)分析研究與應(yīng)用[D];南京航空航天大學(xué);2006年

2 王嬌;多視圖的半監(jiān)督學(xué)習(xí)研究[D];北京交通大學(xué);2010年



本文編號:1810658

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1810658.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶c9932***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
国产日韩在线一二三区| 老熟妇乱视频一区二区| 中文字幕欧美精品人妻一区| 国产日韩精品激情在线观看| 亚洲精品av少妇在线观看| 91精品视频免费播放| 高清不卡一卡二卡区在线| 亚洲一级二级三级精品| 日本高清一道一二三区四五区| 欧美国产极品一区二区| 国内自拍偷拍福利视频| 国产又粗又猛又长又黄视频| 日本女优一色一伦一区二区三区| 日本一区二区三区久久娇喘| 日本二区三区在线播放| 91人妻人人做人碰人人九色| 色婷婷在线视频免费播放| 黄色国产自拍在线观看| 加勒比人妻精品一区二区| 国产爆操白丝美女在线观看| 色综合久久超碰色婷婷| 精品一区二区三区不卡少妇av| 内射精子视频欧美一区二区| 美女极度色诱视频在线观看| 精品少妇一区二区三区四区| 欧美日韩三区在线观看| 欧美日韩国产成人高潮| 国产成人精品在线播放| 日韩免费国产91在线| 免费播放一区二区三区四区| 日韩不卡一区二区三区色图| 婷婷亚洲综合五月天麻豆 | 国产午夜福利在线观看精品| 中文人妻精品一区二区三区四区| 一本久道久久综合中文字幕| 国产传媒精品视频一区| 国产一区二区三区av在线| 亚洲综合香蕉在线视频| 国产肥妇一区二区熟女精品| 在线视频免费看你懂的| 欧美日韩国产二三四区|