天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于關(guān)系信息熵的特征選擇

發(fā)布時(shí)間:2018-02-28 12:22

  本文關(guān)鍵詞: 信息熵 模糊集 粗糙集 鄰域關(guān)系 模糊關(guān)系 出處:《渤海大學(xué)》2016年碩士論文 論文類型:學(xué)位論文


【摘要】:在信息爆炸的今天,網(wǎng)絡(luò)信息技術(shù)快速發(fā)展,各種領(lǐng)域的大數(shù)據(jù)層出不窮、雜亂無(wú)章。對(duì)于大數(shù)據(jù)的處理已經(jīng)成為數(shù)據(jù)挖掘方面的焦點(diǎn)問(wèn)題。面對(duì)這樣復(fù)雜的數(shù)據(jù),很多都是不確定的或者是模糊的,這就需要人們從中獲取用價(jià)值的信息。當(dāng)遇到信息量特別大的數(shù)據(jù)時(shí),需要采取適當(dāng)?shù)臄?shù)據(jù)分析方法對(duì)其進(jìn)行分類和知識(shí)約簡(jiǎn)。Rough集理論和Fuzzy集理論是用來(lái)處理不確定性的數(shù)學(xué)工具,它們可以用來(lái)處理數(shù)據(jù)的不確定性和數(shù)據(jù)的模糊性。近些年來(lái),這些理論已經(jīng)在數(shù)據(jù)挖掘、機(jī)器學(xué)習(xí)、模式識(shí)別等方面占據(jù)了不小的地位,成為很多學(xué)者的研究方向,并且還在被拓寬到多個(gè)領(lǐng)域,取得了很多實(shí)際成果。本文總體思想是:將Shannon熵的理論與粗糙集理論基礎(chǔ)知識(shí)相結(jié)合,分別提出了鄰域關(guān)系信息熵、模糊關(guān)系信息熵等概念,對(duì)它們的性質(zhì)進(jìn)行了詳細(xì)討論,并進(jìn)行數(shù)據(jù)實(shí)驗(yàn)分析。具體工作如下:1.鄰域是數(shù)據(jù)的分類與學(xué)習(xí)中最重要的概念之一,用來(lái)區(qū)分不同決策的樣本。在本文中,提出了鄰域關(guān)系熵,用來(lái)刻畫一個(gè)鄰域關(guān)系的不確定性,它反映了一個(gè)特征子集的區(qū)分能力。本文的鄰域關(guān)系熵不同于以往的鄰域熵,鄰域關(guān)系熵是通過(guò)鄰域關(guān)系的基數(shù)定義的,而不是通過(guò)計(jì)算鄰域相似類的基數(shù)而得到的。為了描述由于特征子集的變化而引起的數(shù)據(jù)不確定信息的變化,提出了鄰域關(guān)系聯(lián)合熵、條件鄰域關(guān)系熵、鄰域關(guān)系互信息等概念。另外,在這些測(cè)度中引入?yún)?shù),使得它們更利于分析實(shí)值數(shù)據(jù);谝陨系牟淮_定性度量,定義了刻畫特征子集的屬性重要度,并設(shè)計(jì)了特征選擇貪心算法,最后利用UCI標(biāo)準(zhǔn)數(shù)據(jù)集進(jìn)行實(shí)驗(yàn)分析,與現(xiàn)有算法進(jìn)行比較。實(shí)驗(yàn)結(jié)果表明,基于鄰域關(guān)系熵的特征選擇算法優(yōu)于其他一些經(jīng)典算法。2.利用距離函數(shù)重新定義了模糊關(guān)系,提出模糊關(guān)系聯(lián)合熵、條件模糊關(guān)系熵、模糊關(guān)系互信息等概念,并對(duì)其性質(zhì)進(jìn)行了討論。另外,討論了鄰域半徑和屬性子集對(duì)模糊關(guān)系熵的影響;谝陨侠碚摰难芯颗c論證,設(shè)計(jì)了基于模糊關(guān)系熵的特征選擇算法并進(jìn)行實(shí)驗(yàn)驗(yàn)證分析。實(shí)驗(yàn)證明:與模糊信息熵相比,本文提出的算法不僅減少了屬性約簡(jiǎn)的復(fù)雜度而且提高了樣本的分類精度,同時(shí)在一定程度上也縮減了約簡(jiǎn)時(shí)間,具有一定的實(shí)際意義。
[Abstract]:In today's information explosion, with the rapid development of network information technology, big data, in various fields, has emerged in endlessly and disorderly. The processing of big data has become a focal point in data mining. Facing such complicated data, Many of them are uncertain or vague, which requires people to get valuable information from it. When it comes to data with a particularly large amount of information, Classification and knowledge reduction. Rough set theory and Fuzzy set theory are mathematical tools for dealing with uncertainty, which can be used to deal with data uncertainty and data fuzziness. These theories have occupied a large position in data mining, machine learning, pattern recognition and so on. They have become the research direction of many scholars, and have also been broadened to many fields. The general idea of this paper is to combine the theory of Shannon entropy with the basic knowledge of rough set theory, and put forward the concepts of neighborhood information entropy and fuzzy relation information entropy respectively, and discuss their properties in detail. The detailed work is as follows: 1. Neighborhood is one of the most important concepts in data classification and learning, which is used to distinguish different decision samples. In this paper, the neighborhood entropy is proposed. In this paper, the entropy of neighborhood relation is different from the former entropy of neighborhood, and the entropy of neighborhood relation is defined by the cardinality of neighborhood relation. In order to describe the change of uncertain information caused by the change of feature subset, the joint entropy of neighborhood relation and the entropy of conditional neighborhood relation are proposed. In addition, the parameters are introduced into these measures to facilitate the analysis of real value data. Based on the above uncertainty measures, the attribute importance of characterizing feature subsets is defined. The feature selection greedy algorithm is designed. Finally, the UCI standard data set is used for experimental analysis, which is compared with the existing algorithms. The experimental results show that, The feature selection algorithm based on neighborhood entropy is superior to other classical algorithms .2.Using distance function to redefine fuzzy relation, the concepts of joint entropy of fuzzy relation, conditional fuzzy relation entropy and mutual information of fuzzy relation are proposed. In addition, the influence of neighborhood radius and attribute subset on the entropy of fuzzy relation is discussed. A feature selection algorithm based on fuzzy relation entropy is designed and verified by experiments. Experimental results show that compared with fuzzy information entropy, the proposed algorithm not only reduces the complexity of attribute reduction, but also improves the classification accuracy of samples. At the same time, the reduction time is also reduced to a certain extent, which has certain practical significance.
【學(xué)位授予單位】:渤海大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:O236
,

本文編號(hào):1547290

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/yysx/1547290.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶fb8ca***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com