代價敏感的三支決策邊界域處理模型研究
本文選題:三支決策 + 邊界域處理 ; 參考:《安徽大學(xué)》2017年碩士論文
【摘要】:三支決策理論(3WD)由三種決策規(guī)則組成,分別是接受決策、拒絕決策和不承諾決策。在傳統(tǒng)的二支決策的基礎(chǔ)上,三支決策增加了一個不承諾決策選項,即表示人們可以在信息不充分的情況下采取不承諾決策,又稱為延遲決策。三支決策理論是姚在研究粗糙集和決策粗糙集理論的過程中提出的,為粗糙集理論的三個域提供了合理的語義解釋:劃分到正域中的對象表示做出接受決策;劃分到負(fù)域中的對象表示做出拒絕決策;劃分到邊界域中的對象表示需要等待進(jìn)一步觀察才能做決策,即延遲決策。這種三支決策的決策模式與人類在解決實際問題時的決策模式很相似,目前已經(jīng)在多個學(xué)科領(lǐng)域中得到廣泛應(yīng)用,例如醫(yī)療診斷、投資決策、垃圾郵件分類等等;跊Q策粗糙集的三支決策模型是應(yīng)用最廣泛的三支決策模型,它在處理分類問題時具有一定的代價敏感性,而且可以直接通過損失函數(shù)計算閾值α和β,但是該模型沒有對邊界域做進(jìn)一步的討論;跇(gòu)造性覆蓋算法的三支決策模型將構(gòu)造性覆蓋算法引入到三支決策理論中,為三支決策理論開辟了新的研究方向。它可以不需要討論任何參數(shù)問題而自動形成三個域,而且該模型針對邊界域的處理問題提出了三種處理原則,但是這三種原則在分類過程中都沒有考慮到代價敏感性。近年來,隨著數(shù)據(jù)挖掘技術(shù)和機(jī)器學(xué)習(xí)技術(shù)的不斷發(fā)展,人們越來越多地意識到分類問題往往具有代價敏感性,如何有效地處理三支決策邊界域也已經(jīng)成為三支決策領(lǐng)域亟待解決的問題。因此,本文針對三支決策邊界域的處理問題提出了兩種代價敏感的分類模型,其目標(biāo)就是為了對邊界域進(jìn)行處理的同時盡可能地降低分類損失和高代價樣本誤分類數(shù)。本文的主要工作包括:1、本文首先對三支決策理論的發(fā)展歷程做了簡單的梳理,并對該理論的研究現(xiàn)狀以及存在的問題進(jìn)行了分析和總結(jié)。然后詳細(xì)介紹了兩種經(jīng)典三支決策模型的相關(guān)理論,即決策粗糙集模型和基于構(gòu)造性覆蓋算法的三支決策模型。最后針對三支決策邊界域的處理問題提出了兩種代價敏感的模型,分別是基于CCA的代價敏感三支決策邊界域處理模型和基于K最近鄰的代價敏感三支決策邊界域處理模型,為三支決策邊界域的處理問題提出了新的解決方案。2、基于CCA的代價敏感三支決策邊界域處理模型(CPBM)將誤分類損失函數(shù)的大小關(guān)系作為依據(jù)來調(diào)整樣本與覆蓋之間邊界距離,以降低處理邊界域樣本的分類損失。而在基于CCA的三支決策模型中,距邊界最近原則在處理樣本時沒有考慮分類的代價敏感性,只是根據(jù)與該樣本邊界距離最小的覆蓋類別對樣本進(jìn)行劃分。相比于非代價敏感的距邊界最近原則,CPBM在處理邊界域時可以有效提高高代價樣本的召回率,最高可以達(dá)到20%,從而降低分類損失。3、基于K最近鄰的代價敏感三支決策邊界域處理模型(CTK)將K最近鄰的思想與代價敏感的方法相結(jié)合,在處理邊界域樣本時將不同的決策損失進(jìn)行量化,通過選擇決策損失最小的決策來降低分類損失。根據(jù)求得的最優(yōu)K值,該模型在處理邊界域樣本時可以充分的利用最近鄰K個覆蓋的類別信息來提高分類準(zhǔn)確性。因此,與普通的非代價敏感方法相比,CTK在處理邊界域時不僅可以有效降低分類損失,而且在某些數(shù)據(jù)集上其分類的錯誤率也相對較低。
[Abstract]:The three decision theory (3WD) is composed of three decision-making rules, which are the acceptance decision, the refusal decision and the non commitment decision. On the basis of the traditional two decision, the three decisions add a non commitment decision option, which means that people can take non commitment decision in the case of insufficient information, also called the delay decision. Three decisions are made. The theory is proposed by Yao in the process of studying rough sets and rough set theory. It provides a reasonable semantic interpretation for three domains of Rough Set Theory: the object representation in the positive domain is divided into the negative domain and the object representation in the negative domain makes a refusal decision; the object representation in the boundary domain needs to be further discussed. Observation can be made to make decision, that is, delay decision. The decision model of this three decision is very similar to the decision model of human being in solving practical problems. At present, it has been widely used in many disciplines, such as medical diagnosis, investment decision, spam classification and so on. The three decision models based on rough set are the most widely used. A generalized three decision model, which has a certain cost sensitivity in dealing with the classification problem, and can calculate the threshold alpha and beta directly through the loss function, but the model does not further discuss the boundary domain. The construction coverage algorithm is introduced into the three decision theory based on the three decision model based on the structural coverage algorithm. It opens up a new research direction for the three decision theory. It can automatically form three domains without the need to discuss any parameter problems, and the model puts forward three processing principles for the processing of the boundary domain, but these three principles do not take account of the generation price sensitivity in the classification process. In recent years, with the data mining technology and the data mining technology, With the continuous development of machine learning technology, people are increasingly aware that classification problems often have cost sensitivity. How to effectively deal with the three decision boundary areas has also become a problem to be solved urgently in the three decision-making fields. Therefore, this paper puts forward two cost sensitive classification models for the problems in the three decision-making boundary areas. The aim of this paper is to reduce the classification loss and the high cost sample misclassification as far as possible. The main work of this paper is as follows: 1. First, the development process of the three decision-making theories is briefly combed, and the research status and existing problems of the theory are analyzed and summarized. The related theories of two classical three decision model are introduced in detail, namely, the decision rough set model and the three decision model based on the constructional coverage algorithm. Finally, two cost sensitive models are proposed for the processing problem of the three decision boundary domains, which are based on the cost sensitive three decision boundary domain processing model based on the CCA and based on the K. The nearest neighbor's cost is sensitive to three decision boundary domain processing models, and a new solution.2 is proposed for the processing problem of three decision boundary domains. The cost sensitive three decision boundary domain processing model based on the cost of CCA (CPBM) takes the size relation of the misclassified loss function as the basis to reduce the boundary distance between the sample and the cover, in order to reduce the processing. In the three decision models based on CCA, the nearest principle of the distance to the boundary does not consider the cost sensitivity of the classification when dealing with the sample, but only according to the cover category which is the smallest distance from the sample boundary. Compared with the non cost sensitive nearest neighbor principle, the CPBM is dealing with the boundary domain. The recall rate of high cost samples can be improved effectively, the maximum can be reached to 20%, which reduces the classification loss.3. Based on the cost sensitive three decision boundary domain processing model (CTK) based on the K nearest neighbor's cost sensitive method, the thought of the nearest neighbor of the K is combined with the cost sensitive method, and the different decision losses are quantified in the process of processing the boundary area, by selecting the different decision losses. According to the optimal K value obtained, the model can make full use of the nearest neighbor K coverage information to improve the classification accuracy according to the obtained optimal value. Therefore, compared with ordinary non cost sensitive methods, CTK can not only effectively reduce the classification loss when it is in the boundary domain. Moreover, the error rate of some data sets is relatively low.
【學(xué)位授予單位】:安徽大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP18;O225
【相似文獻(xiàn)】
相關(guān)期刊論文 前8條
1 鄭德高;;定勢下多思緒模式序列處理模型[J];科學(xué);1993年01期
2 蔡天舸,王命延,肖美華,錢鐘勝;一種基于2PC過程的通用分布式事務(wù)處理模型[J];南昌大學(xué)學(xué)報(理科版);2001年01期
3 帥向華;胡素平;劉欽;甄盟;;地震災(zāi)情網(wǎng)絡(luò)媒體獲取與處理模型[J];自然災(zāi)害學(xué)報;2013年03期
4 閆利;姜蕓;王軍;;利用視線向量的資源三號衛(wèi)星影像嚴(yán)格幾何處理模型[J];武漢大學(xué)學(xué)報(信息科學(xué)版);2013年12期
5 薛小峰,汪曉程;WebGISR的優(yōu)化處理模型[J];微電子學(xué)與計算機(jī);2000年04期
6 高琳琦,李懷祖,孫林巖;基于事件觸發(fā)機(jī)制的EDI事務(wù)處理模型[J];系統(tǒng)工程理論方法應(yīng)用;1998年03期
7 張洪水;賈小珠;紀(jì)美霞;;一種改進(jìn)的基于意圖識別技術(shù)的報警信息關(guān)聯(lián)處理模型[J];青島大學(xué)學(xué)報(自然科學(xué)版);2007年04期
8 ;[J];;年期
相關(guān)會議論文 前2條
1 王文濤;聶祚仁;龔先政;;LCI中廢鋼的處理模型分析[A];2004年中國材料研討會論文摘要集[C];2004年
2 張英俊;謝斌紅;陳立潮;;基于Agent的Web服務(wù)事務(wù)處理模型研究[A];2006年全國開放式分布與并行計算機(jī)學(xué)術(shù)會議論文集(三)[C];2006年
相關(guān)重要報紙文章 前1條
1 王曉民;IBM劍指客戶端技術(shù)顛峰[N];電腦商報;2004年
相關(guān)碩士學(xué)位論文 前10條
1 董正云;Bondarenko方法在共振計算中的改進(jìn)與適用性研究[D];華北電力大學(xué)(北京);2016年
2 王剛;代價敏感的三支決策邊界域處理模型研究[D];安徽大學(xué);2017年
3 田沖;基于偏好學(xué)習(xí)的組合服務(wù)事務(wù)處理模型研究[D];合肥工業(yè)大學(xué);2009年
4 李文生;移動數(shù)據(jù)庫事務(wù)處理模型的研究[D];重慶大學(xué);2013年
5 徐碧云;Web服務(wù)事務(wù)處理模型研究與實現(xiàn)[D];河海大學(xué);2005年
6 唐旭;大量并發(fā)環(huán)境下的緩沖異步處理模型研究與應(yīng)用[D];重慶大學(xué);2008年
7 何演;Web服務(wù)事務(wù)處理模型研究與設(shè)計[D];南京郵電大學(xué);2011年
8 柏延松;基于場景的移動任務(wù)分析與處理模型的設(shè)計[D];西北大學(xué);2013年
9 楊石;嵌入式信息系統(tǒng)中移動事務(wù)處理模型的研究[D];長春理工大學(xué);2012年
10 呂書哲;移動事務(wù)處理模型的研究與實現(xiàn)[D];華南理工大學(xué);2014年
,本文編號:2079378
本文鏈接:http://sikaile.net/kejilunwen/yysx/2079378.html