Learning on Evolving Data Streams
發(fā)布時(shí)間:2023-05-20 06:25
在當(dāng)今數(shù)字時(shí)代,海量流式數(shù)據(jù)正在各種實(shí)際應(yīng)用場(chǎng)景中不斷的自動(dòng)生成。由于數(shù)據(jù)流具有無(wú)限長(zhǎng)度及演化的特性,使得學(xué)習(xí)算法必須在有限的時(shí)間內(nèi)進(jìn)行處理,因此如何開(kāi)發(fā)高效的數(shù)據(jù)流學(xué)習(xí)算法一直是機(jī)器學(xué)習(xí)面臨的挑戰(zhàn)。為此,大量概念漂移的數(shù)據(jù)流學(xué)習(xí)算法在過(guò)去十年中相繼提出。然而現(xiàn)有數(shù)據(jù)流挖掘仍面臨一些新的問(wèn)題和挑戰(zhàn)。首先是數(shù)據(jù)的概念演化(即新類問(wèn)題)。傳統(tǒng)分類器往往聚焦固定的類別,而在實(shí)際場(chǎng)景中,新的類別可能會(huì)隨時(shí)間推移而增加。其次是數(shù)據(jù)標(biāo)簽的稀少性問(wèn)題。傳統(tǒng)的數(shù)據(jù)流挖掘往往采用監(jiān)督學(xué)習(xí)框架。然而數(shù)據(jù)流的樣本標(biāo)注將需要大量的時(shí)間和資源,現(xiàn)實(shí)場(chǎng)景往往僅能提供少量標(biāo)簽實(shí)例。因此如何設(shè)計(jì)一種可靠的半監(jiān)督學(xué)習(xí)算法是面臨的另一個(gè)挑戰(zhàn)。另外,數(shù)據(jù)流中的另一個(gè)挑戰(zhàn)就是數(shù)據(jù)的高維問(wèn)題,它可能會(huì)嚴(yán)重影響學(xué)習(xí)算法的性能。針對(duì)這些問(wèn)題,本文提出了一些新的數(shù)據(jù)流學(xué)習(xí)算法,其重要的貢獻(xiàn)如下:1.針對(duì)概念演化問(wèn)題,本文提出了一種新的數(shù)據(jù)流分類算法用于檢測(cè)和學(xué)習(xí)新類。新提出的算法能夠同時(shí)處理概念漂移和概念演化問(wèn)題,同時(shí)能夠處理數(shù)據(jù)流中的復(fù)雜的類分布,在噪聲數(shù)據(jù)中有效區(qū)分概念漂移和演化。在人工和真實(shí)數(shù)據(jù)中表明新提出的方法與前沿方法相比...
【文章頁(yè)數(shù)】:155 頁(yè)
【學(xué)位級(jí)別】:博士
【文章目錄】:
摘要
ABSTRACT
Chapter1 Introduction
1.1 Research Background and Significance
1.1.1 Data Stream Mining
1.1.2 Challenges
1.2 Research Progress(State-of-the-art)in Data Stream Mining
1.2.1 Clustering Data Streams
1.2.2 Data Stream Classification
1.2.2.1 Stationary Data Stream Classification
1.2.2.2 Evolving Data Stream Classification
1.2.2.3 Data Stream Classification with Novel Class Detection
1.2.2.4 Semi-supervised Data Stream Classification
1.3 Research Scope and Thesis Contributions
1.4 Thesis Organization
Chapter2 Foundation of Concepts
2.1 Definitions
2.2 Basis of Stream Clustering Algorithms
2.3 Taxonomy of Clustering Algorithms
2.4 Basis of Stream Classification Algorithms
2.4.1 Learning Structure
2.4.2 Adaptivity Mechanisms
2.5 Taxonomy of Classification Algorithms
2.5.1 Approaches Based on Adaptation Process
2.5.1.1 Informed or Active Approaches
2.5.1.2 Blind or Passive Approaches
2.5.2 Approaches Based on Learning Process
2.5.2.1 Single Classifier
2.5.2.2 Ensemble Classifiers
2.6 Evaluation and Performance Criteria
2.6.1 Evaluation Metrics
2.6.2 Estimation Techniques
2.6.2.1 Prequential Evaluation
2.6.2.2 Hold-out Evaluation
2.7 Summary
Chapter3 Data Stream Classification with Novel Class Detection
3.1 Introduction
3.2 Related Work
3.3 Proposed Algorithm
3.3.1 Problem Formalization
3.3.2 Overview
3.3.3 Main modules of EMC
3.3.3.1 Initial Model Construction
3.3.3.2 New Class Detection
3.3.3.3 Classification
3.3.3.4 Model Update
3.4 Experiment
3.4.1 Data sets
3.4.2 Classification Performance
3.4.2.1 Comparison Methods
3.4.2.2 Prediction Performance Analysis
3.4.2.3 Parameters Sensitivity on Classification Performance
3.4.3 Evaluation of New Class Detection
3.4.3.1 Comparison Methods
3.4.3.2 Evaluation Metrics
3.4.3.3 Performance Analysis
3.4.3.4 Parameters Sensitivity
3.5 Summary
Chapter4 Online Reliable Semi-supervised Learning on Evolving Data Streams
4.1 Introduction
4.2 Related Work
4.3 Proposed Algorithm
4.3.1 Overview
4.3.2 Main Building Blocks
4.3.2.1 Initializing Learning Model
4.3.2.2 Classification
4.3.2.3 Online Data Maintenance
4.4 Experiments
4.4.1 Data sets
4.4.1.1 Real-world Data sets
4.4.1.2 Synthetic Data sets
4.4.2 Comparison Methods
4.4.2.1 Semi-supervised algorithms
4.4.2.2 Supervised algorithms
4.4.3 Results
4.4.3.1 Comparison with semi-supervised algorithms
4.4.3.2 Comparison with supervised algorithms
4.4.3.3 Parameter Sensitivity Analysis
4.5 Summary
Chapter5 Learning High Dimensional Evolving Data Streams with Limited Labels
5.1 Introduction
5.2 Related Work
5.2.1 Semi-supervised data stream algorithms
5.2.2 Synchronization-based data mining
5.2.3 Denoising autoencoder(DAE)based algorithms
5.3 Proposed Algorithm
5.3.1 Notations and symbols
5.3.2 Overview
5.3.3 Main parts of the proposed algorithm
5.3.3.1 Denoising autoencoders(DAE)
5.3.3.2 Synchronization-based dynamic micro-clusters
5.3.3.3 Model update
5.4 Experiments
5.4.1 Datasets
5.4.2 Comparison algorithms
5.4.3 Analysis of results
5.4.3.1 Performance comparison
5.4.3.2 Parameter sensitivity analysis
5.5 Summary
Chapter6 Conclusion
6.1 Summary
6.1.1 Classification with novel class identification
6.1.2 Online semi-supervised classification
6.1.3 Learning high dimensional evolving data stream with limited labels
6.2 Future work
Acknowledgements
References
Research Results Obtained During the Study for Doctoral Degree
本文編號(hào):3820716
【文章頁(yè)數(shù)】:155 頁(yè)
【學(xué)位級(jí)別】:博士
【文章目錄】:
摘要
ABSTRACT
Chapter1 Introduction
1.1 Research Background and Significance
1.1.1 Data Stream Mining
1.1.2 Challenges
1.2 Research Progress(State-of-the-art)in Data Stream Mining
1.2.1 Clustering Data Streams
1.2.2 Data Stream Classification
1.2.2.1 Stationary Data Stream Classification
1.2.2.2 Evolving Data Stream Classification
1.2.2.3 Data Stream Classification with Novel Class Detection
1.2.2.4 Semi-supervised Data Stream Classification
1.3 Research Scope and Thesis Contributions
1.4 Thesis Organization
Chapter2 Foundation of Concepts
2.1 Definitions
2.2 Basis of Stream Clustering Algorithms
2.3 Taxonomy of Clustering Algorithms
2.4 Basis of Stream Classification Algorithms
2.4.1 Learning Structure
2.4.2 Adaptivity Mechanisms
2.5 Taxonomy of Classification Algorithms
2.5.1 Approaches Based on Adaptation Process
2.5.1.1 Informed or Active Approaches
2.5.1.2 Blind or Passive Approaches
2.5.2 Approaches Based on Learning Process
2.5.2.1 Single Classifier
2.5.2.2 Ensemble Classifiers
2.6 Evaluation and Performance Criteria
2.6.1 Evaluation Metrics
2.6.2 Estimation Techniques
2.6.2.1 Prequential Evaluation
2.6.2.2 Hold-out Evaluation
2.7 Summary
Chapter3 Data Stream Classification with Novel Class Detection
3.1 Introduction
3.2 Related Work
3.3 Proposed Algorithm
3.3.1 Problem Formalization
3.3.2 Overview
3.3.3 Main modules of EMC
3.3.3.1 Initial Model Construction
3.3.3.2 New Class Detection
3.3.3.3 Classification
3.3.3.4 Model Update
3.4 Experiment
3.4.1 Data sets
3.4.2 Classification Performance
3.4.2.1 Comparison Methods
3.4.2.2 Prediction Performance Analysis
3.4.2.3 Parameters Sensitivity on Classification Performance
3.4.3 Evaluation of New Class Detection
3.4.3.1 Comparison Methods
3.4.3.2 Evaluation Metrics
3.4.3.3 Performance Analysis
3.4.3.4 Parameters Sensitivity
3.5 Summary
Chapter4 Online Reliable Semi-supervised Learning on Evolving Data Streams
4.1 Introduction
4.2 Related Work
4.3 Proposed Algorithm
4.3.1 Overview
4.3.2 Main Building Blocks
4.3.2.1 Initializing Learning Model
4.3.2.2 Classification
4.3.2.3 Online Data Maintenance
4.4 Experiments
4.4.1 Data sets
4.4.1.1 Real-world Data sets
4.4.1.2 Synthetic Data sets
4.4.2 Comparison Methods
4.4.2.1 Semi-supervised algorithms
4.4.2.2 Supervised algorithms
4.4.3 Results
4.4.3.1 Comparison with semi-supervised algorithms
4.4.3.2 Comparison with supervised algorithms
4.4.3.3 Parameter Sensitivity Analysis
4.5 Summary
Chapter5 Learning High Dimensional Evolving Data Streams with Limited Labels
5.1 Introduction
5.2 Related Work
5.2.1 Semi-supervised data stream algorithms
5.2.2 Synchronization-based data mining
5.2.3 Denoising autoencoder(DAE)based algorithms
5.3 Proposed Algorithm
5.3.1 Notations and symbols
5.3.2 Overview
5.3.3 Main parts of the proposed algorithm
5.3.3.1 Denoising autoencoders(DAE)
5.3.3.2 Synchronization-based dynamic micro-clusters
5.3.3.3 Model update
5.4 Experiments
5.4.1 Datasets
5.4.2 Comparison algorithms
5.4.3 Analysis of results
5.4.3.1 Performance comparison
5.4.3.2 Parameter sensitivity analysis
5.5 Summary
Chapter6 Conclusion
6.1 Summary
6.1.1 Classification with novel class identification
6.1.2 Online semi-supervised classification
6.1.3 Learning high dimensional evolving data stream with limited labels
6.2 Future work
Acknowledgements
References
Research Results Obtained During the Study for Doctoral Degree
本文編號(hào):3820716
本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/3820716.html
最近更新
教材專著