Learning on Evolving Data Streams
發(fā)布時間:2023-05-20 06:25
在當今數字時代,海量流式數據正在各種實際應用場景中不斷的自動生成。由于數據流具有無限長度及演化的特性,使得學習算法必須在有限的時間內進行處理,因此如何開發(fā)高效的數據流學習算法一直是機器學習面臨的挑戰(zhàn)。為此,大量概念漂移的數據流學習算法在過去十年中相繼提出。然而現有數據流挖掘仍面臨一些新的問題和挑戰(zhàn)。首先是數據的概念演化(即新類問題)。傳統(tǒng)分類器往往聚焦固定的類別,而在實際場景中,新的類別可能會隨時間推移而增加。其次是數據標簽的稀少性問題。傳統(tǒng)的數據流挖掘往往采用監(jiān)督學習框架。然而數據流的樣本標注將需要大量的時間和資源,現實場景往往僅能提供少量標簽實例。因此如何設計一種可靠的半監(jiān)督學習算法是面臨的另一個挑戰(zhàn)。另外,數據流中的另一個挑戰(zhàn)就是數據的高維問題,它可能會嚴重影響學習算法的性能。針對這些問題,本文提出了一些新的數據流學習算法,其重要的貢獻如下:1.針對概念演化問題,本文提出了一種新的數據流分類算法用于檢測和學習新類。新提出的算法能夠同時處理概念漂移和概念演化問題,同時能夠處理數據流中的復雜的類分布,在噪聲數據中有效區(qū)分概念漂移和演化。在人工和真實數據中表明新提出的方法與前沿方法相比...
【文章頁數】:155 頁
【學位級別】:博士
【文章目錄】:
摘要
ABSTRACT
Chapter1 Introduction
1.1 Research Background and Significance
1.1.1 Data Stream Mining
1.1.2 Challenges
1.2 Research Progress(State-of-the-art)in Data Stream Mining
1.2.1 Clustering Data Streams
1.2.2 Data Stream Classification
1.2.2.1 Stationary Data Stream Classification
1.2.2.2 Evolving Data Stream Classification
1.2.2.3 Data Stream Classification with Novel Class Detection
1.2.2.4 Semi-supervised Data Stream Classification
1.3 Research Scope and Thesis Contributions
1.4 Thesis Organization
Chapter2 Foundation of Concepts
2.1 Definitions
2.2 Basis of Stream Clustering Algorithms
2.3 Taxonomy of Clustering Algorithms
2.4 Basis of Stream Classification Algorithms
2.4.1 Learning Structure
2.4.2 Adaptivity Mechanisms
2.5 Taxonomy of Classification Algorithms
2.5.1 Approaches Based on Adaptation Process
2.5.1.1 Informed or Active Approaches
2.5.1.2 Blind or Passive Approaches
2.5.2 Approaches Based on Learning Process
2.5.2.1 Single Classifier
2.5.2.2 Ensemble Classifiers
2.6 Evaluation and Performance Criteria
2.6.1 Evaluation Metrics
2.6.2 Estimation Techniques
2.6.2.1 Prequential Evaluation
2.6.2.2 Hold-out Evaluation
2.7 Summary
Chapter3 Data Stream Classification with Novel Class Detection
3.1 Introduction
3.2 Related Work
3.3 Proposed Algorithm
3.3.1 Problem Formalization
3.3.2 Overview
3.3.3 Main modules of EMC
3.3.3.1 Initial Model Construction
3.3.3.2 New Class Detection
3.3.3.3 Classification
3.3.3.4 Model Update
3.4 Experiment
3.4.1 Data sets
3.4.2 Classification Performance
3.4.2.1 Comparison Methods
3.4.2.2 Prediction Performance Analysis
3.4.2.3 Parameters Sensitivity on Classification Performance
3.4.3 Evaluation of New Class Detection
3.4.3.1 Comparison Methods
3.4.3.2 Evaluation Metrics
3.4.3.3 Performance Analysis
3.4.3.4 Parameters Sensitivity
3.5 Summary
Chapter4 Online Reliable Semi-supervised Learning on Evolving Data Streams
4.1 Introduction
4.2 Related Work
4.3 Proposed Algorithm
4.3.1 Overview
4.3.2 Main Building Blocks
4.3.2.1 Initializing Learning Model
4.3.2.2 Classification
4.3.2.3 Online Data Maintenance
4.4 Experiments
4.4.1 Data sets
4.4.1.1 Real-world Data sets
4.4.1.2 Synthetic Data sets
4.4.2 Comparison Methods
4.4.2.1 Semi-supervised algorithms
4.4.2.2 Supervised algorithms
4.4.3 Results
4.4.3.1 Comparison with semi-supervised algorithms
4.4.3.2 Comparison with supervised algorithms
4.4.3.3 Parameter Sensitivity Analysis
4.5 Summary
Chapter5 Learning High Dimensional Evolving Data Streams with Limited Labels
5.1 Introduction
5.2 Related Work
5.2.1 Semi-supervised data stream algorithms
5.2.2 Synchronization-based data mining
5.2.3 Denoising autoencoder(DAE)based algorithms
5.3 Proposed Algorithm
5.3.1 Notations and symbols
5.3.2 Overview
5.3.3 Main parts of the proposed algorithm
5.3.3.1 Denoising autoencoders(DAE)
5.3.3.2 Synchronization-based dynamic micro-clusters
5.3.3.3 Model update
5.4 Experiments
5.4.1 Datasets
5.4.2 Comparison algorithms
5.4.3 Analysis of results
5.4.3.1 Performance comparison
5.4.3.2 Parameter sensitivity analysis
5.5 Summary
Chapter6 Conclusion
6.1 Summary
6.1.1 Classification with novel class identification
6.1.2 Online semi-supervised classification
6.1.3 Learning high dimensional evolving data stream with limited labels
6.2 Future work
Acknowledgements
References
Research Results Obtained During the Study for Doctoral Degree
本文編號:3820716
【文章頁數】:155 頁
【學位級別】:博士
【文章目錄】:
摘要
ABSTRACT
Chapter1 Introduction
1.1 Research Background and Significance
1.1.1 Data Stream Mining
1.1.2 Challenges
1.2 Research Progress(State-of-the-art)in Data Stream Mining
1.2.1 Clustering Data Streams
1.2.2 Data Stream Classification
1.2.2.1 Stationary Data Stream Classification
1.2.2.2 Evolving Data Stream Classification
1.2.2.3 Data Stream Classification with Novel Class Detection
1.2.2.4 Semi-supervised Data Stream Classification
1.3 Research Scope and Thesis Contributions
1.4 Thesis Organization
Chapter2 Foundation of Concepts
2.1 Definitions
2.2 Basis of Stream Clustering Algorithms
2.3 Taxonomy of Clustering Algorithms
2.4 Basis of Stream Classification Algorithms
2.4.1 Learning Structure
2.4.2 Adaptivity Mechanisms
2.5 Taxonomy of Classification Algorithms
2.5.1 Approaches Based on Adaptation Process
2.5.1.1 Informed or Active Approaches
2.5.1.2 Blind or Passive Approaches
2.5.2 Approaches Based on Learning Process
2.5.2.1 Single Classifier
2.5.2.2 Ensemble Classifiers
2.6 Evaluation and Performance Criteria
2.6.1 Evaluation Metrics
2.6.2 Estimation Techniques
2.6.2.1 Prequential Evaluation
2.6.2.2 Hold-out Evaluation
2.7 Summary
Chapter3 Data Stream Classification with Novel Class Detection
3.1 Introduction
3.2 Related Work
3.3 Proposed Algorithm
3.3.1 Problem Formalization
3.3.2 Overview
3.3.3 Main modules of EMC
3.3.3.1 Initial Model Construction
3.3.3.2 New Class Detection
3.3.3.3 Classification
3.3.3.4 Model Update
3.4 Experiment
3.4.1 Data sets
3.4.2 Classification Performance
3.4.2.1 Comparison Methods
3.4.2.2 Prediction Performance Analysis
3.4.2.3 Parameters Sensitivity on Classification Performance
3.4.3 Evaluation of New Class Detection
3.4.3.1 Comparison Methods
3.4.3.2 Evaluation Metrics
3.4.3.3 Performance Analysis
3.4.3.4 Parameters Sensitivity
3.5 Summary
Chapter4 Online Reliable Semi-supervised Learning on Evolving Data Streams
4.1 Introduction
4.2 Related Work
4.3 Proposed Algorithm
4.3.1 Overview
4.3.2 Main Building Blocks
4.3.2.1 Initializing Learning Model
4.3.2.2 Classification
4.3.2.3 Online Data Maintenance
4.4 Experiments
4.4.1 Data sets
4.4.1.1 Real-world Data sets
4.4.1.2 Synthetic Data sets
4.4.2 Comparison Methods
4.4.2.1 Semi-supervised algorithms
4.4.2.2 Supervised algorithms
4.4.3 Results
4.4.3.1 Comparison with semi-supervised algorithms
4.4.3.2 Comparison with supervised algorithms
4.4.3.3 Parameter Sensitivity Analysis
4.5 Summary
Chapter5 Learning High Dimensional Evolving Data Streams with Limited Labels
5.1 Introduction
5.2 Related Work
5.2.1 Semi-supervised data stream algorithms
5.2.2 Synchronization-based data mining
5.2.3 Denoising autoencoder(DAE)based algorithms
5.3 Proposed Algorithm
5.3.1 Notations and symbols
5.3.2 Overview
5.3.3 Main parts of the proposed algorithm
5.3.3.1 Denoising autoencoders(DAE)
5.3.3.2 Synchronization-based dynamic micro-clusters
5.3.3.3 Model update
5.4 Experiments
5.4.1 Datasets
5.4.2 Comparison algorithms
5.4.3 Analysis of results
5.4.3.1 Performance comparison
5.4.3.2 Parameter sensitivity analysis
5.5 Summary
Chapter6 Conclusion
6.1 Summary
6.1.1 Classification with novel class identification
6.1.2 Online semi-supervised classification
6.1.3 Learning high dimensional evolving data stream with limited labels
6.2 Future work
Acknowledgements
References
Research Results Obtained During the Study for Doctoral Degree
本文編號:3820716
本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/3820716.html