Download presentation
Presentation is loading. Please wait.
Published byOswin Long Modified over 6 years ago
1
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
2
CSE 591: Data Mining by H. Liu
Data preprocessing A necessary step for serious, real-world data mining It’s often omitted in “academic” DM, but can’t be over-stressed in practical DM The need for pre-processing in DM Data reduction - too much data Data cleaning - noise Data integration and transformation 1/1/2019 CSE 591: Data Mining by H. Liu
3
CSE 591: Data Mining by H. Liu
Data reduction Data cube aggregation Feature selection (dimensionality reduction) Sampling random sampling and others Instance selection (search based) Data compression PCA, Wavelet transformation Data discretization 1/1/2019 CSE 591: Data Mining by H. Liu
4
CSE 591: Data Mining by H. Liu
Data cleaning Missing values ignore it fill in manually use a global value/mean/most frequent Noise smoothing (binning) outlier removal Inconsistency domain knowledge, domain constraints 1/1/2019 CSE 591: Data Mining by H. Liu
5
CSE 591: Data Mining by H. Liu
Data integration Data integration - combines data from multiple sources into a coherent data store Schema integration entity identification problem Redundancy an attribute may be derived from another table correlation analysis Data value conflicts 1/1/2019 CSE 591: Data Mining by H. Liu
6
CSE 591: Data Mining by H. Liu
Data transformation Data is transformed or consolidated into forms appropriate for mining Methods include smoothing aggregation generalization normalization (min-max) feature construction using neural networks 1/1/2019 CSE 591: Data Mining by H. Liu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.