Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.

Slides:



Advertisements
Similar presentations
UNIT – 1 Data Preprocessing
Advertisements

UNIT-2 Data Preprocessing LectureTopic ********************************************** Lecture-13Why preprocess the data? Lecture-14Data cleaning Lecture-15Data.
Noise & Data Reduction. Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis - Spectrum.
DATA PREPROCESSING Why preprocess the data?
1 Copyright by Jiawei Han, modified by Charles Ling for cs411a/538a Data Mining and Data Warehousing v Introduction v Data warehousing and OLAP for data.
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.

Data Mining: Concepts and Techniques
Data Preprocessing.
6/10/2015Data Mining: Concepts and Techniques1 Chapter 2: Data Preprocessing Why preprocess the data? Descriptive data summarization Data cleaning Data.
Chapter 3 Pre-Mining. Content Introduction Proposed New Framework for a Conceptual Data Warehouse Selecting Missing Value Point Estimation Jackknife estimate.
Pre-processing for Data Mining CSE5610 Intelligent Software Systems Semester 1.
Chapter 4 Data Preprocessing
Data Preprocessing.
Peter Brezany and Christian Kloner Institut für Scientific Computing
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
Chapter 1 Data Preprocessing
CS2032 DATA WAREHOUSING AND DATA MINING
COSC 4335 DM: Preprocessing Techniques
Ch2 Data Preprocessing part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
The Knowledge Discovery Process; Data Preparation & Preprocessing
Outline Introduction Descriptive Data Summarization Data Cleaning Missing value Noise data Data Integration Redundancy Data Transformation.
Descriptive Exploratory Data Analysis III Jagdish S. Gangolly State University of New York at Albany.
Preprocessing for Data Mining Vikram Pudi IIIT Hyderabad.
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
2015年11月6日星期五 2015年11月6日星期五 2015年11月6日星期五 Data Mining: Concepts and Techniques1 Data Preprocessing — Chapter 2 —
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas.
November 24, Data Mining: Concepts and Techniques.
Data Preprocessing Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.

Data Mining: Concepts and Techniques — Chapter 2 —
Managing Data for DSS II. Managing Data for DS Data Warehouse Common characteristics : –Database designed to meet analytical tasks comprising of data.
February 18, 2016Data Mining: Babu Ram Dawadi1 Chapter 3: Data Preprocessing Preprocess Steps Data cleaning Data integration and transformation Data reduction.
3/13/2016Data Mining 1 Lecture 1-2 Data and Data Preparation Phayung Meesad, Ph.D. King Mongkut’s University of Technology North Bangkok (KMUTNB) Bangkok.
Waqas Haider Bangyal. Classification Vs Clustering In general, in classification you have a set of predefined classes and want to know which class a new.
Data Understanding, Cleaning, Transforming. Recall the Data Science Process Data acquisition Data extraction (wrapper, IE) Understand/clean/transform.
Data Mining: Data Prepossessing What is to be done before we get to Data Mining?
Pattern Recognition Lecture 20: Data Mining 2 Dr. Richard Spillman Pacific Lutheran University.
Data Mining Modified from
Course Outline 1. Pengantar Data Mining 2. Proses Data Mining
Data Transformation: Normalization
Data Mining: Concepts and Techniques
Data Mining: Data Preparation
Data Mining Techniques and Applications
Noisy Data Noise: random error or variance in a measured variable.
UNIT-2 Data Preprocessing
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse and OLAP
Classification & Prediction
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 3 —
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Lecture 7: Data Preprocessing
Data Understanding, Cleaning, Transforming
Analytics: Its More than Just Modeling
Data Preprocessing Modified from
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Chapter 1 Data Preprocessing
©Jiawei Han and Micheline Kamber
Data Transformations targeted at minimizing experimental variance
Data Mining Data Preprocessing
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
By Sandeep Patil, Department of Computer Engineering, I²IT
Data Mining.
Data Pre-processing Lecture Notes for Chapter 2
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Data Warehouse and OLAP
Tel Hope Foundation’s International Institute of Information Technology, (I²IT). Tel
Presentation transcript:

Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.

CSE 591: Data Mining by H. Liu Data preprocessing A necessary step for serious, effective, real-world data mining It’s often omitted in “academic” DM, but can’t be over-stressed in practical DM The need for pre-processing in DM Data reduction - too much data Data cleaning - noise Data integration and transformation 7/14/2019 CSE 591: Data Mining by H. Liu

CSE 591: Data Mining by H. Liu Data reduction Data cube aggregation Feature selection (dimensionality reduction) Sampling random sampling and others Instance selection (search based) Data compression PCA, Wavelet transformation Data discretization 7/14/2019 CSE 591: Data Mining by H. Liu

CSE 591: Data Mining by H. Liu Feature selection The basic problem Finding a subset of original features The illustration of the difficulty of the problem A standard procedure of feature selection Search Evaluation measures on goodness of selected features 7/14/2019 CSE 591: Data Mining by H. Liu

CSE 591: Data Mining by H. Liu Feature extraction The basic problem creating new features that are combinations of original features A common approach – PCA Its variants are used widely in text mining and web mining 7/14/2019 CSE 591: Data Mining by H. Liu

CSE 591: Data Mining by H. Liu Discretization The concept The methods Equ-width Equ-frequency Entropy-based 7/14/2019 CSE 591: Data Mining by H. Liu

CSE 591: Data Mining by H. Liu Data cleaning Missing values ignore it fill in manually use a global value/mean/most frequent Noise smoothing (binning) outlier removal Inconsistency domain knowledge, domain constraints 7/14/2019 CSE 591: Data Mining by H. Liu

CSE 591: Data Mining by H. Liu Data integration Data integration - combines data from multiple sources into a coherent data store Schema integration entity identification problem Redundancy an attribute may be derived from another table correlation analysis Data value conflicts 7/14/2019 CSE 591: Data Mining by H. Liu

CSE 591: Data Mining by H. Liu Data transformation Data is transformed or consolidated into forms appropriate for mining Methods include smoothing aggregation generalization normalization (min-max) feature construction using neural networks 7/14/2019 CSE 591: Data Mining by H. Liu