Www.isquareit.edu.in Tel - +91 20 22933441 Hope Foundation’s International Institute of Information Technology, (I²IT). www.isquareit.edu.in Tel - +91.

Slides:



Advertisements
Similar presentations
UNIT – 1 Data Preprocessing
Advertisements

UNIT-2 Data Preprocessing LectureTopic ********************************************** Lecture-13Why preprocess the data? Lecture-14Data cleaning Lecture-15Data.
1 Copyright by Jiawei Han, modified by Charles Ling for cs411a/538a Data Mining and Data Warehousing v Introduction v Data warehousing and OLAP for data.
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.

Data Preprocessing.
6/10/2015Data Mining: Concepts and Techniques1 Chapter 2: Data Preprocessing Why preprocess the data? Descriptive data summarization Data cleaning Data.
Chapter 3 Pre-Mining. Content Introduction Proposed New Framework for a Conceptual Data Warehouse Selecting Missing Value Point Estimation Jackknife estimate.
Pre-processing for Data Mining CSE5610 Intelligent Software Systems Semester 1.
Chapter 4 Data Preprocessing
Data Preprocessing.
Peter Brezany and Christian Kloner Institut für Scientific Computing
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
Chapter 1 Data Preprocessing
CS2032 DATA WAREHOUSING AND DATA MINING
D ATA P REPROCESSING 1. C HAPTER 3: D ATA P REPROCESSING Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization.
Preprocessing of Lifelog September 18, 2008 Sung-Bae Cho 0.
Outline Introduction Descriptive Data Summarization Data Cleaning Missing value Noise data Data Integration Redundancy Data Transformation.
Descriptive Exploratory Data Analysis III Jagdish S. Gangolly State University of New York at Albany.
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas.
November 24, Data Mining: Concepts and Techniques.
Data Preprocessing Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.

Data Mining: Concepts and Techniques — Chapter 2 —
February 18, 2016Data Mining: Babu Ram Dawadi1 Chapter 3: Data Preprocessing Preprocess Steps Data cleaning Data integration and transformation Data reduction.
Data Preprocessing Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Waqas Haider Bangyal. Classification Vs Clustering In general, in classification you have a set of predefined classes and want to know which class a new.
1 Web Mining Faculty of Information Technology Department of Software Engineering and Information Systems PART 4 – Data pre-processing Dr. Rakan Razouk.
Data Mining: Data Prepossessing What is to be done before we get to Data Mining?
Pattern Recognition Lecture 20: Data Mining 2 Dr. Richard Spillman Pacific Lutheran University.
Course Outline 1. Pengantar Data Mining 2. Proses Data Mining
Data Mining: Concepts and Techniques
Data Mining: Data Preparation
Noisy Data Noise: random error or variance in a measured variable.
UNIT-2 Data Preprocessing
©Jiawei Han and Micheline Kamber Department of Computer Science
©Jiawei Han and Micheline Kamber Department of Computer Science
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 3 —
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Data Preprocessing Modified from
Chapter 1 Data Preprocessing
©Jiawei Han and Micheline Kamber
COMPUTER NETWORK TECHNOLOGY
Data Mining: Concepts and Techniques
DESIGN PATTERNS : Adapter Pattern
EQUIVALENCE CALCULATIONS UNDER INFLATION
International Institute of Information Technology, (I²IT).
Engineering Mathematics-I Eigenvalues and Eigenvectors
Data Mining Data Preprocessing
INTERNATIONAL INSTITUTE OF IFORMATION TECHNOLOGY, (I²IT)
INTERNATIONAL INSTITUTE OF IFORMATION TECHNOLOGY, (I²IT)
Newton’s Laws of Motion
By Sandeep Patil, Department of Computer Engineering, I²IT
Differential Equation
BINARY HEAP Prof ajitkumar shitole Assistant Professor Department of computer engineering Hope Foundation’s International Institute of Information.
Hope Foundation’s International Institute of Information.
Design procedure for an Integrator
Resource: J. Han and other books
Tel Hope Foundation’s International Institute of Information Technology, (I²IT). Tel
INTERNATIONAL INSTITUTE OF IFORMATION TECHNOLOGY, (I²IT)
DAA - Introduction Dr. Sashikala Mishra Associate Professor Department of Computer Engineering Hope Foundation’s INTERNATIONAL INSTITUTE OF INFORMATION.
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.
Data Proprocessing — Chapter 2 —
Transaction Serializability
CASCADING STYLE SHEET WEB TECHNOLOGY
Presentation transcript:

www.isquareit.edu.in Tel - +91 20 22933441 Hope Foundation’s International Institute of Information Technology, (I²IT). www.isquareit.edu.in Tel - +91 20 22933441 Hope Foundation’s International Institute of Information Technology, I²IT, P-14 Rajiv Gandhi Infotech Park, Hinjawadi, Pune - 411 057 Tel - +91 20 22933441 / 2 / 3 | Website - www.isquareit.edu.in ; Email - info@isquareit.edu.in

Data Preprocessing An Overview International Institute of Information Technology, I²IT, P-14 Rajiv Gandhi Infotech Park, Hinjawadi, Pune - 411 057 Toll Free - 1800 233 4499 Website - www.isquareit.edu.in; Email - info@isquareit.edu.in

Outline What is Data Preprocessing ? Major Steps in Data Preprocessing Basic Design Issues Challenges for Distributed System International Institute of Information Technology, I²IT, P-14 Rajiv Gandhi Infotech Park, Hinjawadi, Pune - 411 057 Toll Free - 1800 233 4499 Website - www.isquareit.edu.in; Email - info@isquareit.edu.in

Why Data Preprocessing ? Need of data preprocessing Some part of Data may have problems like Incomplete (absence of data) Inaccurate or noisy (other than expected values) Inconsistent (containing discrepancies) Timeliness (old version of data) Believability (users faith in the correctness of the data) Interpretability (simplicity in understanding the data) International Institute of Information Technology, I²IT, P-14 Rajiv Gandhi Infotech Park, Hinjawadi, Pune - 411 057 Toll Free - 1800 233 4499 Website - www.isquareit.edu.in; Email - info@isquareit.edu.in

Major Steps in Data Preprocessing Data Cleaning Data Integration Data Reduction Data Transformation International Institute of Information Technology, I²IT, P-14 Rajiv Gandhi Infotech Park, Hinjawadi, Pune - 411 057 Toll Free - 1800 233 4499 Website - www.isquareit.edu.in; Email - info@isquareit.edu.in

Data Cleaning Filling Missing values Smoothing Remove Noisy data Identifying or removing outliers Resolving inconsistencies. International Institute of Information Technology, I²IT, P-14 Rajiv Gandhi Infotech Park, Hinjawadi, Pune - 411 057 Toll Free - 1800 233 4499 Website - www.isquareit.edu.in; Email - info@isquareit.edu.in

Data Integration Entity Identification Problem Integrating multiple databases, data cubes, or files Redundancy and Correlation Analysis Tuple Duplication - updating some but not all data occurrences. Data Value Conflict Detection and Resolution - for the same real-world entity, attribute values from different sources may differ International Institute of Information Technology, I²IT, P-14 Rajiv Gandhi Infotech Park, Hinjawadi, Pune - 411 057 Toll Free - 1800 233 4499 Website - www.isquareit.edu.in; Email - info@isquareit.edu.in

Data Reduction To obtain a reduced representation of the data set that is much smaller in volume Numerosity Reduction - Parametric methods eg. Regression and log-linear models etc. - Nonparametric methods eg. Histograms, clustering, sampling etc. Data Compression - lossless - lossy International Institute of Information Technology, I²IT, P-14 Rajiv Gandhi Infotech Park, Hinjawadi, Pune - 411 057 Toll Free - 1800 233 4499 Website - www.isquareit.edu.in; Email - info@isquareit.edu.in

Data Transformation and Data Discretization Data are transformed or consolidated into forms appropriate for mining - Smoothing - Attribute construction or feature construction - Aggregation, - Normalization - Discretization - Concept hierarchy generation International Institute of Information Technology, I²IT, P-14 Rajiv Gandhi Infotech Park, Hinjawadi, Pune - 411 057 Toll Free - 1800 233 4499 Website - www.isquareit.edu.in; Email - info@isquareit.edu.in

Conclusion Although numerous methods of data preprocessing have been developed, data preprocessing remains an active area of research, due to the huge amount of inconsistent or dirty data and the complexity of the problem. International Institute of Information Technology, I²IT, P-14 Rajiv Gandhi Infotech Park, Hinjawadi, Pune - 411 057 Toll Free - 1800 233 4499 Website - www.isquareit.edu.in; Email - info@isquareit.edu.in

www.isquareit.edu.in | prashantg@isquareit.edu.in THANK YOU For further information please contact Prof. Sandeep Patil Department of Computer Engineering Hope Foundation’s International Institute of Information Technology, I²IT Hinjawadi, Pune – 411 057 Phone - +91 20 22933441 www.isquareit.edu.in | prashantg@isquareit.edu.in International Institute of Information Technology, I²IT, P-14 Rajiv Gandhi Infotech Park, Hinjawadi, Pune - 411 057 Toll Free - 1800 233 4499 Website - www.isquareit.edu.in; Email - info@isquareit.edu.in