WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.

Slides:



Advertisements
Similar presentations
QMM 384 – Data Mining Data Mining: Introduction Introduction to Predictive Analytics.
Advertisements

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Introduction to Data Mining by Tan, Steinbach, Kumar.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
Week 9 Data Mining System (Knowledge Data Discovery)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
University of Minnesota
© Vipin Kumar CSci 8980 (Data Mining) Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Decision Support: Data Mining Introduction.
Data Mining: Introduction
Data Mining – Intro.
Data Mining Course Overview. About the course – Administrivia Instructor: George Kollios, MCS 288, Mon 2:30-4:00PM.
Data Mining and Business Intelligence
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Introduction to Data Mining. Why Mine the Data? Lots of data is being collected and warehoused – Web data, e-commerce – purchases at department/ grocery.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Knowledge Discovery and Data Mining Evgueni Smirnov.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
1 What is Data Mining? l Data mining is the process of automatically discovering useful information in large data repositories. l There are many other.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
1 Data Mining: Introduction Chapter 1 of Introduction to Data Mining by Tan, Steinbach, Kumar.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Minqi.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Introduction to Data Mining Jinze Liu April 8 th, 2009.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
COMSATS Institute of Information Technology Department of Computer Science Databases and Information Systems Dr. Ramzan Talib Databases and Information.
Data Mining and Decision Support
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Introduction to Data Mining Mining Association Rules Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
An Introduction to Data Mining
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Lecture Notes for Chapter 1 Introduction to Data Mining.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Data Mining Introduction
Data Mining: Introduction
Data Mining: Introduction
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Sangeeta Devadiga CS 157B, Spring 2007
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining: Introduction
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
First 2-3 Lectures: Intro to DM/DS
Presentation transcript:

WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques (statistics) and sophisticated computer algorithms to discover patterns.  Uses machine learning techniques to find structural patterns within the data.

 Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems  Traditional Techniques may be unsuitable due to  Enormity of data  High dimensionality of data  Heterogeneous, distributed nature of data Origins of Data Mining Machine Learning/ Pattern Recognition Statistics/ AI Data Mining Database systems

Cross Industry Standard Process for Data Mining

The Process -- Simplified  pre-processing,  data mining  results validation

Two Basic Problem Classes  Prediction Methods  Use some variables to predict unknown or future values of other variables.  Description Methods  Find human-interpretable patterns that describe the data.

Basic Types of Data Mining Tasks  Classification (predictive)  Clustering (descriptive)  Association rules (descriptive)  Sequential patterns (descriptive or predictive)  Regression (predictive)  Anomaly Detection (predictive)

Data Mining Techniques  Statistical techniques  Clustering  Decision trees  Subsampling (bootstrapping)  Nearest-neighborhoods  SOM  Bayesian methods

Data Mining Techniques  Artificial Neural Nets  Deep Learning (Google DeepMind)  PCA  Universal Prediction  Reinforcement Learning  “Compression” Sequence Prediction Techniques  Time Series Analysis

Data Mining Techniques  Hidden Markov Models  MLN  PLN  EDA (MOSES)  Random Forests  Feature Engineering  Unsupervised and Semi-Supervised Learning

DATA MINING TECHNIQUES  Entropy methods  Multifractal methods (time series)  Log-linear power laws (crash prediction)  Wavelet transforms  ….

CLASSIFICATION: Definition  Given a collection of records (training set )  Each record contains a set of attributes  one of the attributes is the class.  Find a model for class attribute as a function of the values of other attributes.  Goal: previously unseen records should be assigned a class as accurately as possible.  A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

CLUSTERING: Definition  Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that  Data points in one cluster are more similar to one another.  Data points in separate clusters are less similar to one another.  Similarity Measures:  Euclidean Distance if attributes are continuous.  Other Problem-specific Measures.

ASSOCIATION RULE: Definition  Given a set of records each of which contain some number of items from a given collection;  Produce dependency rules which will predict occurrence of an item based on occurrences of other items.

SEQUENTIAL PATTERN: Definition  Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events.  Rules are formed by first discovering patterns. Event occurrences in the patterns are governed by timing constraints.

REGRESSION: Definition  Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency.  Greatly studied in statistics, neural network fields.  Examples:  Predicting sales amounts of new product based on advetising expenditure.  Predicting wind velocities as a function of temperature, humidity, air pressure, etc.  Time series prediction of stock market indices.

ANOMALY DETECTION: Definition  Detect significant deviations from normal behavior  Applications:  Credit Card Fraud Detection  Network Intrusion Detection

DATA MINING CHALLENGES  Scalability  Dimensionality  Complex and Heterogeneous Data  Data Quality  Data Ownership and Distribution  Privacy Preservation  Streaming Data