Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.

Slides:



Advertisements
Similar presentations
QMM 384 – Data Mining Data Mining: Introduction Introduction to Predictive Analytics.
Advertisements

CPS : Information Management and Mining Shivnath Babu.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 CSE572: Data Mining Instructor: Jieping.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Introduction to Data Mining by Tan, Steinbach, Kumar.
Computing & Information Sciences Kansas State University CIS 690 Data Mining in Mobile and Cloud Computing Environments William H. Hsu, Computing and Information.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
Data Mining: Introduction
Week 9 Data Mining System (Knowledge Data Discovery)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
University of Minnesota
© Vipin Kumar CSci 8980 (Data Mining) Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Decision Support: Data Mining Introduction.
Data Mining: Introduction
Why Mine Data? Commercial Viewpoint
Data Mining and Business Intelligence
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Data Mining: Introduction. Why Data Mining? l The Explosive Growth of Data: from terabytes to petabytes –Data collection and data availability  Automated.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Knowledge Discovery and Data Mining Evgueni Smirnov.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
CSE4334/5334 DATA MINING CSE 4334/5334 Data Mining, Fall 2011 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan, Steinbach, Kumar 9/4/20071 Introduction to Data Mining Tan, Steinbach,
1 What is Data Mining? l Data mining is the process of automatically discovering useful information in large data repositories. l There are many other.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
1 Data Mining: Introduction Chapter 1 of Introduction to Data Mining by Tan, Steinbach, Kumar.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Minqi.
Introduction to Data Mining Jinze Liu April 8 th, 2009.
Christoph Eick Introduction to Data Mining 8/19/ Dr. Eick 2. COSC.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
COMSATS Institute of Information Technology Department of Computer Science Databases and Information Systems Dr. Ramzan Talib Databases and Information.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Waqas Haider Bangyal. 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006.
An Introduction to Data Mining
Department of Computer Science Sir Syed University of Engineering & Technology, Karachi-Pakistan. Presentation Title: DATA MINING Submitted By.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Lecture Notes for Chapter 1 Introduction to Data Mining.
Data Mining: Introduction
Data Mining: Introduction
Data Mining Introduction
Data Mining: Introduction
Statistics 202: Statistical Aspects of Data Mining
Data Mining: Introduction
Introduction to Data Mining Part 1 Knowledge Sources COSC 6335 Webpage
Data mining and real systems modeling
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan,
Data Mining: Introduction
Techniques for Finding Patterns in Large Amounts of Data: Applications in Biology Vipin Kumar William Norris Professor and Head, Department of Computer.
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
Data Mining: Introduction
First 2-3 Lectures: Intro to DM/DS
Presentation transcript:

Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation for Data Mining 3.Examples of Data Mining Tasks B.More detailed Survey on Data Mining C.Course Information

Tang: Introduction to Data Mining (with modification by Ch. Eick) Teaching Plan for the Next 5 Weeks 1. Introduction to Data Mining and Course Information 2. Preprocessing (Han Chapter 3) 3. Concept Characterization (Han Chapter 5) 4. Classification Techniques (multiple soursce)

Knowledge Discovery in Data [and Data Mining] (KDD) Let us find something interesting! Definition := “ KDD is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data ” (Fayyad) l Frequently, the term data mining is used to refer to KDD. l Many commercial and experimental tools and tool suites are available (see l Field is more dominated by industry than by research institutions

Tang: Introduction to Data Mining (with modification by Ch. Eick) l Lots of data is being collected and warehoused –Web data, e-commerce –purchases at department/ grocery stores –Bank/Credit Card transactions l Computers have become cheaper and more powerful (  machine learning techniques become applicable) l Competitive Pressure is Strong –Provide better, customized services for an edge (e.g. in Customer Relationship Management) Why Mine Data? Commercial Viewpoint

Why Mine Data? Scientific Viewpoint l Data collected and stored at enormous speeds (GB/hour) –remote sensors on a satellite –telescopes scanning the skies –microarrays generating gene expression data –scientific simulations generating terabytes of data l Traditional techniques infeasible for raw data l Data mining may help scientists –in classifying and segmenting data –in Hypothesis Formation

Tang: Introduction to Data Mining (with modification by Ch. Eick) Mining Large Data Sets - Motivation l There is often information “ hidden ” in the data that is not readily evident l Human analysts may take weeks to discover useful information l Much of the data is never analyzed at all The Data Gap Total new disk (TB) since 1995 Number of analysts From: R. Grossman, C. Kamath, V. Kumar, “Data Mining for Scientific and Engineering Applications”

Tang: Introduction to Data Mining (with modification by Ch. Eick) Data Mining Tasks l Prediction Methods –Use some variables to predict unknown or future values of other variables. l Description Methods –Find human-interpretable patterns that describe the data.

Tang: Introduction to Data Mining (with modification by Ch. Eick) Classification Example categorical continuous class Test Set Training Set Model Learn Classifier

Tang: Introduction to Data Mining (with modification by Ch. Eick) Classifying Galaxies Early Intermediate Late Data Size: 72 million stars, 20 million galaxies Object Catalog: 9 GB Image Database: 150 GB Class: Stages of Formation Attributes: Image features, Characteristics of light waves received, etc. Courtesy:

Tang: Introduction to Data Mining (with modification by Ch. Eick) What is Clustering? l Given a set of objects, each having a set of attributes, and a similarity measure among them, find clusters such that –Objects in one cluster are more similar to one another. –Objects in separate clusters are less similar to one another. l Similarity Measures: –Euclidean Distance if attributes are continuous. –Other Problem-specific Measures.

Tang: Introduction to Data Mining (with modification by Ch. Eick) Clustering of S&P 500 Stock Data zObserve Stock Movements every day. zClustering points: Stock-{UP/DOWN} zSimilarity Measure: Two points are more similar if the events described by them frequently happen together on the same day.  We used association rules to quantify a similarity measure.

Tang: Introduction to Data Mining (with modification by Ch. Eick) Association Rule Discovery: Definition l Given a set of records each of which contain some number of items from a given collection; –Produce dependency rules which will predict occurrence of an item based on occurrences of other items. Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer} Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer}

Tang: Introduction to Data Mining (with modification by Ch. Eick) Sequential Pattern Discovery: Definition l Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events. l Rules are formed by first discovering patterns. Event occurrences in the patterns are governed by timing constraints. (A B) (C) (D E) <= ms <= xg >ng<= ws (A B) (C) (D E)