INTRODUCTION Elsayed Hemayed Data Mining Course. Outline  The Motivation  Knowledge Discovery in Databases (KDD)  Knowledge Discovery Process  Data.

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Chapter 9 Business Intelligence Systems
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Chapter 5: Data Mining for Business Intelligence
Data Mining Techniques
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
Data Mining: Introduction. Why Data Mining? l The Explosive Growth of Data: from terabytes to petabytes –Data collection and data availability  Automated.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Data mining: some basic ideas Francisco Moreno Excerpts from Fundamentals of DB Systems, Elmasri & Navathe and other sources.
Chapter 1 Introduction to Data Mining
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
1 What is Data Mining? l Data mining is the process of automatically discovering useful information in large data repositories. l There are many other.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
DATA MINING By Cecilia Parng CS 157B.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Elsayed Hemayed Data Mining Course
Academic Year 2014 Spring Academic Year 2014 Spring.
DATA MINING It is a process of extracting interesting(non trivial, implicit, previously, unknown and useful ) information from any data repository. The.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
There is an inherent meaning in everything. “Signs for people who can see.”
Data Mining Functionalities
Data Mining.
Data Mining: Introduction
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Introduction C.Eng 714 Spring 2010.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Adrian Tuhtan CS157A Section1
Sangeeta Devadiga CS 157B, Spring 2007
כריית מידע -- מבוא ד"ר אבי רוזנפלד.
Data Warehousing and Data Mining
Data Mining: Introduction
Data Mining.
Presentation transcript:

INTRODUCTION Elsayed Hemayed Data Mining Course

Outline  The Motivation  Knowledge Discovery in Databases (KDD)  Knowledge Discovery Process  Data mining application types  Association  Clustering  Classification  Prediction  Commercial Data Mining Tool Acknowledgement: some of the material in these slides are from [Max Bramer, “Principles of Data Mining”, Springer-Verlag London Limited 2007] 2 Introduction to Data Mining

The Data Explosion  The current NASA Earth observation satellites generate a terabyte (i.e bytes) of data every day.  The Human Genome project is storing thousands of bytes for each of several billion genetic bases.  As long ago as 1990, the US Census collected over a million million bytes of data.  Many companies maintain large Data Warehouses of customer transactions. A fairly small data warehouse might contain more than a hundred million transactions.  There are vast amounts of data recorded every day on automatic recording devices, such as credit card transaction files and web logs, as well as non-symbolic data such as CCTV recordings. 3 Introduction to Data Mining

Knowledge buried in the data  knowledge that can be critical to a company’s growth or decline  knowledge that could lead to important discoveries in science  knowledge that could enable us accurately to predict the weather and natural disasters  knowledge that could enable us to identify the causes of and possible cures for lethal illnesses  knowledge that could literally mean the difference between life and death. 4 Introduction to Data Mining

Data Rich but Knowledge Poor We are data rich but knowledge poor 5 Introduction to Data Mining

What is Data Mining? Data mining—searching for knowledge (interesting patterns) in your data. 6 Introduction to Data Mining

Knowledge Discovery The ‘non-trivial extraction of implicit, previously unknown and potentially useful information from data’. It is a process of which data mining forms just one part (a central one). 7 Introduction to Data Mining

Data mining as a step in the process of Knowledge Discovery 8 Introduction to Data Mining

Knowledge Discovery Process 1. Data cleaning (to remove noise and inconsistent data) 2. Data integration (where multiple data sources may be combined) 3. Data selection (where data relevant to the analysis task are retrieved from the database) 4. Data transformation (where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations, for instance) 5. Data mining (an essential process where intelligent methods are applied in order to extract data patterns) 6. Pattern evaluation (to identify the truly interesting patterns representing knowledge based on some interestingness measures) 7. Knowledge presentation (where visualization and knowledge representation techniques are used to present the mined knowledge to the user) 9 Introduction to Data Mining

Applications of Data Mining  Analysis of organic compounds  Automatic abstracting  Credit card fraud detection  Electric load prediction  Financial forecasting  Medical diagnosis  Predicting share of television audiences  Real estate valuation  Targeted marketing  Thermal power plant optimisation  Toxic hazard analysis  Weather forecasting 10 Introduction to Data Mining

More Applications to come  A supermarket chain: optimise targeting of high value customers  A major hotel chain: identify attributes of a ‘high-value’ prospect  Improving the ability to predict bad loans  Reducing fabrication flaws in VLSI chips  Arrange show schedules to maximise market share and increase advertising revenues  Predicting the probability that a cancer patient will respond to chemotherapy 11 Introduction to Data Mining

What is (not) Data Mining?  Look up phone number in phone directory  Query a Web search engine for information about “Amazon” – Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly… in Boston area) – Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,) What is not Data Mining?What is Data Mining? 12 Introduction to Data Mining

Main Applications  Applications can be divided into four main types:  Association,  Classification,  Prediction,  Clustering. 13 Introduction to Data Mining

Labelled and Unlabelled Data  There is a specially designated attribute and the aim is to use the data given to predict the value of that attribute for instances that have not yet been seen. [Supervised Learning – Classification and Prediction]  Data that does not have any specially designated attribute is called unlabelled. [Unsupervised Learning – Association and Clustering] 14 Introduction to Data Mining

Attributes and Data Example Introduction to Data Mining 15 categorical continuous class

Association Rules  A relationship amongst the values of variables.  Association rules are frequently used to generate rules from market-basket data.  A market basket corresponds to the sets of items a consumer purchases during one visit to a supermarket. Example: IF variable 1 > 85 and switch 6 = open THEN variable 23 < 47.5 and switch 8 = closed 16 Introduction to Data Mining

Market Basket Analysis  IF cheese AND milk THEN bread (Confidence= 0.7) indicates that 70% of the customers who buy cheese and milk also buy bread.  Thus, move the bread closer to the cheese and milk counter for customer convenience.  or separate them to encourage impulse buying of other products. 17 Introduction to Data Mining

Confidence and Support  Support:  The minimum percentage of instances in the database that contain all items listed in a given association rule.  Confidence:  Given a rule of the form A=>B, rule confidence is the conditional probability that B is true when A is known to be true.  Confidence can be computed as support(A U B) / support(A) 18 Introduction to Data Mining

Market Basket Example Transaction_IdTimeItems_bought 1016:35Milk, bread, cookies, juice 7927:38Milk, juice 11308:05Milk, eggs 17358:40Bread, cookies, coffee Consider the two rules: Milk  juice and bread  juice RuleMilk  juiceBread  juice Support50%25% Confidence66.7%50% 19 Introduction to Data Mining

Clustering  The goal is to place records into groups where the records in a group are highly similar to each other and dissimilar to records in other groups.  For example, an insurance company might group customers according to income, age, types of policy purchased or prior claims experience.  The adult population in Egypt can be categorized into five groups from most likely to buy to least likely to buy a new product. 20 Introduction to Data Mining

Clustering Example 21 Introduction to Data Mining

Classification  Classification is one of the most common applications for data mining.  Classify medical patients into those who are at high, medium or low risk of acquiring a certain illness  Classify people interviewed into those who are likely to vote for each of a number of political parties or are undecided  Classify a student project as distinction, good, pass or fail.  Classify customers in a supermarket into discount- seeking shoppers, loyal regular shoppers, shoppers attached to name brands and infrequent shoppers. 22 Introduction to Data Mining

Degree Classification Example Goal: find some way of predicting the classification for other students given only their grade ‘profiles’. 23 Introduction to Data Mining

Classification Methods  Nearest Neighbour Matching: identifying (say) the five examples that are ‘closest’ in some sense to an unclassified one.  Classification Rules: IF SoftEng = A AND Project = A THEN Class = First IF SoftEng = A AND Project = B AND ARIN = B THEN Class = Second IF SoftEng = B THEN Class = Second 24 Introduction to Data Mining

Decision Tree 25 Introduction to Data Mining

Prediction  Classification is one form of prediction, where the value to be predicted is a label.  Numerical prediction (often called regression) is another.  Goal: determine how certain attributes will behave in the future.  In this case we wish to predict a numerical value.  Example: How much sales volume a store will generate in a given period.  A very popular way of doing this is to use a Neural Network. 26 Introduction to Data Mining

Commercial Data Mining Tool 27 Introduction to Data Mining

Summary  The Motivation  Knowledge Discovery in Databases (KDD)  Knowledge Discovery Process  Data mining application types  Association  Clustering  Classification  Prediction  Commercial Data Mining Tool 28 Introduction to Data Mining