Data Mining Vikram Pudi IIIT Hyderabad.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

COMP3410 DB32: Technologies for Knowledge Management 08 : Introduction to Knowledge Discovery By Eric Atwell, School of Computing, University of Leeds.
An Introduction to Data Mining
Overview of Data Mining & The Knowledge Discovery Process Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
ICS 421 Spring 2010 Data Mining 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/8/20101Lipyeow Lim.
ICS 421 Spring 2010 Data Mining 1 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/6/20101Lipyeow Lim.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining By Archana Ketkar.
Application of Apriori Algorithm to Derive Association Rules Over Finance Data Set Presented By Kallepalli Vijay Instructor: Dr. Ruppa Thulasiram.
Data Mining Concepts 1.1 COT5230 Data Mining Week 1 Data Mining Concepts M O N A S H A U S T R A L I A ’ S I N T E R N A T I O N A L U N I V E R S I T.
Data Mining – Intro.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Introduction to machine learning
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Chapter 5: Data Mining for Business Intelligence
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Efficient Discovery of Concise Association Rules from Large Databases Vikram Pudi IIIT Hyderabad.
Chapter 1 Introduction to Data Mining
Machine Learning An Introduction. What is Learning?  Herbert Simon: “Learning is any process by which a system improves performance from experience.”
Knowledge Discovery and Data Mining Evgueni Smirnov.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Preprocessing for Data Mining Vikram Pudi IIIT Hyderabad.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
DATA MINING By Cecilia Parng CS 157B.
Introduction to Data-Mining Marko Grobelnik Institut Jozef Stefan.
Efficient Discovery of Concise Association Rules from Large Databases Vikram Pudi IIIT Hyderabad.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 14 November 2003 William.
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
1 Decision Trees. 2 OutlookTemp (  F) Humidity (%) Windy?Class sunny7570true play sunny8090true don’t play sunny85 false don’t play sunny7295false don’t.
Classification Vikram Pudi IIIT Hyderabad.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Chapter 26: Data Mining Prepared by Assoc. Professor Bela Stantic.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Data Mining – Intro.
What Is Cluster Analysis?
Data Mining Motivation: “Necessity is the Mother of Invention”
DATA MINING © Prentice Hall.
CH. 1: Introduction 1.1 What is Machine Learning Example:
What is Pattern Recognition?
I don’t need a title slide for a lecture
Welcome! Knowledge Discovery and Data Mining
CSE591: Data Mining by H. Liu
Presentation transcript:

Data Mining Vikram Pudi IIIT Hyderabad

1 Originated from DB community… Traditional Database Systems Indexing Query languages Query optimization Transaction processing Recovery … XML, Semantic web OO and OR DBMS … Data Mining

Automated extraction of interesting patterns from large databases

3 Business Government Internet Research Labs Mine/Explore DATA Patterns Decision Making Feedback To Data Sources

Types of Patterns Associations Coffee buyers usually also purchase sugar Clustering Segments of customers requiring different promotion strategies Classification Customers expected to be loyal

5 Association Rules That which is infrequent is not worth worrying about.

6 Association Rules Transaction IDItems 1Tomato, Potato, Onions 2Tomato, Potato, Brinjal, Pumpkin 3Tomato, Potato, Onions, Chilly 4Lemon, Tamarind Support(X) = |transactions containing X| / | D | Confidence(R) = support(R) / support(LHS(R)) Problem proposed in [AIS 93]: Find all rules satisfying user given minimum support and minimum confidence. Rule: Tomato, Potato  Onion (confidence: 66%, support: 50%) D :

7 Association Rule Applications E-commerce People who have bought Sundara Kandam have also bought Srimad Bhagavatham Census analysis Immigrants are usually male Sports A chess end-game configuration with “white pawn on A7” and “white knight dominating black rook” typically results in a “win for white”. Medical diagnosis Allergy to latex rubber usually co-occurs with allergies to banana and tomato

8 Types of Association Rules Boolean association rules Hierarchical rules reynolds  pencils Quantitative & Categorical rules (Age: 30…39), (Married: Yes)  (NumCars: 2) stationary penspencils reynoldscrossnatrajsteadler

9 More Types of Association Rules Cyclic / Periodic rules Sunday  vegetables Christmas  gift items Summer, rich, jobless  ticket to Hawaii Constrained rules Show itemsets whose average price > Rs.10,000 Show itemsets that have television on RHS Sequential rules Star wars, Empire Strikes Back  Return of the Jedi

10 Classification To be or not to be: That is the question. - William Shakespeare

11 The Classification Problem OutlookTemp (  F) Humidity (%) Windy?Class sunny7570true play sunny8090true don’t play sunny85 false don’t play sunny7295false don’t play sunny6970false play overcast7290true play overcast8378false play overcast6465true play overcast8175false play rain7180true don’t play rain6570true don’t play rain7580false play rain6880false play rain7096false play Model relationship between class labels and attributes sunny7769true? rain7376false?  Assign class labels to new data with unknown labels e.g. outlook = overcast  class = play Play Outside?

12 Applications Text classification Classify s into spam / non-spam Classify web-pages into yahoo-type hierarchy NLP Problems Tagging: Classify words into verbs, nouns, etc. Risk management, Fraud detection, Computer intrusion detection Given the properties of a transaction (items purchased, amount, location, customer profile, etc.) Determine if it is a fraud Machine learning / pattern recognition applications Vision Speech recognition etc. All of science & knowledge is about predicting future in terms of past So classification is a very fundamental problem with ultra-wide scope of applications

13 Clustering Birds of a feather flock together.

14 The Clustering Problem OutlookTemp (  F) Humidity (%) Windy? sunny7570true sunny8090true sunny85 false sunny7295false sunny6970false overcast7290true overcast7388true overcast6465true overcast8175false rain7180true rain6570true rain7580false rain6880false rain7096false Need a function to compute similarity, given 2 input records  Unsupervised learning Find groups of similar records.

15 Applications Targetting similar people or objects Student tutorial groups Hobby groups Health support groups Customer groups for marketing Organizing Spatial clustering Exam centres Locations for a business chain Planning a political strategy

16 Take Home Data mining is a mature field Don’t waste time developing new algorithms for core tasks Focus on applications to challenging kinds of data Streams, Distributed data, Multimedia, Web, … Most effort is in how to map domain problems to data mining problems And how to make sense of the output.

17