CS157A Spring 05 Data Mining Professor Sin-Min Lee.

Slides:



Advertisements
Similar presentations
Supporting End-User Access
Advertisements

Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining By Archana Ketkar.
Data Mining – Intro.
Data mining By Aung Oo.
Data Mining: A Closer Look
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Chapter 5: Data Mining for Business Intelligence
Data Mining Techniques
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Knowledge Discovery and Data Mining Evgueni Smirnov.
DATA MINING Prof. Sin-Min Lee Surya Bhagvat CS 157B – Spring 2006.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
Data Mining By Dave Maung.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Part I Data Mining Fundamentals Chapter 1 Data Mining: A First View Jason C. H. Chen, Ph.D. Professor.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Chapter 20 Data Analysis and Mining. 2 n Decision Support Systems  Obtain high-level information out of detailed information stored in (DB) transaction-processing.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
DATA MINING By Cecilia Parng CS 157B.
Business Intelligence - 2 BUS 782. Topics Data warehousing Data Mining.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Mining. Overview the extraction of hidden predictive information from large databases Data mining tools predict future trends and behaviors, allowing.
Data Mining Copyright KEYSOFT Solutions.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Data Mining Functionalities
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Sangeeta Devadiga CS 157B, Spring 2007
Data Analysis.
Supporting End-User Access
Welcome! Knowledge Discovery and Data Mining
CSE591: Data Mining by H. Liu
Presentation transcript:

CS157A Spring 05 Data Mining Professor Sin-Min Lee

Today's Presentation covers: 1.What is Data Mining? 2.Data Mining Objectives 3.Data Mining Operations 4.Knowledge Discovery 5.Application of Data Mining 6.Summary 7.References

Statistics Databases Artificial Intelligence Visualization Data Mining Overview of Data Mining

1. What is Data Mining? ➔ We usually use Data Mining to: – Discovering useful, previously unknown knowledge by analyzing large and complex databases. – Knowledge discovery, exploratory data analysis, applied statistics, machine learning – Search for valuable Information in Large Databases

2. Data Mining Objectives ➔ Find rules and patterns in large volumn databases ➔ Discovery – Finding human understandable patterns describing the data ➔ Prediction – Using some variables or fields in database to predict unknown or future values or other variables of interest

Data Mining Objectives ➔ Knowledge Discovery – Stage somewhat prior to prediction where information is insufficient – It's close to decision support

3. Data Mining Operations ➔ Associations ➔ Sequential Patterns ➔ Time-Series Clustering ➔ Classification ➔ Segmentation ➔ And many more!

Association ● Used to find all rules in a basket data ● Basket data also called transaction data ● Analyze how items purchased by customers in a shop

Association... ● A formal definition: ● Let I = {i 1, i 2, …i m } be a total set of items D a set of transactions d is one transaction consists of a set of items d  I ● Association rule:- ● X  Y where X  I,Y  I and X  Y =  ●  Support = (#of transactions contain X  Y ) / D ● Support: number of instances predicted correctly ● Confidence: number of correct predictions, as proportion of all instances ● Confidence = (#of transactions contain X  Y) / #of transactions contain X

Association... ● Example of transaction data: – Transaction 1: CD player, music's CD, music's book – Transaction 2: CD player, music's CD – Transaction 3: Music's CD, music's book – Transaction 4: CD player ● I = {CD player, music's CD, music's book} ● D = 4 ● # of transactions contain both CD player, music's CD = 2 ● # of transactions contain CD player = 3 ● Support = 2 /4, Confidence: 2 /3

Applying Association Rule... ● Example: Books that tend to be bought together. If a customer buys a book, an online bookstore may suggest other associated books. (ie. Amazon.com) ● Example: If a person buys a laptop, the salesperson may suggest accessories that tend to be bought along with laptop.

Time Series Clustering ● Given: – A database of time series ● Find: – Groups of similar time series ● Sample Applications: – Determine products with similar selling patterns – Identify companies with similar pattern of grown – Find stocks with similar price movements

Classification ● Classification – Problem: Given that items belong to one of several classes, and given past instances (aka training instances) of items along with the classes to which they belong, the problem is to PREDICT the class to which a new item belongs – The class of the new instance is not known, so other attributes of the instance must be used to predict the class. – It can be done by finding rules that partition the given data into disjoint groups

Classification... ● Dataset is usually in the form of a relation table. ● Data has a set of distinct attributes. ● Each data record is also labeled with a class. ● Goal : To build a model or learn rules that can be used to predict the classes of new cases. ● Training Data are used to build this model.

Classification... ● For example – Suppose that a credit card company wants to decide whether or not to give a credit card to an applicant ● The company has a variety of information about the person, such as their age, education background, income, etc.. ● Then they will rank the applicants (catogorized them into classes) ● Forall person P, P.degree=masters AND P.income > 75,000 ==> P.credit = excellent ● Forall person P, P.degree=bachelors OR (P.income >= 25,000 AND P.income P.credit = good

Classification... ● Table: Age Smoke Risk No Low 25 Yes High 44 Yes High 18 No Low 55 No High 35 No Low ● To identify the risk (we have two groups): – Risk = Low and Risk = High

Classification... ● The following techniques could be used to analyze the classification: – Decision Tree – Predictive Modeling – Using association rule – Neural networks – etc...

Decision Trees ● “Divide-and-conquer” approach produce tree ● Nodes involve testing a particular attribute ● Usually, attribute value is compared to constant ● Other possibilities: – Comparing values of two attributes – Using a function of one or more attributes ● Leaves assign classification, set of classifications, or probability distrbution to instances ● Unknown instance is routed down the tree

Decision Tree ● In short, Decision tree is just a series of nested if/then rules. Smoke Age Yes No 0-35 High Low High Our previous example

Predictive Modeling ● Predict values based on similar groups of data ● Pattern Recognition – Association of an observation to past experience or knowledge – Interchangeable with classification ● Estimation – Assign infinite number of numeric labels to an observation

4. Knowledge Discovery ● Find Patterns in database – For example, if someone buys one thing, what else will he buy next ● Interesting + Certain = Knowledge – Usually the output called “Discovered Knowledge” ● KDD – Knowledge Discovery in Database ● A non-trivial process of identifying valid, potentially useful, and understandable patterns in data

KDD – Knowledge Discovery in Database... ● Advances in traditional tasks in data analysis – Classification, Clustering – New Data Mining operations ● Association rules ● Sequential patterns ● Deviation /Exceptions ● New Application areas – Spatial, Text, Web, Image,....

KDD – Knowledge Discovery in Database ● Applications – Most large companies have data warehouses: platforms for Data Mining Projects – Trend towards integrated vertical solutions such as financial and telecom areas ● Back-end: integration with databases ● Front-end: Campaign Management or CRM (Customer Relationship Management)

KDD – Knowledge Discovery in Database ● Next Generation Knowledge Discovery Systems: – Have integrated front-end access to knowledge delivery tools – Have integrated back-end access to enterprise and external databases – Have knowledge discovery engine as embedded part of the overall solution – Be oriented to solving a business problem, not a data analysis problem

5. Application of Data Mining ● Medical ● Control Theory ● Engineering ● Marketing and Finance ● Data Mining on the web ● Scientific Data Base ● Fraud Dectection ● And many more!

6. Summary ● Data Mining IS.... – Decision Trees, Nearest Neighbor Classification, Neural networks, Rule Induction, K-means Clustering – Decision support process in which we search patterns of information in data ● Data Mining is NOT... – Retrieving data (ie. Google) ● “Information retrieval” or “Database querying” ● Data Mining infers “the right query” from data – Merging many small databases into a large one

Summary ● Data Mining is not... – Data warehousing – SQL / Ad Hoc Queries / Reporting – Software Agents – Online Analytical Processing (OLAP) – Data Visualization

Referneces ● Dr. Lee's Presentation – ● Data Mining Section ● Dr. Kurt Thearling's website – ● An Introduction to Data Mining