Amer Kanj Data Mining For Business Professionals.

Slides:



Advertisements
Similar presentations
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Advertisements

Managing Data Resources
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
Database Processing for Business Intelligence Systems
Building Knowledge-Driven DSS and Mining Data
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Dr. Awad Khalil Computer Science Department AUC
Data Mining By Jason Baltazar, Phil Cademas, Jillian Latham, Rachel Peeler & Kamila Singh.
Data Mining Techniques
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Business Intelligence, Data Mining and Data Analytics/Predictive Analytics By: Asela Thomason IS 495 Summer 2015.
© Negnevitsky, Pearson Education, Introduction, or what is data mining? Introduction, or what is data mining? Data warehouse and query tools Data.
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Overview of Data Mining Methods Data mining techniques What techniques do, examples, advantages & disadvantages.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Introduction, or what is data mining? Introduction, or what is data mining? Data warehouse and query tools Data warehouse and query tools Decision trees.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
DATA MINING By Cecilia Parng CS 157B.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
Overview of Methods Data mining techniques What techniques do, examples, advantages & disadvantages.
Chapter 14 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining Basics. “Copyright and Terms of Service Copyright © Texas Education Agency. The materials found on this website are copyrighted © and trademarked.
Data Mining and Decision Support
Data Mining Copyright KEYSOFT Solutions.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Knowledge Discovery and Data Mining 19 th Meeting Course Name: Business Intelligence Year: 2009.
Miloš Kotlar 2012/115 Single Layer Perceptron Linear Classifier.
Clustering Algorithms Minimize distance But to Centers of Groups.
Monday, February 22,  The term analytics is often used interchangeably with:  Data science  Data mining  Knowledge discovery  Extracting useful.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Ahmed K. Ezzat, SQL Server 2008 and Data Mining Overview 1 Data Mining and Big Data.
Data Mining.
SNS COLLEGE OF TECHNOLOGY
DATA MINING © Prentice Hall.
Adrian Tuhtan CS157A Section1
Data Analysis.
Week 11 Knowledge Discovery Systems & Data Mining :
Presentation transcript:

Amer Kanj Data Mining For Business Professionals

Contents Data Mining Overview Types of Data Mining Why use Data Mining How do we Mine Data Models of Data Mining

Data Mining Overview Data Mining deals with large volumes of data stored in DBMS It is the process of analyzing large databases to find useful patterns Data Mining is the process of automating information discovery It automates the process of discovering useful trends and patterns

Data Mining Overview (Cont) The fundamental assumption of Data Mining is that large data may contain recurring hidden patterns A Data Mining tool does not require any assumptions It tries to discover relationships and hidden patterns that may not always be obvious

Types of Data Mining Business professionals look for Data Mining approaches that meet their needs. They requires Data Mining to:  Be understandable  Have good performance  Be accurate They define three fundamental approaches to Data Mining :  Classification Studies  Clustering Studies  Visualization Studies

Classification Studies Classification studies = Supervised learning Very common in business world. A telecommunication company’s analyst wants to:  Understand why some customers remain loyal while others leave  Predict which customers likely to lose to competitors

Classification Studies (cont) So he can:  Construct a model derived from historical data of loyal customers versus customers who have left  A good model enables him to better understanding his customers and to predict which customer will stay and which will leave A study will identify an overall goal and the data to be used

Classification Rules Classification rules help assign new objects to a set of classes  Given a new automobile insurance applicant, should he/she be classified as low risk, medium risk or high risk? Classification rules for above example could use a variety of knowledge, such as educational level of applicant, salary of applicant, age of applicant, etc…   person p, p.degree = masters & p.income > 75,000  p.credit = excellent   person p, p.degree = bachelors and (p.income >= 25,000 and p.income <= 75,000)  p.credit = good

Classification rules can compactly shown as a decision tree

Clustering Studies Clustering Studies = Unsupervised Learning A method of grouping rows of data that share similar trends and patterns We have no dependent variable Clustering can also be based on historical patterns, but the outcome (loyal or lost) is not supplied with the training data Clustering techniques try to look for similarities within a data set and group similar rows together into clusters or segments

Customers are clustered into four segments Cluster 1 Income: High Children: 1 Car: Luxery Income: high Children: 0 Car: Compact Cluster 2 Income: Medium Children: 2 Car: Sedan and Car: Track Income: Medium Children: 3 Cluster 4 Cluster 3

Visualization It is simply the graphical presentation of data Microsoft Excel has graphing and mapping capabilities in its product Representing data graphically often brings out points that you would not normally see

Why use Data Mining Direct Marking Trend Analysis Fraud Detection Forecasting in Financial Markets

Direct Marketing The ability to predict who is most likely or most desirable to buy certain product can save companies immense amounts in marketing expenditures

Trend Analysis Understanding trends in the marketplace is a strategic advantage, because it is useful in reducing costs and timeliness to market

Fraud Detection data Mining techniques can model which insurance claims, cellular phone calls, or credit card purchases are likely to be fraudulent

Forecasting in Financial Markets The use of data mining to model financial markets is used extensively

How Do We Mine Data There are five steps to Data Mining:  Data Manipulating  Defining a study  Reading the data and building a model  Understanding the model  Prediction

Data Preparation Data preparation is considered as the heart of the Data Mining process Data usually accumulates in transactional database where actual records of transactions are stored Data preparation requires that the data from distributed databases be pooled together, cleansed from redundant, inconsistent, incomplete, irrelevant, and otherwise inappropriate data

Data Preparation (Cont) Data Cleaning:  A column containing a list of soft drinks may have the values “Pepsi”, “Pepsi Cola”, and “Cola”.  The values refer to the same drink, but are not known to the computer as the same. Missing Values:  Some Data Mining approaches require rows of data to be complete in order to mine the data  If too many values are missing in a data set, it becomes hard to gather any useful information from this data or to make predictions from it

Data Preparation (Cont) Data Derivation:  If I have column called maximum$-2002 and maximum$-2003 to describe the dollars spent in 2002 and 2003  Then an interesting derivation is $-difference, which is the change in amount of money spent between 2002 and 2003 Merging Data:  Data usually stored in the form of tables  Merging data in a relational system can be achieved in a number of ways: 1. Merging tables through a view (Query Tools) 2. An SQL statement, or 3. An export of data into a flat file

Defining a Study Differs from Supervised (Classification) versus Unsupervised (Clustering) learning For Supervised learning:  Involves articulating a goal  Specifying the data fields that are used in the study For Unsupervised learning:  The goal is to group similar types of data, usually used in many activities, or  To identify exceptions in a data set, which is useful in discovering fraudulent or incorrect data

Read the data and build a Model A data mining product reads a data set and constructs a model A model will summarize large amounts of data by accumulating indicators such Indicators:  Frequencies: Show how often a certain value occurs  Weight: or impacts, indicate how well some inputs indicate the occurrence of an output  Conjunctions: Sometimes inputs have more weight together than apart  Differentiation: Indicates how much more important an input criterion is to one outcome than another

Understanding the Model Model understand takes different forms based on the type of model used to represent the data We will discuss Data Mining Models later…

Prediction Prediction is the process of choosing the best possible outcomes based on historical data Predictive data mining methods fall into three broad categories:  Mathematical methods  Logic methods  Distance methods

Prediction (Cont) Mathematical method:  Linear math solution  Non-linear math solution Logic methods:  Quite different from what math methods produce  Logical methods often produce tree-like solutions  Best known logical solutions are decision trees, and decision rules.

Prediction (Cont) Distance methods:  A representative sample of cases is kept on file  These cases will be used as a benchmark for classifying new cases  Features of the new case are measured against features of the benchmark cases for proximity

Prediction (Cont) Here are a few interesting predictive capabilities :  Understanding why a prediction is made: some models will provide the reasons why a prediction is made  Margin of victory: if the best case prediction has a score of 100 and the challenger prediction has a score of 50, then the margin of victory is 50%. If the prediction has a score of 100 and the challenger has 99, then the margin of victory would be 1%. Generally, the higher the margin of victory, the more likely the prediction is to be true

Prediction (Cont)  Scenario playing: Some prediction models have the ability to change parameters to see how predictions change  Understanding prediction affinities: Is to set two variables constant and see what the other predictions would look like

Data Mining Models Decision Trees Genetic Algorithms Neural Nets Agent Network Technology Hybrid Models Statistics

Data mining Models (Cont) Decision Trees:  Creating a tree-like structure to describe a data set  The greatest benefit to decision tree approaches is their understandability Genetic Algorithms:  Are a method of combinatorial optimization based on processes in biological evolution

Data Mining Models (Cont) Neural Nets:  Are used extensively in the business world as predictive models  Neural Nets are widely used in the financial market to model fraud in credit cards and monetary transactions Agent Network Technology:  This method of model treats all data elements as agents that are connected to each other in a significant way

Data Mining Models (Cont) Hybrid Models:  Vendor Tools that make use of more than one approach are referred to as hybrid systems  Being a hybrid system does not always imply that the tool uses a hybrid algorithm  For example, Thinking Machines, with their Darwin product, makes use of several different mining algorithm. While the algorithm themselves are not hybrid, the product uses the algorithms in combination

Data Mining Models (Cont) Statistics:  Used to create a model of data sets  Uses probability, data analysis, and statistical inference

Thank You For Listening Q s …Q S … Q s