Chapter 2 Business Problems and Data Science Solution

Slides:



Advertisements
Similar presentations
Market Research Ms. Roberts 10/12. Definition: The process of obtaining the information needed to make sound marketing decisions.
Advertisements

Chapter 9 Business Intelligence Systems
Chapter 2 Business Problems and Data Science Solution 1.
CIS 678 Artificial Intelligence problems deduction, reasoning knowledge representation planning learning natural language processing motion and manipulation.
Mining the Data Ira M. Schoenberger, FACHCA Senior Administrator 2011 AHCA/NCAL Quality Symposium Friday February 18, 2011.
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining.
Business Intelligence Andrew Davis Andria Zippler Jana Krinsky Tiffany Ferris.
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
Data Mining By Archana Ketkar.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data mining By Aung Oo.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
Comparison of Classification Methods for Customer Attrition Analysis Xiaohua Hu, Ph.D. Drexel University Philadelphia, PA, 19104
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
What is Business Intelligence? Business intelligence (BI) –Range of applications, practices, and technologies for the extraction, translation, integration,
Dr. Awad Khalil Computer Science Department AUC
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Data Mining Techniques
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Data Mining Chun-Hung Chou
1 Chapter 21: Customer Relationship Management (CRM) Prepared by Amit Shah, Frostburg State University Designed by Eric Brengle, B-books, Ltd. Copyright.
Understanding Data Analytics and Data Mining Introduction.
Marketing Research: Overview
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
McGraw-Hill/Irwin ©2009 The McGraw-Hill Companies, All Rights Reserved Marketing Research, Primary Data, Secondary Data, Qualitative Research, Quantitative.
© 2008 Pearson Prentice Hall, Experiencing MIS, David Kroenke Slide 1 Chapter 9 Competitive Advantage with Information Systems for Decision Making.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Role of Statistics in Geography
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
Copyright 2000 Prentice Hall5-1 Chapter 5 Marketing Information and Research: Analyzing the Business Environment.
Computers Are Your Future © 2008 Prentice Hall, Inc.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Business Solutions. Agenda Overview Business Solutions Benefits Company Summary.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Lecture 02.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Mining Copyright KEYSOFT Solutions.
Customer Relationship Management (CRM) Chapter 4 Customer Portfolio Analysis Learning Objectives Why customer portfolio analysis is necessary for CRM implementation.
Knowledge Discovery and Data Mining 19 th Meeting Course Name: Business Intelligence Year: 2009.
Fundamentals of Information Systems, Sixth Edition Chapter 3 Database Systems, Data Centers, and Business Intelligence.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
The Application of Data Mining in Telecommunication by Wang Lina February 2003.
Introduction to Machine Learning, its potential usage in network area,
Quantitative Methods for Business Studies
Data Mining.
Data Based Decision Making
Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it.
Chapter 21: Customer Relationship Management (CRM)
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Fundamentals of Information Systems
MIS2502: Data Analytics Introduction to Advanced Analytics
Welcome! Knowledge Discovery and Data Mining
MIS2502: Data Analytics Introduction to Advanced Analytics and R
Presentation transcript:

Chapter 2 Business Problems and Data Science Solution

Fundamental concepts An important Principle of data science is that data mining is a process with fairly well-understood stages. Some involve the application technology, such as the automated discovery and evaluation of patterns from data, while others mostly require an analyst’s creativity, business knowledge, and common sense.

Fundamental concepts Since the data mining process breaks up the overall task of finding patterns from data into a set of well- defined subtasks, it is also useful for structuring discussions about data science. This chapter introduces the data mining process, but first we provide additional context by discussing common types of data mining task.

From Business Problems to Data Mining Tasks Each data-driven business decision-making problem is unique, comprising its own combination of goals, desires, constraints, and even personalities. The solutions to the subtasks can then be composed to solve the overall problem. Some of subtasks are unique to the particular business problem, but others are common data mining tasks.

From Business Problems to Data Mining Tasks Example: telecommunications churn problem(電信客 戶流失) is unique to MegaTelCo: Estimate from historical data the probability of a customer terminating her contract shortly after it has expired. Despite the large number of specific data mining algorithms developed over the years, there are only a handful of fundamentally different types of tasks these algorithms address. Individual(個體):entity. Ex: a customer or a business. Correlations(相關性):between a particular variable describing an individual the company after their contracts expired.

Classification Classification(分類) and class probability estimation attempt to predict, for each individual in a population, which of a (small) set of classes this individual belongs to. Example question: “Among all customers of MegaTelCo, which are likely to respond to a given offer?” Two classes: will respond and will not respond. Scoring or class probability estimation Score representing the Probability( quantification of likelihood)

Regression Regression(回歸)(“value estimation”) attempts to estimate or predict, for each individual, the numerical value of some variable for that individual. Example question: “ How much will a given customer use the service?” Classification predicts whether something will happen. Regression predicts how much something will happen.

Similarity matching Similarity matching(相似度配對) attempts to identify similar individuals based on data known about them. Ex: IBM is interested in finding companies similar to their best business customer, in order to focus their sales force on the best opportunities. Recommendations

Clustering Clustering(群集) attempts to group individuals in a population together by their similarity, but not driven by any specific purpose. Example question: “ Do our customers form natural groups or segments?” Decision-making processes.

Co-occurrence grouping Co-occurrence grouping(共生分群) attempts to find association between entities based on transactions involving them. Example question: “ What items are commonly purchased together?” Ex: analyzing purchase records from a supermarket. Recommendation system

Profiling Profiling(剖析)(also as behavior description(型為描述)) attempts to characterize the typical behavior o an individual, group, or population. Example question: “ What is the typical cell phone usage of this customer segment?” Profiling is often used establish behavioral norms for anomaly detection applications. Fraud detection and monitoring for intrusions to computer systems.

Link prediction Link prediction(連結預測) attempts to predict connections between data items, usually by suggesting that a link should exist, and possibly also estimating the srength of the link. “Since you and Karen share 10 friends, maybe you’d like to be Karen’s friend?”

Data reduction Data reduction(資料縮減) attempts to take a large set of data and replace it with a smaller set of data that contains much of the important information in the larger set. Ex: a massive dataset on consumer movie-viewing preferences may be reduce to a much smaller dataset revealing the customer taste preferences.

Causal modeling Causal modeling attempts to help us understand what events or actions actually influence others. Ex: consider that we use predictive modeling to target advertisements to consumers.

Supervised vs. Unsupervised Methods Unsupervised: no specific purpose or target. “Do our customers nataturally fall into different group?” Supervised: specific target defined. “ Can we find groups of customers who have particularly high likelihoods of canceling their service soon after their contracts expires?”

Supervised vs. Unsupervised Methods Supervised data mining: there must be data on the target. Tow subclasses of supervised data mining: Classification “ Will this customer purchase service S1 if given incentive I?” “Which service package( S1, S2 , or none) will a customer likely purchase if given incentive I?” Regression “How much will this customer use the service”

Supervised vs. Unsupervised Methods A vital part in the early stages of the data mining process To decide whether the line of attack will be supervised or unsupervised. If supervised, to produce a precise definition of a target variable. This variable must be specific quantity that will be the focus of the data mining.

Data Mining and Its Result Distinction pertaining to mining data: Mining the data to find patterns and build models. Using the results of data mining. Churn example.

Data Mining and Its Result

The Data Mining Process Cross Industry Standard Process for Data Mining process.

Business Understanding It is vital to understand the problem to solved. A part of the craft where the analysts’ creativity plays a large role. The design team should think carefully about the use scenario.

Data Understanding The data comprise the available raw material from which the solution will be built. Estimating the costs and benefits of each data source and deciding whether further investment is merited. Ex: Credit card fraud Medicare fraud

Data Preparation

Data preparation Often proceeds along with data understanding. Ex. converting data to tabular format. removing or inferring missing values. converting data to different types.

Data preparation Leaks a variable collected in historical data gives information on the target variable-information that appears in historical data but is not actually available when the decision has to be made. Leakage must be considered carefully during data preparation.

Modeling Output of modeling is some sort of model or pattern capturing regularities in the data.

Evaluation Assess the data mining results rigorously and to gain confidence that they are valid and reliable before moving on. Includes both quantitative and qualitative assessments.

Deployment Put into real use in order to realize some return on investment. The clearest cases of deployment involve implementing a predictive model in some information system or business process. ex. Churn example

Deployment The data mining techniques themselves are deployment. Two reasons the world may change faster than the data science team can adapt, as with fraud and intrusion detection. a business has too many modeling tasks for their data science team to manually curated each model individually.

Deployment Can also be mush less “technical” It is not necessary to fail in deployment to start the cycle again. The Evaluation stage may reveal that results are not good enough to deploy.

Implications for Managing the Data Science Team It is tempting - but usually a mistake - to view the data mining process as a software development cycle. Software skills versus analytics skills

Other Analytics Techniques and Technologies Present six groups of related analytic techniques. Comparisons and contrasts with data mining. Data mining => automated search for knowledge, patterns, or regularities from data. Business analyst => to recognize what sort of analytic technique is appropriate for addressing a particular problem.

Statistics Two different uses in business analytics. used as a catchall term for the computation of particular numeric values of interest from data. denote the field of study that goes by that name.

Data Querying A specific request for a subset of data or for statistics about data, formulated in a technical language and posed to a database system. Differs fundamentally from data mining in that there is no discovery of patterns or models. Ex: select * from customers where age >45 and sex = ‘m’ and domicile = ‘ne’

Data Querying On-line Analytical Processing (OLAP) easy-to-use GUI to query large data collections Data mining tools generally can incorporate new dimensions of analysis easily as part of the exploration.

Data Warehousing Collect and coalesce data from across an enterprise, often from multiple transaction-processing systems, each with its own database.

Regression Analysis This will involve estimating or predicting values for cases that are not in the analyze data set.

Machine Learning and Data Mining A field of study arose as a subfield of Artificial Intelligence, which was concerned with methods for improving the knowledge or performance of an intelligent agent over time. KDD focused on concerns raised by examining real- world applications.

Answer Business Questions with these Techniques who are the most profitable customers? Is there really a difference between the profitable customers and the average customer? But who really are these customers? Can I characterize them? Will some particular new customer be profitable?How much revenue should I expect this customer to generate?

summary Data mining is a craft. As with many crafts, there is a well-defined process that can help to increase the likelihood of a successful result. We will refer back to the data mining process repeatedly throughout the book, showing how each fundamental concept fits in.

THE END