MIS2502: Data Analytics Introduction to Advanced Analytics and R

Slides:



Advertisements
Similar presentations
By Klejdi Muca & Stephen Quinn. A method used by companies like IMDB or Netlfix to turn raw data into useful information, for example It helps companies.
Advertisements

Supporting End-User Access
Chapter 9 Business Intelligence Systems
Database – Part 2b Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Sakthi Angappamudali at Standard Insurance; BI.
Data Mining.
Data Mining By Archana Ketkar.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data mining By Aung Oo.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Business Intelligence. Topics Chart Online Analytical Process, OLAP – Excel’s Pivot table – Data visualization with dashboard Data warehousing Data Mining.
1 REVIEW LEARNING OUTCOME Customer Relationship Management LO I.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Business Intelligence, Data Mining and Data Analytics/Predictive Analytics By: Asela Thomason IS 495 Summer 2015.
Understanding Your Customer Jill Beasant. To look at ways to improve your business in terms of The products you stock The promotions you run The environment.
MD240 - MIS Oct. 4, 2005 Databases & the Data Asset Harrah’s & Allstate Cases.
Customer Relationship Management Key Concepts. Customer Relationship Management Strategy Link all processes of the company from its customers through.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Chapter 19Copyright ©2008 by South-Western, a division of Thomson Learning. All rights reserved 1 MKTG Designed by Amy McGuire, B-books, Ltd. Prepared.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.
Succeeding with Technology Database Systems Basic Data Management Concepts Organizing Data in a Database Database Management Systems Using Database Systems.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Introduction to SQL Server Data Mining Nick Ward SQL Server & BI Product Specialist Microsoft Australia Nick Ward SQL Server & BI Product Specialist Microsoft.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
Data Warehousing.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Business Solutions. Agenda Overview Business Solutions Benefits Company Summary.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or.
Chapter 3 Databases and Data Warehouses: Building Business Intelligence Copyright © 2010 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Business Intelligence - 2 BUS 782. Topics Data warehousing Data Mining.
Business Intelligence. Topics Chart Online Analytical Process, OLAP – Excel’s Pivot table – Data visualization with dashboard Scenario Management Data.
DATA MINING Using Association Rules by Andrew Williamson.
MIS2502: Data Analytics Advanced Analytics - Introduction.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
Data Mining. Overview the extraction of hidden predictive information from large databases Data mining tools predict future trends and behaviors, allowing.
MIS2502: Data Analytics Introduction to Advanced Analytics and R.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Scatterplots & Correlations Chapter 4. What we are going to cover Explanatory (Independent) and Response (Dependent) variables Displaying relationships.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Oracle Advanced Analytics
Jaclyn Hansberry MIS2502: Data Analytics The Things You Can Do With Data The Information Architecture of an Organization Jaclyn.
Data Mining Functionalities
Data Mining.
AP CSP: Cleaning Data & Creating Summary Tables
Operation Data Analysis Hints and Guidelines
Data Based Decision Making
MIS2502: Data Analytics Advanced Analytics - Introduction
19 MKTG CHAPTER Lamb, Hair, McDaniel
Fundamentals of Information Systems
MIS5101: Data Analytics Advanced Analytics - Introduction
Exam #3 Review Zuyin (Alvin) Zheng.
Data Warehousing and Data Mining
MIS2502: Data Analytics The Information Architecture of an Organization Acknowledgement: David Schuff.
Supporting End-User Access
MIS2502: Data Analytics Introduction to Advanced Analytics and R
MIS2502: Data Analytics Introduction to R and RStudio
MIS2502: Data Analytics Introduction to Advanced Analytics
MIS2502: Data Analytics Advanced Analytics Using R
Customer Relationship Management
Kenneth C. Laudon & Jane P. Laudon
Lecture One Data Copyright © 2012 Pearson Education. All rights reserved.
Welcome! Knowledge Discovery and Data Mining
Operation Data Analysis Hints and Guidelines
Presentation transcript:

MIS2502: Data Analytics Introduction to Advanced Analytics and R Jaclyn Hansberry jaclyn.hansberry@temple.edu

The Information Architecture of an Organization Now we’re here… Data entry Transactional Database Data extraction Analytical Data Store Data analysis Stores real-time transactional data Stores historical transactional and summary data

The difference between OLAP and data mining OLAP can tell you what is happening, or what has happened Analytical Data Store …like a pivot table Data mining can tell you why it is happening, and help predict what will happen The (dimensional) data warehouse feed both… …like what we’ll do with R

Data Mining and Predictive Analytics is Extraction of implicit, previously unknown, and potentially useful information from data Exploration and analysis of large data sets to discover meaningful patterns

What data mining is not… How do sales compare in two different stores in the same state? Sales analysis Which product lines are the highest revenue producers this year? Profitability analysis Did salesperson X meet this quarter’s target? Sales force analysis If these aren’t data mining examples, then what are they ?

Example: Smarter Customer Retention Consider a marketing manager for a brokerage company Problem: High churn (customers leave) Customers get an average reward of $160 to open an account 40% of customers leave after the 6 month introductory period Giving incentives to everyone who might leave is expensive Getting a customer back after they leave is expensive

Answer: Not all customers have the same value One month before the end of the introductory period, predict which customers will leave Offer those customers something based on their future value Ignore the ones that are not predicted to churn

Three Analytics Tasks We Will Be Doing in this Class Decision Tree Induction Clustering Association Rule Mining

Decision Trees Used to classify data according to a pre-defined outcome Based on characteristics of that data http://www.mindtoss.com/2010/01/25/five-second-rule-decision-chart/ Uses Predict whether a customer should receive a loan Flag a credit card charge as legitimate Determine whether an investment will pay off

Uses Clustering Used to determine distinct groups of data Based on data across multiple dimensions Uses Customer segmentation Identifying patient care groups Performance of business sectors http://www.datadrivesmedia.com/two-ways-performance-increases-targeting-precision-and-response-rates/

Uses Association Mining Find out which events predict the occurrence of other events Often used to see which products are bought together Uses What products are bought together? Amazon’s recommendation engine Telephone calling patterns

Introduction to R and RStudio

Software development platform and language Open source Many, many, many statistical add-on “packages” that perform data analysis Integrated Development Environment for R Nicer interface that makes R easier to use Requires R to run

The Basics: Calculations and Variables R will do math for you R has variables <- and = do the same thing rm() removes the variable from memory Type commands into R and it will give you an answer x, y, and z are objects that can be manipulated

Arrays of values Called a vector or collection c(), min(), max(), and sort() are functions functions accept parameters and return a value note that sort() puts the scores in order but doesn’t change the original collection

Simple statistics with R You can get descriptive statistics from a collection

Reading from a file Usually you won’t type in data manually, you’ll get it from a file Example: 2009 Baseball Statistics http://www2.stetson.edu/ ~jrasp/data.htm reads data from CSV file and creates collections using the headers and the data reference a collection using datasetname$variablename

Looking for differences across groups: The setup We want to know if National League (NL) teams scored more runs than American League (AL) Teams And if that difference is statistically significant To do this, we need a package (add-on) that will do this analysis In this case, it’s the “psych” package Downloads and installs the package (once per R installation)

Looking for differences across groups: The analysis Descriptive statistics, broken up by group (League) Results of t-test for differences in Runs by League)

Plotting data plot() title() first parameter – x data values second parameter – y data values xlab parameter – label for x axis ylab parameter – label for y axis title() sets title for chart

Drawing a regression (trend) line Calculates the regression line (lm()) And plots the line (abline())

But is the correlation statistically significant? So we can say: “Teams with a better overall batting average tend to have a better winning percentage.” “medium” strength correlation strongly statistically significant

Running this analysis as a script Commands can be entered one at a time, but usually they are all put into a single file that can be saved and run over and over again.