Eco 5385 Predictive Analytics For Economists Spring 2014

Slides:



Advertisements
Similar presentations
Data Mining vs. Statistics
Advertisements

Eco 5385 Predictive Analytics For Economists Spring 2014 Professor Tom Fomby Director, Richard B. Johnson Center for Economic Studies Department of Economics.
Chapter 1 Introduction 1. The Evolution of Data Analysis To Support Business Intelligence 2 Evolutionary Step Business Question Enabling Technologies.
Spreadsheet Modeling & Decision Analysis
© 2010 The McGraw-Hill Companies, Inc. Cost Behavior: Analysis and Use Chapter 5.
“I Don’t Need Enterprise Miner”
Introduction to Data Mining with XLMiner
1 DATA MINING: DEFINITIONS AND DECISION TREE EXAMPLES Emily Thomas Director of Planning and Institutional Research.
Chapter 9 Business Intelligence Systems
Decision Support Systems
1 Spreadsheet Modeling & Decision Analysis: A Practical Introduction to Management Science, 3e by Cliff Ragsdale.
Chapter 5 Data mining : A Closer Look.
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
Introduction to Directed Data Mining: Decision Trees
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Microsoft Enterprise Consortium Data Mining Concepts Introduction: The essential background Prepared by David Douglas, University of ArkansasHosted by.
Predictive Analytics in Customs Administration Duncan Cleary Fiscal Affairs Department – Revenue Administration International Monetary Fund WCO IT Conference.
1 Chapter 1: Introduction 1.1 Introduction to SAS Enterprise Miner.
Chapter 1: Introduction
Comparison of Classification Methods for Customer Attrition Analysis Xiaohua Hu, Ph.D. Drexel University Philadelphia, PA, 19104
Overview DM for Business Intelligence.
DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of DaytonMBA APR 09.
Understanding Data Analytics and Data Mining Introduction.
Introduction: The essential background
Anomaly detection with Bayesian networks Website: John Sandiford.
Dr. Russell Anderson Dr. Musa Jafar West Texas A&M University.
ANALYTICS BUSINESS INTELLIGENCE SOFTWARE STATISTICS Kreara Solutions | 9 years | 60 members | ISO 9001:2008.
The CRISP-DM Process Model
Overview of Data Mining Methods Data mining techniques What techniques do, examples, advantages & disadvantages.
1 Spreadsheet Modeling & Decision Analysis: A Practical Introduction to Management Science, 3e by Cliff Ragsdale.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Copyright © 2010, SAS Institute Inc. All rights reserved. Applied Analytics Using SAS ® Enterprise Miner™
Introduction to Clementine Tutors: Cecia Chan & Gabriel Fung Data Mining Tutorial.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Neural Networks Automatic Model Building (Machine Learning) Artificial Intelligence.
The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation.
XLMiner – a Data Mining Toolkit QuantLink Solutions Pvt. Ltd.
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
MKT 700 Business Intelligence and Decision Models Algorithms and Customer Profiling (1)
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Business Intelligence/ Decision Models CRISP.
Analysis of Experiments
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Artificial Neural Networks for Data Mining. Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 6-2 Learning Objectives Understand the.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Corporate Credit Scoring Models. 2 Scoring Systems Qualitative (Subjective) Univariate (Accounting/Market Measures) Multivariate (Accounting/Market Measures)
When We Say Predictive Analytics, What Do We Mean? Professor Tom Fomby Director of the Richard B. Johnson Center for Economic Studies SMU
Managerial Decision Modeling 6 th edition Cliff T. Ragsdale.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
In an increasingly competitive industry is certified by a recognized provider as Microsoft exam will dramatically improve your chances busy. Microsoft.
DATA MODELING & PREPARATION Biz Pro 9 th Study Group.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
XLMiner – a Data Mining Toolkit
Eco 6380 Predictive Analytics For Economists Spring 2016
Eco 6380 Predictive Analytics For Economists Spring 2014
Dr. Morgan C. Wang Department of Statistics
Eco 6380 Predictive Analytics For Economists Spring 2016
Eco 6380 Predictive Analytics For Economists Spring 2016
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Presentation transcript:

Eco 5385 Predictive Analytics For Economists Spring 2014 Professor Tom Fomby Director, Richard B. Johnson Center for Economic Studies Department of Economics SMU

Presentation 2 The Data Mining Process and Four Software Packages Chapter 2 in SPB

The Data Mining Process The CRISP_DM Paradigm (see CRISPWP_0800 Data Mining Standards.pdf; CRISP_DM.pdf; and CRISP_Wikipedia.pdf) CRISP_DM = Cross Industry Standard Process for Data Mining Steps: Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment See the CRISP_DM tool in IBM/SPSS Modeler 15.0 The SAS Institute in Cary, North Carolina has its own Data Mining Process Protocol. It is called SEMMA. For more on this protocol, see SAS_SEMMA.pdf

A Look at Four Data Mining Software Packages XLMINER – A very nice pedagogical software package to use. Part of your textbook. Add-in to Microsoft EXCEL. A Point and Click add-in that generates extra EXCEL spreadsheets when executing a procedure. IBM/SPSS Modeler 15.0 – More sophisticated than XLMINER. It uses a “stream flow” approach where modeling “nodes” (icons) are drawn from a palate and put on a “canvas.” Then the nodes are connected together to form a “stream” to conduct a task required by the user. This package is available in Apps.SMU or for download. See the instructions provided in the pdf file “INSTRUCTIONS FOR USING SPSS AND SAS WITH AppSMU_edu.pdf” for using SPSS Modeler 15 on VCL (Apps.SMU). If you want to download SPSS Modeler to your laptop or PC, you must have a valid SMU ID and follow the instructions in “Instructions for Downloading SPSS Modeler with Valid SMU ID.pdf”. SAS Enterprise Miner 7.1 Workstation – Currently not available on VCL but is available for download. Ask at the SMU OIT help desk for the instructions for downloading it to your laptop or PC with valid SMU ID. All of the three above software packages have sample data sets. Also the “Open Source” software R is becoming increasingly popular as a data mining software tool.

Some General Observations Most Computer Scientists and Engineers are not trained in Statistics Most Statisticians and Econometricians are not trained in Data Warehousing Techniques Most Offices of Information Technology are not using statistical methods to be forward-looking or proactive with respect to their customers and business operations Properly Reporting Predictive Analytics Results to a Lay Audience is crucial in getting buy-in and utilization of analytics results in company operations Many Technical People are not well schooled in presentation skills

Successful Implementation of Predictive Analytics into company operations requires a combination of three Basic Core Competencies Predictive Analytics Reporting Data Warehousing

To Operate Core Competencies Data Warehousing: D-base, Oracle, etc. Predictive Analytics: SPSS Modeler and Other Statistical Packages Reporting – Cognos for Dashboards and Microsoft PowerPoint

Prerequisite Skills for a Skilled Analytics Person Predictive Analytics Essential Tools for GLM Time Series Analysis Applied Multivariate Analysis Machine Learning Tools Computational Skills Regression Multiple

Some Specifics on Skills Multiple Linear Regression (OLS, WLS, Time Series Regressions) Generalized Linear Modeling (Probit, Logit, Multinomial Logit/Probit, Count, Cox Proportional Hazard models) Time Series Modeling Expertise (Seasonal Adjustment, Box-Jenkins, Exponential Smoothing, Vector Autoregressions) Applied Multivariate Statistical Analysis (Clustering, Principal Components, Discriminant Analysis) Training in Machine Learning Tools (CART, CHAID, SVM, ANN, K-Nearest-Neighbors, Association Rules) Computer Usage and Programming Skills (SPSS Modeler, SAS Enterprise Miner, Matlab, Mathematica, R, STATA, EVIEWS)

Core Tasks of Predictive Analytics Prediction of Numeric Targets Prediction of Categorical Targets Scoring of New Data Continual Supervision of Model Performance Supervised Learning Treatment of Missing Observations Treatment of Outliers Data Segmentation Reduction of Dimension of Input Space Unsupervised Univariate Plots Box-Plots Matrix Plots Heat and Spatial Maps EDA

A Bond Rating Problem In this problem imagine yourself as a Bond Rating Analyst working for BondRate, Inc., a National Bond Rating Company. Given the financials of a company that is about to issue a corporate bond, you are to rate its bond with a rating of AAA (highest rating), AA, A, BBB, BB, B, or C (lowest rating) depending on the probability that the company will not be “financially stressed” in the next 12 months. In our rating system, if the company has a probability between 0.0 and 0.05 of being distressed in the next 12 months, the firm’s bond is rated AAA. If the company has a probability between 0.05 and 0.10 of being distressed in the next 12 months, the firm’s bond is rated AA. The ranges for the other ratings are A = (0.10 – 0.15), BBB = (0.15 – 0.20), BB = (0.20 – 0.25), B = (0.25 – 0.30), and C = (0.30 and above). Target Variable: Y = 0 if firm does not become “distressed” in the next 12 months, Y = 1 if firm becomes distressed in the next 12 months Input Variables include (next slide)

Input Variables Measured 12 Months Prior tdta = "Debt to Assets" gempl = "Employee Growth Rate" opita = "Op. Income to Assets" invsls = "Inventory to Sales" lsls = "Log of Sales" lta = "Log of Assets" nwcta = "Net Working Cap to Assets" cacl = "Current Assets to Current Liab" qacl = "Quick Assets to Current Liab" ebita = "EBIT to Assets" reta = "Retained Earnings to Assets" ltdta = "LongTerm Debt to TotAssets" mveltd = "Mkt Value Eqty to LTD" fata = "Fixed Assets to Assets";

A Typical SPSS Modeler Stream

An Artificial Neural Network

A CHAID Tree

Now Review the Operation of XLMINER Software