Data mining and statistical learning, lecture 1b

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
1 Statistical Modeling  To develop predictive Models by using sophisticated statistical techniques on large databases.
Data Mining: A Closer Look Chapter Data Mining Strategies.
Multivariate Methods Pattern Recognition and Hypothesis Testing.
Data mining and statistical learning, lecture 2 Outline  An example of data mining  SAS Enterprise miner.
Chapter 3 Simple Regression. What is in this Chapter? This chapter starts with a linear regression model with one explanatory variable, and states the.
Time series analysis - lecture 5
Statistics 350 Lecture 16. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Introduction to Neural Networks Simon Durrant Quantitative Methods December 15th.
LAB 3 AIRBAG DEPLOYMENT SENSOR PREDICTION NETWORK Warning This lab could save someone’s life!
Data mining and statistical learning, lecture 3 Outline  Ordinary least squares regression  Ridge regression.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Part I: Classification and Bayesian Learning
Multiple Regression Research Methods and Statistics.
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Chapter 5 Data mining : A Closer Look.
Data Mining Techniques
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of DaytonMBA APR 09.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Chapter 11 LEARNING FROM DATA. Chapter 11: Learning From Data Outline  The “Learning” Concept  Data Visualization  Neural Networks The Basics Supervised.
Linear Trend Lines = b 0 + b 1 X t Where is the dependent variable being forecasted X t is the independent variable being used to explain Y. In Linear.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Chapter 6 Data Mining 1. Introduction The increase in the use of data-mining techniques in business has been caused largely by three events. The explosion.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
Aim: Review for Exam Tomorrow. Independent VS. Dependent Variable Response Variables (DV) measures an outcome of a study Explanatory Variables (IV) explains.
Machine Learning Extract from various presentations: University of Nebraska, Scott, Freund, Domingo, Hong,
Chapter 9 Correlational Research Designs. Correlation Acceptable terminology for the pattern of data in a correlation: *Correlation between variables.
ESL Chap1 - Introduction Statistical Learning Problems Identify the risk factors for prostate cancer, based on clinical and demographic variables.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
MARKET APPRAISAL. Steps in Market Appraisal Situational Analysis and Specification of Objectives Collection of Secondary Information Conduct of Market.
Data Mining and Decision Support
PREDICTION Elsayed Hemayed Data Mining Course. Outline  Introduction  Regression Analysis  Linear Regression  Multiple Linear Regression  Predictor.
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Applied Quantitative Analysis and Practices LECTURE#28 By Dr. Osman Sadiq Paracha.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
732G21/732G28/732A35 Lecture 3. Properties of the model errors ε 4. ε are assumed to be normally distributed
LOAD FORECASTING. - ELECTRICAL LOAD FORECASTING IS THE ESTIMATION FOR FUTURE LOAD BY AN INDUSTRY OR UTILITY COMPANY - IT HAS MANY APPLICATIONS INCLUDING.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Chapter 12: Correlation and Linear Regression 1.
Multivariate Analysis - Introduction. What is Multivariate Analysis? The expression multivariate analysis is used to describe analyses of data that have.
Prepared by Fayes Salma.  Introduction: Financial Tasks  Data Mining process  Methods in Financial Data mining o Neural Network o Decision Tree  Trading.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning with Spark MLlib
Chapter 7. Classification and Prediction
Statistics 200 Lecture #5 Tuesday, September 6, 2016
Boosting and Additive Trees (2)
Lecture 17. Boosting¶ CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
Ch3: Model Building through Regression
The Elements of Statistical Learning
Linear Regression Prof. Andy Field.
Dr. Siti Nor Binti Yaacob
Overview of Supervised Learning
Supervised vs. unsupervised Learning
Classification and Prediction
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Multivariate Analysis - Introduction
Statistics Review (It’s not so scary).
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Data mining and statistical learning, lecture 1b Outline The five pillars of data mining Supervised and unsupervised learning Data mining and statistical learning, lecture 1b

Data mining and statistical learning, lecture 1b The process of Selecting Exploring Modifying Modeling Assessing large amounts of data to uncover previously unknown patterns Data mining and statistical learning, lecture 1b

Data mining and statistical learning, lecture 1b SEMMA Sample the data by creating one or more data tables Explore the data by searching for: (i) anticipated relationships and trends; (ii) unanticipated relationships and trends; (iii) anomalies Modify the data by transforming variables and combining existing variables into new variables Model the data by searching for a combination of the data that reliably predicts a desired outcome Assess the data by evaluating the usefulness and reliability of the findings from the data mining process Data mining and statistical learning, lecture 1b

Sample the data and create data tables Cases and variables Objects and attributes Data mining and statistical learning, lecture 1b

Data mining and statistical learning, lecture 1b Examine anticipated relationships: electricity consumption and temperature Data mining and statistical learning, lecture 1b

Data mining and statistical learning, lecture 1b Examine the presence of outliers: Total nitrogen concentrations in Swedish rivers determined by two different methods Data mining and statistical learning, lecture 1b

Data mining and statistical learning, lecture 1b Modifying inputs Transforming inputs or outputs Combining existing variables into new variables: Aggregating inputs Reducing the dimension of the inputs Data mining and statistical learning, lecture 1b

Model selection: credit scoring Candidate predictors: Age Sex Income Marital status Education Savings Loans Payment records Houseowner . Subset selection aims to produce a model that is interpretable and has possibly lower prediction error Data mining and statistical learning, lecture 1b

Bias, Variance and Model Complexity High Bias Low Variance Low Bias High Variance Test sample Prediction error Training sample Low High Model complexity Data mining and statistical learning, lecture 1b

Data mining and statistical learning, lecture 1b Supervised learning (prediction, classification) We have a training set of data, in which we observe the outcome and feature measurements for a set of objects Using this data we build a prediction model, or learner, which will enable us to predict the outcome for new unseen objects Unsupervised learning (association analysis, clustering) We observe only the features and have no measurements of the outcome. Our task is to describe how the data are organized and clustered Data mining and statistical learning, lecture 1b Hastie, Tibshirani, and Friedman: The elements of statistical learning

Statistical learning problems – some examples Supervised learning (prediction, classification) Predict tomorrow’s electricity consumption, from weather forecasts and calendar records (season, weekday, holiday) Identify the numbers in a handwritten ZIP code, from a digitized image Unsupervised learning (association analysis) Identify buying patterns that can be used to design sales promotions Data mining and statistical learning, lecture 1b

Supervised learning: statistical terminology Prediction of one or more outputs using observations of one or more inputs Statistical terminology Inputs = Predictors Independent variables Explanatory variables Outputs = Responses Dependent variables Data mining and statistical learning, lecture 1b

Data mining and statistical learning, lecture 1b Naming convention Regression Prediction of quantitative outputs using one or more inputs Classification Prediction of qualitative outputs using observations of one or more inputs Data mining and statistical learning, lecture 1b

Prediction by learning from data Assume that we have a data set which shows the outcome (response) y for a set of investigated objects with features x1, …, xp Prediction by learning from data implies that we derive a function that can be used to foresee the outcome for new objects (with known or observed features) Data mining and statistical learning, lecture 1b

Some major types of quantitative prediction models Linear or nonlinear regression models with i.i.d. error terms Time series regression models with stochastic noise Transfer function models Data mining and statistical learning, lecture 1b