Data Analysis Learning from Data

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

Brief introduction on Logistic Regression
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Chapter 10 Curve Fitting and Regression Analysis
Ch11 Curve Fitting Dr. Deshi Ye
CMPUT 466/551 Principal Source: CMU
P M V Subbarao Professor Mechanical Engineering Department
Traditional and innovative technologies and its impact on the long run economic growth in Armenia Karen Poghosyan Central bank of Armenia.
Lecture 7: Principal component analysis (PCA)
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
x – independent variable (input)
Statistical Methods Chichang Jou Tamkang University.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Independent Component Analysis (ICA) and Factor Analysis (FA)
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Estimation and the Kalman Filter David Johnson. The Mean of a Discrete Distribution “I have more legs than average”
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Classification and Prediction: Regression Analysis
Calibration & Curve Fitting
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Regression and Correlation Methods Judy Zhong Ph.D.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Why Model? Make predictions or forecasts where we don’t have data.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
V Bandi and R Lahdelma 1 Forecasting. V Bandi and R Lahdelma 2 Forecasting? Decision-making deals with future problems -Thus data describing future must.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Data Mining and Decision Support
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Data Modeling Patrice Koehl Department of Biological Sciences
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Chapter 7. Classification and Prediction
CEE 6410 Water Resources Systems Analysis
Deep Feedforward Networks
Part 5 - Chapter
Probability Theory and Parameter Estimation I
LECTURE 11: Advanced Discriminant Analysis
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Ch3: Model Building through Regression
CH 5: Multivariate Methods
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
CJT 765: Structural Equation Modeling
Machine Learning Basics
Data Mining Lecture 11.
Roberto Battiti, Mauro Brunato
Discrete Event Simulation - 4
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Lecture 4: Econometric Foundations
EE513 Audio Signals and Systems
Simple Linear Regression
Unit XI: Data Analysis in nursing research
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Parametric Methods Berlin Chen, 2005 References:
Feature Selection Methods
Statistical Thinking and Applications
Presentation transcript:

Data Analysis Learning from Data Traditional methodology is statistics What can we learn from analysis of historical data? Building up models. Relationship to Optimization Objective – find the “best” functional relationship that underlies the data. Constraints – underlying function must have some limits (can always simply match the data but that is not meaningful) Traditional Examples Curve fitting – linear regression minimizes the square error given a polynomial function of the input variables Classification –minimize number of classification errors given the a decision rule constrained to a polynomial or simple linear threshold Model identification – determine relevant parameters in a given dynamic model to minimize error Modern Examples – emphasize underlying process poorly understood  Data Mining

Exploratory Analysis Simple Univariate Measures Measures of central tendency Measures of variation Measures of similarity  Have to start with Statistics

Exploratory Analysis Simple Multivariate Measures Mean and variance Correlation Independence r = 0

Exploratory Analysis ANOVA - Analysis of Variance Hypothesis test Essentially determining which input variables are significant Hypothesis test p-value is the probability of obtaining a finding a result similar to the one obtained, assuming the null hypothesis is true, i.e., the finding was the result of chance alone (p typically 5%) Based on -square distribution Many involved statistical tests e.g., Kruskal-Wallis test for K independent samples

Simple Models from Data Regression Maximum likelihood estimate (pseudo-inverse) Function can be non-linear (still minimizing square error)

Simple Regression Example Data points (1,3); (8,9); (11,11); (4,5); (3,2) Find two coefficients for straightline approximation Several examples adapted from Data Mining by Kantardzic

Linear Regression Comments Weighting can account for better quality data (usually by inverse of variance) Every data point gets a vote Sensitive to outliers – use least absolute value fit to minimize impact of outliers as we talked about last week Maximum likelihood estimate only for normally distributed variables

Preprocessing of Data Bad data – detection of outliers Least absolute value methods Least square error Bisquare iterative method to drop outliers Notice curve fit Figure from National Instruments

Preprocessing of Data Data reduction/transformation Correlation coefficients with dependent variable Essentially significance tests for different models Principal component analysis Transformation of variables based on variance Figure from Mathworks

Preprocessing of Data Principal component analysis Find a linear projection onto a unit vector u that has maximum variance Assume a zero mean data set with covariance matrix C is represented as Let and then the variance of the new data is Form the Lagrangian to solve Applying Kuhn-Tucker gives  as the eigenvalues of C

Preprocessing of Data Principal component analysis Covariance matrix for independent variables Eigenvectors from largest eigenvalues form the transformation Can select “principal axes” by ratio test

Preprocessing of Data PCA Example Covariance matrix Ordered eigenvalues R if selecting first two eigenvalues is 95%

Bayesian Analysis Incorporating Prior Information Assume some information is already known, a conditional probability (H: Hypothesis X: Observation) Example: Classification rule – given sample X what is the probability it belongs to Ci Assuming independent attributes of X, say xt

Bayesian Analysis Example Data Prior probabilities Conditional probabilities for Xi Given Xnew= 1, 2, 2 conditional for X find class, X probability from given samples conclusion class C=2 Sample Attribute X1 X2 X3 Class C 1 2 3 4 5 6 7

Decision Trees Classifying data X1>0 X2>0 X1<2 C1 C2 C3 Yes Series of classification/decision rules Typical best rule – maximize the information gain (minimize the entropy) Entropy Information gain – weight Entropy by number (or probability) in each new class formed to compare decision over some attribute

Decision Tree Example Data Frequency of occurrence for probabilities Initial entropy Weighted entropy based on X1 Entropy based on X3 X1 is better choice Attribute X1 X2 X3 Class C A 70 True 1 90 2 85 False 95 B 78 65 75 80 96

Some Motivation Increasingly large amounts of data gathered Insufficient time to perform detailed analysis and develop precise models Operation of the power system information driven rather than “signal” driven Not possible to derive models from first principles (economics vs. physics)

Other Data Driven Methods Linear methods tend to work well only for a narrow range of inputs Discussed methods tend to work best under certain statistical properties Need some robustness with respect to noisy inputs Other approaches Support Vector Machines – we’ve already done Artificial Neural Networks – we’ll do the simplest version

What is Data Mining? A textbook definition My take Data mining is the process of selection, exploration and modeling of large quantities of data to discover regularities or relations that are at first unknown with the aim of obtaining clear and useful results for the owner of the database. (Applied Data Mining by Paolo Giudici) My take Data mining concerns a wide variety of techniques useful for analyzing large data sets or for gathering information based primarily on data not on predefined models

Some Other Thoughts on Data Mining Massive amounts of data that are not being analyzed that will increase with Smart Grid Both operational and non-operational Within a utility Across different companies Importance of communication systems Decentralized Robust but getting information to where needed

Problems with Data Mining Nonlinear models are particularly susceptible to erroneous conclusions (overfitting) Can always find relation in data if there is no underlying model (e.g., Super Bowl winner “predicts” stock market) Preconceptions can always be reinforced if one searches long enough (i.e., most political discourse) Increasing the amount of data (particularly unfiltered) increases the likelihood of spurious relationships (see the WWW)

Some Data-Driven Applications in Power Systems Bayesian Analysis Price Forecasting – determining Probability distribution Reliability Analysis Clustering Price Modeling (Yesterday’s example) Decision Trees Security Analysis Artificial Neural Nets – I’ll do something brief Load Forecasting (huge number of papers) Diagnostics