Data Analysis Learning from Data

Data Analysis Learning from Data
Traditional methodology is statistics What can we learn from analysis of historical data? Building up models. Relationship to Optimization Objective – find the “best” functional relationship that underlies the data. Constraints – underlying function must have some limits (can always simply match the data but that is not meaningful) Traditional Examples Curve fitting – linear regression minimizes the square error given a polynomial function of the input variables Classification –minimize number of classification errors given the a decision rule constrained to a polynomial or simple linear threshold Model identification – determine relevant parameters in a given dynamic model to minimize error Modern Examples – emphasize underlying process poorly understood  Data Mining

Exploratory Analysis Simple Univariate Measures
Measures of central tendency Measures of variation Measures of similarity  Have to start with Statistics

Exploratory Analysis Simple Multivariate Measures
Mean and variance Correlation Independence r = 0

Exploratory Analysis ANOVA - Analysis of Variance Hypothesis test
Essentially determining which input variables are significant Hypothesis test p-value is the probability of obtaining a finding a result similar to the one obtained, assuming the null hypothesis is true, i.e., the finding was the result of chance alone (p typically 5%) Based on -square distribution Many involved statistical tests e.g., Kruskal-Wallis test for K independent samples

Simple Models from Data Regression
Maximum likelihood estimate (pseudo-inverse) Function can be non-linear (still minimizing square error)

Simple Regression Example
Data points (1,3); (8,9); (11,11); (4,5); (3,2) Find two coefficients for straightline approximation Several examples adapted from Data Mining by Kantardzic

Linear Regression Comments
Weighting can account for better quality data (usually by inverse of variance) Every data point gets a vote Sensitive to outliers – use least absolute value fit to minimize impact of outliers as we talked about last week Maximum likelihood estimate only for normally distributed variables

Preprocessing of Data Bad data – detection of outliers
Least absolute value methods Least square error Bisquare iterative method to drop outliers Notice curve fit Figure from National Instruments

Preprocessing of Data Data reduction/transformation
Correlation coefficients with dependent variable Essentially significance tests for different models Principal component analysis Transformation of variables based on variance Figure from Mathworks

Preprocessing of Data Principal component analysis
Find a linear projection onto a unit vector u that has maximum variance Assume a zero mean data set with covariance matrix C is represented as Let and then the variance of the new data is Form the Lagrangian to solve Applying Kuhn-Tucker gives  as the eigenvalues of C

Preprocessing of Data Principal component analysis
Covariance matrix for independent variables Eigenvectors from largest eigenvalues form the transformation Can select “principal axes” by ratio test

Preprocessing of Data PCA Example
Covariance matrix Ordered eigenvalues R if selecting first two eigenvalues is 95%

Bayesian Analysis Incorporating Prior Information
Assume some information is already known, a conditional probability (H: Hypothesis X: Observation) Example: Classification rule – given sample X what is the probability it belongs to Ci Assuming independent attributes of X, say xt

Bayesian Analysis Example
Data Prior probabilities Conditional probabilities for Xi Given Xnew= 1, 2, 2 conditional for X find class, X probability from given samples conclusion class C=2 Sample Attribute X1 X2 X3 Class C 1 2 3 4 5 6 7

Decision Trees Classifying data
X1>0 X2>0 X1<2 C1 C2 C3 Yes Series of classification/decision rules Typical best rule – maximize the information gain (minimize the entropy) Entropy Information gain – weight Entropy by number (or probability) in each new class formed to compare decision over some attribute

Decision Tree Example Data Frequency of occurrence for probabilities
Initial entropy Weighted entropy based on X1 Entropy based on X3 X1 is better choice Attribute X1 X2 X3 Class C A 70 True 1 90 2 85 False 95 B 78 65 75 80 96

Some Motivation Increasingly large amounts of data gathered
Insufficient time to perform detailed analysis and develop precise models Operation of the power system information driven rather than “signal” driven Not possible to derive models from first principles (economics vs. physics)

Other Data Driven Methods
Linear methods tend to work well only for a narrow range of inputs Discussed methods tend to work best under certain statistical properties Need some robustness with respect to noisy inputs Other approaches Support Vector Machines – we’ve already done Artificial Neural Networks – we’ll do the simplest version

What is Data Mining? A textbook definition My take
Data mining is the process of selection, exploration and modeling of large quantities of data to discover regularities or relations that are at first unknown with the aim of obtaining clear and useful results for the owner of the database. (Applied Data Mining by Paolo Giudici) My take Data mining concerns a wide variety of techniques useful for analyzing large data sets or for gathering information based primarily on data not on predefined models

Some Other Thoughts on Data Mining
Massive amounts of data that are not being analyzed that will increase with Smart Grid Both operational and non-operational Within a utility Across different companies Importance of communication systems Decentralized Robust but getting information to where needed

Problems with Data Mining
Nonlinear models are particularly susceptible to erroneous conclusions (overfitting) Can always find relation in data if there is no underlying model (e.g., Super Bowl winner “predicts” stock market) Preconceptions can always be reinforced if one searches long enough (i.e., most political discourse) Increasing the amount of data (particularly unfiltered) increases the likelihood of spurious relationships (see the WWW)

Some Data-Driven Applications in Power Systems
Bayesian Analysis Price Forecasting – determining Probability distribution Reliability Analysis Clustering Price Modeling (Yesterday’s example) Decision Trees Security Analysis Artificial Neural Nets – I’ll do something brief Load Forecasting (huge number of papers) Diagnostics

Data Analysis Learning from Data

Similar presentations

Presentation on theme: "Data Analysis Learning from Data"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Analysis Learning from Data

Similar presentations

Presentation on theme: "Data Analysis Learning from Data"— Presentation transcript:

Similar presentations

About project

Feedback