Data Analysis Learning from Data Traditional methodology is statistics What can we learn from analysis of historical data? Building up models. Relationship to Optimization Objective – find the “best” functional relationship that underlies the data. Constraints – underlying function must have some limits (can always simply match the data but that is not meaningful) Traditional Examples Curve fitting – linear regression minimizes the square error given a polynomial function of the input variables Classification –minimize number of classification errors given the a decision rule constrained to a polynomial or simple linear threshold Model identification – determine relevant parameters in a given dynamic model to minimize error Modern Examples – emphasize underlying process poorly understood Data Mining
Exploratory Analysis Simple Univariate Measures Measures of central tendency Measures of variation Measures of similarity Have to start with Statistics
Exploratory Analysis Simple Multivariate Measures Mean and variance Correlation Independence r = 0
Exploratory Analysis ANOVA - Analysis of Variance Hypothesis test Essentially determining which input variables are significant Hypothesis test p-value is the probability of obtaining a finding a result similar to the one obtained, assuming the null hypothesis is true, i.e., the finding was the result of chance alone (p typically 5%) Based on -square distribution Many involved statistical tests e.g., Kruskal-Wallis test for K independent samples
Simple Models from Data Regression Maximum likelihood estimate (pseudo-inverse) Function can be non-linear (still minimizing square error)
Simple Regression Example Data points (1,3); (8,9); (11,11); (4,5); (3,2) Find two coefficients for straightline approximation Several examples adapted from Data Mining by Kantardzic
Linear Regression Comments Weighting can account for better quality data (usually by inverse of variance) Every data point gets a vote Sensitive to outliers – use least absolute value fit to minimize impact of outliers as we talked about last week Maximum likelihood estimate only for normally distributed variables
Preprocessing of Data Bad data – detection of outliers Least absolute value methods Least square error Bisquare iterative method to drop outliers Notice curve fit Figure from National Instruments
Preprocessing of Data Data reduction/transformation Correlation coefficients with dependent variable Essentially significance tests for different models Principal component analysis Transformation of variables based on variance Figure from Mathworks
Preprocessing of Data Principal component analysis Find a linear projection onto a unit vector u that has maximum variance Assume a zero mean data set with covariance matrix C is represented as Let and then the variance of the new data is Form the Lagrangian to solve Applying Kuhn-Tucker gives as the eigenvalues of C
Preprocessing of Data Principal component analysis Covariance matrix for independent variables Eigenvectors from largest eigenvalues form the transformation Can select “principal axes” by ratio test
Preprocessing of Data PCA Example Covariance matrix Ordered eigenvalues R if selecting first two eigenvalues is 95%
Bayesian Analysis Incorporating Prior Information Assume some information is already known, a conditional probability (H: Hypothesis X: Observation) Example: Classification rule – given sample X what is the probability it belongs to Ci Assuming independent attributes of X, say xt
Bayesian Analysis Example Data Prior probabilities Conditional probabilities for Xi Given Xnew= 1, 2, 2 conditional for X find class, X probability from given samples conclusion class C=2 Sample Attribute X1 X2 X3 Class C 1 2 3 4 5 6 7
Decision Trees Classifying data X1>0 X2>0 X1<2 C1 C2 C3 Yes Series of classification/decision rules Typical best rule – maximize the information gain (minimize the entropy) Entropy Information gain – weight Entropy by number (or probability) in each new class formed to compare decision over some attribute
Decision Tree Example Data Frequency of occurrence for probabilities Initial entropy Weighted entropy based on X1 Entropy based on X3 X1 is better choice Attribute X1 X2 X3 Class C A 70 True 1 90 2 85 False 95 B 78 65 75 80 96
Some Motivation Increasingly large amounts of data gathered Insufficient time to perform detailed analysis and develop precise models Operation of the power system information driven rather than “signal” driven Not possible to derive models from first principles (economics vs. physics)
Other Data Driven Methods Linear methods tend to work well only for a narrow range of inputs Discussed methods tend to work best under certain statistical properties Need some robustness with respect to noisy inputs Other approaches Support Vector Machines – we’ve already done Artificial Neural Networks – we’ll do the simplest version
What is Data Mining? A textbook definition My take Data mining is the process of selection, exploration and modeling of large quantities of data to discover regularities or relations that are at first unknown with the aim of obtaining clear and useful results for the owner of the database. (Applied Data Mining by Paolo Giudici) My take Data mining concerns a wide variety of techniques useful for analyzing large data sets or for gathering information based primarily on data not on predefined models
Some Other Thoughts on Data Mining Massive amounts of data that are not being analyzed that will increase with Smart Grid Both operational and non-operational Within a utility Across different companies Importance of communication systems Decentralized Robust but getting information to where needed
Problems with Data Mining Nonlinear models are particularly susceptible to erroneous conclusions (overfitting) Can always find relation in data if there is no underlying model (e.g., Super Bowl winner “predicts” stock market) Preconceptions can always be reinforced if one searches long enough (i.e., most political discourse) Increasing the amount of data (particularly unfiltered) increases the likelihood of spurious relationships (see the WWW)
Some Data-Driven Applications in Power Systems Bayesian Analysis Price Forecasting – determining Probability distribution Reliability Analysis Clustering Price Modeling (Yesterday’s example) Decision Trees Security Analysis Artificial Neural Nets – I’ll do something brief Load Forecasting (huge number of papers) Diagnostics