Describing and Exploring Data Initial Data Analysis
Overview Describing and Exploring data Initial Data Analysis Characteristics (Some) Steps involved Methods Statistics Central Tendency Variability Relationships Issues
Once data has been collected, the raw information must be manipulated in some fashion to make it more informative. Several options are available including plotting the data or calculating descriptive statistics. Describing and Exploring Data
Plotting Data Often, one of the first things one does with a set of raw data is to plot the data in some manner. One should start with visual display of data. Examples Frequency and density information Histograms, Violin plots Bar plots Trends over time or across groups Line graphs Display of interval information (error bars) Relationships Scatterplots Combinations Visual display of data allows for more rapid comprehension of distributions and relationships Use it whenever possible
Descriptives The other main part of initial examination of data includes acquiring descriptive statistics Measures of Central Tendency- ‘Expected’ values Single measure estimates Mean, Median, Mode, Trimmed Means, M-estimators Measures of Variability: estimates of uncertainty Standard deviation, MAD Allow for interval estimates on any number of statistics via the standard error Simple correlation measures among the variables under consideration You should think of correlation statistic as a descriptive, not inferential statistic Except for purely exploratory endeavors, correlations are a starting point for analysis, not an end In fact, many of the analyses you come across use the correlation matrix as the dataset
Initial Data Analysis (IDA) Also Initial Examination of Data, Exploratory Data Analysis Often overlooked or thought of as being not all that important but… It is at the beginning stages where much trouble can be avoided, and if the data is glossed over this can lead to missed findings or results that will not be able to be replicated because they represent bad data. Bad data?
Initial Data Analysis IDA includes: General descriptive and graphical output A healthy inspection of the individual variables’ behaviors Especially visually Outlier analysis Outliers in terms of the model, not the individual variables necessarily Possible model selection or re- specification Initial inference measures and testing assumptions of the analysis
Steps of an analysis 1. Clarify the objectives of the investigation 2. Collect the data in the appropriate way 3. Investigate the structure and quality of the data 4. Carry out IDA (descriptive) 5. Select and carry out formal statistical analysis (inferential) 6. Compare findings with previous results, collect more data if necessary 7. Interpret and communicate results * Be flexible in your approach, and treat each research situation uniquely
Method of IDA Data scrutiny and description Study variables in light of how they were collected Look for troublesome variables and that may warrant special analysis later if used inferentially Search for outliers, missing data etc. that may result in less powerful inferential analysis Gather summary (descriptive) statistics and graphs presented as to not be misleading See if transformations or robust statistics are necessary.
Method of IDA Use inferential analyses in an exploratory way Model Formulation Include relevant theory Recognize important features of the data Do the model and data go together? Might there be new hypotheses worthy of examination? Is further analysis even necessary?
Initial Data Analysis Problem Although seen by most stats folk as an important part of data analysis, IDA is often underused as a source of information and important first step in in data interpretation “Theories looking for data” Too much concern on inferential analysis, statistical significance A far too typical approach seems to be get the data, run descriptives, and at the same time or immediately following run the actual analysis Then because results are poor start figuring out ways to ‘fix’ it.
Why the lack of emphasis on IDA? Assumed it is the natural way that people conduct their research anyway It isn’t if they are left to their own devices Assumed lack of standard methods for going about it In fact there are guidelines for how to do it See Chatfield in related articles section Assumed its too exploratory IDA != fishing Don’t disregard prior knowledge and theory Risk of invalid conclusions This would be a concern if you didn’t perform IDA
Conclusion Analysis of data takes time and one must be prepared to exhaustively examine all aspects of the information collected The purpose of analysis is to allow the data to tell its story, not enforce our own onto the data An open-minded and thoughtful approach is necessary to any investigation