Introduction to Data Analysis

Slides:



Advertisements
Similar presentations
Population vs. Sample Population: A large group of people to which we are interested in generalizing. parameter Sample: A smaller group drawn from a population.
Advertisements

Measurement, Evaluation, Assessment and Statistics
Appendix A. Descriptive Statistics Statistics used to organize and summarize data in a meaningful way.
Introduction to Summary Statistics
Review of Basics. REVIEW OF BASICS PART I Measurement Descriptive Statistics Frequency Distributions.
Review of Basics. REVIEW OF BASICS PART I Measurement Descriptive Statistics Frequency Distributions.
BHS Methods in Behavioral Sciences I April 18, 2003 Chapter 4 (Ray) – Descriptive Statistics.
Descriptive (Univariate) Statistics Percentages (frequencies) Ratios and Rates Measures of Central Tendency Measures of Variability Descriptive statistics.
QUANTITATIVE DATA ANALYSIS
Descriptive Statistics
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Data observation and Descriptive Statistics
Central Tendency and Variability
1 Measures of Central Tendency Greg C Elvers, Ph.D.
Measures of Central Tendency
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data.
Today: Central Tendency & Dispersion
@ 2012 Wadsworth, Cengage Learning Chapter 5 Description of Behavior Through Numerical 2012 Wadsworth, Cengage Learning.
BIOSTATISTICS II. RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Statistics 1 Course Overview
Basic Statistics - Concepts and Examples
With Statistics Workshop with Statistics Workshop FunFunFunFun.
Chapter 3 Statistical Concepts.
Statistics and Research methods Wiskunde voor HMI Betsy van Dijk.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)
Data Handbook Chapter 4 & 5. Data A series of readings that represents a natural population parameter A series of readings that represents a natural population.
PTP 560 Research Methods Week 8 Thomas Ruediger, PT.
CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Univariate Data Analysis.
10a. Univariate Analysis Part 1 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson Department of Computer and Information Science,
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Introduction to Descriptive Statistics Objectives: 1.Explain the general role of statistics in assessment & evaluation 2.Explain three methods for describing.
M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.
Chapter 2 Describing Data.
Skewness & Kurtosis: Reference
Chapter 2 Statistical Concepts Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
10b. Univariate Analysis Part 2 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson Department of Computer and Information Science,
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
INVESTIGATION 1.
Experimental Research Methods in Language Learning Chapter 9 Descriptive Statistics.
Introduction to Statistics Santosh Kumar Director (iCISA)
Psy 230 Jeopardy Measurement Research Strategies Frequency Distributions Descriptive Stats Grab Bag $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500.
L643: Evaluation of Information Systems Week 13: March, 2008.
BASIC STATISTICAL CONCEPTS Chapter Three. CHAPTER OBJECTIVES Scales of Measurement Measures of central tendency (mean, median, mode) Frequency distribution.
IE(DS)1 Descriptive Statistics Data - Quantitative observation of Behavior What do numbers mean? If we call one thing 1 and another thing 2 what do we.
Statistical Analysis of Data. What is a Statistic???? Population Sample Parameter: value that describes a population Statistic: a value that describes.
LIS 570 Summarising and presenting data - Univariate analysis.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
Measurements Statistics WEEK 6. Lesson Objectives Review Descriptive / Survey Level of measurements Descriptive Statistics.
Chapter 2 Describing and Presenting a Distribution of Scores.
Descriptive Statistics(Summary and Variability measures)
AP Biology Resources Statistical Analysis and Graphing.
Outline Sampling Measurement Descriptive Statistics:
Statistical Methods Michael J. Watts
Measurements Statistics
Populations.
Analysis and Empirical Results
Statistical Methods Michael J. Watts
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Univariate Analysis/Descriptive Statistics
Central Tendency and Variability
Description of Data (Summary and Variability measures)
Basic Statistical Terms
Review for Exam 1 Ch 1-5 Ch 1-3 Descriptive Statistics
Biostatistics Lecture (2).
Presentation transcript:

Introduction to Data Analysis Data Measurement Measurement of the data is the first step in the process that ultimately guides the final analysis. Consideration of sampling, controls, errors (random and systematic) and the required precision all influence the final analysis. Validation: Instruments and methods used to measure the data must be validated for accuracy. Precision and accuracy…Determination of error Social vs. Physical Sciences

Introduction to Data Analysis Types of data Univariate/Multivariate Univariate: When we use one variable to describe a person, place, or thing. Multivariate: When we use two or more variables to measure a person, place or thing. Variables may or may not be dependent on each other. Cross-sectional data/Time-ordered data (business, social sciences) Cross-Sectional: Measurements taken at one time period Time-Ordered: Measurements taken over time in chronological sequence. The type of data will dictate (in part) the appropriate data-analysis method.

Introduction to Data Analysis Measurement Scales Nominal or Categorical Scale Classification of people, places, or things into categories (e.g. age ranges, colors, etc.). Classifications must be mutually exclusive (every element should belong to one category with no ambiguity). Weakest of the four scales. No category is greater than or less (better or worse) than the others. They are just different. Ordinal or Ranking Scale Classification of people, places, or things into a ranking such that the data is arranged into a meaningful order (e.g. poor, fair, good, excellent). Qualitative classification only

Introduction to Data Analysis Measurement Scales (business, social sciences) Interval Scale Data classified by ranking. Quantitative classification (time, temperature, etc). Zero point of scale is arbitrary (differences are meaningful). Ratio Scale Data classified as the ratio of two numbers. Quantitative classification (height, weight, distance, etc). Zero point of scale is real (data can be added, subtracted, multiplied, and divided).

Univariate Analysis/Descriptive Statistics The Range Min/Max Average Median Mode Variance Standard Deviation Histograms and Normal Distributions

Univariate Analysis/Histograms Distributions Descriptive statistics are easier to interpret when graphically illustrated. However, charting each data element can lead to very busy and confusing charts that do not help interpret the data. Grouping the data elements into categories and charting the frequency within these categories yields a graphical illustration of how the data is distributed throughout its range.

Univariate Analysis/Histograms With just a few columns this chart is difficult to interpret. It tells you very little about the data set. Even finding the Min and Max can be difficult. The data can be presented such that more statistical parameters can be estimated from the chart (average, standard deviation).

Univariate Analysis/Histograms Frequency Table The first step is to decide on the categories and group the data appropriately. (45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100) Category Labels Frequency 0-50 3 51-60 2 61-70 6 71-80 5 81-90 >90 1

Univariate Analysis/Histograms A histogram is simply a column chart of the frequency table. Category Labels Frequency 0-50 3 51-60 2 61-70 6 71-80 5 81-90 >90 1

Univariate Analysis/Histograms Average (68.6) and Median (68) Histogram Mode (74) -1SD +1SD

Univariate Analysis/Normal Distributions Distributions that can be described mathematically as Gaussian are also called Normal The Bell curve Symmetrical Mean ≈ Median Mean, Median, Mode

Univariate Analysis/Skewed Distributions When data are skewed, the mean and SD can be misleading Skewness sk= 3(mean-median)/SD If sk>|1| then distribution is non-symetrical Negatively skewed Mean<Median Sk is negative Positively Skewed Mean>Median Sk is positive

Central Limit Theorem Regardless of the shape of a distribution, the distribution of the sample mean based on samples of size N approaches a normal curve as N increases. N must be less than the entire sample N=10

Univariate Analysis/Descriptive Statistics The Range Difference between minimum and maximum values in a data set Larger range usually (but not always) indicates a large spread or deviation in the values of the data set. (73, 66, 69, 67, 49, 60, 81, 71, 78, 62, 53, 87, 74, 65, 74, 50, 85, 45, 63, 100)

Univariate Analysis/Descriptive Statistics The Average (Mean) Sum of all values divided by the number of values in the data set. One measure of central location in the data set. Average = Average=(73+66+69+67+49+60+81+71+78+62+53+87+74+65+74+50+85+45+63+100)/20 = 68.6 Excel function: AVERAGE()

Univariate Analysis/Descriptive Statistics 2.5 7.5 10 4.8 The data may or may not be symmetrical around its average value 2.5 7.5 10 4.8

Univariate Analysis/Descriptive Statistics The Median The middle value in a sorted data set. Half the values are greater and half are less than the median. Another measure of central location in the data set. (45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100) Median: 68 (1, 2, 4, 7, 8, 9, 9) Excel function: MEDIAN()

Univariate Analysis/Descriptive Statistics The Median May or may not be close to the mean. Combination of mean and median are used to define the skewness of a distribution. 2.5 7.5 10 6.25

Univariate Analysis/Descriptive Statistics The Mode Most frequently occurring value. Another measure of central location in the data set. (45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100) Mode: 74 Generally not all that meaningful unless a larger percentage of the values are the same number.

Univariate Analysis/Descriptive Statistics Variance One measure of dispersion (deviation from the mean) of a data set. The larger the variance, the greater is the average deviation of each datum from the average value. Variance = Average value of the data set Variance = [(45 – 68.6)2 + (49 – 68.6)2 + (50 – 68.6)2 + (53 – 68.6)2 + …]/20 = 181 Excel Functions: VARP(), VAR()

Univariate Analysis/Descriptive Statistics Standard Deviation Square root of the variance. Can be thought of as the average deviation from the mean of a data set. The magnitude of the number is more in line with the values in the data set. Standard Deviation = ([(45 – 68.6)2 + (49 – 68.6)2 + (50 – 68.6)2 + (53 – 68.6)2 + …]/20)1/2 = 13.5 Excel Functions: STDEVP(), STDEV()