Lecture 1: Descriptive Statistics and Exploratory

Slides:



Advertisements
Similar presentations
Richard M. Jacobs, OSA, Ph.D.
Advertisements

© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan,
Introduction to Data Analysis
QUANTITATIVE DATA ANALYSIS
Intro to Statistics for the Behavioral Sciences PSYC 1900
Chapter 13 Analyzing Quantitative data. LEVELS OF MEASUREMENT Nominal Measurement Ordinal Measurement Interval Measurement Ratio Measurement.
Chapter 14 Analyzing Quantitative Data. LEVELS OF MEASUREMENT Nominal Measurement Nominal Measurement Ordinal Measurement Ordinal Measurement Interval.
SOWK 6003 Social Work Research Week 10 Quantitative Data Analysis
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Quantifying Data.
Exploratory Data Analysis. Height and Weight 1.Data checking, identifying problems and characteristics Data exploration and Statistical analysis.
● Midterm exam next Monday in class ● Bring your own blue books ● Closed book. One page cheat sheet and calculators allowed. ● Exam emphasizes understanding.
With Statistics Workshop with Statistics Workshop FunFunFunFun.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Univariate Data Analysis.
Chapter 1 The Role of Statistics. Three Reasons to Study Statistics 1.Being an informed “Information Consumer” Extract information from charts and graphs.
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Determination of Sample Size: A Review of Statistical Theory
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
STATISTICS AND OPTIMIZATION Dr. Asawer A. Alwasiti.
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
The field of statistics deals with the collection,
1 UNIT 13: DATA ANALYSIS. 2 A. Editing, Coding and Computer Entry Editing in field i.e after completion of each interview/questionnaire. Editing again.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Research Methodology Lecture No :32 (Revision Chapters 8,9,10,11,SPSS)
Exploratory data analysis, descriptive measures and sampling or, “How to explore numbers in tables and charts”
Data Mining: Data Prepossessing What is to be done before we get to Data Mining?
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
Outline Sampling Measurement Descriptive Statistics:
Exploratory Data Analysis
SUR-2250 Error Theory.
Statistical Methods Michael J. Watts
Measurements Statistics
Chapter 6 Introductory Statistics and Data
Data Analysis.
Statistical Methods Michael J. Watts
CHAPTER 4 Research in Psychology: Methods & Design
PA330 FEB 28, 2000.
8.DATA DESCRIPTIVE.
Tips for exam 1- Complete all the exercises from the back of each chapter. 2- Make sure you re-do the ones you got wrong! 3- Just before the exam, re-read.
Module 6: Descriptive Statistics
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Statistical Reasoning in Everyday Life
Chapter 12 Using Descriptive Analysis, Performing
Distributions and Graphical Representations
Unit 1 - Graphs and Distributions
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Descriptive Statistics
Analyzing Reliability and Validity in Outcomes Assessment Part 1
Basic Statistical Terms
Descriptive and inferential statistics. Confidence interval
Political Science 30 Political Inquiry
Elementary Statistics (Math 145)
Welcome!.
1. Homework #2 (not on posted slides) 2. Inferential Statistics 3
15.1 The Role of Statistics in the Research Process
Statistics PSY302 Review Quiz One Spring 2017
Descriptive Statistics
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Analyzing Reliability and Validity in Outcomes Assessment
Chapter Nine: Using Statistics to Answer Questions
Statistics Definitions
Advanced Algebra Unit 1 Vocabulary
Chapter 6 Introductory Statistics and Data
Statistics Review (It’s not so scary).
Biostatistics Lecture (2).
Introductory Statistics
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

Lecture 1: Descriptive Statistics and Exploratory Data Analysis When we make an instrumental measurement – we collect data: experimentally obtained measurement results. Data – is a set of values of qualitative or quantitative variables. Exploratory data analysis or “EDA” is a first step in analyzing the data from an experiment. The main reasons we use EDA: • detection of mistakes • checking of statistical methods assumptions • generating hypotheses • preliminary selection of appropriate models • determining relationships among the variables.

Why we need EDA? Data in the real world is „dirty“ incomplete: lacking attribute values occupation=“ ” noisy: containing errors or outliers Salary=“-10” inconsistent: containing discrepancies in codes or names Age=“42” Birthday=“03/07/1997” Was rating “1,2,3”, now rating “A, B, C” discrepancy between duplicate records Why Is Data Dirty? “Not applicable” data value when collected, faulty data collection instruments, human or computer error at data entry, different data sources, …

Why Is Data Preprocessing Important? Quality decisions must be based on quality data. Duplicate or missing data may cause incorrect or even misleading statistics. High quality data requirements High-quality data needs to pass a set of quality criteria: Validity Accuracy Precision Reliability The precision of an experiment is related to our ability to minimize random error. The accuracy of an experiment is related to our ability to minimize systematic error.

Validity - The extent to which the study measures what it is intended to measure. Are the values describing what was supposed to be measured? Lack of validity is referred to as ‘Bias‘ or ‘systematic error‘. Accuracy - The degree to which a measurement represents the true value of something. How close a measurement is to the true value? Precision - The degree of the reproducibility of our technique.  How close the measurements are to each other? Reliability - A measure of how dependably an observation is exactly the same when repeated. Will one get the same values if the measurements are repeated?  Accuracy (validity): are used synonymously Precision (reliability): are used synonymously

Bias & Variability A biased measurement will be wrong in the same direction nearly every time. Variability is the difference in successive measurements of the same thing. TYPES OF DATA

Quantitative Data: Discrete vs. Continuous Discrete random variables can only take on values from a countable set of numbers such as the integers or some subset of integers. (Usually, they can’t be fractions.) Continuous random variables can take on any real number in some interval. (They can be fractions.) Categorical: Nominal vs. Ordinal Nominal (unordered) random variables have categories where order doesn’t matter. e.g. gender, ethnic background, religious affiliation Ordinal (ordered) random variables have ordered categories. (e.g. grade levels, income levels, school levels, ...) Observational units are entities whose characteristics we measure. Random variables are characteristics of the observational.

A Review of the main Principles of Statistics Population: the entire collection of units about which we would like information. Sample: the collection of units we actually measure. Parameter: the true value we hope to obtain. Statistic: an estimate of the parameter based on observed information in the sample.

Non-Graphical Exploratory Data Analysis This preliminary data analysis step focuses on four points: measures of central tendency, i.e. the mean, the median and mode, measures of spread, i.e. variability, variances and standard deviation, the shape of the distribution, the existence of outliers.

Why Squared Deviations? Squares eliminate the negatives. Result: – Increasing contribution to the variance as you go farther from the mean.

Standard deviations are simply the square root of the variance

Interesting Theoretical Result

Graphical Exploratory Data Analysis Univariate Data: Histograms and Bar Plots What’s the difference between a histogram and bar plot? Bar plot • Used for categorical variables to show frequency or proportion in each category. • Translate the data from frequency tables into a pictorial representation… Histogram • Used to visualize distribution (shape, center, range, variation) of continuous variables • “Bin size” important