1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)

Slides:



Advertisements
Similar presentations
Chapter 3, Numerical Descriptive Measures
Advertisements

Describing Quantitative Variables
Class Session #2 Numerically Summarizing Data
Appendix A. Descriptive Statistics Statistics used to organize and summarize data in a meaningful way.
Measures of Dispersion
Introduction to Summary Statistics
Descriptive Statistics
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Descriptive (Univariate) Statistics Percentages (frequencies) Ratios and Rates Measures of Central Tendency Measures of Variability Descriptive statistics.
Chapter 3 Describing Data Using Numerical Measures
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Analysis of Research Data
Basic Business Statistics 10th Edition
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Measures of Dispersion
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 3 Describing Data Using Numerical Measures.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Describing Data: Numerical
Economics 173 Business Statistics Lecture 2 Fall, 2001 Professor J. Petry
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Chapter 3 Statistical Concepts.
Descriptive Statistics
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Rules of Data Dispersion By using the mean and standard deviation, we can find the percentage of total observations that fall within the given interval.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Descriptive Statistics Descriptive Statistics describe a set of data.
Statistical Tools in Evaluation Part I. Statistical Tools in Evaluation What are statistics? –Organization and analysis of numerical data –Methods used.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Chapter 2 Describing Data.
Lecture 3 Describing Data Using Numerical Measures.
Skewness & Kurtosis: Reference
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Descriptive Statistics Descriptive Statistics describe a set of data.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
INVESTIGATION 1.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Business Statistics, A First Course.
 Two basic types Descriptive  Describes the nature and properties of the data  Helps to organize and summarize information Inferential  Used in testing.
Central Tendency & Dispersion
Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
Descriptive Statistics(Summary and Variability measures)
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Chapter 4: Measures of Central Tendency. Measures of central tendency are important descriptive measures that summarize a distribution of different categories.
Welcome to MM305 Unit 2 Seminar Dr. Bob Statistical Foundations for Quantitative Analysis.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Descriptive Statistics ( )
Analysis and Empirical Results
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Description of Data (Summary and Variability measures)
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Descriptive Statistics
Basic Statistical Terms
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Advanced Algebra Unit 1 Vocabulary
Biostatistics Lecture (2).
Presentation transcript:

1 DATA DESCRIPTION

2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject) has his age, weight, height, home address, number of units taken, and so on.

3 Variables l These parameters are called variables. l In statistics variables are stored in columns, each variable occupying a column.

4 Cross-sectional and time-series analyses l In a cross-sectional analysis a unit/subject will be the entity you are studying. For example, if you study the housing market in San Diego, a unit will be a house, and variables will be price, size, age, etc., of a house. l In a time-series analysis the unit is a time unit, say, hour, day, month, etc.

5 Data Types l Nominal data: male/female, colors, l Ordinal data: excellent/good/bad, l Interval data: temperature, GMAT scores, l Ratio data: distance to school, price,

6 Two forms l GRAPHICAL form l NUMERICAL SUMMARY form

7 Graphical forms l Sequence plots l Histograms (frequency distributions) l Scatter plots

8 Sequence plots l To describe a time series l The horizontal axis is always related to the sequence in which data were collected l The vertical axis is the value of the variable

9 Example: sequence plot

10 Histograms I l A histogram (frequency distribution) shows how many values are in a certain range. l It is used for cross-sectional analysis. l the potential observation values are divided into groups (called classes). l The number of observations falling into each class is called frequency. l When we say an observation falls into a class, we mean its value is greater than or equal to the lower bound but less than the upper bound of the class.

11 Example: histogram A commercial bank is studying the time a customer spends in line. They recorded waiting times (in minutes) of 28 customers:

12 Example: histogram

13 Histogram II l The relative frequency distribution depicts the ratio of the frequency and the total number of observations. l The cumulative distribution depicts the percentage of observations that are less than a specific value.

14 Example: relative frequency distribution l A “relative frequency” distribution plots the fraction (or percentage) of observations in each class instead of the actual number. For this problem, the relative frequency of the first class is 6/28= The remaining relative frequencies are 0.179, 0.250, and A graph similar to the above one can then be plotted.

15 Example: cumulative distribution l In the previous example, the percentage of observations that are less than 3 minutes is 0.214, the percentage of observations that are less than 5 is =0.393, less than 7 is =0.643, less than 9 is =0.929, and that less than 11 is 1.0.

16 Example: cumulative distribution

17 Histogram III l The summation of all the relative frequencies is always 1. l The cumulative distribution is non- decreasing. l The last value of the cumulative distribution is always 1. l A cumulative distribution can be derived from the corresponding relative distribution, and vice versa.

18 Probability l A random variable is a variable whose values cannot predetermined but governed by some random mechanism. l Although we cannot predict precisely the value of a random variable, we might be able to tell the possibility of a random variable being in a certain interval. l The relative frequency is also the probability of a random variable falling in the corresponding class. l The relative frequency distribution is also the probability distribution.

19 Scatter plots l A scatter plot shows the relationship between two variables.

20 Example: scatter plot. The following are the height and foot size measurements of 8 men arbitrarily selected from students in the cafeteria. Heights and foot sizes are in centimeters. man Height foot

21 Example: scatter plot

22 Numerical Summary Forms l Central locations: mean, median, and mode. l Dispersion: standard deviation and variance. l Correlation.

23 Mean l Mean/average is the summation of the observations divided by the number of observations l Sum = ( ) = 243 l Mean = 243/10 = 24.3

24 Median l Median is the value of the central observation (the one in the middle), when the observations are listed in ascending or descending order. l When there is an even number of values, the median is given by the average of the middle two values. l When there is an odd number of values, the median is given by the middle number.

25 Example: median

26 Compare mean and median l The median is less sensitive to outliers than the mean. Check the mean and median for the following two data sets:

27 Mode l Mode is the most frequently occurring value(s).

28 Symmetry and skew l A frequency distribution in which the area to the left of the mean is a mirror image of the area to the right is called a symmetrical distribution. l A distribution that has a longer tail on the right hand side than on the left is called positively skewed or skewed to the right. A distribution that has a longer tail on the left is called negatively skewed. l If a distribution is positively skewed, the mean exceeds the median. For a negatively skewed distribution, the mean is less than the median.

29 Range l The range is the difference in the maximum and minimum values of the observations.

30 Standard deviation and variance l The standard deviation is used to describe the dispersion of the data. l The variance is the squared standard deviation.

31 Calculation of S.D. l Calculate the mean; l calculate the deviations; l calculate the squares of the deviations and sum them up; l Divide the sum by n-1 and take the square root.

32 Example: S.D. Sample Deviation Sq of Dev Sum of = = Std. Dev. =

33

34 Empirical rules l If the distribution is symmetrical and bell- shaped, l Approximately 68% of the observations will be within plus and minus one standard deviation from he mean. l Approximately 95% observations will be within two standard deviation of the mean. l Approximately 99.7% observations will be within three standard deviations of the mean.

35 Percentiles l The 75th percentile is the value such that 75% of the numbers are less than or equal to this value and the remaining 25% are larger than this value. l The k-th percentile is the value such that k% of the numbers are less than or equal to this value and the remaining 1-k% are larger than this value.

36 Correlation coefficient l The Correlation coefficient measures how closely two variables are (linearly) related to each other. It has a value between -1 to +1. l Positive and negative linear relationships. l If two variables are not linearly related, the correlation coefficient will be zero; if they are closely related, the correlation coefficient will be close to 1 or -1.