Data Summarization.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Unit 16: Statistics Sections 16AB Central Tendency/Measures of Spread.
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 14 Descriptive Statistics 14.1Graphical Descriptions of Data 14.2Variables.
Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
Summarising and presenting data
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive statistics (Part I)
Quantitative Data Analysis Definitions Examples of a data set Creating a data set Displaying and presenting data – frequency distributions Grouping and.
Data observation and Descriptive Statistics
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data.
Programming in R Describing Univariate and Multivariate data.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Graphical Summary of Data Distribution Statistical View Point Histograms Skewness Kurtosis Other Descriptive Summary Measures Source:
CHAPTER 1 Basic Statistics Statistics in Engineering
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
Methods for Describing Sets of Data
CHAPTER 1 Basic Statistics Statistics in Engineering
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Chapter 2 Describing Data.
Basic Statistics  Statistics in Engineering  Collecting Engineering Data  Data Summary and Presentation  Probability Distributions - Discrete Probability.
Skewness & Kurtosis: Reference
STATISTICS. Statistics * Statistics is the area of science that deals with collection, organization, analysis, and interpretation of data. * A collection.
WARM UP Find the mean, median, mode, and range 1. 5, 10, 19, 34, 16, , 22, 304, 425, 219, 304, 22, 975 When you are done the warm up put the calculator.
Data summarization: Data summarization is either by; 1-Measurements of central tendency (average measurements, measurements of location, and measurements.
Categorical vs. Quantitative…
INVESTIGATION 1.
Agenda Descriptive Statistics Measures of Spread - Variability.
FREQUANCY DISTRIBUTION 8, 24, 18, 5, 6, 12, 4, 3, 3, 2, 3, 23, 9, 18, 16, 1, 2, 3, 5, 11, 13, 15, 9, 11, 11, 7, 10, 6, 5, 16, 20, 4, 3, 3, 3, 10, 3, 2,
CHAPTER 1 Basic Statistics Statistics in Engineering
What are the effects of outliers on statistical data?
Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is.
Descriptive Statistics – Graphic Guidelines
LIS 570 Summarising and presenting data - Univariate analysis.
Data, Type and Methods of representation Dr Hidayathulla Shaikh.
CHAPTER 1 Basic Statistics Statistics in Engineering
Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.
Descriptive Statistics Unit 6. Variable Any characteristic (data) recorded for the subjects of a study ex. blood pressure, nesting orientation, phytoplankton.
Descriptive Statistics(Summary and Variability measures)
Educational Research Descriptive Statistics Chapter th edition Chapter th edition Gay and Airasian.
Introduction Dispersion 1 Central Tendency alone does not explain the observations fully as it does reveal the degree of spread or variability of individual.
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
Descriptive Statistics
Prof. Eric A. Suess Chapter 3
Exploratory Data Analysis
Measures of Central Tendency and Variation
Methods for Describing Sets of Data
Frequency Distributions
Statistics Unit Test Review
Module 6: Descriptive Statistics
CHAPTER 5 Basic Statistics
26134 Business Statistics Week 3 Tutorial
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
T6.1 – Introduction to Statistics
Laugh, and the world laughs with you. Weep and you weep alone
MEASURES OF CENTRAL TENDENCY
Introduction to Statistics
Histograms: Earthquake Magnitudes
1.3 Data Recording, Analysis and Presentation
Basic Statistical Terms
Describing Distributions with Numbers
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Warm Up # 3: Answer each question to the best of your knowledge.
Data analysis LO: Identify and apply different methods of measuring central tendencies and dispersion.
Basic Biostatistics Measures of central tendency and dispersion
Math 341 January 24, 2007.
Presentation transcript:

Data Summarization

Data summarization is either by; 1-Measurements of central tendency (average measurements) 2-Measurments of variability (dispersion measurements)

Measures of Central Tendency What is central tendency? The “middle” / “center” of a variable’s distribution. A single score that best describes the entire distribution. How is it calculated? 1. Mode 2. Median 3. Mean

Measurements of variability: The degree to which numerical (quantitative data) tend to spread about an average value is called variation or dispersion of the data. The variability is something that is in the nature of data, i.e. the data always have a variation (not came as one value). There are various measures of variation or dispersion but the most common being used are;

1-Range:  

The uses of range;

The IQR formula is: IQR = Q3 – Q1 Where Q3 is the upper quartile and Q1 is the lower quartile.

2. Interquartile Range (IQR) The interquartile range is a measure of where the “middle fifty” is in a data set. Where a range is a measure of where the beginning and end are in a set, an interquartile range is a measure of where the bulk of the values lie. That’s why it’s preferred over many other measures of spread.

3-Variance: The variance is defined as the average of the squared deviation of observations away from their mean in a set of observations. It represents a squared value (so it has no units mostly, as it is not accustomed to use meter2 for length square as a measurement).

Haemoglobin level (g/dL)   Haemoglobin level (g/dL) Difference, deviation d=(X-X) d2 D=(X-X)2 X2 8 8-10= -2 4 64 9 9-10= -1 1 81 10 10-10=0 100 11 11-10=+1 121 12 12-10=+2 144 x=50 d= (X-X)=0 d2= (X-X)2=10 x2=510

3-Standard deviation: The SD is defined as the squared root of the variance, or it can be defined as the average of the deviation of observations away from their mean in a set of observations. It is the measure that is accustomed and widely used in biostatistics as a measure of variability. If the value of SD is high it means a large variation the data posses, and if it is of small value it mean a less variation the data posses.

Presentation of Data

Data collected and complied from different types of epidemiological studies are raw data. These are unsorted and are not much helpful for understanding the underlying trends or its meaning.

So, the next step after data collection is to sort and classify the data into characteristic groups or classes like, according to age, sex, social class, number of DMFT, etc. The objective of classification of data is to make the data simple, concise, meaningful, interesting and helpful in further analysis.

There are two main methods of presenting data: Tabulation Diagrams

1. Tabulation: Benefits of the presentation of data by using tables are:

The basic rules have to be followed while forming a frequency distribution tables are :

Example Distribution of study group according to gender, age Study group categories Number Percentage % Gender Male 87 50.9 Female 84 49.1 Age category 30-39 years 17 9.9 40-49 years 29 17.0 50-59 years 64 37.4 60-69 years 61 35.7

2.Diagrams : By arranging the data into tables, we simplify the entire mass of the data, but sometimes it is difficult to understand and compare two or more tables. Diagrams and graphs are one of the most convincing and appealing ways of depicting statistical results, they are extremely useful because they are attractive to the eyes, give a bird eye view of the entire data, have a last impression on the mind of the layman and they facilitate comparison of the relating to different time periods and regions.

The basic rules in the construction of diagrams and graphs are:

Types of diagrams: Depending on the nature of the data, whether it is qualitative or quantitative, the following diagrams may be chosen:

Bar diagram: This diagram is used to represent qualitative data.

. Pie diagram:

Line diagram: this diagram is useful to study changes of values in the variable over time. On the axis X, the time such as hours, days, weeks, months or years are represented and the value of any quantity pertaining to this represented along the axis-Y.

Histogram: this diagram is used to depict quantitative data of continuous type. A histogram is a bar diagram without gap between the bars. It represents the frequency distribution. The histogram is constructed as follows. On the X-axis, class interval is marked and on the Y-axis, the frequencies is marked. A rectangle is drawn above each class interval with height proportional to the frequency of that interval.

Cartograms or spot map: These maps are used to show geographical distribution of frequencies of characteristic.

Figure 6: (a) Dental caries levels (DMFT) of 12-year-olds worldwide. (b) Dental caries levels (DMFT) of 35–44-year-olds worldwide in (2003)