Summarizing Data. Statistics statistics probability probability vs. statistics sampling inference.

Slides:



Advertisements
Similar presentations
Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 14 Descriptive Statistics 14.1Graphical Descriptions of Data 14.2Variables.
Advertisements

Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
Slide 1 Spring, 2005 by Dr. Lianfen Qian Lecture 2 Describing and Visualizing Data 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data.
In this chapter, we will look at some charts and graphs used to summarize quantitative data. We will also look at numerical analysis of such data.
1 The Islamic University of Gaza Civil Engineering Department Statistics ECIV 2305 ‏ Chapter 6 – Descriptive Statistics.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
1. Statistics 2. Frequency Table 3. Graphical Representations  Bar Chart, Pie Chart, and Histogram 4. Median and Quartiles 5. Box Plots 6. Interquartile.
Statistics - Descriptive statistics 2013/09/23. Data and statistics Statistics is the art of collecting, analyzing, presenting, and interpreting data.
Starter 1.Find the median of Find the median of Calculate the range of Calculate the mode.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Cumulative Frequency Diagrams & Box Plots
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
AP STATISTICS Section 1.1: Displaying Distributions.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
M08-Numerical Summaries 2 1  Department of ISM, University of Alabama, Lesson Objectives  Learn what percentiles are and how to calculate quartiles.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Engineering Probability and Statistics - SE-205 -Chap 1 By S. O. Duffuaa.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Chapter 1: Exploring Data AP Stats, Questionnaire “Please take a few minutes to answer the following questions. I am collecting data for my.
Statistics 3502/6304 Prof. Eric A. Suess Chapter 3.
Methods for Describing Sets of Data
Kinds of data 10 red 15 blue 5 green 160cm 172cm 181cm 4 bedroomed 3 bedroomed 2 bedroomed size 12 size 14 size 16 size 18 fred lissy max jack callum zoe.
What is Statistics? Statistics is the science of collecting, analyzing, and drawing conclusions from data –Descriptive Statistics Organizing and summarizing.
 Frequency Distribution is a statistical technique to explore the underlying patterns of raw data.  Preparing frequency distribution tables, we can.
Univariate Data Chapters 1-6. UNIVARIATE DATA Categorical Data Percentages Frequency Distribution, Contingency Table, Relative Frequency Bar Charts (Always.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
Chapter 2 Describing Data.
Data Analysis Qualitative Data Data that when collected is descriptive in nature: Eye colour, Hair colour Quantitative Data Data that when collected is.
Chapter 21 Basic Statistics.
14.1 Data Sets: Data Sets: Data set: collection of data values.Data set: collection of data values. Frequency: The number of times a data entry occurs.Frequency:
BIA 2610 – Statistical Methods Chapter 2 – Descriptive Statistics: Tabular and Graphical Displays.
1 Elementary Statistics Larson Farber Descriptive Statistics Chapter 2.
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
Bellwork 1. If a distribution is skewed to the right, which of the following is true? a) the mean must be less than the.
Chapter 8 Making Sense of Data in Six Sigma and Lean
MMSI – SATURDAY SESSION with Mr. Flynn. Describing patterns and departures from patterns (20%–30% of exam) Exploratory analysis of data makes use of graphical.
Chapter 4: Quantitative Data Part 1: Displaying Quant Data (Week 2, Wednesday) Part 2: Summarizing Quant Data (Week 2, Friday)
Unit 3: Averages and Variations Week 6 Ms. Sanchez.
The field of statistics deals with the collection,
Exploring, Displaying, and Examining Data
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
4.2 Displays of Quantitative Data. Stem and Leaf Plot A stem-and-leaf plot shows data arranged by place value. You can use a stem-and-leaf plot when you.
Unit 2 Test Topics: Statistics Honors Analysis. Conditional Probability Geometric Probability ◦ triangles ◦ triangles ◦Basic area formulas.
Chapter 5: Organizing and Displaying Data. Learning Objectives Demonstrate techniques for showing data in graphical presentation formats Choose the best.
Graphs with SPSS Aravinda Guntupalli. Bar charts  Bar Charts are used for graphical representation of Nominal and Ordinal data  Height of the bar is.
Statistics Review  Mode: the number that occurs most frequently in the data set (could have more than 1)  Median : the value when the data set is listed.
Homework solution#1 Q1: Suppose you have a sample from Palestine University and the distribution of the sample as: MedicineDentistEngineeringArtsCommerce.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Central Tendency  Key Learnings: Statistics is a branch of mathematics that involves collecting, organizing, interpreting, and making predictions from.
AP Statistics. Chapter 1 Think – Where are you going, and why? Show – Calculate and display. Tell – What have you learned? Without this step, you’re never.
Prof. Eric A. Suess Chapter 3
Exploratory Data Analysis
Methods for Describing Sets of Data
STATISTICS AND PROBABILITY IN CIVIL ENGINEERING
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
Engineering Probability and Statistics - SE-205 -Chap 6
BUSINESS MATHEMATICS & STATISTICS.
Description of Data (Summary and Variability measures)
Laugh, and the world laughs with you. Weep and you weep alone
Jeopardy Final Jeopardy Chapter 1 Chapter 2 Chapter 3 Chapter 4
Cumulative Frequency Diagrams & Box Plots
Cumulative Frequency Diagrams & Box Plots
pencil, red pen, highlighter, GP notebook, graphing calculator
Honors Statistics Review Chapters 4 - 5
Review for Exam 1 Ch 1-5 Ch 1-3 Descriptive Statistics
Comparing Statistical Data
Probability and Statistics
Section 1.1: Displaying Distributions
pencil, red pen, highlighter, GP notebook, graphing calculator
Review of 6th grade material to help with new Statistics unit
Presentation transcript:

Summarizing Data

Statistics

statistics probability probability vs. statistics sampling inference

Distribution ?

Distribution : A mathematical way to represent the diversity of characteristics of a group. Group may be a sample and a population. population distribution distribution of a sample

dist’n of a sample pop’n dist’n realisticimaginary dataTheory (model) statistics

Statistics starts from data.

Data are clues to truth, and say about truth. Data are not just sets of numbers.

The 1st principle of statistics : The sample is not the same with the population, but the population is represented by the sample sufficiently well.

Datawork

From real world Data collecting Exploring data Reducing data Modeling Evaluating From forest Making timber Inspecting wood grain Cutting Structuring Finishing Woodwork & Datawork

Craft & Endeavor

Tools & Skills

Paper, pencil & calculator Spreadsheet SW (Excel) Minitab, SPSS, SAS, R DBMS ( Access, Oracle, …) C/C++, Java, Python, … Statistical tools You need skill to use these.

Also, you need craft & experiences. However, the more important point in datawork is trying to get perspectives of the data on your hand.

No typical ways for good datawork. Think, think and think ! That’s the only way.

Datawork is not a miagic. It's a hard job. 살라카둘라 메치카불라 비비디 바비디 부 --

Wood grain ?

Grain of data ?

Seeing the grain of data Exploratory Data Analysis ≈

The step to check the basic properties of data, by using the basic statistical methods. From EDA, we aim to develop insight on data, as a first step for more specific analysis. Exploratory Data Analysis (EDA)

Qualitative variable frequency table crosstabulation (contingency table) bar chart, pie chart, …. Basic Statistical Methods

(cumulative) frequency distribution histogram dot-plot stem & leaf diagram scatter plot box plot, …. Quantitative scale Basic Statistical Methods

12 var’s & 100 obs’s Many types of ‘offer’ to cardholders To find the type of ‘offer’ that increases cardholder’s usage maximally. Credit_Card_Bank: p22 of SVV Example Data

[1] "Offer.Status" (Categorical) [2] "Charges.Aug.2008" (Quantitative) [3] "Charges.Sept.2008" (Quantitative) [4] "Charges.Oct.2008“ (Quantitative) [5] "Marketing.Segment" (Categorical) [6] "Industry.Segment" (Categorical) [7] "Spendlift.After.Promotion“(Quantitative) [8] "Pre.Promotion.Avg.Spend" (Quantitative) [9] "Post.Promotion.Avg.Spend" (Quantitative) [10] "Retail.Customer" (Yes, No) [11] "Enrolled.in.Program" (Yes, No) [12] "Spendlift.Positive" (Yes, No) oct08 mseg iseg loct08 = log(oct08) data.svv<-dir("c:/temp/text") dfile.svv<-paste("c:/temp/text/",data.svv,sep="") dsv<- read.table(dfile.svv[1],head=TRUE, sep="\t") names(dsv) oct08 0] mseg<-dsv[,5]; iseg<-dsv[,6]

[1] -Inf [11] [21] [31] Inf [41] [51] Inf [61] Inf [71] [81] -Inf Inf [91] Inf log(oct08): log(0) = - Inf Rounded up to 2 nd decimal round(loct08,2)

[1] [11] [21] [31] [41] [51] [61] [71] [81] [91] Sorted values of log(oct08): after deleting 7 cases of –Inf. round(sort(xoct08,2)

[1] B B A T A T B A A T B T T B B B B B T B R A T A A [26] R B B R T T T A A B T B A R B B A T B B R T T A A [51] B A B B T A A T B A B A B R B A A R A T B T T B R [76] T A T A A B B B T R T T R T B A A A A A A B T A T Levels: A B R T iseg Meaning of the levels are not known.

[1] M L L M B A L A M H M L A M M B L B H L [21] H B L H H M A B H L A H A B L H L B A A [41] A H A L L H L A B A A B A B B A M A B L [61] L B B H B A B A B L B A H L M L L M A B [81] A L L M H A H H L A H L B A H A L L L H Levels: L < B < M < A < H mseg L: low, B: below medium, M: medium, A: above medium, H: high levels(mseg)<-c("M","H","L","A","B") mseg<-factor(mseg, levels=c("L","B","M","A","H")) mseg

Histogram of loct08 loct08 Frequency hist(xoct08,col="grey")

Stem and leaf display: leaf unit = | 5 3 | | | | | | | | 4 a stem a leaf 2.5 stem(xoct08)

leaf unit = 1 2 | 5 3 | | | | | | | | 4 25 stem(10*xoct08)

Min. Q1 Median Q3 Max number summary of log(oct08): IQR = summary(xoct08)

Quartiles : Q1, Q2, Q3 Q1 : values ranked at 25% from lowest Q2 : values ranked at 50% from lowest Q3 : values ranked at 75% from lowest IQR (Inter-Quartile Range) = Q3 – Q1 Median = Q2

How to take : Q1, Q2, Q3 If c is an integer, then c-th ranked value x[c] If c is not an integer, then (x[c - ]+ x[c + ])/2 Q1 : c = 0.25*(n+1) Q2 : c= 0.5*(n+1) Q3 : c= 0.75*(n+1) c - : the largest lower integer than c c + : the smallest upper integer than c

[1] [11] [21] [31] [41] [51] [61] [71] [81] [91] Sorted values of log(oct08): after deleting 7 cases of – Inf. n= 93, 0.25*94=23.5, 0.5*94=47, 0.75*94=70.5

loct08 Dot plot

Box plot oct Box plot of log(oct08) boxplot(xoct08)boxplot(oct08)

IQR Q1 Q3Q2 * * mild-outlierextreme-outlier min(non-outlier) 1.5 IQR

freq %freq cum. freq %cum. freq Low Spender Med Low Spender Average Spender Med High Spender High Spender Total Frequency table table(mseg) table(mseg)/length(mseg) cumsum(table(mseg)) cumsum(table(mseg))/length(mseg)

Bar chart of log(oct08) (2,3](3,4](4,5](5,6](6,7](7,8](8,9](9,10](10,11]

Histogram & Bar chart Histogram : for quantitative variables connected bar’s Bar chart : for categorical variables disconnected bar’s

A B R T Total L B M A H Total Contingency table of mseg and iseg mseg iseg table(mseg,iseg) apply(table(mseg,iseg),1,sum) apply(table(mseg,iseg),2,sum)

A B R T Pie chart of iseg pie(table(iseg),col=c("red","light green","green","blue"))

ABRT Segmented bar chart of (mseg, iseg) - serial barplot(table(mseg,iseg),col=c("red","light green","green","blue","purple"))

ABRT Segmented bar chart of (mseg, iseg) - parallel barplot(table(mseg,iseg),col=c("red","light green","green","blue","purple"),beside=TRUE)

Mosaic Plot iseg mseg ABRT L B M A H mosaicplot(~iseg+mseg,col=rainbow(5))

LBMAH Box plot of log(oct08) by mseg boxplot(loct08[oct08>0]~mseg[oct08>0])

ABCDEF

Thank you !!