Exploring Data 17 Jan 2012 Dr. Sean Ho busi275.seanho.com HW1 due Thu 10pm By Mon, send email to set proposal meeting For lecture, please download: 01-SportsShoes.xls.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Descriptive Measures MARE 250 Dr. Jason Turner.
Measures of Dispersion
Numerically Summarizing Data
Chapter 3 Describing Data Using Numerical Measures
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
BCOR 1020 Business Statistics Lecture 15 – March 6, 2008.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
Slides by JOHN LOUCKS St. Edward’s University.
1 Business 260: Managerial Decision Analysis Professor David Mease Lecture 1 Agenda: 1) Course web page 2) Greensheet 3) Numerical Descriptive Measures.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Lecture 6: Descriptive Statistics: Probability, Distribution, Univariate Data.
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 2 Describing Data with Numerical Measurements
Describing Data Using Numerical Measures
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Engineering Probability and Statistics - SE-205 -Chap 1 By S. O. Duffuaa.
Descriptive Statistics
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
Methods for Describing Sets of Data
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Ch2: Exploring Data: Charts 13 Sep 2011 BUSI275 Dr. Sean Ho HW1 due Thu 10pm Download and open “SportsShoes.xls”SportsShoes.xls.
Copyright © 2004 Pearson Education, Inc.. Chapter 2 Descriptive Statistics Describe, Explore, and Compare Data 2-1 Overview 2-2 Frequency Distributions.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
1 MATB344 Applied Statistics Chapter 2 Describing Data with Numerical Measures.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data.
Chapter 2 Describing Data.
Lecture 3 Describing Data Using Numerical Measures.
Skewness & Kurtosis: Reference
1 Elementary Statistics Larson Farber Descriptive Statistics Chapter 2.
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
BUSINESS STATISTICS I Descriptive Statistics & Data Collection.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Sampling ‘Scientific sampling’ is random sampling Simple random samples Systematic random samples Stratified random samples Random cluster samples What?
Ch3: Exploring Your Data with Descriptives 15 Sep 2011 BUSI275 Dr. Sean Ho HW1 due tonight 10pm Download and open “02-SportsShoes.xls”02-SportsShoes.xls.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
Descriptive Statistics – Graphic Guidelines
LIS 570 Summarising and presenting data - Univariate analysis.
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
MODULE 3: DESCRIPTIVE STATISTICS 2/6/2016BUS216: Probability & Statistics for Economics & Business 1.
CHAPTER 1 Basic Statistics Statistics in Engineering
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Descriptive Statistics – Graphic Guidelines Pie charts – qualitative variables, nominal data, eg. ‘religion’ Bar charts – qualitative or quantitative variables,
Welcome to MM305 Unit 2 Seminar Dr. Bob Statistical Foundations for Quantitative Analysis.
COMPLETE BUSINESS STATISTICS
Methods for Describing Sets of Data
STATISTICS AND PROBABILITY IN CIVIL ENGINEERING
Statistics -S1.
Chapter 3 Describing Data Using Numerical Measures
Data Mining: Concepts and Techniques
Description of Data (Summary and Variability measures)
Laugh, and the world laughs with you. Weep and you weep alone
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Descriptive Statistics
Numerical Measures: Skewness and Location
Describing Data with Numerical Measures
Honors Statistics Review Chapters 4 - 5
Review for Exam 1 Ch 1-5 Ch 1-3 Descriptive Statistics
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
OAD30763 Statistics in Business and Economics
Presentation transcript:

Exploring Data 17 Jan 2012 Dr. Sean Ho busi275.seanho.com HW1 due Thu 10pm By Mon, send to set proposal meeting For lecture, please download: 01-SportsShoes.xls 01-SportsShoes.xls

17 Jan 2012 BUSI275: exploring data2 Outline for today Charts Histogram, ogive Scatterplot, line chart Descriptives: Centres: mean, median, mode Quantiles: quartiles, percentiles  Boxplot Variation: SD, IQR  CV, empirical rule, z-scores Probability Venn diagrams Union, intersection, complement

17 Jan 2012 BUSI275: exploring data3 Quantitative vars: histograms For quantitative vars (scale, ratio), must group data into classes e.g., length: 0-10cm, 10-20cm, 20-30cm... (class width is 10cm) Specify class boundaries: 10, 20, 30, … How many classes? for sample size of n, use k classes, where 2 k ≥ n Can use FREQUENCY() w/ column chart, or Data > Data Analysis > Histogram

17 Jan 2012 BUSI275: exploring data4 Cumulative distrib.: ogive The ogive is a curve showing the cumulative distribution on a variable: Frequency of values equal to or less than a given value Compute cumul. freqs. Insert > Line w/Markers Pareto chart is an ogive on a nominal var, with bins sorted by decreasing frequency Sort > Sort by: freq > Order: Large to small

17 Jan 2012 BUSI275: exploring data5 2 quant. vars: scatterplot Each participant in the dataset is plotted as a point on a 2D graph (x,y) coordinates are that participant's observed values on the two variables Insert > XY Scatter If more than 2 vars, then either 3D scatter (hard to see), or Match up all pairs: matrix scatter

17 Jan 2012 BUSI275: exploring data6 Time series: line graph Think of time as another variable Horizontal axis is time Insert > Line > Line

17 Jan 2012 BUSI275: exploring data7 Outline for today Charts Histogram, ogive Scatterplot, line chart Descriptives: Centres: mean, median, mode Quantiles: quartiles, percentiles  Boxplot Variation: SD, IQR  CV, empirical rule, z-scores Probability Venn diagrams Union, intersection, complement

17 Jan 2012 BUSI275: exploring data8 Descriptives: centres Visualizations are good, but numbers also help: Mostly just for quantitative vars Many ways to find the “centre” of a distribution Mean: AVERAGE()  Pop mean: μ ; sample mean: x  What happens if we have outliers? Median: line up all observations in order and pick the middle one Mode: most frequently occurring value  Usually not for continuous variables Statisti c AgeIncome Mean34.71$27, Median30$23, Mode24$19,000.00

17 Jan 2012 BUSI275: exploring data9 Descriptives: quantiles The first quartile, Q 1, is the value ¼ of the way through the list of observations, in order Similarly, Q 3 is ¾ of the way through What's another name for Q 2 ? In general the p th percentile is the value p% of the way through the list of observations Rank = (p/100)n: if fractional, round up  If exactly integer, average the next two Median = which percentile? Excel: QUARTILE(data, 3), PERCENTILE(data,.70)

17 Jan 2012 BUSI275: exploring data10 Box (and whiskers) plot Plot: median, Q 1, Q 3, and upper/lower limits: Upper limit = Q (IQR) Lower limit = Q 1 – 1.5(IQR) IQR = interquartile range = (Q 3 – Q 1 ) Observations outside the limits are considered outliers: draw as asterisks (*) Excel: try tweaking bar chartstry tweaking bar charts 25% 25% Outliers * * Lower lim Q1 Median Q3 Upper lim

17 Jan 2012 BUSI275: exploring data11 Boxplots and skew Right-SkewedLeft-SkewedSymmetric Q1Q2Q3Q1Q2Q3 Q1Q2Q3

17 Jan 2012 BUSI275: exploring data12 Boxplot Example Data: Right skewed, as the boxplot depicts: Min Q 1 Q 2 Q 3 Max * Upper limit = Q (Q 3 – Q 1 ) = (6 – 2) = is above the upper limit so is shown as an outlier

17 Jan 2012 BUSI275: exploring data13 Outline for today Charts Histogram, ogive Scatterplot, line chart Descriptives: Centres: mean, median, mode Quantiles: quartiles, percentiles  Boxplot Variation: SD, IQR  CV, empirical rule, z-scores Probability Venn diagrams Union, intersection, complement

17 Jan 2012 BUSI275: exploring data14 Measures of variation Spread (dispersion) of a distribution: are the data all clustered around the centre, or spread all over a wide range? Same center, different variation High variation Low variation

17 Jan 2012 BUSI275: exploring data15 Range, IQR, standard deviation Simplest: range = max – min Is this robust to outliers? IQR = Q 3 – Q 1 (“too robust”?) Standard deviation: Population: Sample: In Excel: STDEV() Variance is the SD w/o square root Pop.Samp. Mean μx SD σ s

17 Jan 2012 BUSI275: exploring data16 Coefficient of variation Coefficient of variation: SD relative to mean Expressed as a percentage / fraction e.g., Stock A has avg price x=$50 and s=$5 CV = s / x = 5/50 = 10% variation Stock B has x =$100 same standard deviation CV = s / x = 5/100 = 5% variation Stock B is less variable relative to its average stock price

17 Jan 2012 BUSI275: exploring data17 SD and Empirical Rule Every distribution has a mean and SD, but for most “nice” distribs two rules of thumb hold: Empirical rule: for “nice” distribs, approximately 68% of data lie within ± 1 SD of the mean 95% within ±2 SD of the mean 99.7% within ±3 SD NausicaaDistribution

17 Jan 2012 BUSI275: exploring data18 SD and Tchebysheff's Theorem For any distribution, at least (1-1/k 2 ) of the data will lie within k standard deviations of the mean Within (μ ± 1σ): ≥(1-1/1 2 ) = 0% Within (μ ± 2σ): ≥(1-1/2 2 ) = 75% Within (μ ± 3σ): ≥(1-1/3 2 ) = 89%

17 Jan 2012 BUSI275: exploring data19 z-scores Describes a value's position relative to the mean, in units of standard deviations: z = (x – μ )/ σ e.g., you got a score of 35 on a test: is this good or bad? Depends on the mean, SD: μ=30, σ=10: then z = +0.5: pretty good μ=50, σ=5: then z = -3: really bad!

17 Jan 2012 BUSI275: exploring data20 Outline for today Charts Histogram, ogive Scatterplot, line chart Descriptives: Centres: mean, median, mode Quantiles: quartiles, percentiles  Boxplot Variation: SD, IQR  CV, empirical rule, z-scores Probability Venn diagrams Union, intersection, complement

20 Sep 2011 BUSI275: Probability21 Probability Chance of a particular event happening e.g., in a sample of 1000 people, say 150 will buy your product: ⇒ the probability that a random person from the sample will buy your product is 15% Experiment: pick a random person (1 trial) Possible outcomes: {“buy”, “no buy”} Sample space: {“buy”, “no buy”} Event of interest: A = {“buy”} P(A) = 15%

20 Sep 2011 BUSI275: Probability22 Event trees Experiment: pick 3 people from the group Outcomes for a single trial: {“buy”, “no buy”} Sample space: {BBB, BBN, BNB, BNN, NBB, …} Event: A = {at least 2 people buy}: P(A) = ? P(BNB) = (.15)(.85)(.15)

20 Sep 2011 BUSI275: Probability23 Venn diagrams Box represents whole sample space Circles represent events (subsets) within SS e.g., for a single trial: A = “clicks on ad” B = “buys product” AB P(SS) = 1 P(A) =.35 P(B) =.15

20 Sep 2011 BUSI275: Probability24 Venn: set theory Complement: A = “does not click ad” P(A) = 1 - P(A) Intersection: A ∩ B = “clicks ad and buys” Union: A ∪ B = “either clicks ad or buys” A A A ∩ B A ∪ B

20 Sep 2011 BUSI275: Probability25 Addition rule: A ∪ B P(A ∪ B) = P(A) + P(B) - P(A ∩ B)

20 Sep 2011 BUSI275: Probability26 Addition rule: example 35% of the focus group clicks on ad: P(?) =.35 15% of the group buys product: P(?) =.15 45% are “engaged” with the company: either click ad or buy product: P(?) =.45 ⇒ What fraction of the focus group buys the product through the ad? P(A ∪ B) = P(A) + P(B) – P(A ∩ B) ? = ? + ? - ?

20 Sep 2011 BUSI275: Probability27 Mutual exclusivity Two events A and B are mutually exclusive if the intersection is null: P(A ∩ B) = 0 i.e., an outcome cannot satisfy both A and B simultaneously e.g., A = male, B = female e.g., A = born in Alberta, B = born in BC If A and B are mutually exclusive, then the addition rule simplifies to: P(A ∪ B) = P(A) + P(B)

20 Sep 2011 BUSI275: Probability28 Yep!

17 Jan 2012 BUSI275: exploring data29 TODO HW1 (ch1-2): due online, this Thu 19Jan Text document: well-formatted, complete English sentences Excel file with your work, also well-formatted HWs are to be individual work Get to know your classmates and form teams me when you know your team Discuss topics/DVs for your project Find existing data, or gather your own? Schedule proposal meeting during 23Jan - 3Feb