Ｌｅｃｔｕｒｅ 1 Describing Data.

Ｌｅｃｔｕｒｅ 1 Describing Data

Histogram and frequency table
Example Visualizing your clients’ age range using　a histogram.

Histogram Example Age range Frequency ～15 ～20 ～25 4 ～30 5 ～35 11 ～40
～20 ～25 4 ～30 5 ～35 11 ～40 ～45 6 ～50 ～55 2 ～60 More

From the histogram, we can learn that
Clients of age between 31~35 and 36~40 are the primary clients. It is important to maintain the satisfaction of these clients. Provide new services for other age ranges to increase client base.

Making Histogram and Frequency Table
Open the data “Clients list” which is stored in our Applied Stat Folder. This is the data for the histogram shown in the previous slides.

Numerical measures of data summary (I)
Difference between Population and Sample Mean (Average) Median

Difference between Population and Sample
A population is the complete set of all items in which an investigator is interested.

Examples of Populations
Names of all registered voters in the United States. Incomes of all families living in Daytona Beach. Grade point averages of all the students in your university.

A major objective of statistics is to make an inference about the population. For example “What is the average income of all families living in Daytona Beach?” Often, collecting the data for the population is costly or impossible. Therefore, we often collect data for only a part of the population. Such data is called a “Sample”.

Sample A sample is an observed subset of population values.

Numerical measure of summarizing data 1-1 Mean (Average)
How to compute the mean (average) Understanding the mathematical notation of the mean (average) Cautionary notes for the use of the mean

1-2 How to compute the mean
Sum all the data, then divide it by the number of observations. We use the term “sample size” to mean the number of observations.

1-3 Computing the mean: an example
Client ID Age 1 49 2 37 3 48 4 46 5 This is a sample data of the ages of your business clients. Compute the mean age of your clients in this sample. Note that this is a typical data format that we will encounter in this course. It has the observation id (Client ID), and the value of the variable of interest (age) for each observation.

2-1 Understanding the mathematical notation of the mean
Observation id Variable X 1 x1 2 x2 3 x3 . n xn This is one of the most common format of data that we deal with. In the first column, we have the observation id, and the second column has the value for each observation. (Often observation id is omitted) In the previous example, variable X is the age of the clients. Then observation id =1 means that this is the first customer in your customer list, and x1 is the age of the customer.

2-2 Understanding the mathematical notation of the mean
Observation id Variable X 1 x1 2 x2 3 x3 . n xn When a data set is given in this format, the sample mean of the variable X, denoted by ,is given by The notation, is the summation notation. This is simply the sum from x1 to xn

2-3 Sample Mean and Population Mean
Most often we use a sample data. For example, if we want to know the popularity rating of the current government, we may use data from 10,000 interviews. This is just a part of the whole voting population. Though not often, we may have the data from the whole population.

2-4 Sample Mean and Population Mean
Later, it will become convenient to distinguish sample mean and population mean. Thus we will use different notations for the sample mean and the population mean.

2-5 Notations for the sample mean and the population mean
For a sample mean, we use the following notation For the population mean, we use μ. We also use upper case N to denote the population size.

3-1 Cautionary note : Mean (average) is not necessarily the “center of the data”

3-2 Example “The average Japanese household savings in year 2005 is ￥17,280,000” This data may make you feel “well, if I do not have this much savings, I am not normal” Now, take a look at the histogram of the household savings in the next slide.

The mean may not be “the center of the data”: An example
About 50% of people are here

One may think that the average is the “normal household”
One may think that the average is the “normal household”. However, you can see that a lot of households have savings much less than the average. The average savings is very high because a few households have huge savings. In such case, “median” can give you a better sense of a “normal household”. The definition of the median is given in the next slide.

4-1 Median Sort the data in an ascending order. Then the median is the value in the middle (middle observation) When the number of observations is an even number, then there is no “middle observation”. In such case, take the average of the two middle numbers

4-2 Median Exercise Open the file “ Computation of median A”. This data contains the age of a company’s clients. Find the median age of this sample Open the file “Computation of median B”. This data contains the revenue of bag sales. Find the median of this sample.

Japanese household savings revisited

Corresponding chapters
This lecture note covers the following topics of the textbook: 1.2 Sampling 3.1 Arithmetic Mean, Median

Ｌｅｃｔｕｒｅ 1 Describing Data.

Similar presentations

Presentation on theme: "Ｌｅｃｔｕｒｅ 1 Describing Data."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ｌｅｃｔｕｒｅ 1 Describing Data.

Similar presentations

Presentation on theme: "Ｌｅｃｔｕｒｅ 1 Describing Data."— Presentation transcript:

Similar presentations

About project

Feedback