Dr. Hong Zhang
Tables and Graphs Populations and Samples Mean, Median, and Standard Deviation Standard Error & 95% Confidence Interval (CI) Error Bars Comparing Means of Two Data Sets Linear Regression (LR) Coefficient of Correlation
Statistics is a huge field, I’ve simplified considerably here. For example: ◦ Mean, Median, and Standard Deviation There are alternative formulas ◦ Standard Error and the 95% Confidence Interval There are other ways to calculate CIs (e.g., z statistic instead of t; difference between two means, rather than single mean…) ◦ Error Bars Don’t go beyond the interpretations I give here! ◦ Comparing Means of Two Data Sets We just cover the t test for two means when the variances are unknown but equal, there are other tests ◦ Linear Regression We only look at simple LR and only calculate the intercept, slope and R2. There is much more to LR!
All of the possible outcomes of experiment or observation ◦ US population ◦ Cars in market A large population may be impractical and costly to study. It might be impossible to collect data from every member of the population. ◦ Weight and height of every US citizen ◦ Quality of every car in market
A part of population that we actually measure or observe and to draw outcome or conclusion ◦ 1000 US citizens ◦ 100 cars We use samples to estimate population properties ◦ Use 1000 US citizens to estimate the height of entire US population ◦ Use 100 cars to estimate quality of all Toyota Corolla cars under 3 years old
Sample should fully represent the entire population. ◦ Good Randomly select 1000 names from a phone book to represent the region Randomly select 100 cars from DMV record ◦ Bad Use a college campus to represent the country Use cars in dealers lot to represent cars in market Reporters randomly stop 3 persons on street for opinions
Sum of values divided by number of samples, also called Average Example: ◦ Data: 3, 8, 5, 10, 4, 6 ◦ Sum = = 36 ◦ Number of samples (data points) = 6 ◦ Mean = 36 / 6 = 6 Exercise ◦ Mean of height of the entire class ◦ Average commute time of the students
Bill Gates comes to give a presentation to 100 of students in Rowan Auditorium. Suppose the personal wealth of Bill Gates is $50 billion. The personal wealth of each student is $0. What is the mean of the personal wealth for the entire population in the room?
V alue of the middle item of data arranged in increasing or decreasing order of magnitude Example: ◦ Data: 3, 8, 5, 10, 4, 6 ◦ Rearrange: 3, 4, 5, 6, 8, 10 ◦ The middle two are 5 & 6, the average of the two is 5.5 ◦ The mean of the data set is 5.5 Exercise: ◦ Medium height of the class ◦ Medium commute time of the class ◦ Medium personal wealth in the room with Bill Gates.
Data Points: 3, 8, 5, 10, 4, 6
Standard deviation of mean ◦ Sample size n ◦ taken from population with standard deviation s ◦ Estimate of mean depends on sample selected ◦ As n , variance of mean estimate goes down, i.e., estimate of population mean improves ◦ As n , mean estimate distribution approaches normal, regardless of population distribution
μ: Mean, n: Sample size, x i : Data point
For n > 30 For n < 30
S=s 2
Data:
Flip a coin, chances of upside up and downside up are equal. (It’s also called binomial dist.) up dow n 50%
Normal distribution ◦ Women’s shoe size sold by a shoe store.
Chemical distribution of a well mixed compound
where X is a normal random variable, μ is the mean, σ is the standard deviation, π is approximately , and e is approximately
NσConfidence IntervalsError per million sigma
Rank k has a frequency roughly proportional to 1/k, or more accurately P n =a/n b Developed by George Kingsley Zipf Occurs naturally in many situations ◦ City population ◦ Colors in images ◦ Call center ◦ Website traffic
Rank Word Freq % Freq Theoretical Zipf Distribution 1 the of and to a in that is was he for it with as his on be at by I
If a distribution gives us a straight line on a log-log scale, then we can say that it is a Zipf Distribution.
Count the vehicles in Rowan Parking lots ◦ Distribution of colors ◦ Distribution of cars and trucks ◦ Distribution of last letter (digit) of license number Select a parking lot Design a strategy to count Design a method to record data Design a method to represent result Write a one page report per group
White:2 Black:1 Red:2 Blue:2 Silver:4 Gold: 1 Beige: 1
Voltage (V)Height (in) Result for Pressure Transducer Calibration
Time (s) Voltage (V)
Time (s) Voltage (V)log(Voltage)
Concentration (Mol/ft 3 ) Reaction Rate (Mol/s)
ConcentrationReaction Rate log (concentration) log (reaction rate)
Table 1: Average Turbidity and Color of Water Treated by Portable Water Filters Consistent Format, Title, Units, Big Fonts Differentiate Headings, Number Columns
11 Figure 1: Turbidity of Pond Water, Treated and Untreated Consistent Format, Title, Units Good Axis Titles, Big Fonts