Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probability & Statistics

Similar presentations


Presentation on theme: "Probability & Statistics"— Presentation transcript:

1 Probability & Statistics
Data (Bock Ch2)

2 What are Data? Data can be numbers, record names, or other labels.
T. Serino Data can be numbers, record names, or other labels. Not all data represented by numbers are numerical data (e.g., 1=male, 2=female). Data are useless without their context…

3 The “W’s” Remember that there are Six “W’s”
T. Serino To provide context we need the W’s Who What (and in what units) When Where Why (if possible) and How of the data. Note: the answers to “who” and “what” are essential. Remember that there are Six “W’s”

4 Organization! Organization is important!
T. Serino Organization is important! Here are some customer records from a company’s database: B001OAA 24 902 Boston 18 Kansas 10.99 Veterans Y B0015Y6 413 Garbage 43 Chicago N Fenway 440 17 15.98 368 312 Without some organization, what can be made of this data?

5 # of Previous Purchases
Data Table T. Serino The following data table clearly shows the context of the data presented: Notice that this data table tells us the What (column titles) and Who (row titles) for these data. Name Ship To Price Area Code # of Previous Purchases Gift? Catalog ID Artist Katherine H. Ohio 10.99 440 17 N B0015Y6 Kansas Samuel P. Illinois 16.99 312 3 Y B002BK9 Boston Chris G. New York 15.98 413 B0068ZVQ Chicago Monique D. Canada 11.98 902 10 B001OAA Garbage

6 # of Previous Purchases
Data Table T. Serino The “What” Name Ship To Price Area Code # of Previous Purchases Gift? Catalog ID Artist Katherine H. Ohio 10.99 440 17 N B0015Y6 Kansas Samuel P. Illinois 16.99 312 3 Y B002BK9 Boston Chris G. New York 15.98 413 B0068ZVQ Chicago Monique D. Canada 11.98 902 10 B001OAA Garbage The “Who”

7 # of Previous Purchases
Data Table T. Serino The “What” needs to labeled as either Categorical or Quantitative. If it is Quantitative, the units should be included. Name Ship To Price Area Code # of Previous Purchases Gift? Catalog ID Artist Katherine H. Ohio 10.99 440 17 N B0015Y6 Kansas Samuel P. Illinois 16.99 312 3 Y B002BK9 Boston Chris G. New York 15.98 413 B0068ZVQ Chicago Monique D. Canada 11.98 902 10 B001OAA Garbage Quantitative (Purchases) Quantitative ($) Categorical Categorical Categorical Categorical Categorical

8 Who T. Serino The Who of the data tells us the individual cases about which (or whom) we have collected data. Individuals who answer a survey are called respondents. People on whom we experiment are called subjects or participants. Animals, plants, and inanimate subjects are called experimental units.

9 Who (cont.) T. Serino Sometimes people just refer to data values as observations and are not clear about the Who. But we need to know the Who of the data so we can learn what the data say.

10 What T. Serino The “what” are Variables or characteristics recorded about each individual. The variables should have a name that identify What has been measured. To understand variables, you must Think about what you want to know.

11 What T. Serino Some variables have units that tell how each value has been measured and tell the scale of the measurement.

12 What T. Serino A categorical variable names categories and answers questions about how cases fall into those categories. Categorical examples: sex, race, ethnicity A quantitative variable is a measured variable (with units) that answers questions about the quantity of what is being measured. Quantitative examples: income ($), height (inches), weight (pounds)

13 What T. Serino Example: An online store that sells sports memorabilia keeps track of the addresses of all of their customers. One of the variables they keep track of are customers’ zip codes. Question: Are zip codes categorical or quantitative?

14 What Question: Are zip codes categorical or quantitative?
T. Serino Question: Are zip codes categorical or quantitative? Although zip codes are numbers and can be put in order, there are no natural units for the variable zip code. Variables like “zip code” are considered categorical data.

15 Why T. Serino Why we are collecting data is important in understanding what we think about and how we treat the variables.

16 Where, When, and How T. Serino We need the Who, What, and Why to analyze data. But, the more we know, the more we understand. When and Where give us some nice information about the context. Example: Values recorded at a large public university may mean something different than similar values recorded at a small private college. (The “where” makes a difference)

17 Where, When, and How T. Serino How the data are collected can make the difference between insight and nonsense. Example: results from voluntary Internet surveys are often useless The first step of any data analysis should be to examine the W’s—this is a key part of the Think step of any analysis. And, make sure that you know the Why, Who, and What before you proceed with your analysis.

18 What can go wrong? T. Serino Don’t label a variable as categorical or quantitative without thinking about the question you want it to answer. Just because your variable’s values are numbers, don’t assume that it’s quantitative. Always be skeptical—don’t take data for granted.

19 Summary Why: Why the data was collected?
T. Serino Data are information in a context. The W’s help with context. We must know the Who (cases), What (variables), and Why to be able to say anything useful about the data. Why: Why the data was collected? Who: Who (or what) the data is about? When and Where: When and Where the data were recorded? What: The variables (and their units). Both categorical and quantitative measuerments about the “who”. (What about them?) How: How was the data collected? Was is a legitimate source?

20 Summary (cont.) T. Serino We treat variables (the “what”) as categorical or quantitative. Categorical variables identify a category for each case. Quantitative variables record measurements or amounts of something and must have units. Some variables can be treated as categorical or quantitative depending on what we want to learn from them.

21 Example T. Serino For the following description of data, identify the W’s, name the variables, classify each variable as categorical or quantitative, and for any quantitative variable, identify the units in which it was measured (or state that they were not provided). According to an article in Fortune (Dec 28, 1992), 401(k) plans permit employees to shift part of their before-tax salaries into investments such as mutual funds. Employers typically match 50% of the employees’ contribution up to about 6% of salary. One company, concerned with what it believed was a low employee participation rate in its 401(k) plan, sampled 30 other companies with similar plans and asked for their 401(k) participation rates.

22 Example Who: 30 other companies
T. Serino According to an article in Fortune (Dec 28, 1992), 401(k) plans permit employees to shift part of their before-tax salaries into investments such as mutual funds. Employers typically match 50% of the employees’ contribution up to about 6% of salary. One company, concerned with what it believed was a low employee participation rate in its 401(k) plan, sampled 30 other companies with similar plans and asked for their 401(k) participation rates. Who: 30 other companies Although the interest here is in 401(k) plans, the 30 companies are the subject being studied.

23 Example What: Employer’s contribution – quantitative (%)
T. Serino According to an article in Fortune (Dec 28, 1992), 401(k) plans permit employees to shift part of their before-tax salaries into investments such as mutual funds. Employers typically match 50% of the employees’ contribution up to about 6% of salary. One company, concerned with what it believed was a low employee participation rate in its 401(k) plan, sampled 30 other companies with similar plans and asked for their 401(k) participation rates. Be sure to label each variable as quantitative or categorical What: Employer’s contribution – quantitative (%) Contribution limit – quantitative (%) Participation rate – ??? Participation rate could be quantitative or categorical depending on how it is measured. If it is measured by number of employees or percent of employees, then it would be quantitative. If it is categorized as high, medium, or low, then participation rate would be categorical.

24 Example Why: to improve participation rates
T. Serino According to an article in Fortune (Dec 28, 1992), 401(k) plans permit employees to shift part of their before-tax salaries into investments such as mutual funds. Employers typically match 50% of the employees’ contribution up to about 6% of salary. One company, concerned with what it believed was a low employee participation rate in its 401(k) plan, sampled 30 other companies with similar plans and asked for their 401(k) participation rates. Why: to improve participation rates Because the company has low employee participation in its 401(k) plan, we can infer that the study is being done in an attempt to improve participation rates.

25 Example Where: not specified
T. Serino According to an article in Fortune (Dec 28, 1992), 401(k) plans permit employees to shift part of their before-tax salaries into investments such as mutual funds. Employers typically match 50% of the employees’ contribution up to about 6% of salary. One company, concerned with what it believed was a low employee participation rate in its 401(k) plan, sampled 30 other companies with similar plans and asked for their 401(k) participation rates. Where: not specified Although the “where” is not specified here, we can assume that it is in the area that Fortune magazine services. If the magazine were local, then we could assume that the “where” is local. If the magazine were only distributed in the U.S. then we could assume that the “where” is in the U.S. only.

26 Example When: before Dec. 28, 1992
T. Serino According to an article in Fortune (Dec 28, 1992), 401(k) plans permit employees to shift part of their before-tax salaries into investments such as mutual funds. Employers typically match 50% of the employees’ contribution up to about 6% of salary. One company, concerned with what it believed was a low employee participation rate in its 401(k) plan, sampled 30 other companies with similar plans and asked for their 401(k) participation rates. When: before Dec. 28, 1992 This passage seems to imply that the study of the 30 companies was part of the magazine article. If this is so, then the data must have been collected before the article was published. If the company had done the study as a result of the magazine article, then we would have said that the “when” was after Dec. 28, 1992.

27 Example How: Sampled other companies
T. Serino According to an article in Fortune (Dec 28, 1992), 401(k) plans permit employees to shift part of their before-tax salaries into investments such as mutual funds. Employers typically match 50% of the employees’ contribution up to about 6% of salary. One company, concerned with what it believed was a low employee participation rate in its 401(k) plan, sampled 30 other companies with similar plans and asked for their 401(k) participation rates. How: Sampled other companies How the sample was conducted is not specified. It could have been by asking the company, sending them a survey to fill out, or by simply checking company records. We do know that they sampled the other companies; a more specific “how” was not specified.

28 athematical M D ecision aking


Download ppt "Probability & Statistics"

Similar presentations


Ads by Google