Download presentation
Presentation is loading. Please wait.
1
Anna Bombak, Chuck Humphrey, Lindsay Johnston and Leah Vanderjagt University of Alberta The Winter Institute on Statistical Literacy for Librarians Demystifying statistics for the practitioner
2
Outline Introductions Statistics and data: what are we talking about? Definitions, standards and metadata Official statistics: national Official statistics: international Census geography and small area statistics Non-official statistics
3
Introductions: your backgrounds Please introduce yourself Your name Your institutional affiliation Your librarian responsibilities Is there anything in particular that you are hoping in covered this workshop?
4
Introductions: your backgrounds You are equally split between non- academic and academic libraries. The largest group, with 13, is from universities other than the U of A. The second largest group, with 10, is from government libraries.
5
Introductions: your backgrounds Geographically, 21 of you are from Alberta and nine are from other provinces. We have representa- tion from Ontario, Manitoba, Saskatch- ewan and Alberta. Thirteen are from the Edmonton region.
6
Statistics: what are we talking about
7
Statistics are ubiquitous “Statistics are generated today about nearly every activity on the planet. Never before have we had so much statistical information about the world in which we live. Why is this type of information so abundant? For one thing, statistics have become a form of currency in today’s information society. Through computing technology, society has become very proficient in calculating statistics from the vast quantities of data that are collected. As a result, our lives involve daily transactions revolving around some use of statistical information.” Data Basics, page 1.1
8
Numeric information Statistics numeric facts/figures created from data, i.e, already processed presentation-ready Data numeric files created and organized for analysis/processing requires processing not display-ready
9
Numeric information Six dimensions or variables in this table The cells in the table are the number of estimated smokers. Geography Region Time Periods Unit of Observation Attributes Smokers Education Age Sex
10
Statistics are about definitions! Definitions Sex Total Male Female Periods 1994-1995 1996-1997
11
Statistics are about definitions! Some definitions are based on standards while others are based on convention or practice. For example, Standard Geography classifications Geography classifications
13
Numeric information
14
Stories are told through statistics The National Population Survey in the previous example had over 80,000 respondents in 1996-97 sample and the Canadian Community Health Survey in 2005 has over 130,000 cases. How do we tell the stories about each of these respondents? We create summaries of these life experiences using statistics.
15
Summary Statistics are derived from observational, experimental or simulated data. A table is a format for displaying statistics and presents a summary or one view of the data. Tables are structured around geography, time and attributes of the unit of observation. Statistics are dependent on definitions. Statistics summarize individual stories into common or general stories.
16
Methods producing data Observational Methods Experimental Methods Computational Methods Focus is on developing observational instruments to collect data Focus is on manipulating causal agents to measure change in a response agent Focus is on modeling phenomena through mathematical equations CorrelationCausationPrediction Replicate the analysis (same data or similar) Replicate the experiment Replicate the simulation Statistics summarize observations Statistics summarize experiment results Statistics summarize simulation results
17
Methods producing data A particular discipline or field will tend to be dominated by one of these three methods, although outputs may also exist from the other two methods. Consequently, the knowledge disseminated within a field is often fairly homogeneous in how statistical information is used and reported. Knowing this and the life cycle in which statistics are produced can help in the search for statistics.
18
Life cycle of survey statistics 1Program objective 2Survey unit organized 3Questionnaire & sample 4Data collection 5Data production & release 6Analysis 7Findings released 8Popularizing findings 9Needs & gaps evaluation 1 2 3 4 5 6 7 8 9 Access to Information
19
Life cycle of survey statistics 1Program objective 2Survey unit organized 3Questionnaire & sample 4Data collection 5Data production & release 6Analysis 7Official findings released 8Popularizing findings 9Needs & gaps evaluation 1 2 3 4 5 6 7 8 9 Preserving Information
20
Life cycle applied to health statistics 1Program objectives increased emphasis on health promotion and disease prevention; decentralization of accountability and decision- making; shift from hospital to community-based services; integration of agencies, programs and services; and increased efficiency and effectiveness in service delivery. 1 2 3 4 5 6 7 8 9 Health Information Roadmap Initiative
21
Life cycle applied to health statistics 1 2 3 4 5 6 7 8 9 Health Information Roadmap Initiative 2Survey unit organized 3Questionnaire & sample 4Data collection 5Data production & release 6Analysis 7Official findings released
22
Reconstructing statistics One way to see the relationship between statistics and the data upon which they were derived is to reconstruct statistics that someone else has produced from data that are publicly accessible.
23
Reconstructing statistics 1 2 3 4 5 6 7 8 9 Health Information Roadmap Initiative 1Program objective 2Survey unit organized 3Questionnaire & sample 4Data collection 5Data production & release 6Analysis 7Official findings released 8Popularizing findings 9Needs & gaps evaluation
24
The statistics that we will reconstruct are reported in “Health Facts from the 1994 National Population Health Survey,” Canadian Social Trends, Spring 1996, pp. 24-27. The steps we will follow are: identify the characteristics of the respondents in the article; identify the data source; locate these characteristics in the data documentation; find the original questions used to collect the data; retrieve the data; and run an analysis to reproduce the statistics. Reconstructing statistics
25
The findings to be replicated Page 26
26
Summary of variables identified Findings apply to Canadian adults Likely need age of respondents Men and women Look for the sex of respondents Type of drinkers Look for frequency of drinking or a variable categorizing types of drinkers Age Look for actual age or age in categories Smokers Look for smoking status
27
Identify the data source Survey title is identified: National Population Health Survey, 1994-95 Public-use microdata file is announced Page 25 of the article
28
Locate the variables Examine the data documentation for the National Population Health Survey, 1994-95 PDF version is on-line PDF version Use TOC and link to “Data Dictionary for Health” Identify the variables from their content NOTE: check how missing data were handled Trace the variables back the questionnaire Did sampling method require weighting cases? NOTE: in addition to the other variables, is a weight variable needed to adjust for the sampling method?
29
Retrieve and analyze the data For universities subscribed to the Statistics Canada Data Liberation Initiative (DLI), the public use microdata from the NPHS can be downloaded without additional cost. See the Statistics Canada Online Catalogue for further cost details. Make use of local data services to retrieve data from the NPHS.local data services to retrieve data
30
Lessons from the NPHS example This example demonstrates the distinction between creating statistics and interpreting statistics that have been created by others. This is an important distinction because: Choices are made in creating statistics. Interpreting statistics requires an ability to understand the choices that were made. Searching for statistics that others have created can be facilitated by understanding these points.
31
Provide a different perspective Building on the previous example using the NPHS, compare the statistics from an article about young adults giving and receiving help to their parents’ age cohort.
32
Statistics are about definitions
34
Look at the Census definitions Definitions are in the Census Handbook (2001) and the Census Dictionary (2006)Census Handbook Census Dictionary Search by Census Variable under Topic-Based Tabulations (2006) for value categorizations Search Look at some standard classifications used in statisticsstandard classifications SIC, NAICS, NOC, Standard Classification of Goods (SCG), Standard Geographic Classification (SGC), Classification of Instructional Programs (CIP), ICD10
35
Statistics in the News Three recent newspaper articles that include statistics in them have been selected for this exercise. For each of the articles, answer the following questions. What is the concept represented by the statistic or statistics in this story? Is a definition for this concept provided? If it is, what is it? Or is the definition implicit? Are any classifications identifiable? What are they? Are the data from which this statistic was derived identified in the article?
36
Metadata for describing tables As we have discussed, tables are a typical display format for statistics. Because tables are often published within an article, they don’t get indexed. Therefore, to find published tables requires a connection between characteristics in the table with other indexed content. Two indices of tables that exist are Statistical Universe and Tablebase. They use traditional elements to index tables without defining unique properties of tables.Statistical Universe Tablebase
37
Metadata for describing tables What are the properties of a table that we might use to develop useful descriptors for describing their content? What is the motivation for doing this exercise? Searching for tables that were indexed using such descriptors would allow finding statistics much easier. The movement toward open access journals and publishing lends an opportunity to introduce metadata elements for statistical tables. Once we have statistical tables described more comprehensively, opportunities will exist to link tables to the data sources from which the statistics in the table were derived.
38
Title ProducerDate Unit of Observation Variables Average Tuition Discipline Academic Year Province Statistical Metric Dollars Footnote
39
What are the metadata characteristics of tables & graphs? Is a title provided? Is an author, producer or agency identifiable? Is there a date of creation or publication? What is the entity that has been observed to make this statistic? That is, what is the unit of observation? Are the characteristics of the unit observation (i.e., variables) and their categories clearly identified and defined? Is there a key to explain the use of colours or lines in the graph? Is the type of statistic clearly identified? That is, does the table or graph contain percentages, counts, averages, etc.? Is there a scale for the numbers presented in the table or graph? Is there an overall figure or number (N) presented upon which the table or graph was calculated? Are there footnotes? Are geography, time and social content clearly expressed in the table or graph ?
40
Summary If statistical tables and graphs were described and indexed by rich metadata, our ability to locate statistics would be greatly enhanced. In the absence of such metadata, we use elements of this metadata structure to search our existing databases. The next generation of metadata in the field of data will work to integrate the description of both data and statistics.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.