Anna Bombak, Chuck Humphrey, Lindsay Johnston and Leah Vanderjagt University of Alberta The Winter Institute on Statistical Literacy for Librarians Demystifying.

Slides:



Advertisements
Similar presentations
MICS 3 DATA ANALYSIS AND REPORT WRITING. Purpose Provide an overview of the MICS3 process in analyzing data Provide an overview of the preparation of.
Advertisements

Aggregate Data and Statistics
DLI Orientation: Concepts A Framework for Thinking about Statistical Information Train the Trainers Montreal, March 9, 2004 Chuck Humphrey Data Library.
Chuck Humphrey Data Library University of Alberta.
Designing a Continuum of Learning to Assess Mathematical Practice NCSM April, 2011.
Anna Bombak, Chuck Humphrey, Lindsay Johnston and Leah Vanderjagt University of Alberta The Winter Institute on Statistical Literacy for Librarians Demystifying.
Geo-referenced data and DLI aggregate data sources Chuck Humphrey University of Alberta September 29, 2008.
Demystifying Data Reference Helping non-specialists make sense of data.
Quantitative Evidence for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 26, 2009.
Chuck Humphrey University of Alberta Digital Reference: Statistics & Data LIS 536 March 4, 2009.
Providing Context for Understanding The Data Life Cycle and the National Population Health Survey E. Hamilton IASSIST 2005.
SOCI 380 INSTRUCTIONS RE. RESEARCH PAPER DUE DATE: The research paper is due on the last day of class You are required to write and submit a detailed research.
Chuck Humphrey, Leah Vanderjagt and Anna Bombak University of Alberta The Winter Institute on Statistical Literacy for Librarians Demystifying statistics.
Chuck Humphrey & Lynne Robinson University of Alberta Surviving Statistics Strategies for dealing with statistical questions on the reference desk.
Searching the University of Alberta Library’s Statistics Canada-based Websites 2001 Census of Canada Canadian Centre for Justice Statistics Canadian Business.
Anna Bombak, Chuck Humphrey, Lindsay Johnston, Angie Mandeville and Leah Vanderjagt Winter Institute on Statistical Literacy for Librarians, February 18-20,
Chuck Humphrey, Leah Vanderjagt and Anna Bombak University of Alberta The Winter Institute on Statistical Literacy for Librarians Demystifying statistics.
Quantitative Evidence for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library March 6, 2009.
Statistics and Data for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 27, 2008.
EAS 293 Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 14, 2008.
CHAPTER 14, QUANTITATIVE DATA ANALYSIS. Chapter Outline  Quantification of Data  Univariate Analysis  Subgroup Comparisons  Bivariate Analysis  Introduction.
Geo-referenced data and DLI aggregate data sources Chuck Humphrey University of Alberta ACCOLEDS 2007.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Anna Bombak, Chuck Humphrey, Angie Mandeville, Leah Vanderjagt and Amanda Wakaruk Winter Institute on Statistical Literacy for Librarians, February 23-25,
The Winter Institute on Statistical Literacy for Librarians Demystifying statistics for the practitioner Anna Bombak, Chuck Humphrey, Larry Laliberte,
Introduction to Statistical Literacy : A Low pain and high gain presentation Garth Homer, 02/11/09.
FCM Quality of Life Reporting System Metadata By: Acacia Consulting and Research June 2002.
Statistics are ubiquitous “Statistics are generated today about nearly every activity on the planet. Never before have we had so much statistical information.
Welcome to Common Core High School Mathematics Leadership
Packaged Serendipity: Preserving Context through Metadata Robert Cole Sharon Farnel Chuck Humphrey Digital Preservation Seminar University of Alberta 5.
Health Statistics Information on STC website Calgary–DLI training–Dec 2003 Michel B. Séguin, Statistics Canada,
Data and Social Research Chuck Humphrey Data Library Rutherford North Library.
Chuck Humphrey, University of Alberta Atlantic DLI Training, 2008 DLI Orientation: Concepts A Framework for Thinking about Data and Statistics.
1 The 2001 Census PUMFS Odyssey Sponsored by HAL and PALS Presented by Chuck Humphrey.
DLI Workshop -- Mar Hosted by Dalhousie University March 2000 DLI Training Workshop.
The Process of Conducting Research
6.1 WELCOME TO COMMON CORE HIGH SCHOOL MATHEMATICS LEADERSHIP SUMMER INSTITUTE 2014 SESSION 6 23 JUNE 2014 TWO-WAY TABLES AND ASSOCIATION.
Chuck Humphrey, University of Alberta Digital Reference: Statistics and Data LIS 536 March 5, 2008.
American Community Survey (ACS) 1 Oregon State Data Center Meeting Portland State University April 14,
The Census of Canada and Immigration & Ethno-cultural Data Chuck Humphrey University of Alberta February 10, 2006.
DLI Boot Camp 2011 Finding Statistics: Tools and Techniques Jean Blackburn Vancouver Island University Library SDA.
The Practice of Social Research Chapter 14 – Quantitative Data Analysis.
United Nations Economic Commission for Europe Statistical Division Mapping Data Production Processes to the GSBPM Steven Vale UNECE
PROCESSING OF DATA The collected data in research is processed and analyzed to come to some conclusions or to verify the hypothesis made. Processing of.
Information Sources Focus: The Census October 2007 S.Mowers and the GSG team.
DATA and STATISTICS … at your service! S.Mowers & the GSG team ©2009, University of Ottawa.
1 Statistics Statistics can be found in all aspects of life:
Sociology 343 Chuck Humphrey Data Library University of Alberta.
Title Page The title page is the first page of your psychology paper. In order to make a good first impression, it is important to have a well-formatted.
1-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Principals of Research Writing. What is Research Writing? Process of communicating your research  Before the fact  Research proposal  After the fact.
Stretching Your Data Management Skills Chuck Humphrey University of Alberta Atlantic DLI Workshop 2003.
Data in context Chapter 1 of Data Basics. Frameworks Today, we will be presenting two frameworks for thinking about the content of data services. A.Statistics.
Research Methodology II Term review. Theoretical framework  What is meant by a theory? It is a set of interrelated constructs, definitions and propositions.
Hosted by the University of Regina Library December 1999 DLI Training Workshop Chuck Humphrey.
Data and Statistics: As easy as 1-2-3? Carolyn DeLorey, MLIS St. Francis Xavier University Atlantic DLI Workshop UNB Fredericton April 28, 2015.
User Services Focus, value and attitude Vocabulary stories: wash & wear, circ & dingo Statistics and data.
What is Research?. Intro.  Research- “Any honest attempt to study a problem systematically or to add to man’s knowledge of a problem may be regarded.
Health Statistics 2016 DLI Atlantic Training
Sociology. Sociology is a science because it uses the same techniques as other sciences Explaining social phenomena is what sociological theory is all.
Chapter 29 Conducting Market Research. Objectives  Explain the steps in designing and conducting market research  Compare primary and secondary data.
Biostatistics Dr. Amjad El-Shanti MD, PMH,Dr PH University of Palestine 2016.
Information Sources Focus: The Census October 2008 S.Mowers and the GSG team.
DLI Orientation: Concepts
Data Management: Documentation & Metadata
An Example of Working with Data Documentation
University of Regina Library
Mapping Data Production Processes to the GSBPM
The role of metadata in census data dissemination
STEPS Site Report.
Presentation transcript:

Anna Bombak, Chuck Humphrey, Lindsay Johnston and Leah Vanderjagt University of Alberta The Winter Institute on Statistical Literacy for Librarians Demystifying statistics for the practitioner

Outline Introductions Statistics and data: what are we talking about? Definitions, standards and metadata Official statistics: national Official statistics: international Census geography and small area statistics Non-official statistics

Introductions: your backgrounds Please introduce yourself  Your name  Your institutional affiliation  Your librarian responsibilities  Is there anything in particular that you are hoping in covered this workshop?

Introductions: your backgrounds You are equally split between non- academic and academic libraries. The largest group, with 13, is from universities other than the U of A. The second largest group, with 10, is from government libraries.

Introductions: your backgrounds Geographically, 21 of you are from Alberta and nine are from other provinces. We have representa- tion from Ontario, Manitoba, Saskatch- ewan and Alberta. Thirteen are from the Edmonton region.

Statistics: what are we talking about

Statistics are ubiquitous “Statistics are generated today about nearly every activity on the planet. Never before have we had so much statistical information about the world in which we live. Why is this type of information so abundant? For one thing, statistics have become a form of currency in today’s information society. Through computing technology, society has become very proficient in calculating statistics from the vast quantities of data that are collected. As a result, our lives involve daily transactions revolving around some use of statistical information.” Data Basics, page 1.1

Numeric information Statistics numeric facts/figures created from data, i.e, already processed presentation-ready Data numeric files created and organized for analysis/processing requires processing not display-ready

Numeric information Six dimensions or variables in this table The cells in the table are the number of estimated smokers. Geography Region Time Periods Unit of Observation Attributes Smokers Education Age Sex

Statistics are about definitions! Definitions Sex Total Male Female Periods

Statistics are about definitions! Some definitions are based on standards while others are based on convention or practice. For example, Standard Geography classifications Geography classifications

Numeric information

Stories are told through statistics The National Population Survey in the previous example had over 80,000 respondents in sample and the Canadian Community Health Survey in 2005 has over 130,000 cases. How do we tell the stories about each of these respondents? We create summaries of these life experiences using statistics.

Summary Statistics are derived from observational, experimental or simulated data. A table is a format for displaying statistics and presents a summary or one view of the data. Tables are structured around geography, time and attributes of the unit of observation. Statistics are dependent on definitions. Statistics summarize individual stories into common or general stories.

Methods producing data Observational Methods Experimental Methods Computational Methods Focus is on developing observational instruments to collect data Focus is on manipulating causal agents to measure change in a response agent Focus is on modeling phenomena through mathematical equations CorrelationCausationPrediction Replicate the analysis (same data or similar) Replicate the experiment Replicate the simulation Statistics summarize observations Statistics summarize experiment results Statistics summarize simulation results

Methods producing data A particular discipline or field will tend to be dominated by one of these three methods, although outputs may also exist from the other two methods. Consequently, the knowledge disseminated within a field is often fairly homogeneous in how statistical information is used and reported. Knowing this and the life cycle in which statistics are produced can help in the search for statistics.

Life cycle of survey statistics 1Program objective 2Survey unit organized 3Questionnaire & sample 4Data collection 5Data production & release 6Analysis 7Findings released 8Popularizing findings 9Needs & gaps evaluation Access to Information

Life cycle of survey statistics 1Program objective 2Survey unit organized 3Questionnaire & sample 4Data collection 5Data production & release 6Analysis 7Official findings released 8Popularizing findings 9Needs & gaps evaluation Preserving Information

Life cycle applied to health statistics 1Program objectives increased emphasis on health promotion and disease prevention; decentralization of accountability and decision- making; shift from hospital to community-based services; integration of agencies, programs and services; and increased efficiency and effectiveness in service delivery Health Information Roadmap Initiative

Life cycle applied to health statistics Health Information Roadmap Initiative 2Survey unit organized 3Questionnaire & sample 4Data collection 5Data production & release 6Analysis 7Official findings released

Reconstructing statistics One way to see the relationship between statistics and the data upon which they were derived is to reconstruct statistics that someone else has produced from data that are publicly accessible.

Reconstructing statistics Health Information Roadmap Initiative 1Program objective 2Survey unit organized 3Questionnaire & sample 4Data collection 5Data production & release 6Analysis 7Official findings released 8Popularizing findings 9Needs & gaps evaluation

The statistics that we will reconstruct are reported in “Health Facts from the 1994 National Population Health Survey,” Canadian Social Trends, Spring 1996, pp The steps we will follow are:  identify the characteristics of the respondents in the article;  identify the data source;  locate these characteristics in the data documentation;  find the original questions used to collect the data;  retrieve the data; and  run an analysis to reproduce the statistics. Reconstructing statistics

The findings to be replicated Page 26

Summary of variables identified Findings apply to Canadian adults  Likely need age of respondents Men and women  Look for the sex of respondents Type of drinkers  Look for frequency of drinking or a variable categorizing types of drinkers Age  Look for actual age or age in categories Smokers  Look for smoking status

Identify the data source Survey title is identified: National Population Health Survey, Public-use microdata file is announced Page 25 of the article

Locate the variables Examine the data documentation for the National Population Health Survey,  PDF version is on-line PDF version Use TOC and link to “Data Dictionary for Health” Identify the variables from their content  NOTE: check how missing data were handled Trace the variables back the questionnaire Did sampling method require weighting cases?  NOTE: in addition to the other variables, is a weight variable needed to adjust for the sampling method?

Retrieve and analyze the data For universities subscribed to the Statistics Canada Data Liberation Initiative (DLI), the public use microdata from the NPHS can be downloaded without additional cost. See the Statistics Canada Online Catalogue for further cost details. Make use of local data services to retrieve data from the NPHS.local data services to retrieve data

Lessons from the NPHS example This example demonstrates the distinction between creating statistics and interpreting statistics that have been created by others. This is an important distinction because: Choices are made in creating statistics. Interpreting statistics requires an ability to understand the choices that were made. Searching for statistics that others have created can be facilitated by understanding these points.

Provide a different perspective Building on the previous example using the NPHS, compare the statistics from an article about young adults giving and receiving help to their parents’ age cohort.

Statistics are about definitions

Look at the Census definitions Definitions are in the Census Handbook (2001) and the Census Dictionary (2006)Census Handbook Census Dictionary Search by Census Variable under Topic-Based Tabulations (2006) for value categorizations Search Look at some standard classifications used in statisticsstandard classifications SIC, NAICS, NOC, Standard Classification of Goods (SCG), Standard Geographic Classification (SGC), Classification of Instructional Programs (CIP), ICD10

Statistics in the News Three recent newspaper articles that include statistics in them have been selected for this exercise. For each of the articles, answer the following questions.  What is the concept represented by the statistic or statistics in this story?  Is a definition for this concept provided? If it is, what is it? Or is the definition implicit?  Are any classifications identifiable? What are they?  Are the data from which this statistic was derived identified in the article?

Metadata for describing tables As we have discussed, tables are a typical display format for statistics. Because tables are often published within an article, they don’t get indexed. Therefore, to find published tables requires a connection between characteristics in the table with other indexed content. Two indices of tables that exist are Statistical Universe and Tablebase. They use traditional elements to index tables without defining unique properties of tables.Statistical Universe Tablebase

Metadata for describing tables What are the properties of a table that we might use to develop useful descriptors for describing their content? What is the motivation for doing this exercise?  Searching for tables that were indexed using such descriptors would allow finding statistics much easier.  The movement toward open access journals and publishing lends an opportunity to introduce metadata elements for statistical tables.  Once we have statistical tables described more comprehensively, opportunities will exist to link tables to the data sources from which the statistics in the table were derived.

Title ProducerDate Unit of Observation Variables Average Tuition Discipline Academic Year Province Statistical Metric Dollars Footnote

What are the metadata characteristics of tables & graphs? Is a title provided? Is an author, producer or agency identifiable? Is there a date of creation or publication? What is the entity that has been observed to make this statistic? That is, what is the unit of observation? Are the characteristics of the unit observation (i.e., variables) and their categories clearly identified and defined? Is there a key to explain the use of colours or lines in the graph? Is the type of statistic clearly identified? That is, does the table or graph contain percentages, counts, averages, etc.? Is there a scale for the numbers presented in the table or graph? Is there an overall figure or number (N) presented upon which the table or graph was calculated? Are there footnotes? Are geography, time and social content clearly expressed in the table or graph ?

Summary If statistical tables and graphs were described and indexed by rich metadata, our ability to locate statistics would be greatly enhanced. In the absence of such metadata, we use elements of this metadata structure to search our existing databases. The next generation of metadata in the field of data will work to integrate the description of both data and statistics.