Misinterpretation of data, the importance of metadata and STC math Misinterpretation of data, the importance of metadata and STC math DLI Atlantic Training DLI Atlantic Training April 2005
Data Misinterpretation: Crime Rates Ebert & Roeper review of Michael Wilson movie “Michael Moore hates America” Ebert doubted claim that Canadian crime rate 2X the USA rate Moorelies.com | News: Whoa; Stuart Didn't See That One Coming Moorelies.com | News: Whoa; Stuart Didn't See That One Coming Ebert conceded that the statistics supported claim - figures were right BUT - comparison of STC and US Bureau of Justice website shows how statistics misinterpreted Crimes per 100,000 population CanadaUSA All Crimes 8,5304,267 Violent crimes Property crimes 4,2753,744
Comparative Crime Rates Simplistic comparison –Similar category titles on violent and property crimes but different definitions –Violent crime 2-3 times higher in US, property crimes close –Bureau of Justice Statistics Crime & Justice Data Online Bureau of Justice Statistics Crime & Justice Data OnlineBureau of Justice Statistics Crime & Justice Data Online –Canadian Statistics - Crimes by type of offence Canadian Statistics - Crimes by type of offenceCanadian Statistics - Crimes by type of offence Crimes per 100,000 population CanadaUSA Violent crime homicide robbery85146 comparison of US (rape and aggravated assault) difficult with Cdn sexual assault and assaults) Property Crime B & E (Cdn) – Burglary (US) Theft (Cdn) - Larceny & Theft (US) 2,1912,446 Motor Vehicle theft
US Crime Data
Canadian Crime Data
Data Misinterpretation: Drinking Habits of Canadians Data Misinterpretation: Drinking Habits of Canadians Initial analysis of the 1990 Health Promotion Survey, indicated Canadians enjoyed an average 60 drinks per day….
Data Misinterpretation: Importance of Metadata Data Misinterpretation: Importance of Metadata 1990 Health Promotion Survey there were a series of questions about alcohol consumption. First they asked if the respondent EVER drank alcohol, and if YES asked if they drank within the last 12 months and if YES asked for number of drinks for each day for the past 7 days. The code book showed number of drinks per day as: 81 F4MON ‑ 0097 HOW MANY DRINKS DID YOU HAVE ON: MONDAY 81 F4MON ‑ 0097 HOW MANY DRINKS DID YOU HAVE ON: MONDAY 00 NONE NONE :40 NUMBER OF DRINKS :40 NUMBER OF DRINKS MORE THAN 40 DRINKS MORE THAN 40 DRINKS QUESTION NOT ASKED QUESTION NOT ASKED NOT STATED NOT STATED F4TUE ‑ 0099 HOW MANY DRINKS DID YOU HAVE ON: TUESDAY 00 NONE :40 NUMBER OF DRINKS F4TUE ‑ 0099 HOW MANY DRINKS DID YOU HAVE ON: TUESDAY 00 NONE :40 NUMBER OF DRINKS QUESTION NOT ASKED QUESTION NOT ASKED NOT STATED NOT STATED (Raw Weighted) (Raw Weighted)
Metadata for PUMFS With Public Use Microdata Files, the code book is very important –Gives questions asked and codes used for responses –“Missing values”, “refusals”, “don’t know” and “not applicable” numeric codes are often assigned –Not consistent in the numeric codes used –Numeric codes that to most software would seem to be valid response
Metadata STC Policy on Informing Users of Data Quality In place since 1978 Tightened up 2000 in response to 1999 AG report Recognition that “All statistics are to some extent estimates” Statistics to be used with awareness of strengths and weaknesses – “fitness for use” Key tool is the Integrated Meta Database (Definitions, data sources and methods) (Definitions, data sources and methods)
Metadata Important to find STC metadata and use it Definitions, Data Sources and Methods –Questionnaire and reporting guides Survey Description Data sources and methodology Data Accuracy Documentation Contact us
Definitions, Data Sources and Methods
Online Catalogue Canadian Community Health Survey: public use microdata file: Product main page Canadian Community Health Survey: public use microdata file: Product main page Canadian Community Health Survey: public use microdata file: Product main page
DLI Website DLI - Canadian Community Health Survey Cycle 1.1 DLI - Canadian Community Health Survey Cycle 1.1 DLI - Canadian Community Health Survey Cycle 1.1 DLI listserv: Ask and we will find out from the Division!
Data Quality Symbols
Use metadata to avoid key pitfalls Collection methodology Questionnaire Data quality: sample size, response rates Definitions Conceptual changes Survey coverage Reweighting/rebasing
STC Math Random rounding Percentages and percentage points Central tendencies (mean, median and mode) Current vs constant dollars Raw vs seasonally adjusted