CHEPA & Health Policy PhD Program The Research Data Centre (RDC) PROGRAM www.statcan.gc.ca Telling Canada’s story in numbers Mustafa Ornek rdc2@mcmaster.ca Vivek Jadon vivek@mcmaster.ca CHEPA & Health Policy PhD Program March 15, 2016
What is an RDC ? An RDC is a secure Statistics Canada (StatsCan) research laboratory physically located on a university campus to ensure that statistical microdata can be extensively analyzed by the research community across Canada without compromising the confidential nature of the data / information
Master File versus PUMFs Compared to PUMFs, a master file: contains the full sample of respondents, not a sub-set of them includes additional categories in variables; more detailed information allows research on lower levels of geography provides discrete values for certain variables, such as age or body weight, instead of categories may offer other concepts that are not available in PUMFs Moreover, master files contain derived variables and bootstrap weights RDC access is useful when PUMF does not exist or provide adequate level of details for quality research or when longitudinal data linkage is required
Continuum of data access Statistical Products Custom Requests Microdata www.statcan.gc.ca Daily releases Cansim tables Analytic articles Aggregate tables Data tabulations PUMFs (DLI) Real Time Remote Access Remote job submissions Research Data Centres Federal RDC @ McMaster University Mills Library 1st floor for Public use master files (PUMFs) @Statistics Canada RDC Mills Library 2nd floor for Confidential microdata master files
Microdata Access for Researchers Public Use Microdata Files (PUMF) Each Public Use Microdata File is based on a corresponding master data file. The modifications performed by Statistics Canada before the PUMF is released ensure that the risk of breaching confidentiality has been removed. Since the results of any analysis performed do not have to be scrutinized before they are released, the file is considered “Public”. Modifications made to master files to convert them to PUMFs may include: collapsing of variables (e.g., age groups instead of individual years of age); collapsing variables into one variable (e.g., multiple language questions collapsed into one language variable for analysis); suppressing variables (although the variable is part of the master file, it will not show up in the public file); and removing outliers (removing cases that are extremes - often used with income). By using these techniques to anonymise the files, combining variables will not result in the user identifying a respondent.
Microdata Access for Researchers Public Use Microdata Files (PUMF) Benefits Free Very few conditions to access & use the data No approval process to access the data Limitations Content is limited (screened and grouped for confidentiality) Not all surveys have a PUMF Almost all PUMFs are cross-sectional, i.e., represent data collected at one point in time
Microdata Access for Researchers Programs related to accessing PUMFs Data Liberation Initiative (DLI) Microdata Access Division Programs related to accessing microdata Real Time Remote Access (RTRA) Research Data Centres (RDC) Federal Research Data Centre (FRDC) Canadian Centre for Data Development and Economic Research (CDER)
Data Liberation Initiative (DLI) DLI provides access to Statistics Canada standard products, databases, 350 public-use microdata and geographic information files. DLI metadata are coded for search ability DLI members have support through a very active listserv Currently 77 subscribing institutions; McMaster University Library is also part of Stat Can DLI program.
DLI PUMF Collection The DLI offers access to all public data products such as: The public use files for over 200 surveys including, the Canadian Community Health Survey, the National Population Health Survey, the General Social Survey, the Labour Force Survey and the Census of Population Databases such as the Small Area Business and Labour Database, Inter-Corporate Ownership etc An enhanced line of Census products Aggregated data on subject such as Justice and Education All standard geographic files and databases
Health-related DLI PUMFs National Population Health Survey (NPHS) Canadian Community Health Survey (CCHS) Canada Health Survey (CHS) General Social Survey (GSS) Joint Canada/US Survey of Health (JCUSH) National Longitudinal Survey of Children and Youth (NLSCY) Participation and Activity Limitation Survey (PALS) Discharge Abstract Database (DAD) – Conducted by CIHI
Microdata Access for Researchers Synthetic Files These microdata do not contain actual “real” cases but contain pseudo-cases that provide aggregate results close to the “real” cases These files have been prepared to create analysis runs with the master file without possibly disclosing or identifying any of the cases The results are NOT to be reported, but are strictly to be used to prepare analysis of master files Usually associated with longitudinal files, e.g. NLSCY and CCHS
Real Time Remote Access (RTRA) On-line remote access facility allowing users to run SAS programs, in more or less in real-time, against microdata located in a central and secure location. Researchers using the RTRA system do not gain direct access to the microdata and cannot view the content of the microdata. Instead, users submit SAS programs to extract results in the form of frequency tables. As RTRA researchers cannot view the microdata, becoming a deemed employees of Stat Can is no longer necessary. Hence, rapid access to microdata files. Using a secure username and password, the RTRA provides around the clock access to survey results from any computer with internet access. Confidentiality of the micro data is automated in the RTRA system, eliminating the need for manual intervention and allowing for rapid access to results.
Real Time Remote Access (RTRA) Benefits: Access from any computer with internet access, using a secure username and password - No travel to RDCs Few conditions on access and use of data Full Master Files available; Confidentiality automated Deemed employee status not required Limitations: Tabular frequencies only - SAS only Only certain statistics available Not all data sets available Costs http://www.statcan.gc.ca/rdc-cdr/rtra-adtr/rtra-adtr-eng.htm
Access DLI Data: Odesi Data Portal Odesi is... a web-based data extraction system delivered through Scholars Portal, provides access to diverse, quality, numeric data sets including microdata survey collection from DLI, demographic data from Statistics Canada and polling data from Gallup. facilitates the exploration (searching, data manipulation, creation of summary statistics, graphing and export) of multiple, sophisticated data sets. Access is open to all OCUL institutions and is controlled by IP address http://search1.odesi.ca/
Research Data Management (RDM) RDM is the active organization and maintenance of data throughout its lifecycle, from its collection, interpretation, dissemination, and the archiving of valuable results. Application of best practices to ensure data security, accessibility, usability, and integrity throughout the project and after its completion. Research Design Data Collection & Creation Processing & Analyses Storage & Preservation Publication, Sharing & Reuse
Research Data Management (RDM) Why RDM: Rewards for RDM practices are manifold: Granting Agencies Researchers Universities Research Output The Public RDM + Efficiency + Impact Public Good Transparency Compliance + Reuse
Research Data Management (RDM) Scholars Portal Dataverse A multidisciplinary data repository for researchers at Ontario’s universities, available as a service through Scholars Portal – an initiative from Ontario Council of University Libraries (OCUL). With SP Dataverse, researchers can share, preserve, cite, explore and analyze research data and allows them to control data access and distribution. Supports data DOI registration through Datacite Canada. http://dataverse.scholarsportal.info/dvn/
Portage DMP Assistant A web-based, bilingual data management planning tool. Available to all researchers across Canada. A guide for best practices in data stewardship. Exportable data management plans. https://assistant.portagenetwork.ca/
Research Data Management (RDM) http://library.mcmaster.ca/rdm
Contact Information Location: Mills Memorial Library, Room L104/C Telephone: 905 525-9140 Ext. 23848 E-mail: vivek@mcmaster.ca Hours of Service: 9:30 am – 1:00 pm 2:00 pm – 5:00 pm
RDC Micro data Resources / Links Google: “McMaster RDC” for: Summary of data sets classified by type and status Resources for selected surveys Monthly Newsletter: goings-on in the McMaster RDC (new data sets, library hours / closures, featured survey) and contact information
A Brief List of Datasets Census National Household Survey (NHS) Canadian Community Health Survey (CCHS) Special cycles on: Mental health, Nutrition intake and Health Aging Canadian Health Measures Survey (CHMS) National Population Health Survey (NPHS) Vital Statistics Database: Births and Deaths Canadian Cancer Registry (CCR) Canadian Survey on Disability 2012 (CSD) Canadian Tobacco Use Monitoring Survey: 1999-2012 (CTUMS) Survey on Living with Chronic Diseases in Canada: 2009, 2011, 2014 (SLCDC) Survey on Living with Neurological Conditions in Canada 2011 (SLNCC) CNICS (Childhood National Immunization Coverage Survey) ALSO: Administrative (linked) datasets For full list, please visit CRDCN’s website
Access to Centre Application Process: Academic, government funded or public sector access Link for “How to Apply” http://www.statcan.gc.ca/eng/rdc/process Security Clearance and Orientation McMaster RDC offers: Two analysts on-site for orientation and support 12 workstations with various statistical software Conference room
McMaster RDC Information LOCATION Mills Memorial Library, Rm 217 McMaster University 1280 Main Street West Hamilton ON, L8S 4L6 Note: We will be relocating to new Wilson Building in near future Academic Director: Byron Spencer (spencer@mcmaster.ca) Analysts: Peter Kitchen (rdc@mcmaster.ca) 905-525-9140 ext. 27968 Mustafa Ornek (rdc2@mcmaster.ca) 905-525-9140 ext. 27967 Statistical Assistant: Anna Kata (rdc3@mcmaster.ca) 905-525-9140 ext. 27968 Statistics Canada at McMaster RDC Website : http://socserv.mcmaster.ca/rdc/
Thank you See you in the RDC … SOON! Telling Canada’s story in numbers