Exploring the DLI Product line Overview of resources for accessing the DLI Collection Chantal Ripp June 14
Statistics Canada • Statistique Canada Overview Who can use the DLI Collection? Who? Why would I use the DLI Collection? Why? What is in the DLI Collection? What? Where do I find the DLI Collection? Where? How do I access the DLI Collection? How? Statistics Canada • Statistique Canada 19/09/2019
Statistics Canada • Statistique Canada Who? The DLI Collection Students, Faculty, Staff, and the Researchers of member post-secondary institutions Access On campus (or proxied) to e-Resources (WDS, Nesstar) Through DLI Contact (EFT access) Statistics Canada • Statistique Canada 19/09/2019
Statistics Canada • Statistique Canada Why? The DLI Collection Data not available on StatCan website If Microdata is required All stop shop for StatCan standard data products Statistics Canada • Statistique Canada 19/09/2019
Statistics Canada • Statistique Canada What? The DLI Collection DLI provides access to StatsCan data produced as standard electronic products available to the public These data are digitally encoded and stored in a file structure These include Microdata files Geography files Databases Aggregate data in table format Standard electronic product is an “off the shelf’” product available to the public Statistics Canada • Statistique Canada 19/09/2019
Statistics Canada • Statistique Canada The DLI Collection Not all products in the DLI collection are standard electronic products Have some “special” products just for DLI members Postal CodeOM data Products Discharge Abstract Database (DAD) from CIHI Have some “special” products just for DLI members who have signed the appendices in the DLI licence Statistics Canada • Statistique Canada 19/09/2019
Statistics Canada • Statistique Canada Statistics vs data Statistics numeric facts/figures created from data, i.e, already processed presentation-ready Data numeric files created and organized for analysis requires processing not ready for display Statistics are the numeric facts and figures which have been created from the data. They have been processed and are ready for use, but do not have the same kind of analysis behind them as statistical information does. These can take the form of e-publications, e-tables or databases. Data are numeric files created and organized for processing and analysis. There are two types of data – aggregate and microdata. Aggregate data and microdata offer the user more control over the variables offered for analysis. Statistics are created from data Source : DLI Orientation: Concepts by Chuck Humphrey, University of Alberta, 2004 Statistics Canada • Statistique Canada 19/09/2019
Distinguishing aggregate & microdata Aggregate data consist of statistics that are organized into a specific data file structure Microdata (organized raw data) consist of the data directly observed or collected from a specific unit of observation 000001353594362261605241223332233132221226966966666666666666666666111142122081029.732226622622222296669662266622222222222222216666612960402030105000000000001010122666612122222222222222122221222222207126666666626666666666666666666666666662666666666666666666666211311231323112326666603010401010129622222222226666112222000000111113666666666666666266666666666666661266666666666666666666666666666666666666662266666666666666662666666666266966666666666666616666666666626966666666696666666611016666666666666661112612244222222203221111122212222121266969696000.0000.3000.4000.0000.0001.4002.2111222222222122222222222660053001499669966996699669966996699669966996600239966996699669966996699669966996699669966996699660101200.210033231069606050729696969696166666466666666662966666666269696969666666666626960266666696699.9962666666666296969666969.96266666696210339699699696666969966666605699626669662666666266666666666266666666666666666662622112222666610202299699699699699699699666666666666666699610000001121422329612329612629629622969.9699.662666666699.662696969699.6626696969696969696969696969696969696969696969696969696969662669669696966966969669696669669696612266966666666666666266666666666666666666699966662666666666666666666666666996296969696129666666626666666666666696969626666666666666662969.969696126299699699699699699699699696962612611442000021211122073035116266666612455041333200124.00 Source : CCHS 1.2 and Chuck Humphrey, University of Alberta Aggregate data are statistical summaries organized in a specific data file structure which permits further computer analysis (that is, data processing). The variables in an aggregate data file do not lend themselves to generating cross-tabulations of individuals since the initial unit of observation has been replaced by time (eg: time series data), geography (eg: census / geography) or a social construct (eg: cause of death – HID). Not all aggregate data contain the combination of variables from the microdata that a user may desire. For example, a researcher may be looking at whether alcohol use and gambling are correlated and wishes to know if these variables differ between men and women, by age group, and whether the results vary across Canada. Although data in the Canadian Community Health Survey (CCHS) 3.1 are collected about the respondent's geography, gender, age, Canadian Problem Gambling Index, and alcohol use, this combination of variables may not have been used in creating an aggregate data product. That’s where microdata comes in… Microdata consist of the data directly observed or collected from a specific unit of observation. That is, a microdata file contains organized raw data wherein the lines represent a specific unit of measure (usually an individual, household or family) and the information about the lines are the values of variables. Need the use of statistical software packages (eg: SAS, SPSS, STATA) to use microdata files Statistics Canada • Statistique Canada 19/09/2019
Public Use Microdata Files (PUMFs) Each PUMF is based on a corresponding master data file (all observations) Modifications are made to the file (eg: collapsing of variables; suppressing variables) to minimize the possibility of disclosing or identifying any of the cases in the file (i.e. participants in a survey) More than 1,300 PUMFs in the DLI collection from over 90 surveys General Social Survey (GSS) Aboriginal Peoples Survey (APS) Canadian Community Health Survey (CCHS) I read it in “The Daily”… by Michael Sivyer (DLI Update, Fall 2000, Volume 4, Number 2) PUMF: Modifications made to master files to convert them to PUMFs may include: collapsing of variables (e.g., age groups instead of individual years of age); collapsing variables into one variable (e.g., multiple language questions collapsed into one language variable for analysis); suppressing variables (although the variable is part of the master file, it will not show up in the public file); and removing outliers (removing cases that are extremes - often used with income). By using these techniques to anonymise the files, combining variables will not result in the user identifying a respondent. Each Public Use Microdata File is based on a corresponding master data file (all observations). The modifications performed by Statistics Canada before the PUMF is released ensure that the risk of breaching confidentiality has been removed. Why use microdata: With microdata files, researchers can analyse any variable in the file, and can construct the tables they need, rather than choosing from the pre-tabulated information presented in an aggregated file. Statistics Canada • Statistique Canada 19/09/2019
Statistics Canada • Statistique Canada How? Access Options DLI Restricted Access Page DLI B2020 Web Data service (WDS) DLI Nesstar site EFT site Some institutions also use other (non-StatCan) interfaces to access and deliver DLI data; - additional cost to access these platforms <odesi> (OCUL – Scholar’s Portal) SDA (University of Toronto) CHASS (University of Toronto) Statistics Canada • Statistique Canada 19/09/2019
Statistics Canada • Statistique Canada Where? There are third party distributions tools, Odesi and Equinox among others, which provide access to DLI data products. In order to access the DLI products through those mode of access, you need to be a DLI member Statistics Canada • Statistique Canada 19/09/2019
Statistics Canada • Statistique Canada Getting started Statistics Canada • Statistique Canada 19/09/2019
Statistics Canada • Statistique Canada DLI Products page Link to IMDB Survey page Links to the product in the appropriate application Statistics Canada • Statistique Canada 19/09/2019
Statistics Canada • Statistique Canada DLI Website DLI WDS site Contains DLI’s aggregate data collection IP Authenticated – students and professors can access Web-based multidimensional table viewer View and manipulate .ivt files over the web without installing any client-side software (B2020) The Beyond 20/20 Web Data Server is a web based multidimensional table viewer which allows for the dissemination of data over the web in a variety of formats. Statistics Canada • Statistique Canada 19/09/2019
Statistics Canada • Statistique Canada DLI Website DLI Nesstar site Web data portal for accessing and manipulating microdata IP Authenticated – students and professors can access download data files, create subsets and create tables Allows users to work with microdata files through Web interface (without SAS or SPSS) Nesstar is a web-based data exploration, extraction and analysis tool. It lets you search for survey variables across the collection, and supports basic tabulation and analysis online. It also allows for the downloading of the PUMF files into statistical software for further analysis. www62.statcan.ca Statistics Canada • Statistique Canada 19/09/2019
Statistics Canada • Statistique Canada DLI Website DLI EFT site Only DLI Contacts have access Complete DLI collection (including PCCF and CIHI files) Faster downloads than Web site Requires FTP application (ex. WS_FTP Pro, FileZilla, etc) MAD_DLI MAD_DLI_CIHI MAD_DLI_PCCF When files are released, they are placed on the EFT within a short delay. Files are prepared in statistical packages and formatted (data; doc) for distribution. Although not always intuitive to the first time user, the EFT directory structure is quite logical. The same directory structure has been used for all recent additions to the DLI collection. The readme file is a very useful tool to help users understand the set-up of each folder. The initial screen upon entering the DLI EFT site provides a listing of folder names. The folders are named according to their corresponding survey acronym and are sorted alphabetically. Statistics Canada • Statistique Canada 19/09/2019
Statistics Canada • Statistique Canada DLI Website DLI EFT site: MAD_DLI Census (Agriculture and Population) Geography Folder Other Products Surveys Reports Bi-annual report EAC meeting mins Mirror Site Includes non-DLI files MAD_DLI MAD_DLI_CIHI MAD_DLI_PCCF When files are released, they are placed on the EFT within a short delay. Files are prepared in statistical packages and formatted (data; doc) for distribution. Although not always intuitive to the first time user, the EFT directory structure is quite logical. The same directory structure has been used for all recent additions to the DLI collection. The readme file is a very useful tool to help users understand the set-up of each folder. The initial screen upon entering the DLI EFT site provides a listing of folder names. The folders are named according to their corresponding survey acronym and are sorted alphabetically. Statistics Canada • Statistique Canada 19/09/2019
Statistics Canada • Statistique Canada DLI Website DLI Survival Guide Nesstar is a web-based data exploration, extraction and analysis tool. It lets you search for survey variables across the collection, and supports basic tabulation and analysis online. It also allows for the downloading of the PUMF files into statistical software for further analysis. Statistics Canada • Statistique Canada 19/09/2019
Statistics Canada • Statistique Canada DLI Website DLI Survival Guide Nesstar is a web-based data exploration, extraction and analysis tool. It lets you search for survey variables across the collection, and supports basic tabulation and analysis online. It also allows for the downloading of the PUMF files into statistical software for further analysis. Statistics Canada • Statistique Canada 19/09/2019
Statistics Canada • Statistique Canada The DLI Collection Still unsure? Consult the Survival Guide on the DLI website Send a question to the DLI team Statistics Canada • Statistique Canada 19/09/2019
Statistics Canada • Statistique Canada DLI Collection Thank you Contact us at statcan.stcinet-dli-idd.statcan@canada.ca Statistics Canada • Statistique Canada 19/09/2019