OVERVIEW OF THE DATA LIBERATION: Licence, Products, & Services Mike Sivyer Ontario DLI Training, April 5, 2004.


OVERVIEW OF THE DATA LIBERATION: Licence, Products, & Services Mike Sivyer Ontario DLI Training, April 5, 2004

The DLI is a partnership between Statistics Canada and participating Canadian post secondary institutions There are 66 participating institutions Data are made available on a subscription basis Major activities and direction of the project are guided by members themselves through the External Advisory Committee Introduction

All member institutions must sign a data use licence agreement when joining the project Under this licence data are made available for: Teaching Planning of academic/educational services Academic Research and Publishing Use of data in textbooks falls under a different set of STC licences and permissions The Licence

Data are made available to educators, students and other institutional staff while they have such status at the institution E.g.. A student who goes to USA to do Masters no longer has access to DLI data Data are not to be used in any commercial or private activities (even if no $$ involved) The Licence

There are also conditions indicating what must be done with any data if an institutions leaves the program All use of data obtained must cease They must destroy, or return all data products obtained while a member Provide Stats Canada with assurances that this has happened The Licence

A copy of the Data Use Agreement can be found on the DLI web site DLI Contact is responsible to ensure eligible use of data There are specific criteria that must be met in order to determine eligible use of DLI data The Licence

Other questions to help determine if use falls under definition of academic research If for publishing - is use strictly for publishing in academic or scholarly journal? Is use under a joint project with outside agency/organization? - Any $$ involved? The Licence

Did money come through institution’s “grants dept”? Even if no $$ involved did research project come through regular institutional channels? Are data expected to be shared with outside agency/organization? The Licence

Other important elements of the Licence Agreement: Data & products offered “as is “ STC remains owner of intellectual property - only access to data is provided Users must not link data or otherwise try to identify individual respondents The Licence

DLI Contact to implement data security measures May request that users sign agreement before allowing access If unsure of eligibility send message to Team for consideration All questions reviewed by DLI Manager & Director as well as Co-Chairs of EAC The Licence

DLI provides access to Stats Canada data produced as standard electronic products available to the public These products can be found in Stats Canada’s On- Line Catalogue of Products and Services There is usually a flag indicating if a specific product is available to the DLI members The Products

What is a standard electronic product ? An “off the shelf ” electronic product available to the public Not included are standard publications available in electronic form as these are usually part of DSP The Products

These data are digitally encoded and stored in a file structure. These include Public Use Micro Data Files (PUMFs) Census/Geography Files Databases The main focus of our collection are the public use microdata files The Products

These are files of RAW DATA that have been anonomized and organized in a file where the records in the file represent the responses to survey questions of each individual respondent Need metadata and software to read and understand the data The Products

Data files can contain <10,000 to 50,000+ records Records can contain <50 to 1,000+ variables Can be as few as 50 to over 2000 bytes in length Documentation can consist of 50 to 600+ pages The Products

Need Codebook, Record Layout and other documentation to be able to manipulate data with a statistical software package such as SAS, SPSS, etc Following are examples of Codebook, and Record Layout The Products

DLI Collection also contains some products that contain aggregated data in table format Main focus of DLI Collection on Socio-Economic data: Health Education, Literacy Labour Market, Income Travel Justice Census, Demographic Etc. The Products

Data products supplied by the social side of Stats Canada can be : raw data in the form of public use microdata files, aggregate data in the form of Beyond 20/20 tables, etc. We have only a few products supplied by the business side of Stats Canada The Products

These surveys do not produced public use microdata files as a standard electronic product This is because most of these surveys are a “census” of the target population and there are confidentiality issues DLI does include some business products such as: Trade data Financial Performance Indicators CD Inter-Corporate Ownership The Products

There are currently over 20,000 files available in the DLI Collection These include : Data files Metadata Census & Geography CD’s The Products

New data products continually being added to Collection Includes: Updated data from regular on-going surveys New data from ad-hoc special surveys (one time only) Data from new surveys in STC program (on- going) The Products

Updates may be provided in different format than earlier version: For example PUMF Beyond 20/20 Tables As new versions are received have to decide to either replace data or add to Collection The Products

Not all products in DLI Collection are standard STC electronic products For example we have the KLEMS database An experimental database of productivity data We also have data from the Dept. Of Fisheries and Oceans The Products

DLI was conceived to be a internet based means of dissemination The internet is the main mode of data transfer and communications DLI offers both an FTP and a Web based service for access to Collection The Services

Our FTP site is considered to be the main repository for our collection where DLI Contacts download data products The FTP site is only open to DLI Contacts The following is an example of the FTP file structure The Services

Access to the data and metadata of many of our titles can also be achieved via our Web pages While the data files are locked and available only to DLI Contacts the metadata files are available to all The following are examples of the different parts of a small and a large survey The Services

How large are files?

The internet is also used for communication between and among the members and to order products that are available in hardcopy only DLILIST - forum for making enquires, sharing of information and general communication between and among members DLIORDER & WWW DLI ORDER DESK - to order hard copy versions of products not available electronically The Services

Our web site not only provides access to the data and metadata but also contains a lot of other information and valuable links The Services

Another service is the production of the DLI-Update A newsletter designed as a means to inform, teach and share information Articles are written by various DLI Contacts and Team members Back issues can be found on the web site The Services

When a product is received by Team a number of steps are performed before it is placed in the Collection: First of all we need to check to ensure that all files - data, metadata (French & English) have been received Open each file to ensure it is what it says it is (e.g if a.DOC then file is a WORD file, etc) The Services

Run program against data file to verify: Number of records Record length Overall size of file Compare results against codebook and/or record layout The Services

If SAS and/or SPSS received run against file If no SPSS - create it Rename all files to conform to DLI standards Create FTP path & directories Create Web pages The Services

Load all files into appropriate places on FTP and Web Announce addition on DLILIST The Services

Many files have not come with SPSS descriptions - these are created by DLI Team Often older files do not have French versions of documentation so extremely difficult to create French SPSS Creation of these SPSS labels can take some time after receipt of documentation, depending on workload, size of file, and if any documentation in electronic format The Services

We are starting to receive some kind of SPSS descriptors from author divisions If and when SPSS supplied by author division they can require major editing to fit with “DLI Users” requirements (e.g. length of variable and value labels) The preparation and/or verification of SPSS syntax is a major undertaking The Services

Who does all this work? There Team of people situated in the Stats Can Library They are: The Services

Jackie Godfrey Responsible for: Project on-line infrastructure Data security i.e.. Passwords, IP validation, etc Listservs etc

The Services Sage Cram Responsible for: Communications Responding to question on DLILIST Liasion between DLI members and STC divisions

The Services André Blondin Responsible for: Quality Control of data and metadata Maintenance of FTP site directories Loading of files on FTP site Overseeing creation of SPSS

The Services Marie Josée Bourgeois Responsible for: Web page creation and development Loading of files on to Web Web links to data and metadata

The Services Anne Chartrand Responsible for: DLIORDER and hardcopy products Assist. Liaison / Communication Financial assistance

The Services Gaetan Drolet Responsible for: Special Projects EX. Assist with training Create Citation Guide for DLI DDI / NESSTAR

Mike Sivyer Manager of Project Ernie Boyko Director of the Library Carol Paradis Creation of catalogue records of DLI products for Library catalogue (BIBLIOCAT) The Services

Training Training of DLI Contacts is considered to be of major importance to the project DLI project provides funding for annual Regional Training Workshops DLI Web page will provide links to various training materials The Services

There are a number of advantages to belonging to DLI: The DLI provides academic community with “one stop shopping” for STC products at affordable prices Provide a forum for sharing information and obtaining advice Conclusion

Value added to basic STC products (e.g. SPSS) Training workshops to learn about the products and how to use them Participation in workshops also a great “community builder” Conclusion