Framework of Statistical Information. This is a typology of the categories or classes of statistical information. Remember the relationship between statistics.

Slides:



Advertisements
Similar presentations
DLI Orientation: Concepts
Advertisements

EQUINOX DATA DELIVERY SYSTEM May 31, 2011 –Elizabeth Hill Equinox.uwo.ca.
DLI Orientation: Concepts A Framework for Thinking about Statistical Information Train the Trainers Montreal, March 9, 2004 Chuck Humphrey Data Library.
Environmental Statistics in E-STAT Tom Power Education Centre Library, Nipissing University/Canadore College Ontario DLI Training Guelph University, Guelph,
Input Data Warehousing Canada’s Experience with Establishment Level Information Presentation to the Third International Conference on Establishment Statistics.
1 The DLI Contacts and Designates Survey: Ontario regional profile Gaëtan Drolet Train the Trainers February 23-25, 2010 Université de Montréal Montréal,
Data Access and Data Use: the Missing Link? Elizabeth Hamilton University of New Brunswick Chuck Humphrey University of Alberta Data and Knowledge Transfer.
Jeff Moon Data Librarian & Academic Director, Queen’s Research Data Centre Statistics & Data& Data An OverviewAn Overview
Chuck Humphrey Data Library University of Alberta.
Meeting the Challenge The National Population Health Survey and Data Access E. Hamilton UNB Libraries IASSIST 2003.
Geo-referenced data and DLI aggregate data sources Chuck Humphrey University of Alberta September 29, 2008.
Quantitative Evidence for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 26, 2009.
1 U.S. Census Bureau Data Availability for Geographic Areas March 25, 2008.
Chuck Humphrey & Lynne Robinson University of Alberta Surviving Statistics Strategies for dealing with statistical questions on the reference desk.
Searching the University of Alberta Library’s Statistics Canada-based Websites 2001 Census of Canada Canadian Centre for Justice Statistics Canadian Business.
Quantitative Evidence for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library March 6, 2009.
Statistics and Data for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 27, 2008.
EAS 293 Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 14, 2008.
The Data Liberation Initiative Orientation Session Statistics Canada / Statistique Canada University of Alberta December 5, 2001 Chuck Humphrey.
The Crime Scene Justice Data and the Case of Multiple Files in GSS 18 Chuck Humphrey University of Alberta Atlantic DLI Workshop April 20-21, 2006.
FCM Quality of Life Reporting System Metadata By: Acacia Consulting and Research June 2002.
CANSIM A look at 3 interfaces Ontario DLI Training University of Guelph April 12, 2006 Suzette Giles Data, Map and GIS Librarian Ryerson University Library.
Finding Data & GIS Files at the U of S Library Kiran Doranalli Lucy Li
Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011.
Disclosure Avoidance: An Overview Irene Wong ACCOLEDS/DLI Training December 8, 2003.
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
Doing data & statistics at the reference desk (some of) what you’ll need to know OLA Super Conference Walter W. Giesbrecht Data Librarian,
Excel. Spreadsheet Software  What Is a Spreadsheet, and How Does It Work? A spreadsheet program allows users to perform simple and complex sorting. It.
Data and Social Research Chuck Humphrey Data Library Rutherford North Library.
Searching for data. The Zero Effect When you go looking for something specific, your chances of finding it are very bad. Because of all the things in.
Chuck Humphrey, University of Alberta Atlantic DLI Training, 2008 DLI Orientation: Concepts A Framework for Thinking about Data and Statistics.
DLI Workshop -- Mar Hosted by Dalhousie University March 2000 DLI Training Workshop.
The Census of Canada and Immigration & Ethno-cultural Data Chuck Humphrey University of Alberta February 10, 2006.
POLS 328.3: Public Policy Analysis Finding data and statistics.
5 Marzo 2007 Census mapping and Gis Part II: dissemination Fabio Crescenzi Istat, Central Directorate on General Censuses UNECE Training Workshop on Census.
Innovations in Data Dissemination Thomas L. Mesenbourg, Jr. Acting Director U.S. Census Bureau United Nations Seminar on Innovations in Official Statistics.
Soc : Principles of Research Design LONGITUDINAL DATA Sunny Kaniyathu, Data Services Librarian.
Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,
United Nations Economic Commission for Europe Statistical Division The Importance of Databases in the Dissemination Process Steven Vale, UNECE.
 Public Use Microdata Sample – sample file of unaggregated raw data with no identifying information about an individual person or household (no addresses,
Disclosure Avoidance at Statistics Canada INFO747 Session on Confidentiality Protection April 19, 2007 Jean-Louis Tambay, Statistics Canada
Statistical data confidentiality and micro data in Albania
C OMPUTING E SSENTIALS Timothy J. O’Leary Linda I. O’Leary Presentations by: Fred Bounds.
Creating Something from Nothing: Working with Synthetic Files ACCOLEDS /DLI Training: December 2003 Chuck Humphrey University of Alberta.
RRM : Resource Data and Environmental Modeling DATA SOURCES Sunny Kaniyathu, Data Services Librarian.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Disclosure Analysis: What do RDC Analysts do? Research Data Centre Program, Statistics Canada James Chowhan Ontario DLI Training, Queen's University
CTPP in TranStats The One-Stop Shop of Transportation Data
1 Working with Canadian Census Microdata Martine Grenier and Mokili Mbuluyo Census Operations Division, Statistics Canada December 2007.
Stretching Your Data Management Skills Chuck Humphrey University of Alberta Atlantic DLI Workshop 2003.
Role of the IMDB in the CBA and IM Strategy Presented to Information Management Committee Standards Division June
Anticipating Great Things: A 2006 Census Preview June, 2006 DLI, Ottawa, ON Paul Schwets // Stuart Fyffe.
Hosted by the University of Regina Library December 1999 DLI Training Workshop Chuck Humphrey.
Soc 332.6: Principles of research design Finding statistics.
Rural Development Finding data and statistics.  Statistics Canada: Federal statistical agency  Data released under the Data Liberation Initiative (DLI)
Real Time Remote Access: Educational resources Susan Mowers, University of Ottawa.
Expanding the Role of Synthetic Data at the U.S. Census Bureau 59 th ISI World Statistics Congress August 28 th, 2013 By Ron S. Jarmin U.S. Census Bureau.
Navigating Your Way Through the EFT, Nesstar and Beyond 20/20 (WDS)
Accessing data – a user’s perspective
Creating Something from Nothing: Working with Synthetic Files
DLI Orientation: Concepts
2001 Census of Population Products and Services Presentation to ACCOLEDS December 6, 2001.
DLI PRESENTATION University of Victoria December 4, 2002 Raymond Roy
University of Regina Library
Disclosure Avoidance: An Overview
The reference interview
Telling Canada’s story in numbers Marie-Josée Major
Data Liberation Initiative (DLI)
Exploring the DLI Product line
Creating Something from Nothing: Working with Synthetic Files
Presentation transcript:

Framework of Statistical Information

This is a typology of the categories or classes of statistical information. Remember the relationship between statistics and data, however, is causal. Statistics are created from data.

Framework of Statistical Information An overlap occurs in this chart between Statistics: Databases and Data: Aggregate, which will be discussed below.

Framework of Statistical Information

In print

In Print Rely on yearbooks, statistical abstracts, catalogues, and indexes to locate statistics in print. Examples of online indexes to print resources: – Statistical Universe (U.S., international, government and private) – Tablebase Example of online catalogues that include print resources: – U.S. Census Bureau Sales Catalog U.S. Census Bureau Sales Catalog – Statistics Canada’s Online Catalogue Statistics Canada’s Online Catalogue

Framework of Statistical Information Online

Online Statistics Example of e-publications – Statistical Abstract of the United States Statistical Abstract of the United States – Statistics Canada Downloadable Publications (DSP) Example of e-tables – Tables [and publications] containing U.S. Consumer Price Indexes Tables – Canadian Statistics (STC Website) Example of statistical databases – American Fact Finder and Data Ferrett American Fact Finder Data Ferrett – CANSIM II (STC Website, E-STAT, CHASS)

E-Publications Tend to be available in PDF format Can use the “Select Text” Tool in the Adobe Reader and copy columns to another application

Statistical Information

E-Tables Tend to be displayed in HTML May provide a pull-down list to view other categories in the table Some e-tables will provide an alternate format for the table that can be downloaded (e.g., the Canadian Census tables are available in comma- separated ASCII, IVT, and print-friendly formats)

Databases Often use HTML forms to define the statistics to be retrieved May offer a variety of output formats for the retrieved statistics (e.g., E-STAT provides IVT format for Beyond 20/20, graphs, charts, maps, and ASCII formats for spreadsheets and databases)

Framework of Statistical Information Aggregate Data

Aggregate Data Aggregate data consist of statistics that are organized into a data structure and stored in a database or in a data file. The data structure is based on tabulations organized by time, geography, or social content.

Aggregate Data Data Structure – Time – Geography – Social Content Example: CANSIM II

Aggregate Data Time series data have long fueled econometric models based on macro- economic indicators. Comma-separate values (CSV) have become an important format for time series data, which is often manipulated in Excel if not analyzed in a spreadsheet.

Aggregate Data Example: CENSUS Data Structure – Time – Geography – Social Content

Aggregate Data Increased availability of GIS software has created greater demand for Census statistics organized as aggregate data. Beyond 20/20 has become a popular tool for reshaping census statistics from 1996 and 2001 for use with GIS software. DBF is the most commonly used format to share census statistics with GIS software.

Aggregate Data A map from E-STAT of Montreal Census Tracts

Aggregate Data “Small area statistics” are a special category of aggregate data. These data files consist of statistics for small geographic areas usually calculated from a population or manufacturing census or an administrative database with enough cases to create accurate summaries for small areas.

Aggregate Data Example: Cause of Death (HID) Data Structure – Time – Geography – Social Content

Aggregate Data Also known as “cross-classified” tables, these files tend to be made of statistics constructed from social-content variables. Examples of cross-classified tables in DLI are found in education and justice.

Framework of Statistical Information Microdata

This is raw data organized in a file where the lines in the file represent a specific unit of observation and the information on the lines are the values of variables. There are different types of microdata files, which will now be discussed.

Confidential Microdata Master files: these files contain the fullness of detail captured about each case of the unit of observation. This detail is specific enough that the identify of a case can often be disclosed easily. Therefore, these files are treated as confidential.

Confidential Microdata Share files: these are confidential files in which the participants in the survey have signed a consent form permitting Statistics Canada to allow access to their information for approved research. These files consist of a subset of the cases in the master file.

Confidential Microdata In summary, confidential microdata get grouped into two types: – master files and share files.

Public Use Microdata These microdata are specially prepared to minimize the possibility of disclosing or identifying any of the cases in a file, i.e, participants in a survey. The original data from the master file are edited to create a public use microdata file.

Public Use Microdata Steps in Anonymizing Microdata – Remove of all personal identification information (names, addresses, etc); – Include only gross levels of geography; – Collapse detailed information into a smaller number of general categories; – Cap the upper range of values of variables with rare cases; – Suppress the values of a variable; or – Suppress entire cases.

Public Use Microdata Statistics Canada PUMFs – Only available for select social surveys that undergo a review of the Data Release Committee, an internal Statistics Canada committee. – No ‘enterprise’ public use microdata.

Public Use Microdata Statistics Canada PUMFs – Almost all PUMFs consist of cross-sectional samples, that is, samples where the data have been collected from respondents at one point in time. – Longitudinal samples, where data are collected from the same individuals two or more times, are difficult to anonymize and maintain any useful information.

Synthetic Microdata These data files have been created to assist with the analysis of confidential data files. – The files provide the full variable structure of the confidential microdata but do not contain any real cases. – They are intended to be used by researchers wanting to submit a file of commands in a statistical package’s language for remote job submission.

Synthetic Microdata – They are also being used by those with approved projects in Research Data Centres to help prepare their analysis strategies prior to working in an RDC. – Synthetic files are also commonly referred to as “dummy files,” although a more technical use of this term does exist for this specific type of synthetic file.

Synthetic Microdata A variety of synthetic file types are being created and tested by author divisions. – One type has no real data but does contain a complete set of real variables. This type is the more technical reference to a dummy file. – Another type has a mix of real data but no real cases. The purpose of this type is to provide -- in the aggregate -- results that should be close to an analysis of the real microdata file.

Synthetic Microdata Users of these files must be advised that none of the analytic results from these files should ever be reported. Their only purpose is to help researchers construct their statistical analysis programs to guard against syntax errors that might exist in their setup.

Framework of Statistical Information

Framework Summary This framework provides a way of thinking about the types of statistical information that exist. Is the information Statistics or Data? – If Statistics, is the information in print or online? If online, is it in an e-pub, e-table, or database? – If Data, is the information aggregate data or microdata?