5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN.

Slides:



Advertisements
Similar presentations
W ASHINGTON S TATE E DUCATION R ESEARCH & D ATA C ENTER, O FFICE OF F INANCIAL M ANAGEMENT 2014 ERDC ARRA SLDS Grant Conference | May 21, 2014 G OVERNOR.
Advertisements

Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John.
W ASHINGTON S TATE E DUCATION R ESEARCH & D ATA C ENTER, O FFICE OF F INANCIAL M ANAGEMENT ERDC Conference May 21, 2014 ERDC Conference | May 21, 2014.
M AY 21, 2014 I DENTITY M ATCHING : SSN S ARE NOT ENOUGH ! J OHN S ABEL ERDC ARRA SLDS Conference.
Data Dictionary What does “Backordered item” mean? What does “New Customer info.” contain? How does the “account receivable report” look like?
3/5/2009Computer systems1 Analyzing System Using Data Dictionaries Computer System: 1. Data Dictionary 2. Data Dictionary Categories 3. Creating Data Dictionary.
1 P-20 Data Dictionary November 16, :15 – 12:15 Kit Goodner, FL Mike McKindles, IL Rod Packard, WI.
Project Update : Claims/Clinical Linkage Project MHDO Board of Directors June 6, 2013.
2012 SLDS P-20W Best Practice Conference 1 E NSURING D ATA G OVERNANCE A CROSS THE P-20W S PECTRUM Tuesday, October 30, 2012 Melissa Beard, Data Governance.
PPA 502 – Program Evaluation Lecture 5b – Collecting Data from Agency Records.
Fall Data Entry for the Spring Enrollment Report Presented by PTD Technology 3001 Coolidge Road Suite 403 East Lansing, MI
Irs data retrieval tool Financial Information: Answer questions to determine whether or not you (or your parents) are eligible to use the tool. If you.
AGEP Evaluation Capacity Building Meeting: Building a Data Collection Infrastructure at the Graduate School Level Panelist: Maia Bergman
GEOG3025 Census and administrative data sources 2: Outputs and access.
D ATA M INING A N O VERVIEW BY : J OSEPH C ASABONA Data Warehouse-->
2013 MIS Conference 1 F EDERATED AND C ENTRALIZED M ODELS Wednesday, February 13, 2013 Facilitator: Jeff Sellers (SST) Panelists: Charles McGrew, Kentucky.
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
The BRUCE Project (Brunel Research Under a CERIF Environment) Rosa Scoble & Lorna Mitchell Brunel University.
Presented by: Kathy Gosa Andrea Hall Kansas State Department of Education 26 th Annual Management Information Systems (MIS) Conference February 14, 2013.
CEDS Standard Update.
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
FIX Repository based Products Infrastructure for the infrastructure Presenter Kevin Houstoun.
Logic Modeling Logic and timing are not represented on data flow diagrams or entity-relationship diagrams Processes contain logic - what happens under.
Education Research & Data Center Spring 2014 Conference Carol Jenner, ERDC.
Introduction to Microsoft Access Danielle Zammit B.Pharm. (Hons.), M.S.(Pharm.)
Confidential - Property of Navitas Accelerate define.xml using defineReady - Saravanan June 17, 2015.
Research and Planning Commission 2012Conference November 9, 2012 Katie Weaver Randall Education Research and Data Center Office of Financial Management.
Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall Essentials of Systems Analysis and Design Fourth Edition Joseph S. Valacich Joey F.
Empowering the User Custom Web Reporting M. Keener & R. Kolatalo | Thursday, March 1, 2012.
Hans P. L’Orange State Higher Education Executive Officers October 20, 2009.
@EdDataCampaign Mining the Data: What States Have and Where to Find It February 7, 2012 Elizabeth Laird Director, Communications and External Affairs Data.
Washington’s Education Research & Data Center (ERDC) Carol B. Jenner SHEEO/NCES Network Conference May 2009.
Michigan’s Longitudinal Data System tracks education inputs and outputs, connecting student records over time, while protecting student privacy.  Over.
On-line data submission training California Partnership for Achieving Student Success.
The University of North Carolina Office of the President State Higher Education Executive Officers The University of North Carolina Office of the President.
CaMSP Database and APR Webinar Public Works, Inc. CaMSP Database and October 15, 2011 APR Update April 4, 2011 CaMSP Network Meeting Patty O’Driscoll Albert.
Introduction to Computers Lesson 10B. home Database A collection of related data or facts.
Statewide Unit Record Databases in Higher Education: Growth and Application Peter Ewell National Center for Higher Education Management Systems (NCHEMS)
SLDS State Support Team Webinar 1 Linking K12 and Early Childhood Data: A State Example from Kentucky The webinar will begin at approximately 11:00 AM.
2013 MIS Conference 1 E NSURING D ATA G OVERNANCE A CROSS THE P-20W S PECTRUM Thursday, February 14, 2013 Melissa Beard, Data Governance Coordinator, Washington.
The experience of a National Statistical Institute after a law change: Estonia First Regional Workshop Microdata Access in European Countries ― Cooperation.
Washington’s Education Research & Data Center 26 th Annual Management Information Systems Conference Concurrent Session I-B: Using a Research Center or.
For the Chicago Chapter BOUG Meeting – August 20, 2010
Methods and Techniques for Integration of Small Datasets September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban.
ABS Statistical Databases Session 6 Mark Viney Australian Bureau of Statistics 6 June 2007.
John Ykema, Director of Sales & Marketing. Agenda  Understanding the NEW Tool  Table JOINS & Database Views  Building your first report  Charts and.
Student Financial Assistance. Session 25-2 Session 25 Review of Institutional Student Information Record (ISIR) and EDE Technical Reference Changes.
Public Libraries Survey Data File Overview. What We’ll Talk About PLS: Public Libraries Survey State level data Public library data (Administrative Entities)
P-20W Statewide Longitudinal Information System: Looking toward the Future… Research Coordination Committee December 11, 2015.
Thursday, March 17, :00 AM. Background Cohort options and examples Methods and technology Questions 2 Agenda.
Call In: Code: #. Introductions An overview of KLDS Access to Data and Reports Current Status of Data and Reports SLDS 2015 Overview.
TSDS LEA APPLICATION SOFTWARE VENDOR WEBINAR – TSDS UNIQUE IDENTIFIER FOR STUDENTS AND STAFF Terri Hanson, Bryce Templeton – Texas Education Agency 11/13/2012.
The NCCS Data Web: An Introduction The National Center for Charitable Statistics at the Urban Institute January.
Creating and submitting Cal-PASS Data files California Partnership for Achieving Student Success.
Kansas Education Longitudinal Data System Update to Kansas Commission on Graduation and Dropout Prevention and Recovery December 2010 Kathy Gosa Director,
NEMSIS Version2  NEMSIS Version 3. Purpose of NEMSIS Version 3 Improve Data Quality  –Schematron Enhance performance assessment  – Incorporation of.
Advanced Higher Computing Science
Normalized bubble chart for Data in the Instructor’s View
Linking information for better lives in Connecticut
IDEA Assessment Data Anne Rainey, IDEA Part B Data Manager, Montana
Accelerate define.xml using defineReady - Saravanan June 17, 2015.
Tennessee Longitudinal Data system (TLDS)
ECDS Early Childhood Data System Texas Student Data System August 2018
Logic Modeling Logic and timing are not represented on data flow diagrams or entity-relationship diagrams Processes contain logic - what happens under.
Ann van Nieuwenhove, Global Account Director European Institutions
Metadata use in the Statistical Value Chain
UML Design for an Automated Registration System
Evidence-Based Policymaking: The Case from Washington State, USA
Presentation transcript:

5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN

5/21/2014 T ODAYS A GENDA Introductions Date Profiling and Readiness Lessons Learned Future Direction

5/21/2014 A BOUT THE P20W D ATA W AREHOUSE Statewide longitudinal data system De-identified data about people's early childhood, Kindergarten through 12 th grade, higher education and workforce experiences and performances Collected and linked from existing state agency data systems. It includes data about the kinds of services they receive, programs in which they participate, and their academic performance and program or degree completion. It also includes a variety of demographic data so we are able to look at a variety of different groups of people. Personally identifiable information, such as names, social security numbers, addresses, and other data which can identify a person as an individual, are not part of the research database.

ECEAP studentsK-12 studentsK-12 teachersCTC studentsBaccalaureate students National Student Clearinghouse WorkforceIPEDS Financial Data Sources data Data Management, Governance Standards, confidentiality, security Critical questions Data dictionary, matching, longitudinal linking, cross- sector derived elements P-20/W datasets ERDC Research Data to partner agencies PCHEES Collaborative research Ad-hoc requests (data and research) for partners and legislature LEAP External requests for data Feedback reports (behalf of agencies) Output OFM 4

5/21/2014 D ATA F LOW P ROCESS Chart of data flow goes here

5/21/2014 D ATA S OURCE C HARACTERISTICS Over 20 source data feeds Data systems being developed in parallel Some migrated historic data, some didn’t

5/21/2014 D ATA P REPARATION : D ATA P ROFILING Do it early, do it often Verification of data dictionary Descriptive statistics Distinct counts and percentages Zero, blanks and nulls Minimum and maximum values Patterns of data

5/21/2014 D ATA P REPARATION : D ATA P ROFILING Dataset validation checks Counts of records by time, institution Values and codes over time Systematic changes (0,1 to Y,N) Values defined in data dictionary Quality of data Names and identifiers Data elements

5/21/2014 D ATA P REPARATION : D ATA P ROFILING Toolset varied by analyst SAS Informatica Data Analyst Excel Goal of understanding the data Constraints Completeness, patterns over time Values of each data element

5/21/2014 D ATA P REPARATION : D ATA R EADINESS Document and expand results of profiling process Generate the “goto” resource for follow-up question Resource to begin data loading Content that feeds the data dictionary

5/21/2014 D ATA P REPARATION : D ATA R EADINESS Information about: Data provider Data file Data elements

5/21/2014 R EADINESS C ONTENT I TEMS Dataset elementsData element Number of recordsName and description Years ProvidedAcceptable values Primary keyData format/length Business owner and stewardBusiness rules Update frequencyIdentity matching flag Extract processField/record level data rules Known issuesSecurity category Dataset level rulesNotes

5/21/2014 D ATA R EADINESS T EMPLATE s

5/21/2014 W HAT WE ’ VE LEARNED Customers need to be involved Dictionaries don’t match data Educate our analyst on the data, the customer on the vision of the database Avoid custom extracts More time required up front

5/21/2014 T OWARD THE F UTURE Empower the provider by offering guidance and tools for profiling Develop feedback process of data quality and edits back to customer Open and transparent

5/21/2014 Q UESTIONS ?