Download presentation
Presentation is loading. Please wait.
Published byFay Boone Modified over 9 years ago
1
5/21/2014 D ATA P REPARATION AND P ROFILING : S TRATEGIES, CHALLENGES, AND EXPERIENCES T IM N ORRIS AND M ARK L UNDGREN
2
5/21/2014 T ODAYS A GENDA Introductions Date Profiling and Readiness Lessons Learned Future Direction
3
5/21/2014 A BOUT THE P20W D ATA W AREHOUSE Statewide longitudinal data system De-identified data about people's early childhood, Kindergarten through 12 th grade, higher education and workforce experiences and performances Collected and linked from existing state agency data systems. It includes data about the kinds of services they receive, programs in which they participate, and their academic performance and program or degree completion. It also includes a variety of demographic data so we are able to look at a variety of different groups of people. Personally identifiable information, such as names, social security numbers, addresses, and other data which can identify a person as an individual, are not part of the research database.
4
ECEAP studentsK-12 studentsK-12 teachersCTC studentsBaccalaureate students National Student Clearinghouse WorkforceIPEDS Financial Data Sources data Data Management, Governance Standards, confidentiality, security Critical questions Data dictionary, matching, longitudinal linking, cross- sector derived elements P-20/W datasets ERDC Research Data to partner agencies PCHEES Collaborative research Ad-hoc requests (data and research) for partners and legislature LEAP External requests for data Feedback reports (behalf of agencies) Output OFM 4
5
5/21/2014 D ATA F LOW P ROCESS Chart of data flow goes here
6
5/21/2014 D ATA S OURCE C HARACTERISTICS Over 20 source data feeds Data systems being developed in parallel Some migrated historic data, some didn’t
7
5/21/2014 D ATA P REPARATION : D ATA P ROFILING Do it early, do it often Verification of data dictionary Descriptive statistics Distinct counts and percentages Zero, blanks and nulls Minimum and maximum values Patterns of data
8
5/21/2014 D ATA P REPARATION : D ATA P ROFILING Dataset validation checks Counts of records by time, institution Values and codes over time Systematic changes (0,1 to Y,N) Values defined in data dictionary Quality of data Names and identifiers Data elements
9
5/21/2014 D ATA P REPARATION : D ATA P ROFILING Toolset varied by analyst SAS Informatica Data Analyst Excel Goal of understanding the data Constraints Completeness, patterns over time Values of each data element
10
5/21/2014 D ATA P REPARATION : D ATA R EADINESS Document and expand results of profiling process Generate the “goto” resource for follow-up question Resource to begin data loading Content that feeds the data dictionary
11
5/21/2014 D ATA P REPARATION : D ATA R EADINESS Information about: Data provider Data file Data elements
12
5/21/2014 R EADINESS C ONTENT I TEMS Dataset elementsData element Number of recordsName and description Years ProvidedAcceptable values Primary keyData format/length Business owner and stewardBusiness rules Update frequencyIdentity matching flag Extract processField/record level data rules Known issuesSecurity category Dataset level rulesNotes
13
5/21/2014 D ATA R EADINESS T EMPLATE s
14
5/21/2014 W HAT WE ’ VE LEARNED Customers need to be involved Dictionaries don’t match data Educate our analyst on the data, the customer on the vision of the database Avoid custom extracts More time required up front
15
5/21/2014 T OWARD THE F UTURE Empower the provider by offering guidance and tools for profiling Develop feedback process of data quality and edits back to customer Open and transparent
16
5/21/2014 Q UESTIONS ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.