© 2009 Octagon Research Solutions, Inc. All Rights Reserved. 1 Octagon Research Solutions, Inc. Leading the Electronic Transformation of Clinical R&D © 2009 Octagon Research Solutions, Inc. All Rights Reserved.
2 Data Profiling Octagon Research Solutions
© 2009 Octagon Research Solutions, Inc. All Rights Reserved. 3 Metadata Profiling Metadata (structure) –Likeness of nomenclature among study databases –Answer some planning questions: Claim: “The studies are 90% identical.” Are they? If they indeed are, can you to create pool(s) of source data to gain efficiency? Not our main focus today
© 2009 Octagon Research Solutions, Inc. All Rights Reserved. 4 Data Profiling Data (content) –Statistics, e.g., min, max, average –Relationship –Pattern Fact: Data are often “bad, worse, or ugly” Goal: Get a realistic pulse on quality of the data
© 2009 Octagon Research Solutions, Inc. All Rights Reserved. 5 Case Study (“Slightly” Altered for Illustration Purposes) Background –Central lab, i.e., eDT CHEM for biochemistry (20807 records), along with 4 other labs –No annotated CRF Mapping document initially authored using variable label
© 2009 Octagon Research Solutions, Inc. All Rights Reserved. 6 Case Study (con’t) Sponsor decisions: –Match standard results with original results, i.e., no unit conversion; therefore, LBSTRSC = LBORRES –LPARM to (LBTEST and LBTESTCD) will be done through a sponsor-supplied lookup table Easy enough, right?
High-level mapping based on source dataset metadata
© 2009 Octagon Research Solutions, Inc. All Rights Reserved. 8 Case Study (con’t) Programmer noticed errors –LBSTRESN is a numeric variable, but CHEM.LVALUE contains non-numeric data Programmer determined the mapping specifications document is not detailed enough, began to involve the analyst
© 2009 Octagon Research Solutions, Inc. All Rights Reserved. 9 Case Study (con’t) Let’s look some options at their disposal (novice to veteran): –SAS System Viewer –A creative method by an Excel-savvyA creative method by an Excel-savvy –SAS PROC FREQSAS PROC FREQ
© 2009 Octagon Research Solutions, Inc. All Rights Reserved. 10 Case Study (con’t) SAS System Viewer –Read-only, great for displaying data –Unreliable as a data browser Analyze data in Excel –Very manual –Changes of data ownership, possible “lost in translations”? “Smart” behaviors, e.g., “01JAN :00” to “1/1/ :00:00 PM”, auto-trimming, etc SAS PROC FREQ –CHEM.LVALUE: records reduced to 1237 unique values
© 2009 Octagon Research Solutions, Inc. All Rights Reserved. 11 Case Study (con’t) 4 th option –A data pattern analyzer
© 2009 Octagon Research Solutions, Inc. All Rights Reserved. 12 Case Study (con’t) –Reduced records to only 11 patterns Aha, we found the needle in the haystack! 0.3% of LVAULE is not numeric.
© 2009 Octagon Research Solutions, Inc. All Rights Reserved. 13 Case Study (con’t) –Drilled down to the actual values with non- numeric data patterns
Through issue/resolution with the sponsor, added detailed instructions for LVALUE to accommodate the non-numeric values
© 2009 Octagon Research Solutions, Inc. All Rights Reserved. 15 Another Data Pattern Example #1 Source: Character variable AEV.STOP (AE stop date), being mapped to AE Realized source is “somewhat” a free-form field –Critical data point, must handle case-by-case using regular expression (regex) technique
© 2009 Octagon Research Solutions, Inc. All Rights Reserved. 16 Another Data Pattern Example #2 Source: Character variable DOSE.DOSE_ACT (Actual dose), being mapped to EX Realized source does not always contain numbers –Used both EX.EXDOSE and EX.EXDOSTXT
© 2009 Octagon Research Solutions, Inc. All Rights Reserved. 17 Wrapping Up Integrated data profiling – a tool demo The bigger picture: –Data rules (e.g., pre-defined business rules, data standards, etc) –Data corrections Although ETL is a solution platform for CDISC SDTM data conversion, too much of it is symptom of a problem
© 2009 Octagon Research Solutions, Inc. All Rights Reserved. 18 Thank you! Anthony Chow (610) x5526