Presentation is loading. Please wait.

Presentation is loading. Please wait.

Script Metadata for Sharing

Similar presentations


Presentation on theme: "Script Metadata for Sharing"— Presentation transcript:

1 Script Metadata for Sharing
Standard Analysis and Code Sharing Working Group 23 January, 2019 Contributor & Presenter: Bob Friedman, Xybion Project Lead: Hanming Tu, Frontage Laboratories

2 Agenda Working Group background Script Metadata for Sharing Project
Issues Scope Objectives & Goals Key Technology and Knowledge Project Timelines and Tasks

3 Background: Standard Gap Vision: Fill the Gap on Analysis & Displays Standards
Data Collection Systems Observed Datasets Analysis Datasets Tables, Figures and Listings Clinical Data Flow Trial Design PRM SDTM ADaM No Standards Exist Industry Standards Alignment CDASH (Therapeutic Areas)

4 Background: Common Toolbox
Vision: From Everyone Building Their Own Tools to Shared Tools Text Shared Value Shared tools Paradigm shift: contribution to build a better world Shared Tools increase the value of the tools

5 Background: Script Repository
Vision: Collaboration platform and Shared Reusable Code Library Script Repository in Github White Paper Scripts Contributed Scripts Industry FDA Academia Test Data Script Metadata SAS, R, Spotfire, Etc. Code Test Environment

6 Working Group: Suggested New Structure
6 active Projects and many Sub-projects: Script Repository (4 Sub projects) Test Data Factory (TDF, Peter Schaefer) Analyses and Display White Papers (ADW, Mary Nilsson) Communication, Promotion, Education (CPE, Jared Slain and Wendy Dobson) Script Discovery and Acquisition (SDA, Rebeka Revis, Alfredo Rojas) Repository Content and Delivery (RCD, Gustav Bernard, Andrew Miskell) Repository Governance and Infrastructure (RGI, Hanming Tu, Mike Carniello) Script metadata for Sharing (SMS, Hanming Tu) Best Practices for Quality Control and Validation (Maria Dalton and Jane Marrer) GPP in Macro Development (Ninan Luke)

7 Background: Scripts Scripts developed by volunteers
6 Script-a-thons (plus additional work by project members) resulting in several scripts at various stages Scripts developed based on white paper Scripts contributed by other groups FDA: Non-clinical: Data Handle: Spotfire Templates:

8 Script Metadata for Sharing
Issue statement: Not easy to find scripts due to Script metadata are not defined thus the metadata files are not consistent; The index page based on metadata files are not updated promptly Not easy to navigate in the repository due to Scripts are not well organized; The folders are deep and complicated Not easy to use the scripts due to Need to download the scripts; Modify the scripts. You usually need to make change to the original script to make it work in your own environment.

9 Project Scope Definition: Metadata is information about the structures that contain the actual data. A metadata repository is a database used to store, organize, manage, relate/link, and share metadata. We are NOT to create a MDR system We are only to define script metadata Standards: ISO/IEC (formally known as the ISO/IEC Metadata Registry (MDR) standard) is an international standard for representing metadata for an organization in a metadata registry. Features: Generally contains four classifications: ownership, descriptive characteristics, rules and policies, and physical characteristics. The major categories of structure types are thesaurus, taxonomy, and ontology. Business cases: Statistical Computation Environment Script development life cycle Define script metadata for Sharing

10 Project Objectives Script metadata provides the information about the scripts purpose, version, execution environment, library and data files used, inputs, outputs, review history, ratings etc. The metadata will make it easy to share, access and execute scripts in the repository. Objectives Clearly define script metadata in a white paper Conduct a proof of concept (PoC) using the script metadata to drive the sharing and executing scripts in the repository. Test in both R and SAS scripts

11 Key Technology and Knowledge
Git and Github YML for Script metadata Metadata Standards R, Rstudio, R Packages SAS

12 Technology Readiness Git and GitHub YML for Script metadata
Have you created a GitHub account? Have you downloaded a SVN client? Have you obtained the PhUSE source code YML for Script metadata Read about YML? Read about Metadata Standards? R, Rstudio, R Packages Installed R, Rstudio? Installed DevTools and PhUSE package? SAS: do you have SAS on your computer?

13 Git and GitHub About Git and GitHub Git Client Github Readme First
Git e-book GitHub for beginners Learn: Code Academy quick video tutorial Git Workflows - "Centralized" is a good starting point GitHub Workflow - Working with Forks Git Client GitHub Clients Using Git in R/RStudio Windows: Download TortoiseSVN Mac: SnailSVN SmartSVN: download

14 Script Metadata: Format
YML is the chosen format Keywords: Test, Metadata, Dual Box Script: name : metadata_example_rep.yml title : Metadata example on local drive desc : > This script demonstrates how to use YML to store the metadata about your program and define your input parameters and their values. version: 0.1.1 Language: name : YML version: x.x.x Environment: system: Linux or Window 2010 os_version: OEL 5.8, Window 2010 desc: Description of the computing environment. Inputs: datasets: dm.xpt,ae.xpt,testfile.xlsx p1: String - dataset name p2: Number - depart id Outputs: datasets: out1, out2, out3 v1: Date - script execution date and time v2: User - user who executes the script Repo: base_dir: lib_files: Func_comm.R Authors: - name : Jon Doo company: PhUSE Qualification: last_date: DD-MON-YYYY last_by: FirstName LastName stage: T doc_url: a link to latest documentation note: C - Contributed; D - Development; T - Testing; Q - Qualified Stages: - date: 01-JAN-2016 name: Jon1 Doo stage: C docs: a link to qualification documents Ratings: - user: htu date: 25-AUG-2017 asso: Accenture stars: 5 # end of file YML is a short name for YAML Yet Another Markup Language YAML Ain't Markup Language YML is a data serialization language that can be read by both human and machine

15 Script Metadata: 8 main Groups
Keywords: used to categorize the script such as analysis, boxplot, etc. Script: defines the name, version, short and long description of the script. Language: provides script language info such as SAS 9.4.0, R 3.4.0, etc. Environment: provides the computing environment of the script language and its configuration. Inputs: defines the input datasets and parameters required for the successful execution of the scripts. Required. Parameters: defines the input parameters used by a script. Required. Outputs: provides the expected output datasets, tables, graphs, etc. Repo: provides the hosting repository information. Required.

16 Script Metadata: 4 secondary Groups
Authors: documents the developers who create or contribute to the development and qualification of the script. Qualification: documents the qualification state and process. Stages: provides the historical status of the scripts. Ratings: records the users who review and give the rating about the script.

17 R shiny and PhUSE package
Use shiny as interface to develop PhUSE web application framework R is an open source programming language and software environment for statistical computing and graphics R Package is the fundamental unit of shareable code bundled with data, tests, examples, and documentation. RStudio is a free and open-source integrated development environment (IDE) for R. Rstudio project helps you organize your development and build of a R code or a Package.

18 R shiny and PhUSE package
The PhUSE package sets a web application framework Shiny is an R package for building interactive web apps straight from R. The PhUSE package helps finding, downloading and executing scripts. How to get R PhUSE Package How to run R PhUSE Package Or directly install once the package accepted by CRAN Clone the PhUSE-scripts repository to your local computer if you are the first time to start the interface or the local repository is old Grep all the YML files from the local repository Build a data frame to hold the information for all YML files Write the data frame to a local file Populate the “Select Script” dropdown list install.packages("devtools") library(devtools) install_github(”TuCai/PhUSE") library(PhUSE) start_PhUSE() install.packages(“PhUSE”)

19 The PhUSE R Package Script: displays the script if it is readable.
YML: displays the content of YML Info: displays the information about the YML Metadata: shows the metadata of the script in table format Verify: verifies the existence of the files defined in YML Download: downloads the script to local computer Merge: merges online and local metadata files Execute: executes the script if it is executable. # phuse functions: build_script_df read_yml extract_fns download_fns cvt_list2df merge_lists run_examples get_inputs

20 PoC: SAS and R examples Whitepaper deliverables
Analyses and Displays Associated with Measures of Central Tendency - Focus on Vital Sign, Electrocardiogram, and Laboratory Analyte Measurements in Phase 2-4 Clinical Trials and Integrated Submission Documents (PDF Version) 10Oct2013 Figure 7-2 Box Plot – Change in xxx Over Time WPCT SAS: Source: YML: R: Source: YML:

21 PoC: R Program and YML R Program YML file library(phuse)
# 1. get the script name and YML name # the fn() is from phuse web frame work to provide selected YML le full path and name yml_fn <- fn(); inYML <- get_inputs(yml_fn) cfgYML <- read_yml(yml_fn) # we need to remove any R Shiny specification here dn <- ifelse(is.null(input$p1), cfgYML$Inputs$p1, input$p1); nn <- ifelse(is.null(input$p2), cfgYML$Inputs$p2, input$p2); d <- eval(call(dn, nn)) t <- paste(dn, "(", nn, ")", sep = "") hist(d, main = t,col="#75AADB", border = "white") YML file Inputs: datasets: p1: rexp p2: 100 RShinyUIs : - control: radioButtons("p1","Distribution type:",c("Normal"="rnorm","Uniform"="runif","Log-normal"="rlnorm","Exponential"="rexp")) - control: sliderInput("p2","Number of observations:",value = 500,min = 1,max = 1000)

22 PoC: R– more complicated example
R Program # creation of accompanying datatable buildtable <- function(avalue, dfname, by1, by2, dignum){ byvarslist <- c(by1,by2) summary <- eval(dfname)[,list( n = .N, mean = round(mean(eval(avalue), na.rm = TRUE), digits=dignum), sd = round(sd(eval(avalue), na.rm = TRUE), digits=dignum+1), min = round(min(eval(avalue), na.rm = TRUE), digits=dignum), q1 = round(quantile(eval(avalue), .25, na.rm = TRUE), digits=dignum), mediam = round(median(eval(avalue), na.rm = TRUE), digits=dignum), YML file Inputs: datasets: advs.xpt RShinyUIs : - control: radioButtons("ft","File type:",c("PNG"="PNG","TIFF"="TIFF","JPEG"="JPEG")) - control: radioButtons("usn","Use short names?",c("TRUE"="TRUE","FALSE"="FALSE")) - control: radioButtons("upf","Subset on population flag?",c("TRUE"="TRUE","FALSE"="FALSE")) - control: radioButtons("trn","Treatment name?",c("TRTA"="TRTA","TRTP"="TRTP")) - control: sliderInput("perpage","Number per page:",value = 6,min = 1,max = 20)

23 Final Deliverables Script Metadata for Sharing Whitepaper
Public review: done by Dec 13, 2018 PhUSE Web Application Framework:

24 How to participate Sign up for the PhUSE working group mailings
From phusewiki.org, click “Join a Working Group Now” Standard Scripts Groups CSS-WG-Standard-Scripts (Entire Working Group) CSS-WG-Standard-Scripts-WhitePapers (White Paper Project Team) CSS-WG-Standard-Scripts-Platform (Script Repository) CSS-WG-SS-WhitePaperReviewers (Notified when a white paper is ready for review) See wiki pages for each of the projects (

25 Q & A Contact Information Hanming Tu, VP of Clinical IT & Database Administration Phone: ; C: Bob Friedman , Phone: ; C:

26


Download ppt "Script Metadata for Sharing"

Similar presentations


Ads by Google