XML in a SAS World Mike Molter d-Wise Technologies
Mike Teresa Lauren Ryan Sydney
Background Author: Mike Molter Company: d-Wise Committees: CDISC XML Technologies, Phuse Working Group Best Practices Reason for presentation: Increasing prevalence of XML in our industry
Agenda What is XML? Comparison to HTML Purpose and use Examples of XML standards (schemas) Tools for working with XML (SAS and non-SAS) XML in the pharmaceutical industry
XML and HTML Made of elements Elements have names Elements are identified by a pair of tags (start tag and end tag)
XML and HTML Some elements have one or more attributes Attributes are specified as name-value pairs <element-name attribute-name1="attribute-value1" attribute-name2="attribute-value2" etc.>
XML and HTML Elements can have stuff between the start tag and end tag Nested elements
XML and HTML Elements can have stuff between the start tag and end tag One or more nested elements Element content HTML web page content or XML data
XML and HTML Each document contains a root element - an element that nests all other elements The rest of the document
HTML Hypertext Markup Language Language of the web Provides instructions to web browsers for displaying content Pre-defined elements Team Conference Division Red Wings Eastern Atlantic
What is XML? eXtensible Markup Language A data container - used for structure, storage, and transport of data (w3schools.com) Like any other computer language… textual gibberish set of rules (structural, syntax) vocabulary elements attributes tags schemas
What is XML? Like any other computer language… textual gibberish set of rules (structural, syntax) vocabulary elements attributes tags schemas Unlike other computer languages… no pre-defined element (no keywords) no processor
Mike Teresa Lauren Ryan Sydney
Eastern Atlantic Detroit Western Pacific Calgary Eastern Metropolitan New Jersey What is XML?
XML Schema XML Schema (or Language, or Vocabulary) - A specific set of elements and attributes, along with a set of rules that govern their use An XML schema can be a combination of new elements along with other XML schemas (extensible) A schema file lays out the rules of an XML language. An XML schema language is a computer language in which schema files are written. Examples: DTD, XSD An XML validator is a piece of software that uses the schema file to validate an XML file.
XML Language Examples NHL (Ok, I made this one up) XSL (eXtensible Stylesheet Language,.xsl) Transforms XML into something else XML Schema Definition (.xsd) Validates an XML document XML Spreadsheet 2003 (.xml) Read and displayed by Excel ODM, Define, Dataset-XML, Analysis Results Metadata, OpenCDISC Clinical Trials data, metadata
Teams.sas7bdat Exporting XML
filename xmlout4 'C:\teams_datastep.xml' ; data _null_ ; file xmlout4 ; set teams end=thatsit ; if _n_ eq 1 then put ' ' ; put ' ' ; put ' ' conference ' ' ; put ' ' division ' ' ; put ' ' location ' ' ; put ' ' ; if thatsit then put ' ' ; run; Exporting XML with a DATA step
libname xmlout xml 'C:\teams_generic.xml' ; data xmlout.xteams ; set teams ; run; Exporting XML with the LIBNAME statement
libname xmlout xml 'C:\teams_oracle.xml' xmltype=oracle ; data xmlout.xteams ; set teams ; run; Exporting XML with the LIBNAME statement
Exporting XML with the LIBNAME statement or ODS using tagsets libname xmlout xml 'C:\teams_tagset_libname.xml' tagset= ; data xmlout.xteams ; set teams ; run; ods markup tagset= file='C:\teams_tagset_ods.xml'; proc print noobs data=teams ; run; ods markup close ;
Exporting XML with ODS using SAS's ExcelXP tagset ods markup tagset=excelxp file='C:\teams_excel.xml'; proc print noobs data=teams ; run; ods markup close ;
References Tips and Tricks for Creating Multi-Sheet Microsoft Excel Workbooks, Vince DelGobbo, SAS Global Forum 2009 ODS Markup: The SAS Reports You've Always Dreamed of, Eric Gebhart, SUGI 30 ExcelXP on Steroids: Adding Custom Options to the ExcelXP Tagset, Mike Molter, SAS Global Forum 2011
References ExcelXP on Steroids: Adding Custom Options to the ExcelXP Tagset, SAS Global Forum 2011 ods markup tagset=myexcel file='define.xml' options (tab_color='45') ; proc print noobs data=dataset1; run; ods markup close ;
Importing XML libname xmlout xml 'C:\teams_generic.xml' ; data xmlout.xteams ; set teams ; run; Export data sasteams ; set xmlout.xteams ; run; Import
Eastern Atlantic Detroit Western Pacific Calgary Eastern Metropolitan New Jersey libname xmlin xml 'C:\teams_nhl.xml' ; data sasteam ; set xmlin.team ; run; NHL.XML SASTEAM.SAS7BDAT
XML in Pharma Operational Data Model (ODM) Collected clinical trial data, metadata, administrative data, reference data, audit information Define-XML Metadata for submitted data in ODM structure Value-level metadata is in the define extension Dataset-XML Submission data in ODM structure
XML in Pharma Analysis Results Metadata Metadata that describes the methods used for arriving at the results OpenCDISC Extension of Define-XML Describes validation checks applicable to each domain
XML in Pharma Collected Data Data Transformations Data Submission Metadata Submission ODM.XMLSAS Dataset-XML Define.XML
ODM Conventions item common element prefix represents a variable def common element suffix represents a definition ref common element suffix represents a reference to a def oid common attribute suffix object identifier represents a link to another part of the document
Clinical Data ODM ItemGroup (dataset-level) Metadata
Clinical Data ItemGroup (dataset-level) Metadata Item (variable-level) Metadata ODM
Item (variable-level) Metadata Codelist Metadata (allowable values) ODM
Define-XML
Importing XML with an XML map XMLMap is an XML schema Provides instructions to the XML LIBNAME engine for reading XML Name and Label for the data set Which XML elements define observations How to define variables (attributes and values) Uses XPath syntax to navigate the XML document and identify its components filename mymap 'C:\mymap.map' ; libname xmlin xml 'C:\nhl.xml' xmlmap=mymap; data sasteams ; set xmlin.teams ; run;
Importing XML with an XML map /nhl/team character string 20 /nhl/team/conference character string 20 Name of data set to be created Observation boundary Variable Definition
XML Mapper
Extensible Stylesheet Language (XSL) XSLT - XSL Transformations - transforms XML into something else XSL is an XML schema An XSL processor reads through an XML document and generates text according to instructions in the stylesheet XSL processors: SAS (PROC XSL) Internet Explorer
Extensible Stylesheet Language (XSL) SAS's PROC XSL creates an output file, given an input file and a stylesheet filename inxml 'C:\mysubmission\define.xml' ; filename outhtml 'C:\mysubmission\define.html' ; filename xslss 'C:\mysubmission\define.xsl' ; proc xsl in=inxml out=outhtml xsl=xslss ; run;
Extensible Stylesheet Language (XSL) Internet Explorer renders XML as HTML Define.xml via text editor Define.xml via Internet Explorer Tabulation Datasets for Study CDISC01 (SDTM-IG 3.1.2) HTML generated by XSL
Extensible Stylesheet Language (XSL) Tabulation Datasets for Study CDISC01 (SDTM-IG 3.1.2) Datasets for Study ( )
Clinical Standards Toolkit (CST) A Base SAS framework for executing clinical data tasks such as verification of data compliance against standards and importing/exporting ODM and Define.xml. Contains all necessary files (SAS macros and driver programs, maps, property files, XSL stylesheets) Learning curve
Clinical Standards Toolkit (CST) …or PROC XSL
References Using the SAS Clinical Standards Toolkit 1.5 to Import CDISC ODM Files, Lex Jansen, Pharmasug 2013 Using the SAS Clinical Standards Toolkit for Define.xml Creation, Lex Jansen, Pharmasug 2011 Accessing the Metadata from the Define.xml Using XSLT Transformation, Lex Jansen, Phuse 2010
References A SAS Programmer's Guide to Generating Define.xml, Mike Molter, SAS Global Forum 2009 ods markup tagset=mydefine file='define.xml' ; proc print noobs data=meta-dataset1; run; proc print noobs data=meta-dataset2; run; proc print noobs data=meta-dataset3; run; etc ods markup close ;
Other Resources LinkedIn Groups CDISC XML Technologies CDISC Define-XML CDISC Dataset-XML CDISC-SDTM Experts wiki.cdisc.org
In Summary… Options for Exporting XML XML LIBNAME engine (XMLTYPE=, TAGSET= options) ODS (SAS XML destinations or user-defined tagsets) DATA step XSL stylesheets CST (clinical) Options for Importing XML XML LIBNAME engine (XMLTYPE=, TAGSET= options) XML maps XSL stylesheets CST (clinical)
In Summary… So what do I need to know???