Louise S. Hadden, Lead Programmer Analyst, Abt Associates Inc.

Slides:



Advertisements
Similar presentations
HTML I. HTML Hypertext mark-up language. Uses tags to identify elements of a page so that a browser such as Internet explorer can render the page on a.
Advertisements

DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
XHTML Basics.
 Fundamentals of Web Design.  Describe the history and theory of XHTML  Understand the rules for creating valid XHTML documents  Apply a DTD to an.
XP Information Technology Center - KFUPM1 Microsoft Office FrontPage 2003 Creating a Web Site.
Project 1 Introduction to HTML.
CIS101 Introduction to Computing Week 05. Agenda Your questions Exam next week - Excel Introduction to the Internet & HTML Online HTML Resources Using.
Russell Taylor Lecturer in Computing & Business Studies.
Microsoft Office XP Illustrated Introductory, Enhanced Office Applications with Internet Explorer Integrating.
Developing a Basic Web Page with HTML
Copyright © 2003 Pearson Education, Inc. Slide 1-1 Created by Cheryl M. Hughes, Harvard University Extension School — Cambridge, MA The Web Wizard’s Guide.
1st Project Introduction to HTML.
4.01B Authoring Languages and Web Authoring Software 4.01 Examine webpage development and design.
Exploring Microsoft® Office Grauer and Barber 1 Committed to Shaping the Next Generation of IT Experts. Robert Grauer and Maryann Barber Using.
Chapter 14 Introduction to HTML
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
HTML 1 Introduction to HTML. 2 Objectives Describe the Internet and its associated key terms Describe the World Wide Web and its associated key terms.
Chapter ONE Introduction to HTML.
Web Design Basic Concepts.
A detailed guide on how to set-up your printing storefront. Please Note: Storefronts are compatible with all browsers, however for optimal use of the admin.
Chapter 1 Introduction to HTML, XHTML, and CSS
Creating a Basic Web Page
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
First things, First Do you belong in here? – 10 – 12 – Comp. Discovery or Keyboard/Comp Apps – Do you have any experience with Web Page Design?????
10 Adding Interactivity to a Web Site Section 10.1 Define scripting Summarize interactivity design guidelines Identify scripting languages Compare common.
Learning Web Design: Chapter 4. HTML  Hypertext Markup Language (HTML)  Uses tags to tell the browser the start and end of a certain kind of formatting.
HTML, XHTML, and CSS Sixth Edition Chapter 1 Introduction to HTML, XHTML, and CSS.
 2008 Pearson Education, Inc. All rights reserved Introduction to XHTML.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
Web Design and Development for E-Business By Jensen J. Zhao Copyright 2003 Prentice Hall, Inc. Web Design and Development for E-Business Jensen J. Zhao.
1 Introduction  Extensible Markup Language (XML) –Uses tags to describe the structure of a document –Simplifies the process of sharing information –Extensible.
4.01B Authoring Languages and Web Authoring Software 4.01 Examine webpage development and design.
HTML Concepts and Techniques Fifth Edition Chapter 1 Introduction to HTML.
Chapter 1 Introduction to HTML, XHTML, and CSS HTML5 & CSS 7 th Edition.
Build your Metadata with PROC CONTENTS and ODS OUTPUT Louise S. Hadden Abt Associates Inc.
Better Metadata Through SAS® II: %SYSFUNC, PROC DATASETS, and Dictionary Tables.
Web Page Programming Terms. Chapter 1 Objectives Describe Internet and Understand Key terms Describe World Wide Web and its Key terms Identify types and.
1 Section 4 Web Skills InternetWebHTML. 2 The difference between the Internet and the Web The Internet is a way of linking large multi-user computers.
HTML PROJECT #1 Project 1 Introduction to HTML. HTML Project 1: Introduction to HTML 2 Project Objectives 1.Describe the Internet and its associated key.
Microsoft FrontPage 2003 Illustrated Complete Creating a Web Site.
9.1 The Need for Integrating Data among Different Types of Software Tasks of composing a project.
Chapter 1 Getting Started with ASP.NET Objectives Why ASP? To get familiar with our IDE (Integrated Development Environment ), Visual Studio. Understand.
Section 10.1 Define scripting
DHTML.
CONTENT MANAGEMENT SYSTEM CSIR-NISCAIR, New Delhi
XML Introduction Bill Jerome.
Project 1 Introduction to HTML.
Chapter 1 Introduction to HTML
Creating Dynamic Web Pages with FrontPage Barry Sosinsky Valda Hilley
Chapter 1 Introduction to HTML.
XML QUESTIONS AND ANSWERS
XHTML Basics.
Project 1 Introduction to HTML.
Introduction to XHTML.
Section 10.1 YOU WILL LEARN TO… Define scripting
Central Document Library Quick Reference User Guide View User Guide
XHTML Basics.
XHTML Basics.
Translation Workspace File Filters
PROC DOC III: Self-generating Codebooks Using SAS®
Tutorial Developing a Basic Web Page
HTML 5 Training HTML 5 SYMANTICS [Notes to trainer:
XHTML Basics.
Tutorial 7 – Integrating Access With the Web and With Other Programs
XHTML Basics.
CSE591: Data Mining by H. Liu
Intro Project Introduction to HTML.
Web Programming : Building Internet Applications Chris Bates CSE :
Web Application Development Using PHP
Presentation transcript:

Louise S. Hadden, Lead Programmer Analyst, Abt Associates Inc. Taking XML’s Measure: Using SAS® to Read In and Create XML for Analytic Use and Websites Presenter Louise S. Hadden, Lead Programmer Analyst, Abt Associates Inc. Louise Hadden has been using and loving SAS since the days of punch cards and computers the size of a tiny house. She spends most of her time in support of health policy analytics at Abt Associates Inc. and loves a good SAS reporting challenge. She is an ardent life long learner and reads voraciously, loves photography and volunteers at the MSPCA Boston Adoption Center walking, training and photographing dogs.

Taking XML’s Measure: Using SAS® to Read In and Create XML for Analytic Use and Websites

Introduction XML has become a standard over the years for populating websites and transferring information. This presentation demonstrates how to parse mystery XML files, read in XML files that you can’t right-click on, read into Microsoft Excel using SAS®, how to use maps and schemas to input and output various XML representations, and how to construct and output “measure code” data sets from input data to maximize the flexibility of XML data representation and usage.

Introduction XML HTML Markup Language used to build static web pages HTML5 Latest version of HTML with support for multimedia XML Extensible markup language XHTML XML that mirrors HTML in syntax and adds hypertext capability DHTML Dynamic HTML KML ML to display geographic data HTML based on SGML - Standard Generalized Markup Language (SGML; ISO 8879:1986) is a standard for defining generalized markup languages for documents – it has specifications that indicate how to present structured data DHTML is a combination of static code and scripting languages like JavaScript used to create interactive and animated websites KML is a tag-based structure based on an XML standard used with an “earth” browser. ip: To see the KML "code" for a feature in Google Earth, you can simply right-click the feature in the 3D Viewer of Google Earth and select Copy. Then Paste the contents of the clipboard into any text editor. The visual feature displayed in Google Earth is converted into its KML text equivalent. Be sure to experiment with this feature. What do ALL of these have in common? They are markup languages

Introduction Markup Languages Wikipedia tells us “Markup Language is a system for annotating a document in a way that is syntactically distinguishable from the text.” The origin of markup language is from when people marked up / edited paper manuscripts with a blue pencil. Markup languages are designed for the processing, definition and presentation of text. The language specifies code for formatting and the code used are called tags. / a computer language that uses tags to define elements with a document – XML is “extensible” because custom tags can support a wide range of elements. Do tags and ODS Markup sound familiar? ODS MARKUP is the precursor to a number of SAS destinations. Tagsets are another type of template that enable you to create your own ODS Markup destinations. Many of the SAS tagsets and destinations we use today were developed as variations on a theme

Introduction More on XML files SAS has a very informative document at http://support.sas.com/rnd/base/ods/templateFAQ/Temp late_xml.html#overview SAS Tip sheets are also available for both 9.3 and 9.4. At the link above, included in the paper, is a great explanation of what XML files look like relative to SAS. This is a 20 minute presentation and I won’t have time to delve too far into this discussion, especially as SAS has done a much better job than I would. We will look at an XML file used for the examples of reading in XML below. Incidentally, Office Open XML (also informally known as OOXML or Microsoft Open XML (MOX)) is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Starting with the 2007 Microsoft Office system, Microsoft Office uses the XML-based file formats. SAS files also follow this format – EGP files are ALSO zipped XML (Troy Hughes has an amazing paper on parsing EGP files.)  

Introduction Basic XML Concepts XML documents DTD XML maps or schemas Nodes Relationships DTD XML maps or schemas XSL styles sheets XML documents should be “well formed”; that is, it should match the rules for XML. All XML documents should have a “root” tag (this is the data set internal name) and all tags should be terminated with a closing tag. DTD is a Document Type Definition and may or may not be included in an XML document – it functions like a codebook but does not specify all nodes, just levels. It can be part of an XML document or separate. The XML schema is more complete and accurately maps all nodes. Style sheets can be incorporated to customize XML output when rendered.

Methods for Reading XML in IRL Example If you haven’t used Lex Jansen’s site yet you are in for a treat. Lex has assembled a vast quantity of SAS papers from many conferences, with a robust search routine. The word cloud is of authors and it is not a surprise that Art Carpenter and Kirk Lafler are front and center. I was gratified to make it on the cloud next to Maura Stokes. At my request, Lex added the ability to download the results of a search to XML or JSON (any other outputs you’d like to see? Let Lex know.)

Methods for Reading XML in IRL Example Download the results to XML

Methods for Reading XML in IRL Example This is what the output XML file looks like, and it will be the basis of our IN examples.  XML documents are hierarchical, and include “nodes”. All XML documents start with a root node, the first one shown above. Often there will be an informative node with some metadata (the second one shown above.) Other nodes may be “nested” inside each other in a loop – a node called Paper is shown here.

Methods for Reading XML In Open Microsoft Excel and Open XML You will be prompted as to how you want to read in the file

Methods for Reading XML In Open Microsoft Excel and Open XML You might get some errors – can anyone guess why?

Methods for Reading XML In Open Microsoft Excel and Open XML Although there was an error, the read in was relatively smooth – there were fields split, etc. but you get the gist. It’s not good enough for production.

Methods for Reading XML In Read In Using Automap filename ajw '.\AlanWhite.xml'; filename map '.\Map\AJW.map'; libname ajw xmlv2 automap=replace xmlmap=map; proc contents data=ajw._all_ ; run; Any given XML file may not fit into a rectangular construct and/or you may get import errors as we saw below. Another method is to use an XML libname and the automap statement.

Methods for Reading XML In Read In Using Automap This shows the auto-generated map. Note I had to submit with UTF-8 because of Unicode characters in the data. Since this data includes paper titles can anyone guess what causes this?

Methods for Reading XML In SAS XML Mapper You can also use SAS XML Mapper to explore XML files and if needed, generate a map or schema. XML mapper is a separately installed Java based tool available for both versions 9.3 and 9.4. It is available on installation packages and is free to download from the Base SAS Focus area on support.sas.com. Note your platform must have Java available in order for the tool to work. Here you see the exploration of an XML source file.

Methods for Reading XML In SAS XML Mapper Here you see the creation of an XML map or Schema through the tool.

Methods for Reading XML In Read In Using an Existing Map filename ajw '.\AlanWhite.xml'; filename map '.\Map\AJW.map'; libname ajw xmlv2 xmlmap=map; Once a map is constructed, either by the automap option, XML mapper or by another method, XML documents can be read in using the above code.

Capitalizing on the “Extensible” Case Study: Nursing Home Compare Since 1998, the U.S. Centers for Medicare and Medicaid Services (CMS) has maintained a website, Nursing Home Compare, which provides detailed quality information about every certified nursing home in the country. In December 2008, CMS greatly enhanced the usability of the website by adding an easy-to-understand 5-star rating. Each nursing home receives one to five stars based on performance in each of three key quality domains (health inspections, reported staffing levels, and quality measures derived from mandated assessments of resident health and well-being) plus an overall quality rating. Calculation of ratings requires integration of information from both facility and resident-level data sources. SAS® is used extensively in analysis to support the development of the rating system, and it is currently used to process data to refresh the ratings each month, based on newly collected data in each domain. The data, a massive amount, is transferred to the vendor in XML format.

Capitalizing on the “Extensible” Concepts Many files from many different sources go into the Nursing Home XML Original files were at different levels, for example nursing home residents versus providers Thousands of elements or nodes

Capitalizing on the “Extensible” XML Output ODS MARKUP BODY=TEST.XML; PROC PRINT DATA=TEST; RUN: ODS MARKUP CLOSE; The simplest way to create XML is: XML is the default output for ods markup. You can also specify a DTD (frame=

Methods for Reading XML in Creating Measures data msr_ownership (keep=provnum msr_cd value occurrence ftnt filedate); length provnum $ 6 msr_cd $ 20 value $ 120 ftnt $ 12 occurrence $ 3 filedate $ 8; set dd.owner_ocr; . . . msr_cd = 'ASSOCDATE'; value = assoc_date_text; if value ne ' ' then output ; Run; Rather than having thousands of variables with different specifications, variable names are transformed into measure codes. Variable names are associated with values, footnotes, and occurrences. All VALUES are text.

Methods for Reading XML in Creating Measures proc sql ; create table out.msr_Owners_&fileyear.&filedate. as select PROVNUM as PID ,MSR_CD as MCD ,occurrence as OCR ,VALUE as SV ,ftnt as FN ,"Text" as ST from msr_ownership order by PID, MCD, OCR; quit; The file at the provider level (provider will be the level of observation) is converted into a file with consistent naming conventions.

Methods for Reading XML in Creating Measures PID 015009 MCD ASSOCDATE OCR 1 SV since 09/01/1969 FN ST Text As you can see, the values are transformed into a consistent pattern.

Capitalizing on the “Extensible” XML Output As with reading XML output in, the SAS XML libname engine and a schema or map are employed. filename mapt ".\map\MapTemplate.map"; filename map ".\map\MapModified.map"; libname temp1 xml92 xmltype=xmlmap xmlmap= map; The map is modified slightly, and XML header information is created via hard code. This information is data driven and changes each month.

Capitalizing on the “Extensible” XML Output As with reading XML output in, the SAS XML libname engine and a schema or map are employed. filename map ".\map\MapModified.map"; libname temp1 xml92 xmltype=xmlmap xmlmap= map; filename out ".\XML\&outnm.XMLOut.xml"; The map is modified slightly, and XML header information is created via hard code. This information is data driven and changes each month. XML nodes are created using the XML map, then appended to the header information to create well-formed XML.

Capitalizing on the “Extensible” XML Output Here are screen shots of snippets of the components of the completed XML output file. Here is the standard header generated by the XML engine.

Capitalizing on the “Extensible” XML Output This is a data driven header created at the request of the web site vendor.

Capitalizing on the “Extensible” XML Output This is the start to a provider record. There are thousands of “measure codes” so only one is shown here for demonstration purposes.

Conclusion SAS has provided many tools to both read XML into SAS and to output XML from SAS. The use of “measure code” transformation greatly extends the power and flexibility of XML generation from SAS. “Wit beyond measure is man's greatest treasure.” J.K. Rowling

Acknowledgements The author wishes to thank Chevell Parker of SAS, a former colleague Fred Pratter, a current colleague Nancy McGarry, Troy Martin Hughes and Lex Jansen for mentoring and inspiring me. Thanks also to the Nursing Home Compare project team for providing a welcome challenge to encourage finding new ways to improve our data processing and transfer, especially Christianna Williams of Abt Associates and Zach Sarver of CGI.

Contact Information Your comments and questions are valued and encouraged. Contact the author at: Louise S. Hadden Abt Associates Inc. 617-349-2385 Louise_hadden@abtassoc.com abtassociates.com   SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.