Download presentation
Presentation is loading. Please wait.
Published byΠαρασκευή Αλεξάκης Modified over 5 years ago
1
Louise S. Hadden, Lead Programmer Analyst, Abt Associates Inc.
Taking XML’s Measure: Using SAS® to Read In and Create XML for Analytic Use and Websites Presenter Louise S. Hadden, Lead Programmer Analyst, Abt Associates Inc. Louise Hadden has been using and loving SAS since the days of punch cards and computers the size of a tiny house. She spends most of her time in support of health policy analytics at Abt Associates Inc. and loves a good SAS reporting challenge. She is an ardent life long learner and reads voraciously, loves photography and volunteers at the MSPCA Boston Adoption Center walking, training and photographing dogs.
2
Taking XML’s Measure: Using SAS® to Read In and Create XML for Analytic Use and Websites
3
Introduction XML has become a standard over the years for populating websites and transferring information. This presentation demonstrates how to parse mystery XML files, read in XML files that you can’t right-click on, read into Microsoft Excel using SAS®, how to use maps and schemas to input and output various XML representations, and how to construct and output “measure code” data sets from input data to maximize the flexibility of XML data representation and usage.
4
Introduction XML HTML Markup Language used to build static web pages
HTML5 Latest version of HTML with support for multimedia XML Extensible markup language XHTML XML that mirrors HTML in syntax and adds hypertext capability DHTML Dynamic HTML KML ML to display geographic data HTML based on SGML - Standard Generalized Markup Language (SGML; ISO 8879:1986) is a standard for defining generalized markup languages for documents – it has specifications that indicate how to present structured data DHTML is a combination of static code and scripting languages like JavaScript used to create interactive and animated websites KML is a tag-based structure based on an XML standard used with an “earth” browser. ip: To see the KML "code" for a feature in Google Earth, you can simply right-click the feature in the 3D Viewer of Google Earth and select Copy. Then Paste the contents of the clipboard into any text editor. The visual feature displayed in Google Earth is converted into its KML text equivalent. Be sure to experiment with this feature. What do ALL of these have in common? They are markup languages
5
Introduction Markup Languages Wikipedia tells us “Markup Language is a system for annotating a document in a way that is syntactically distinguishable from the text.” The origin of markup language is from when people marked up / edited paper manuscripts with a blue pencil. Markup languages are designed for the processing, definition and presentation of text. The language specifies code for formatting and the code used are called tags. / a computer language that uses tags to define elements with a document – XML is “extensible” because custom tags can support a wide range of elements. Do tags and ODS Markup sound familiar? ODS MARKUP is the precursor to a number of SAS destinations. Tagsets are another type of template that enable you to create your own ODS Markup destinations. Many of the SAS tagsets and destinations we use today were developed as variations on a theme
6
Introduction More on XML files
SAS has a very informative document at late_xml.html#overview SAS Tip sheets are also available for both 9.3 and 9.4. At the link above, included in the paper, is a great explanation of what XML files look like relative to SAS. This is a 20 minute presentation and I won’t have time to delve too far into this discussion, especially as SAS has done a much better job than I would. We will look at an XML file used for the examples of reading in XML below. Incidentally, Office Open XML (also informally known as OOXML or Microsoft Open XML (MOX)) is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Starting with the Microsoft Office system, Microsoft Office uses the XML-based file formats. SAS files also follow this format – EGP files are ALSO zipped XML (Troy Hughes has an amazing paper on parsing EGP files.)
7
Introduction Basic XML Concepts XML documents DTD XML maps or schemas
Nodes Relationships DTD XML maps or schemas XSL styles sheets XML documents should be “well formed”; that is, it should match the rules for XML. All XML documents should have a “root” tag (this is the data set internal name) and all tags should be terminated with a closing tag. DTD is a Document Type Definition and may or may not be included in an XML document – it functions like a codebook but does not specify all nodes, just levels. It can be part of an XML document or separate. The XML schema is more complete and accurately maps all nodes. Style sheets can be incorporated to customize XML output when rendered.
8
Methods for Reading XML in
IRL Example If you haven’t used Lex Jansen’s site yet you are in for a treat. Lex has assembled a vast quantity of SAS papers from many conferences, with a robust search routine. The word cloud is of authors and it is not a surprise that Art Carpenter and Kirk Lafler are front and center. I was gratified to make it on the cloud next to Maura Stokes. At my request, Lex added the ability to download the results of a search to XML or JSON (any other outputs you’d like to see? Let Lex know.)
9
Methods for Reading XML in
IRL Example Download the results to XML
10
Methods for Reading XML in
IRL Example This is what the output XML file looks like, and it will be the basis of our IN examples. XML documents are hierarchical, and include “nodes”. All XML documents start with a root node, the first one shown above. Often there will be an informative node with some metadata (the second one shown above.) Other nodes may be “nested” inside each other in a loop – a node called Paper is shown here.
11
Methods for Reading XML In
Open Microsoft Excel and Open XML You will be prompted as to how you want to read in the file
12
Methods for Reading XML In
Open Microsoft Excel and Open XML You might get some errors – can anyone guess why?
13
Methods for Reading XML In
Open Microsoft Excel and Open XML Although there was an error, the read in was relatively smooth – there were fields split, etc. but you get the gist. It’s not good enough for production.
14
Methods for Reading XML In
Read In Using Automap filename ajw '.\AlanWhite.xml'; filename map '.\Map\AJW.map'; libname ajw xmlv2 automap=replace xmlmap=map; proc contents data=ajw._all_ ; run; Any given XML file may not fit into a rectangular construct and/or you may get import errors as we saw below. Another method is to use an XML libname and the automap statement.
15
Methods for Reading XML In
Read In Using Automap This shows the auto-generated map. Note I had to submit with UTF-8 because of Unicode characters in the data. Since this data includes paper titles can anyone guess what causes this?
16
Methods for Reading XML In
SAS XML Mapper You can also use SAS XML Mapper to explore XML files and if needed, generate a map or schema. XML mapper is a separately installed Java based tool available for both versions 9.3 and 9.4. It is available on installation packages and is free to download from the Base SAS Focus area on support.sas.com. Note your platform must have Java available in order for the tool to work. Here you see the exploration of an XML source file.
17
Methods for Reading XML In
SAS XML Mapper Here you see the creation of an XML map or Schema through the tool.
18
Methods for Reading XML In
Read In Using an Existing Map filename ajw '.\AlanWhite.xml'; filename map '.\Map\AJW.map'; libname ajw xmlv2 xmlmap=map; Once a map is constructed, either by the automap option, XML mapper or by another method, XML documents can be read in using the above code.
19
Capitalizing on the “Extensible”
Case Study: Nursing Home Compare Since 1998, the U.S. Centers for Medicare and Medicaid Services (CMS) has maintained a website, Nursing Home Compare, which provides detailed quality information about every certified nursing home in the country. In December 2008, CMS greatly enhanced the usability of the website by adding an easy-to-understand 5-star rating. Each nursing home receives one to five stars based on performance in each of three key quality domains (health inspections, reported staffing levels, and quality measures derived from mandated assessments of resident health and well-being) plus an overall quality rating. Calculation of ratings requires integration of information from both facility and resident-level data sources. SAS® is used extensively in analysis to support the development of the rating system, and it is currently used to process data to refresh the ratings each month, based on newly collected data in each domain. The data, a massive amount, is transferred to the vendor in XML format.
20
Capitalizing on the “Extensible”
Concepts Many files from many different sources go into the Nursing Home XML Original files were at different levels, for example nursing home residents versus providers Thousands of elements or nodes
21
Capitalizing on the “Extensible”
XML Output ODS MARKUP BODY=TEST.XML; PROC PRINT DATA=TEST; RUN: ODS MARKUP CLOSE; The simplest way to create XML is: XML is the default output for ods markup. You can also specify a DTD (frame=
22
Methods for Reading XML in
Creating Measures data msr_ownership (keep=provnum msr_cd value occurrence ftnt filedate); length provnum $ 6 msr_cd $ 20 value $ 120 ftnt $ 12 occurrence $ 3 filedate $ 8; set dd.owner_ocr; msr_cd = 'ASSOCDATE'; value = assoc_date_text; if value ne ' ' then output ; Run; Rather than having thousands of variables with different specifications, variable names are transformed into measure codes. Variable names are associated with values, footnotes, and occurrences. All VALUES are text.
23
Methods for Reading XML in
Creating Measures proc sql ; create table out.msr_Owners_&fileyear.&filedate. as select PROVNUM as PID ,MSR_CD as MCD ,occurrence as OCR ,VALUE as SV ,ftnt as FN ,"Text" as ST from msr_ownership order by PID, MCD, OCR; quit; The file at the provider level (provider will be the level of observation) is converted into a file with consistent naming conventions.
24
Methods for Reading XML in
Creating Measures PID MCD ASSOCDATE OCR 1 SV since 09/01/1969 FN ST Text As you can see, the values are transformed into a consistent pattern.
25
Capitalizing on the “Extensible”
XML Output As with reading XML output in, the SAS XML libname engine and a schema or map are employed. filename mapt ".\map\MapTemplate.map"; filename map ".\map\MapModified.map"; libname temp1 xml92 xmltype=xmlmap xmlmap= map; The map is modified slightly, and XML header information is created via hard code. This information is data driven and changes each month.
26
Capitalizing on the “Extensible”
XML Output As with reading XML output in, the SAS XML libname engine and a schema or map are employed. filename map ".\map\MapModified.map"; libname temp1 xml92 xmltype=xmlmap xmlmap= map; filename out ".\XML\&outnm.XMLOut.xml"; The map is modified slightly, and XML header information is created via hard code. This information is data driven and changes each month. XML nodes are created using the XML map, then appended to the header information to create well-formed XML.
27
Capitalizing on the “Extensible”
XML Output Here are screen shots of snippets of the components of the completed XML output file. Here is the standard header generated by the XML engine.
28
Capitalizing on the “Extensible”
XML Output This is a data driven header created at the request of the web site vendor.
29
Capitalizing on the “Extensible”
XML Output This is the start to a provider record. There are thousands of “measure codes” so only one is shown here for demonstration purposes.
30
Conclusion SAS has provided many tools to both read XML into SAS and to output XML from SAS. The use of “measure code” transformation greatly extends the power and flexibility of XML generation from SAS. “Wit beyond measure is man's greatest treasure.” J.K. Rowling
31
Acknowledgements The author wishes to thank Chevell Parker of SAS, a former colleague Fred Pratter, a current colleague Nancy McGarry, Troy Martin Hughes and Lex Jansen for mentoring and inspiring me. Thanks also to the Nursing Home Compare project team for providing a welcome challenge to encourage finding new ways to improve our data processing and transfer, especially Christianna Williams of Abt Associates and Zach Sarver of CGI.
32
Contact Information Your comments and questions are valued and encouraged. Contact the author at: Louise S. Hadden Abt Associates Inc. abtassociates.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.