Moving from SAS to Stata: Making customized tables in RTF using -rtfutil- and other packages Thank you for your kind introduction, and the opportunity.

Slides:



Advertisements
Similar presentations
Quantitative Data Preparation Louise Corti ESDS/ UKDA Social Science Data Archives for Social Historians: creating, depositing and using qualitative data.
Advertisements

CC SQL Utilities.
Do files, log files, and workflow in Stata Biostatistics 212 Lecture 2.
Using Excel Biostatistics 212 Lecture 4. Housekeeping Questions about Lab 3? –replace vs. recode Final Project Dataset! –“Housekeeping” commands vs. data.
Using Microsoft Office Excel 2007
Generating new variables and manipulating data with STATA Biostatistics 212 Lecture 3.
©2004, 2006, 2008 UIW Department of Instructional Technology Meat and Potatoes SPSS Presented by Terence Peak.
Stat-JR: eBooks Richard Parker. Quick overview To recap… Stat-JR uses templates to perform specific functions on datasets, e.g.: – 1LevelMod fits 1-level.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
Introduction to Statistical Computing in Clinical Research Biostatistics 212 Course director: Mark Pletcher Teaching Assistant: Lee Zane.
SPSS 1: An Introduction to the Statistical Package SPSS Suzie Cro MRC Clinical Trials Unit.
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
Applications Software. Applications software is designed to perform specific tasks. There are three main types of application software: Applications packages.
P366: Lecture #1 Use of Excel for analysis Lei Chen, MD Jan 6, 2002.
Niraj J. Pandya, Element Technologies Inc., NJ.  Summarize all possible combinations of class level variables even if few categories are altogether missing.
Key Data Management Tasks in Stata
Report Management Using the ODS DOCUMENT Destination and Report Metadata Brit Harvey February 2010.
Math 3400 Computer Applications of Statistics Lecture 1 Introduction and SAS Overview.
Scientific Research in Biotechnology 5.03 – Demonstrate the use of the scientific method in the planning and development of an experimental SAE.
Just as there are many human languages, there are many computer programming languages that can be used to develop software. Some are named after people,
1 Click to edit Master title style Demographic Analysis Panel Current and Future State FDA/PhUSE CSS - Working Group 5 - Analysis Standards Script Examples.
How to read a scientific paper
First… Check with conference organisers on their specifications of size and orientation, before you start your poster eg. maximum poster size; landscape,
Introduction to Statistical Computing in Clinical Research Biostatistics 212.
United Nations Economic Commission for Europe Statistical Division The Importance of Databases in the Dissemination Process Steven Vale, UNECE.
Title of the Paper Your Name Critical Care Medicine School of Medicine University of Pittsburgh.
Creating Macros in Excel Adding Automated Functionality to Excel & Office Applications.
1 EndNote X2 Your Bibliographic Management Tool 29 September 2009 Humanities and Social Sciences Resource Teams.
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Comparison of different output options from Stata
Software AS Module Heathcote Ch. 22. Importance of Information  Information technology is fundamental to the success of any business  The information.
Customize SAS Output Using ODS Joan Dong. The Output Delivery System (ODS) gives you greater flexibility in generating, storing, and reproducing SAS procedure.
Review for MassHunter and reporting
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Software. Introduction n A computer can’t do anything without a program of instructions. n A program is a set of instructions a computer carries out.
Saving Everyone’s Time and Energy: Practical Tips for Database Design Cynthia Wilson Garvan PhD Statistics, MA Mathematics College of Nursing
Clinical database management: From raw data through study tabulations to analysis datasets Thank you for your kind introduction, and the opportunity to.
SPSS For a Beginner CHAR By Adebisi A. Abdullateef
SPSS: Using statistical software — a primer
Poster title goes here (change font size to keep within box)
1. Data Processing Sci Info Skills.
Using ODS Excel Migrating from DDE to ODS
Analyze ICD-10 Diagnosis Codes with Stata
Transportation MyEducation BC
Introduction to SPSS.
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
Getting SASsy with Enterprise Guide
Press <spacebar> to continue tutorial
Week 12 Option 3: Database Design
EVOLUTION FROM EXCEL PIVOT TABLES TO
DEPARTMENT OF COMPUTER SCIENCE
SAS Programming Introduction to SAS.
REDCap 201: Leveraging REDCap’s Advanced Features
ECONOMETRICS ii – spring 2018
The Anatomy of a Large-Scale Hypertextual Web Search Engine
ASPIRE Workshop 5: Analysis Supplementary Slides
Tamara Arenovich Tony Panzarella
Creating Macros in Excel
What’s New in Colectica 5.3 Part 2
STATA User Group September 2007
2 Handling Data Basic Medical Statistics Course October 2010
Comparative Reporting & Analysis (CR&A)
Spreadsheets, Modelling & Databases
5/8/2019 3:20 AM bQuery-Tool 3.0 A new and elegant way to create queries and ad-hoc reports on your Baan/Infor ERP LN data. This Baan session is a query.
Integrating Office 2013 Programs
Nat. Rev. Rheumatol. doi: /nrrheum
An Introduction to SPSS
A suite of community-contributed programs to produce outcome tables and graphs for demographic and survival data Not theory-heavy, Stata-heavy or anything.
Considerations for the use of multiple imputation in a noninferiority trial setting Kimberly Walters, Jie Zhou, Janet Wittes, Lisa Weissfeld Joint Statistical.
P o s t e r t i t l e g o e s h e r e ADD YOUR LOGOS HERE INTRODUCTION
Presentation transcript:

Moving from SAS to Stata: Making customized tables in RTF using -rtfutil- and other packages Thank you for your kind introduction, and the opportunity to give this talk. The title of the talk is Clinical database management: From raw data through study tabulations to analysis datasets Si litt om bakgrunn, CRO, akademia, SAS, Stata Inge Christoffer Olsen, Phd Diakonhjemmet Hospital, Norway

Background Used SAS when working for a norwegian CRO Forced into SPSS when moving to Diakonhjemmet Managed to force Stata onto the researchers at Diakonhjemmet I will begin with a quote from the famous physicist Max Planck: ” An experiment is a question which science poses to Nature and a measurement is the recording of Nature's answer” Meaning we cannot understand Nature without measurements. We need to take care of our measurements!

Background I love Stata! Programming is so efficient, clear and easy.

Background But… Programming is so efficient, clear and easy.

Stata output How do you come from here:

Final table …to here? Diagnosis Male Female Total Rheumatoid arthritis 21 57 78 Spondyloarthritis 76 15 91 Psoriatric arthritis 30 Ulcerative colitis 61 32 93 Crohn’s disease 92 63 155 Psoriasis 5 35 295 187 482

Cut and paste 1: Text tabulate diagnosis sex if visit_no==1 | Sex Diagnosis | Male Female | Total---------------------+----------------------+----------Rheumatoid arthritis | 21 57 | 78 Spondyloarthritis | 76 15 | 91 Psoriatric arthritis | 15 15 | 30 Ulcerative colitis | 61 32 | 93 Crohn's disease | 92 63 | 155 Psoriasis | 30 5 | 35 ---------------------+----------------------+---------- Total | 295 187 | 482 *Possible to remedy by using fixed width fonts, but still only text

Cut and paste 2: HTML * Sometimes OK, but unreliable Randomised and Sex Yes ---- Diagnosis Male Female Rheumatoid arthritis 21 57 Spondyloarthritis 76 15 Psoriatric arthritis 15 15 Ulcerative colitis 61 32 Crohn's disease 92 63 Psoriasis 30 5 * Sometimes OK, but unreliable

Cut and paste 3: Picture Nice, but useless

Dissapointed Only major feature where Stata is inferior to SAS or SPSS Forced to enter results manually Error prone!

Reports In SAS it was relatively easy to report tables from statistical analyses in RTF format using PROC REPORT Characteristic Statistic Abatacept N = 6 Adalimumab N = 133 Anakinra N = 2 Certol. Pegol N = 190 Etanercept N = 203 Golimumab N = 327 Infliximab N = 46 Rituximab N = 42 Tocilizumab N = 30   Age (Years) (a) n 6 133 2 190 203 327 46 42 30 Mean 47.77 42.82 46.40 51.48 45.47 44.12 46.05 55.27 53.73 Std. Dev. 15.47 13.57 27.44 14.72 13.62 13.14 13.34 12.21 12.77 Median 51.60 41.20 53.75 45.00 42.90 43.60 54.95 54.30 Min/Max 26.5/64.1 19.0/74.7 27.0/65.8 18.7/82.5 19.3/76.3 17.7/80.5 25.5/79.0 30.7/78.0 29.8/73.7 Sex, n(%) Male 1 ( 16.7) 64 ( 48.1) 0 ( 0.0) 44 ( 23.2) 80 ( 39.4) 151 ( 46.2) 20 ( 43.5) 6 ( 14.3) 4 ( 13.3) Female 5 ( 83.3) 69 ( 51.9) 2 ( 100) 146 ( 76.8) 123 ( 60.6) 176 ( 53.8) 26 ( 56.5) 36 ( 85.7) 26 ( 86.7) No. of prev. Biologics (b) 77 85 62 114 29 32 27 3.7 1.5 4.0 2.0 1.4 2.1 2.2 2.3 0.8 0.7 1.2 1.0 1.3 1.1 3.5 3/5 1/4 1/6 1/8 1/7 1/5 No. of prev. DMARDs 129 172 181 280 40 36 28 2.7 3.0 1.9 2.5 1.6 0.9 2.8 0/4 0/6 0/5 0/3 Biologics Naive, n(%) Yes 52 ( 40.3) 87 ( 50.6) 119 ( 65.7) 166 ( 59.3) 11 ( 27.5) 4 ( 11.1) 1 ( 3.6) No 6 ( 100) 77 ( 59.7) 85 ( 49.4) 62 ( 34.3) 114 ( 40.7) 29 ( 72.5) 32 ( 88.9) 27 ( 96.4)

Reports Most work a statistician or researcher do in Stata usually ends up in a report or article Most reports or articles are presented in a document, usually Word (or pdf using Latex) Unsatisfying to rely on manually entering tables from Stata output Natively it is possible to export raw results to Excel, but this still mandates a lot of manual work I hate manual work! Must be a way to produce Word/RTF tables

Solution Fortunately there is a lot of user-written programs for Stata (available through ssc) I will present how RTF-tables can be produced using the package –rtfutil- in addition to results compiling packages such as (x)contract-, -(x)collapse-, and -parmest-

Aim Typical Table 2 in an article Variable Treatment 1 Treatment 2 Difference (95% CI) Contvar 1, mean (SD) 1.5 (5.54) 0.7 (3.94) 1.29 (-0.35 - 2.94) Contvar 2, mean (SD) 1.6 (5.67) 0.7 (4.41) 1.09 (-0.68 - 2.86) Contvar 3, mean (SD) 0.3 (1.01) -0.2 (1.38) -0.02 (-0.50 - 0.47) Catvar 1, n(%) 10 (23.3) 7 (16.7) 6.6 (-10.3 - 23.5) Contvar 4, mean (SD) 0.1 (0.59) -0.2 (0.67) -0.01 (-0.27 - 0.24) Contvar 5 mean (SD) 0.1 (0.23) 0.0 (0.25) 0.08 (0.01 - 0.15) Catvar 2, n(%) 29 (87.9) 39 (92.9) -5 (-18.6 - 8.6) Contvar 6, mean (SD) 0.1 (1.28) -0.2 (1.68) 0.14 (-0.30 - 0.59)

Step 1 Use –xcollapse- by treatment to get the mean and SD The resulting dataset 1 has two lines with mean and SD for each treatment Restore the original dataset and run some regression analysis (e.g. by -mixed-) Use –margins- to get the treatment difference Use –parmest- to store the result dataset 2 Append the second dataset to the first Add some variable indicating endpoint Store in an endpoint specific temporary dataset

Step 2 Repeat step 1 for each continuous endpoint Combine all endpoint specific datasets var varlab Treatment estimate max95 min95 mean sd contvar1 Contvar 1, mean (SD) 1 1.511628 5.543645 2 0.6666667 3.942679 3 1.2947915 2.9380777 -0.3484947 contvar2 Contvar 2, mean (SD) 1.564023 5.667449 0.6947619 4.412254 1.089702 2.8553689 -0.675965 contvar3 Contvar 3, mean (SD) 0.2507936 1.008937 -0.1547619 1.375607 -0.01812485 0.46805113 -0.50430084 contvar4 Contvar 4, mean (SD) 0.0650334 0.5879224 -0.1902657 0.667424 -0.01448155 0.23825443 -0.26721752

Step 3 Create text variable using e.g. –sdecode- to make RTF-ready , varlab Treatment estimate max95 min95 mean sd text contvar1 Contvar 1, mean (SD) 1 1.511628 5.543645 \qr{1.51 (5.544)} 2 0.6666667 3.942679 \qr{0.67 (3.943)} 3 1.2947915 2.9380777 -0.3484947 \qr{1.29 (-0.35 \endash 2.94)} contvar2 Contvar 2, mean (SD) 1.564023 5.667449 \qr{1.56 (5.667)} 0.6947619 4.412254 \qr{0.69 (4.412)} 1.089702 2.8553689 -0.675965 \qr{1.09 (-0.68 \endash 2.86)} contvar3 Contvar 3, mean (SD) 0.2507936 1.008937 \qr{0.25 (1.009)} -0.1547619 1.375607 \qr{-0.15 (1.376)} -0.01812485 0.46805113 -0.50430084 \qr{-0.02 (-0.50 \endash 0.47)} contvar4 Contvar 4, mean (SD) 0.0650334 0.5879224 \qr{0.065 (0.5879)} -0.1902657 0.667424 \qr{-0.190 (0.6674)} -0.01448155 0.23825443 -0.26721752 \qr{-0.014 (-0.267 \endash 0.238)} \qr means right align, \endash is a semi-long dash

Step 4 Use –xrewide- (or reshape wide) to get to one line: xrewide text, i(var varlab) j(treatment) var varlab text1 text2 text3 contvar1 Contvar 1, mean (SD) \qr{1.51 (5.544)} \qr{0.67 (3.943)} \qr{1.29 (-0.35 \endash 2.94)} contvar2 Contvar 2, mean (SD) \qr{1.56 (5.667)} \qr{0.69 (4.412)} \qr{1.09 (-0.68 \endash 2.86)} contvar3 Contvar 3, mean (SD) \qr{0.065 (0.5879)} \qr{-0.190 (0.6674)} \qr{-0.014 (-0.267 \endash 0.238)} contvar4 Contvar 4, mean (SD) \qr{0.23 (0.988)} \qr{-0.17 (1.337)} \qr{-0.03 (-0.50 \endash 0.43)}

Step 5 Repeat step 1 to 4 for categorical and/or time-to-event variables and compile to a final results dataset Sort the dataset according to the sequence you want to present the data

Step 6 Use the –rtfutil- package to write the results dataset to RTF tempname handle2 rtfopen `handle2' using “Output/Table 2.rtf", template(minimal) replace paper(a4land) landscape use work/total, clear file write `handle2' " {\pard\b Typical Table 2 in an article \par}" _n rtfrstyle varlab text1 text2 text3, cwidths(3500 2000 2000 3000 ) local(b d e) listtab varlab text1 text2 text3, handle(`handle2') begin("`b'") delim("`d'") end("`e'") /// head("`b’ \ql{\b Variable} `d' \qr{\b Treatment 1 }`d' \qr{\b Treatment 2} `d' \qr{\b Difference (95% CI)} `e'" rtfclose `handle2'

Strengths Produce ready-to-use tables to be inserted into an article or report No manual work! Possible to include .eps figures directly into RTF document Quick once you have the results datasets

Weaknesses Quite a lot of programming (initially) Can take a lot of tweaking to get the result exactly as you want Inclusion of figures are not fully supported in RTF, need to open the document in Word and include the figure files

Tabulation Datasets (TD) Final organisation Raw output from eCRF Imported into Stata Tabulation Datasets (TD) Analysis Datasets Results dataset RTF tables

Additional tips The -project- module is fantastic for organizing and maintaining Stata projects Utilizes checksums to keep overview of unchanged files, only recently changed do-files and dependent do-files will be run

Acknowledgements The –rtfutil- module is written by Roger B. Newson The –project- module is written by Robert Picard

The end Thank you!