The Automation of the U.S. Budget Appendix Volume XML on a Shoestring The Automation of the U.S. Budget Appendix Volume 12/2/2018
Budget Documents Budget Volume Analytical Perspectives Historical Tables Appendix 12/2/2018
Structure of Appendix 34 Chapters – Major Departments, Large Independent Agencies Chapter contents contain: Numeric data - schedules by account Textual data Appropriation Language Narrative 12/2/2018
Manual System for Updating Appendix – A Simplified View 12/2/2018
Sample Hard Copy Markup 12/2/2018
Another example 12/2/2018
Why Change? OMB had promised to automate text as far back as 1979 Manual process is slow and prone to error Physical paper moves back and forth Handwritten, pencil changes Possibility for printer’s errors Limited opportunity to see updated text 12/2/2018
Obstacles and Constraints Limited funding – project used small amount left over from previous initiative Compatibility with existing process Agencies wanted to use existing MAX A-11 data entry application for entering numbers Final output had to be in GPO format OMB staff familiar with manual process Risk – publication of Budget is high profile Management supportive but concerned with failure Limited development time – window is 10 months between budget seasons 12/2/2018
History of Project 2002 – Pilot (Dept of Interior chapter) Basic technology identified and developed 2003 to 2004 – Three more chapters 2005 – Five more chapters 12/2/2018
Objectives Eliminate problems of manual process Synthesize with existing processes Preserve format through update cycle – final output must match format of other chapters Incorporate work flow process and change tracking for updating appropriation language Set stage for further use of XML 12/2/2018
Why XML? Favorable experiences Introduced to new technology at IBM seminar Advice was to start small and get acquainted with technology Attended seminars held by House of Representatives Clerk’s Office Success with Budget Volume Widespread and growing interest in XML among all parts of federal government Desire for practical knowledge of technology 12/2/2018
Why XML? XML advantages Text could be easily distributed for update and then re-aggregated No need to assemble text in different formats Output could be produced in a variety of formats – PDF for agencies and GPO code for final publication User update could be better controlled Appropriation language could be tagged using House of Representatives Bill DTD 12/2/2018
Automated Appendix Text 12/2/2018
Converting from GPO to XML Documents in GPO format were unstructured Program that converted GPO to XML had to associate text following format codes with chapter content and structure Example: GPO S3616 format locator code “bell” I09 GPO – print text in bold small caps Conversion – tag text as account title 12/2/2018
Storing XML in Database XML document is hierarchical and database tables are flat Conversion program had to: Associate each section of text with an “account” – an existing database record created for number updates Create special text “accounts” for sections of text not associated with numbers (e.g., introductory text) 12/2/2018
Updating the Text Extended MAX A-11 Data Entry application to handle text Application previously only allowed update of numbers Program was adapted to Utilize existing user access control mechanisms Allow updating of text via special custom-coded text editor embedded into application 12/2/2018
Updating Text – User Interface 12/2/2018
Updating Text – Why custom coded text editor? Agencies were familiar with MAX A-11 program Limited funding for Web based COTS tool Security and limited funding for thick client COTS workstation software Change tracking capabilities in COTS tools were limited at time of pilot Reduced variability – users did not have to use new application 12/2/2018
Work Flow Text is saved at each stage of process Appropriation language has separate work flow requirements: Congress requires proposed language to show additions and deletions Language must be vetted by general counsels in agencies and at OMB 12/2/2018
Publication Publication greatly simplified because we chose XML Available tools – open source and commercial -obviated need for special programming to extract and format text Agencies could view their chapters “on demand” – XML stylesheets can now produce document in PDF format Merging of text and numbers easier 12/2/2018
Publication Why continue to output in GPO format? Need to maintain consistency with manually prepared chapters Format is still used by Appropriations committees to markup bills Some formatting still not possible in PDF version 12/2/2018
Technical Analysis and Recommendations Conducted by X-Systems, Inc. in 2005 Recommendations – reduce “error creep” Redesign and upgrade GPO to XML conversion program Implement XML-aware error correction utility before database load Redesign and upgrade the database load program Upgrade MAX A-11 to database update program to include more rigorous error correction Minimize the practice of creating duplicate outputs 12/2/2018
Future Plans 13 additional chapters this year (2008 Budget) Remaining chapters following year for 2009 Budget 12/2/2018
Continuing Issues Controlling user update Late reception of enacted appropriation language Many bills not passed until December Converting manually prepared chapters Structure of each chapter is idiosyncratic Adaptation to new work flow process at OMB Progress could not be monitored the same way as old process 12/2/2018
Lessons Learned Plan small – adapting to a new technology is not accomplished overnight Utilize expertise from multiple outside sources Be flexible – DTD’s should not not be written in stone Get something in users’ hands early – you will get valuable feedback XML is a valuable, low cost tool – no need for extensive funding or management buy-in 12/2/2018
Link for Budget Documents http://www.whitehouse.gov/omb/budget/fy2007/ 12/2/2018