The Continuing Evolution of Generalized Systems at Statistics Canada for Business Survey Processing Chris Mohl Statistics Canada
Outline Why Generalize? Factors Influencing the Evolution The Systems Development, Support and Maintenance Lessons Learned Possible Future Activities Conclusions
Why Generalize Systems? Fully researched methods Thoroughly tested Complete documentation Expert support team Minimal user programming required – improves timeliness Coherent methods across surveys
Factors Influencing the Evolution Changes in technology Mainframe to PC/UNIX processing Some underlying software no longer supported Statistics Canadas SAS site license Need for new or more sophisticated methods
The Systems Can be classified into three groupings Mature Systems No new development Redesign Systems Reengineering of old systems New Development Systems New methodologies
Mature Systems The longest surviving generalized systems No new functionality being added – only maintenance SAS macros Interface built with SAS/AF Can be run in batch mode (macro call within SAS program) or via interface PC or UNIX
Mature Systems Generalized Sampling (GSAM) Performs functions related to sample selection for ongoing and ad hoc surveys Stratification, Allocation, Sampling, Frame Maintenance Generalized Estimation System (GES) Performs functions related to weighting and estimation One-stage element and cluster, two-phase element designs Mostly design based, some synthetic, jackknife
Example of GES Interface Screen
Redesigned Systems Generalized systems previously existed that performed similar functions but needed replacement Why? Often due to outdated architecture – mainframe, obsolete software New capabilities in SAS New methodologies couldnt be integrated into previous system
Redesigned Systems Banff (replaces Oracle based GEIS) Performs edit and imputation of numeric continuous data Nine custom built SAS procedures SAS Enterprise Guide based interface (Banff wizards)
Example of Banff SAS Procedure
Example of Banff Wizard
Redesigned Systems New CONFID Performs protection of tabular economic data SAS-based custom built procedures (like Banff) and macros for PC and UNIX Jasper (replacement for ACTR) Performs automated coding of character strings Retains interface-based processing, but may later build SAS-based custom built procedures
New Development Systems Fills in needs for functionality not already available in other generalized systems Replaces customized programs that may already exists
New Development Systems Statistical Macro Extensions (StatMx) New functionality not available in GES / GSAM Multi-stage design estimation, Lavallée-Hidiroglou allocation, extended synthetic estimation SAS macros, no interface Forillon Time Series processing Benchmarking sub-annual series, Raking to retain additivity, trend computations, variance calculations, analytical tools SAS-based procedures and Enterprise Guide "interface
Development, Support and Maintenance Most systems developed and maintained by teams of individuals from two groups Mathematical statisticians (Methodology Branch) Programmers (Informatics Branch) Certain projects are the sole responsibility of one group Moving away from such situations
Development Methodologists review mathematical needs Consultation with potential users, literature searches, research into mathematical methods Programmers review informatics needs Methodologists write specifications Programmers produce new version Methodologists do final certification Documentation is written
Support Team members not directly responsible to implement the systems – assist users Mathematical questions go to methodologists, informatics questions to programmers Amount of support depends upon number of users, complexity of the methods, newness of the system
Maintenance May consist of bug fixes or adding new functionality May be identified by the users or by team members Team members work together to identify if it merits attention and then implement and certify the change
Costs Generalized systems require a very significant outlay of resources Varies significantly from project to project Development of a large project 2-3 methodologists, 2-3 programmers over several years Support and maintenance 1 methodologist, 1 programmer per year
Lessons Learned Reduce Software Diversity Emphasis put on SAS, reduce reliance on different programming languages Easier to move people from one project to another Users only need to know one language Learning SAS is part of staffs early training
Lessons Learned Traditional interfaces are expensive – there are alternatives Interface development can cost as much as the mathematical functionality Changes can be difficult Often does not upgrade as well as rest of the system Most users prefer batch processing for production Can be necessary when tool is used by non-technical personnel SAS Enterprise Guide being successfully used
Lessons Learned People like things they are familiar with Customized SAS procedures (Banff, Forillon) have been favorably received Centralization of resources is beneficial People can take ideas used in one project and apply it to others Examples: Enterprise Guide interfaces, Customized SAS procedures
Lessons Learned Modularity and flexibility are important Some early systems too rigid – successful ones had more flexibility Users only want pieces of certain systems Reduce custom-built systems, put in generalized systems People often borrow other programs and dont understand all the implications Support is a problem when person leaves project However, timing sometimes makes it necessary
Lessons Learned Buy when possible, but dont get cornered No need to build certain components ex. linear programming function Ensure that changing to an alternate component is not difficult Make sure that the support is there Stay up to date on technology Dont wait too long to react to advances Ex. Mainframe PC 1990s, Linux
Possible Future Activities Current Systems Banff – categorical data capabilities New CONFID – add additional functionality Jasper – review of methodology used Forillon – add additional functionality StatMx – advanced variance calculations?
Possible Future Activities General avenues Continue movement towards SAS based procedures and Enterprise Guide interfaces Buy components when possible – free up programming resources for specialized tasks Metadata table-based processor
Conclusions Generalized Systems have become a critical part of business survey processing Due to the investments made in development we have to keep them relevant Moving towards a more standardized look and feel Use what we have learned in the past to help shape the future
Chris Mohl For more Information please contact Pour plus dinformation, veuillez contacter