Professor William Browne School of Veterinary Science and Centre for Multilevel Modelling Statistical Software at the Centre for Multilevel Modelling (In.

Slides:



Advertisements
Similar presentations
Progress on the software developed under E-STAT Bill Browne and Chris Charlton.
Advertisements

Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
MCMC for Poisson response models
Generalised linear mixed models in WinBUGS
Lecture 9 Model Comparison using MCMC and further models.
Introduction to Monte Carlo Markov chain (MCMC) methods
Other MCMC features in MLwiN and the MLwiN->WinBUGS interface
Lecture 23 Spatial Modelling 2 : Multiple membership and CAR models for spatial data.
MCMC estimation in MlwiN
NCeSS e-Stat quantitative node Prof. William Browne & Prof. Jon Rasbash University of Bristol.
For the e-Stat meeting of 6-7 April 2011 Paul Lambert / DAMES Node inputs 1)Updates on DAMES 2)Bringing DAMES inputs to e-Stat 3)Misc. feedback - Stat-JR.
KompoZer. This is what KompoZer will look like with a blank document open. As you can see, there are a lot of icons for beginning users. But don't be.
Stat-JR: eBooks Richard Parker. Quick overview To recap… Stat-JR uses templates to perform specific functions on datasets, e.g.: – 1LevelMod fits 1-level.
1 A Balanced Introduction to Computer Science, 2/E David Reed, Creighton University ©2008 Pearson Prentice Hall ISBN Chapter 17 JavaScript.
Professor William Browne and Chris Charlton Centre for Multilevel Modelling Progress with STAT-JR April 2011 – September 2011.
The use of electronic books (eBooks) in social science research Richard Parker* Danius Michaelides† Huanji Yang† Alex Frazer† Luc Moreau† Camille Szmaragd*
Copyright 2004 Monash University IMS5401 Web-based Systems Development Topic 2: Elements of the Web (g) Interactivity.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
Data Analytics and Dynamic Languages Lee E. Edlefsen, Ph.D. VP of Engineering 1.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
Seven good reasons why everyone should be using R.
Everything you ever wanted to know about BUGS, R2winBUGS, and Adaptive Rejection Sampling A Presentation by Keith Betts.
EViews. Agenda Introduction EViews files and data Examining the data Estimating equations.
1 The World Wide Web. 2  Web Fundamentals  Pages are defined by the Hypertext Markup Language (HTML) and contain text, graphics, audio, video and software.
William Browne §, Laura Green*, Graham Medley* and Camille Szmaragd § §University of Bristol *University of Warwick Using Discrete time survival models.
Software Development Unit 6.
Department of Geography, Florida State University
The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.
Web Content Management Systems. Lecture Contents Web Content Management Systems Non-technical users manage content Workflow management system Different.
LEMMA: Learning Environment in Multilevel Modelling and Applications Fiona Steele London School of Economics & Political Science Director NCRM LEMMA node,
Multilevel Modeling Using HLM and MLwiN Xiao Chen UCLA Academic Technology Services.
RMG Study Group Session I: Git, Sphinx, webRMG Connie Gao 9/20/
Introducing the STAT-JR software package Professor William Browne, School of Veterinary Science, University of Bristol.
Unit 1 – Improving Productivity Tyler Dunn Instructions ~ 100 words per box.
Computational Methods of Scientific Programming Lecturers Thomas A Herring, Room A, Chris Hill, Room ,
London April 2005 London April 2005 Creating Eyeblaster Ads The Rich Media Platform The Rich Media Platform Eyeblaster.
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Objectives Understand what MATLAB is and why it is widely used in engineering and science Start the MATLAB program and solve simple problems in the command.
The Cluster Computing Project Robert L. Tureman Paul D. Camp Community College.
Overview of the LEMMA VLE Jon Rasbash, 2008 Centre for Multilevel Modelling.
Introduction to MATLAB 7 Engineering 161 Engineering Practices II Joe Mixsell Spring 2010.
A Comparison of Two MCMC Algorithms for Hierarchical Mixture Models Russell Almond Florida State University College of Education Educational Psychology.
Multilevel Modeling Software Wayne Osgood Crime, Law & Justice Program Department of Sociology.
Creating Graphical User Interfaces (GUI’s) with MATLAB By Jeffrey A. Webb OSU Gateway Coalition Member.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
Robust GW summary statistics & robust GW regression are used to investigate a freshwater acidification data set. Results show that data relationships can.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
The Software Development Process
A Balanced Introduction to Computer Science, 3/E David Reed, Creighton University ©2011 Pearson Prentice Hall ISBN Chapter 17 JavaScript.
Introduction to MATLAB 7 MATLAB Programming for Engineer Hassan Migdadi Spring 2013.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Chapter 3 Response Charts.
Software Engineering 2004 Jyrki Nummenmaa 1 BACKGROUND There is no way to generally test programs exhaustively (that is, going through all execution.
Introduction to MATLAB 7 Engineering 161 Engineering Practices II Joe Mixsell Spring 2012.
Mantid Stakeholder Review Nick Draper 01/11/2007.
Intermediate 2 Computing Unit 2 - Software Development.
Topic 4 - Database Design Unit 1 – Database Analysis and Design Advanced Higher Information Systems St Kentigern’s Academy.
Version Control and SVN ECE 297. Why Do We Need Version Control?
Chapter 8: Using Basic Statistical Procedures “33⅓% of the mice used in the experiment were cured by the test drug; 33⅓% of the test population were unaffected.
1 Getting started with WinBUGS Mei LU Graduate Research Assistant Dept. of Epidemiology, MD Anderson Cancer Center Some material was taken from James and.
Bursts modelling Using WinBUGS Tim Watson May 2012 :diagnostics/ :transformation/ :investment planning/ :portfolio optimisation/ :investment economics/
A quick guide to other statistical software
Matlab Training Session 4: Control, Flow and Functions
How to handle missing data values
School of Mathematical Sciences, University of Nottingham.
Centre for Multilevel Modelling, University of Bristol
An Introduction to StatJR
Tonga Institute of Higher Education IT 141: Information Systems
Tonga Institute of Higher Education IT 141: Information Systems
Chapter 17 JavaScript Arrays
Presentation transcript:

Professor William Browne School of Veterinary Science and Centre for Multilevel Modelling Statistical Software at the Centre for Multilevel Modelling (In part a tribute to the work of Jon Rasbash)

Acknowledgements Colleagues at CMM past and present (Harvey, Fiona, Min, Jon, Pan, Chris, George, Becky, Sophie, Emma, Hilary, Amy & Lisa, Mike, Geoff, Ian, Mary, Paul, Toni, Mousa, Camille, Richard, Mark, Kelvyn and Liz) Collaborators on the e-STAT project (Dave, Luc, Danius, Paul, Ian, Mark, Mac) Coders of STAT-JR (Jon, Chris, Danius, Camille, Toni and Bruce) ESRC for funding so much of my work! 2

3 Summary The distant past The past MLwiN The move (from London to Bristol) The present MLPowSim / REALCOM The future STAT-JR

So where did it all begin? Q: How does one start writing a statistics package? A: Build it on top of an existing software package! The NANOStat package developed in 1981 by Prof. Mike Healy was a Minitab clone written in RATFOR (Fortran) as something to do with his new computer! Mike worked at LSHTM but also at Rothamsted with Fisher! Recounted writing programs to invert matrices. 4

NANOStat (1981) Data represented as a set of columns Commands took columns, numbers and boxes as arguments Commands strung together so output from 1 command acts as input to another (became MLwiN macro language) Software consisted of functions to create columns and several hundred commands. Still sits under MLwiN today! 5

ML2, ML3 and MLN Jon worked for Mike at LSHTM and then transferred to Harvey at Institute of Education. Harvey wrote his IGLS paper in 1986 ML2 came out in 1988 (written in Fortran) ML3 followed in 1990 (converted to C code) MLN followed in 1995 (with new N level algorithms written in C++) Algorithms very fast for hierarchical models – good matrix routines for block diagonal structures. Bob Prosser completed the team. 6

Moving towards MLwiN - Jon MLwiN was another step change, this time to a Windows based program. ML2/3/N worked in a sequential way with each command performing an action and then the next etc.… Graphics were limited. MLwiN in contrast consists of a GUI front-end (written in Visual Basic) and all the objects are dynamic i.e. changing one window should change the contents of other windows. Jon worked with Bruce Cameron setting up this architecture and Bruce is still involved in STAT-JR. 7

Moving towards MLwiN – Bill in Bath I did my MSc. (Comp Stats) in 1995 and used S- Plus and C and BUGS to fit models for my dissertation. I then did my PhD. (Stats) between 1995 and 1998 supervised by David Draper. My PhD. included much comparison work between methods of fitting multilevel models (Bayesian & frequentist) I did this using both BUGS and MLwiN An artefact of the process was writing (limited) MCMC functionality into MLwiN! 8

MLwiN in 1998 What was good / What was new? Interactive Equations window Interactive Graphics Interactive Trajectories window Choice of estimation methods Adaptive Metropolis algorithms 9

Interactive Equations Window 10 This is the 2011 version but the 1998 version had many of the same features. Could toggle between estimates and symbols. Numbers changing from blue to green on convergence. Clicking on the window would allow model construction – see X variable window.

Interactive Graphics 11 Graphics can instantly update as column content changes. Highlighting of data points passed between graphs. Highlighting can be done at different levels of the data structure

Interactive Trajectories Window 12 Trajectories plot particularly useful for MCMC estimation First software (to my knowledge) to have dynamic chains although they appeared in WinBUGS soon after. Used to joke about taking a coffee break while MCMC ran and chains make you look busy! Click on graph to get diagnostics

Sixway diagnostic plot 13 Plot of chain plus kernel density – code from my MSc! Kernel would show informative priors as well. Time series plots and originally MCSE and summary in bottom 2 boxes. Expanded to 7 boxes later.

MCMC functionality In 1998 we had implemented Normal, Binomial and Poisson N level models by shoe horning my stand-alone C code into MLwiN in a couple of manic visits to London. For Normal responses used Gibbs sampling, for others used a mixture of Gibbs and random walk Metropolis including my ad-hoc adapting scheme that persists today. BUGS used AR sampling instead of MH at the time. Code was very model specific and optimised so far faster than BUGS as it still is today! 14

Changes from 1989 to Normal response, hierarchical, IGLS algorithm 1999 Normal, Poisson, Binomial and Multinomial responses Hierarchical, cross-classified and multiple membership (in IGLS see Rasbash and Goldstein) IGLS, bootstrapping and MCMC. 15

Move to the CMM at the IOE in 1998 CMM in 1998 : Harvey, Jon, Min and Pan Ian Plewis and Geoff Woodhouse also at IOE Chris Charlton did summer work in 1998 and 1999 on many of the new windows in the package. I joined in October 1998 (and lived in Jon’s house for a few months!) Fiona Steele joined a couple of years later from LSE when Geoff Woodhouse left 16

Changes to MLwiN in my time at IOE ( ) Better handling of missing data, categorical variables. Lots more data manipulation windows. MCMC functionality for: XC/MM/Spatial CAR models (Browne et al. 2001a) Multilevel Factor Analysis (Goldstein & Browne 2002,2005) Measurement Error (Browne et al 2001b) Multivariate/Mixed responses/Complex variation (Browne 2006) 17

Happy times at a Centre away day and croquet Harvey was taking the photo (includes Mike Healy)

Documentation Part of the success of MLwiN is due to the quality and quantity of the accompanying documentation: Users Guide in 1998 (Goldstein et al. ~130 pages) By 2003 Users Guide (Rasbash et al. ~ 250 pages) & MCMC Guide (Browne et al.) & Command guide Today: User’s Guide (314 pages) Supplement (114 pages) MCMC Guide (439 pages) Command guide (114 pages) Huge effort by colleagues in centre on web-based training materials. 19

Moves are afoot! In 2003 I moved from London to Nottingham to take a lecturing position. The grant that would have continued funding me produced the start of the REALCOM software In 2005 as Harvey retired the centre began the move to Bristol – first Jon then Fiona. The first LEMMA project began in 2005 and Chris Charlton (re)joined centre. Chris gradually became main coder of MLwiN as Jon took a more managerial role directing centre. 20

REALCOM Harvey extended our earlier MCMC work to create the suite of REALCOM functions to cover: (further) Measurement error modelling Structural equations models (extending the factor analysis work) (further) Mixed responses and imputation (REALCOM – IMPUTE) – with input from James and Mike at LSHTM Harvey wrote REALCOM in MATLAB which was easier to code in but slower. Jon and Chris put a front end on the software. 21

Grants at Nottingham! In 2006 an ESRC grant to look at power calculations in multilevel models (along with MCMC algorithm improvements) began at Nottingham. This employed Mousa Golalizadeh. I also was a sponsor of a Wellcome grant on Bayesian modelling of vet (mastitis) data with Martin Green. In late 2006 I was interviewed for a chair at Bristol vet school and got a chance to rejoin CMM when I arrived in April 2007 at Bristol. 22

MLPowSim The power grant resulted in MLPowSim – a program to perform power calculations for multilevel models Note that Cora Maas (with Joop Hox) did lots of good work in this area. Program creates MLwiN macro files or R scripts to run via simulation power calculations. Program developed with two postdocs Mousa and Richard Parker and work continues with new PhD student Toni Price. 23

MLPowSim Software freely available from

MLwiN (2003-) We still develop MLwiN: The Users guide supplement shows Jon & Chris’s work on improved model specification, model comparison, customised predictions and auto- correlated error modelling. The last 5 chapters of the MCMC guide shows my work on MCMC algorithm improvements (SMVN, SMCMC, hierarchical centring, parameter expansion and orthogonal parameterisations) see also Browne et al. (2009) 25

MLwiN – A picture of where we have come 26 Picture courtesy of Becky Pillinger.

Impact – “It’s papers not programs stupid!” MLwiN has over 2,000 citations (since 1998) which represent only a small proportion of actual use in published work. The spread of topics includes: 466 in public health, 150 in statistics/probability 135 in education, 101 in social science, biomed. 86 in health care, 86 in psychiatry, 78 in medicine, 77 in vet science, 63 zoology, 58 ecology, 57 in sport science, 55 behavioural science, several hundred in various forms of psychology.

Jon’s big vision Jon had been thinking hard on where the software went next. The frequentist IGLS algorithm was hard to extend further. WinBUGS showed that MCMC as an algorithm could be extended easily but the difficulty in MLwiN was in extending my MCMC code and possibly relying on the personnel! The big vision was an all-singing all-dancing system where expert users could add functionality easily and which interoperates with other software. Bruce was developing an underpinning algebra system. The ESRC LEMMA II and E-STAT grants would enable this to be achieved 28

2010 – Annus Horribilis Sadly Jon’s vision didn’t have a happy ending for him following his tragic death last year was a pretty rotten year for CMM as a result although we have now picked up most of the pieces. Fiona and I (with Harvey and other colleagues support) have taken on Jon’s role to lead the centre and the announcement of the successful bid for LEMMA III (led by Fiona) in 2011, looking at more longitudinal modelling seems like a rebirth. One ‘phoenix from the flames’ is the STAT-JR package currently developing which is our take on Jon’s vision. 29

The E-STAT project and STAT-JR STAT-JR developed jointly by LEMMA II and E-STAT Consists of a set of components many of which we have an alpha version for which contains: Templates for model fitting, data manipulation, input and output controlled via a web browser interface. Currently model fitting for 90% of the models that MLwiN can fit in MCMC plus some it can’t including greatly sped up REALCOM templates Some interoperability with MLwiN, WinBUGS, R, Stata and SPSS (written by Camille) Also runmlwin in Stata -> MLwiN written by George 30

STAT-JR Jon identified 3 groups of users: Novice practitioners who want to use statistical software that is user friendly and maybe tailored to their discipline Advanced practitioners who are the experts in their fields and also want to develop tools for the novice practitioners Algorithm Developers who want their algorithms used by practitioners. See EStat/news.shtml for details of Advanced User’s guide for STAT-JRhttp:// EStat/news.shtml 31

STAT-JR component based approach 32 Below is an early diagram of how we envisioned the system. Here you will see boxes representing components some of which are built into the STAT-JR system. The system is written in Python with currently a VB.net algebra processing system. A team of coders (currently me, Chris, Danius, Camille and Bruce) work together on the system.

Templates Consist of a set of code sections for advanced users to write. For a model template it consists of at least: an invars method which specifies inputs and types An outbug method that creates (BUGS like) model code for the algebra system An (optional) outlatex method can be used for outputting LaTeX code for the model. Other optional functions required for more complex templates 33

Regression 1 Example from EStat.Templating import * from mako.template import Template as MakoTemplate import re class Regression1(Template): 'A model template for fitting 1 level Normal multiple regression model in E-STAT only. To be used in documentation.' tags = [ 'model', '1-Level' ] invars = ''' y = DataVector('response: ') tau = ParamScalar() sigma = ParamScalar() x = DataMatrix('explanatory variables: ') beta = ParamVector() beta.ncols = len(x) ''' outbug = ''' model{ for (i in 1:length(${y})) { ${y}[i] ~ dnorm(mu[i], tau) mu[i] <- ${mmult(x, 'beta', 'i')} } # Priors % for i in range(0, x.ncols()): beta${i} ~ dflat() % endfor tau ~ dgamma( , ) sigma <- 1 / sqrt(tau) } ''' outlatex = r''' \begin{aligned} \mbox{${y}}_i & \sim \mbox{N}(\mu_i, \sigma^2) \\ \mu_i & = ${mmulttex(x, r'\beta', 'i')} \\ %for i in range(0, len(x)): \beta_${i} & \propto 1 \\ %endfor \tau & \sim \Gamma (0.001,0.001) \\ \sigma^2 & = 1 / \tau \end{aligned} ''' 34

An example of STAT-JR – setting up a model 35

Equations for model and model code 36 Note Equations use MATHJAX and so underlying LaTeX can be copied and paste. The model code is based around the WinBUGS language with some variation. This is a more complex template for 2 level models.

Model code in detail model { for (i in 1:length(normexam)) { normexam[i] ~ dnorm(mu[i], tau) mu[i] <- cons[i] * beta0 + standlrt[i] * beta1 + u[school[i]] * cons[i] } for (j in 1:length(u)) { u[j] ~ dnorm(0, tau_u) } # Priors beta0 ~ dflat() beta1 ~ dflat() tau ~ dgamma( , ) tau_u ~ dgamma( , ) } 37 For this template the code is, aside from the length function, standard WinBUGS model code.

Bruce’s (Demo) algebra system step for parameter u 38

Output of generated C++ code 39 The package can output C++ code that can then be taken away by software developers and modified.

Output from the E-STAT engine 40 Here the six-way plot functionality is in part taken over to STAT-JR after the model has run. In fact graphs for all parameters are calculated and stored as picture files so can be easily viewed quickly.

Interoperability with WinBUGS 41 Interoperability in the user interface is obtained via a few extra inputs. In fact in the template code user written functions are required for all packages apart from WinBUGS. The transfer of data between packages is however generic.

Output from WinBUGS with multiple chains 42 STAT-JR generates appropriate files and then fires up WinBUGS. Multiple Chains are superimposed in the sixway plot output.

Interoperability with R 43 R interoperability for a 2-level model can use the glmer or the mcmcglmm functions

Output from R 44 Currently the R output is displayed but other files including the scripts are stored in a directory and will be made available when the e-Book work is done.

Other templates - TemplateXYlabel 45 Can make use of Python’s fantastic graphics! Or potentially other package graphics via interoperability.

The E-STAT project – still to come We have lots of work to do: Parallel processing. E-books. Optimising code generation. Improving algebra system. Suites of templates for missing data and social network models. Interoperability with SAS and hooking up more templates for other packages. 46