Professor William Browne and Chris Charlton Centre for Multilevel Modelling Progress with STAT-JR April 2011 – September 2011.

Slides:



Advertisements
Similar presentations
Progress on the software developed under E-STAT Bill Browne and Chris Charlton.
Advertisements

Missing data – issues and extensions For multilevel data we need to impute missing data for variables defined at higher levels We need to have a valid.
Generalised linear mixed models in WinBUGS
Introduction to Monte Carlo Markov chain (MCMC) methods
Other MCMC features in MLwiN and the MLwiN->WinBUGS interface
NCeSS e-Stat quantitative node Prof. William Browne & Prof. Jon Rasbash University of Bristol.
For the e-Stat meeting of 6-7 April 2011 Paul Lambert / DAMES Node inputs 1)Updates on DAMES 2)Bringing DAMES inputs to e-Stat 3)Misc. feedback - Stat-JR.
Basics of Biostatistics for Health Research Session 2 – February 14 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Computer Basics Hit List of Items to Talk About ● What and when to use left, right, middle, double and triple click? What and when to use left, right,
Schedule and Effort. Planning Big Project: Waterfall-ish Style 1.Figure out what the project entails Requirements, architecture, design 2.Figure out dependencies.
The Components There are three main components of inDepth Lite, inDepth and inDepth+ Real Time Component Reporting Package Configuration Tools.
Teaching Statistics Using Stata Software Susan Hailpern BSN MPH MS Department of Epidemiology and Population Health Albert Einstein College of Medicine.
Stat-JR: eBooks Richard Parker. Quick overview To recap… Stat-JR uses templates to perform specific functions on datasets, e.g.: – 1LevelMod fits 1-level.
Latent Growth Curve Modeling In Mplus:
The use of electronic books (eBooks) in social science research Richard Parker* Danius Michaelides† Huanji Yang† Alex Frazer† Luc Moreau† Camille Szmaragd*
Welcome to E-Prime E-Prime refers to the Experimenter’s Prime (best) development studio for the creation of computerized behavioral research. E-Prime is.
Incremental Network Programming for Wireless Sensors NEST Retreat June 3 rd, 2004 Jaein Jeong UC Berkeley, EECS Introduction Background – Mechanisms of.
DEMONSTRATION FOR SIGMA DATA ACQUISITION MODULES Tempatron Ltd Data Measurements Division Darwin Close Reading RG2 0TB UK T : +44 (0) F :
The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.
Programmer Defined Functions Matthew Verleger. Windows It’s estimated that Window’s XP contains 45 million lines of code (and it’s over 10 years old).
Downloading and Installing PAF Insight PAF Insight can be easily downloaded Or can be installed from a CD A license is needed t0 activate the program.
Professor William Browne School of Veterinary Science and Centre for Multilevel Modelling Statistical Software at the Centre for Multilevel Modelling (In.
Introducing the STAT-JR software package Professor William Browne, School of Veterinary Science, University of Bristol.
1 Shawlands Academy Higher Computing Software Development Unit.
What is Sure BDCs? BDC stands for Batch Data Communication and is also known as Batch Input. It is a technique for mass input of data into SAP by simulating.
NA-MIC National Alliance for Medical Image Computing shapeAnalysisMANCOVA_Wizar d Lucile Bompard, Clement Vacher, Beatriz Paniagua, Martin.
JAS3 + AIDA LC Simulations Workshop SLAC 19 th May 2003.
Objectives Understand what MATLAB is and why it is widely used in engineering and science Start the MATLAB program and solve simple problems in the command.
Risk Assessment/Risk Reduction © Risk Assessment/Risk Reduction Risk Assessment Risk Reduction Software.
Key Data Management Tasks in Stata
1 Multiple Imputation : Handling Interactions Michael Spratt.
1 The Software Development Process  Systems analysis  Systems design  Implementation  Testing  Documentation  Evaluation  Maintenance.
*** CONFIDENTIAL *** © Toshiba Corporation 2008 Confidential Creating Report Templates.
R2WinBUGS: Using R for Bayesian Analysis Matthew Russell Rongxia Li 2 November Northeastern Mensurationists Meeting.
I Power Higher Computing Software Development The Software Development Process.
A Comparison of Two MCMC Algorithms for Hierarchical Mixture Models Russell Almond Florida State University College of Education Educational Psychology.
1 What to do before class starts??? Download the sample database from the k: drive to the u: drive or to your flash drive. The database is named “FormBelmont.accdb”
Multilevel Modeling Software Wayne Osgood Crime, Law & Justice Program Department of Sociology.
Introduction Advantages/ disadvantages Code examples Speed Summary Running on the AOD Analysis Platforms 1/11/2007 Andrew Mehta.
Advanced Stata Workshop FHSS Research Support Center.
Software Status  Last Software Workshop u Held at Fermilab just before Christmas. u Completed reconstruction testing: s MICE trackers and KEK tracker.
The Software Development Process
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Introduction What is detector simulation? A detector simulation program must provide the possibility of describing accurately an experimental setup (both.
Comparison of different output options from Stata
COMP3241 E-Commerce Technologies Richard Henson University of Worcester November 2014.
Introduction to Microsoft Excel Macros COE 201- Computer Proficiency.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
1 Getting started with WinBUGS Mei LU Graduate Research Assistant Dept. of Epidemiology, MD Anderson Cancer Center Some material was taken from James and.
WS1-1 ADM , Workshop 1, August 2005 Copyright  2005 MSC.Software Corporation WORKSHOP 1 INTRODUCTION.
Introduction to Eviews Eviews Workshop September 6, :30 p.m.-3:30 p.m.
Henrik Kjems-Nielsen ICES Secretariat InterCatch – the screen guide.
Editing and Debugging Mumps with VistA and the Eclipse IDE Joel L. Ivey, Ph.D. Dept. of Veteran Affairs OI&T, Veterans Health IT Infrastructure & Security.
Copyright © 2010 Pearson Education, Inc. or its affiliate(s). All rights reserved.1 | Assessment & Information 1 Online Testing Administrator Training.
Analysis Tools interface - configuration Wouter Verkerke Wouter Verkerke, NIKHEF 1.
Bursts modelling Using WinBUGS Tim Watson May 2012 :diagnostics/ :transformation/ :investment planning/ :portfolio optimisation/ :investment economics/
Greenfoot.
Overview Modern chip designs have multiple IP components with different process, voltage, temperature sensitivities Optimizing mix to different customer.
User-Written Functions
Linear Regression.
Applied Software Implementation & Testing
How to handle missing data values
School of Mathematical Sciences, University of Nottingham.
An Introduction to StatJR
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
Module Recognition Algorithms
TRICS AUSTRALASIAN DATABASE WORKSHOP
Professor William Browne,
Presentation transcript:

Professor William Browne and Chris Charlton Centre for Multilevel Modelling Progress with STAT-JR April 2011 – September 2011

2 Summary Personnel updates Different forms of STAT-JR Changes to webtest look and feel Faster Estimation – Optimising C code Interoperability changes Changes to Templates New / In progress templates Overall progress / One zip file!!

Personnel Updates A new 12 month RA at Bristol advertised yesterday to start ASAP. Hoping for either a statistician or programmer. Will require an overall no-cost extension to the project. George officially part of project as Co-Investigator. Camille rejoins us in January 2012 from maternity leave. Richard started in July and will work for a 12 month period. Chris & Camille both funded on LEMMA III so funded until September

Different forms of STAT-JR Webtest - the format we have demonstrated up to now. Allows user to investigate 1 template and 1 dataset. A dataset can be output from 1 template and then used by the next. We will come back to this. Cmdtest – this format involves the use of a Python script and allows the template to be called from within a script. Helpful for our test suite and potential for tasks like simulations. E-book – Danius will talk about progress here later. 4

Command Test (cmdtest) Written by Chris straight after last meeting. Currently used with model templates only. Syntax example: m = RunStatJR(template='Regression1', dataset='tutorial', invars = {'y':'normexam', 'x':'cons, standlrt'}, estoptions = {'burnin': '1000', 'iterations': '5000', 'thinning': '1', 'seed': '1'}) Here fits a single level regression with the settings given. 5

Cmdtest example 2 Here is a second example for a template which has interoperability m = RunStatJR(template='1LevelCat', dataset='tutorial', invars = {'y':'normexam','D': 'Normal', 'x': 'cons,standlrt,girl,schgend', 'cons_cat': 'No', 'standlrt_cat': 'No','girl_cat': 'Yes', 'schgend_cat': 'Yes' }, estoptions = {'burnin': '1000', 'iterations': '5000', 'thinning': '1', 'seed': '1', 'Engine' : 'eSTAT', 'EstM' : 'Yes'}) Note can get invars information from the inputs box in webtest. 6

Webtest look and feel Since April this has changed a little: If E-STAT engine is to be used then the Algebra system is called after Next and before Run. Algorithm algebra is displayed in the main window (maybe need this to be optional). The Run and Test Code buttons now give same answers. There is a More button that works with E-STAT, MLwiN, OpenBUGS and JAGS. The change estimation settings button allows the same model to be easily fitted with different software. 7

Example – 2 level Model tutorial dataset 8 Here are the initial inputs and upon pressing Next things happen.

Example – 2 level Model tutorial dataset 9 Here are the initial inputs and upon pressing Next things happen.

Example – Equations and model code 10

Example – Equations and model code 11

Example – Equations and model code 12

Example – algorithm code 13

Algorithm - continued 14 First line is output from Bruce’s algebra system, second line is the result of including known constants and simplifying

Algorithm - continued 15 First line is output from Bruce’s algebra system, second line is the result of including known constants and simplifying

Results of pressing the Run button 16

Results of pressing the Run button 17

Results of pressing the Run button 18

Running for a further 5,000 via more 19 We typed 5,000 in the Extra Iterations box and pressed More. Note the iterations increased to 10,000.

Running for a further 5,000 via more 20 We typed 5,000 in the Extra Iterations box and pressed More. Note the iterations increased to 10,000.

Run vs Test Code buttons The Run button will create sections of C++ code that are compiled and run from Python. The Test Code button (as does the Code button) creates a complete C++ program. This is then compiled and is called as an external process in a similar way to interoperability with other packages. Both methods give identical answers. Note that when the program has finished the screen will update quicker than previously as only the current graph is calculated to save time. 21

Faster Code ? For most templates we are now faster than WinBUGS, OpenBUGS and JAGS though these packages may give better mixing for some models where we use Metropolis. For mixed response models, factor analysis and some other templates we are faster than MLwiN but for others we are not. Speed ups achieved by optimising code via rearranging terms and removing constants from loops amongst other things. Test Code / Run now comparable. Need to produce a test suite of timings. 22

Interoperability BUGS language used by: WinBUGS – original implementation of BUGS for Windows. OpenBUGS – more recent implementation which is open source. JAGS – Just Another Gibbs Sampler developed by Martin Plummer. Doesn’t fit all BUGS models and has some limitations though often faster. E-STAT to some extent. 23

Interoperability – State of play Some templates have lots of interoperability included. Some have only E-STAT and work is needed particularly when E-STAT code diverges from standard WinBUGS code. For many templates there is greater effort required to write interoperability to MLwiN, STATA, R etc. Camille’s original code also created plots from the packages which may not be required. Let’s look at 2-level Mod: 24

WinBUGS (3 chains) 25 Here equation comes up without running algebra. We used change estimate settings to save typing in first 5 boxes.

WinBUGS (3 chains) 26 Here equation comes up without running algebra. We used change estimate settings to save typing in first 5 boxes.

WinBUGS (2) 27 Takes a while as doing 3 times as many iterations. Note sixth multiple chains graph of Brooks-Gelman-Rubin diagnostic.

WinBUGS (2) 28 Takes a while as doing 3 times as many iterations. Note sixth multiple chains graph of Brooks-Gelman-Rubin diagnostic.

Brooks, Gelman,Rubin (BGR) diagnostic 29 MCMC diagnostic based on ANOVA type analysis of set of chains. If convergence is achieved then between chain (green) and within chain (blue) variability should be similar and their ratio (red) should converge to 1.0

OpenBUGS 30 An interesting example showing non-convergence here! Can do More as OpenBUGS saves state of chain on exit.

OpenBUGS 31 An interesting example showing non-convergence here! Can do More as OpenBUGS saves state of chain on exit.

OpenBUGS (2) 32 Here 10k more doesn’t help as we really need to lengthen burnin and that is hard to do without starting again.

OpenBUGS (2) 33 Here 10k more doesn’t help as we really need to lengthen burnin and that is hard to do without starting again.

OpenBUGS with burnin 3000 main run Here the convergence issue goes away and although mixing is not perfect it is better than before

OpenBUGS with burnin 3000 main run Here the convergence issue goes away and although mixing is not perfect it is better than before

JAGS 36 JAGS is multiple chains and runs in the Python window i.e. doesn’t flash up like WinBUGS. Here lack of convergence in 1 chain – note that JAGS is quicker than other 2

JAGS 37 JAGS is multiple chains and runs in the Python window i.e. doesn’t flash up like WinBUGS. Here lack of convergence in 1 chain – note that JAGS is quicker than other 2

MLwiN 38 Model code is replaced by macro code for running MLwiN

MLwiN 39 Model code is replaced by macro code for running MLwiN

MLwiN (2) 40 Currently single chain but in theory could set MLwiN off three times with different starting values to get multiple chains

R - MCMCglmm 41 A fairly short R macro in this case calling the MCMCglmm package. Note behind the scenes the data files are also constructed.

R - MCMCglmm 42 A fairly short R macro in this case calling the MCMCglmm package. Note behind the scenes the data files are also constructed.

R – MCMCglmm 43 MCMCglmm uses fancy block updating method (that E-STAT uses in other templates) so mixing is better.

The download button 44 All files generated by a model fit are stored in a temporary directory and the download button will zip them up into a file called model.zip. Note here the top 2 png files are two parameters we happened to view diagnostics for. The script and data files used can be viewed (big files) whereas packages specific files like the bottom png file can be stored (see overleaf)

R output file extracted from the download 45

Changes to Model Templates Currently in the process of assessing the vast number of model templates. Removed redundant and incomplete templates and so list down to around Of these many need interoperability adding and/or latex model code adding. A short help file would be good for each including examples of their use. Plan is to get all these ready for beta release. Any help greatly appreciated! 46

New / In progress templates Prediction templates Mixed response templates Capture Recapture template Continuous time survival template 47

Prediction templates In MLwiN there has been quite a bit of effort made with regard the customised prediction window that allows ‘out of sample’ prediction. In STAT-JR we have a template 1levelpred that does out of sample prediction for a 1 level model. This template requires the equivalent of the cut function in WinBUGS and in our case we have the zxfd trick. We have modified the template so that non-normal models can do out of sample prediction. 48

Set up for the model 49 Basically we require the explanatory variables for the actual model and the same explanatory variables for the missing cases in additional columns – for now we use the same variables

Set up for the model 50 Basically we require the explanatory variables for the actual model and the same explanatory variables for the missing cases in additional columns – for now we use the same variables

Model code model { for (i in 1:length(votecons)) { votecons[i] ~ dbin(p[i], cons[i]) logit(p[i]) <- cons[i] * beta0 + defense[i] * beta1 + unemp[i] * beta2 + taxes[i] * beta3 + privat[i] * beta4 } for (j in 1:10) { mvotecons[j] ~ dbin(missp[j], cons[j]) logit(missp[j]) <- cons[j] * betazxfd0 + defense[j] * betazxfd1 + unemp[j] * betazxfd2 + taxes[j] * betazxfd3 + privat[j] * betazxfd4 dummy[j] ~ ddummy(mvotecons[j]) } # Priors beta0 ~ dflat() beta1 ~ dflat() beta2 ~ dflat() beta3 ~ dflat() beta4 ~ dflat() } 51

Results 52 All predicted probabilities are between 0 and 1 and looking at the out datafile produced will show that the columns for mvotecons always take values 0 or 1.

Mixed Response templates Cover mixture of continuous, ordered category and unordered category responses via latent variable modelling. Have 3 templates for 1 level, 2 level and N level models. Models currently can only be fitted in REALCOM and then only for 2 levels. They deal with missing data and we have now got the imputation imputing variables on the original scale. Could extend to responses at several levels and wrap up in a ‘super template’ that calls this template as part of the process. 53

Example – setup with jspmix 54

Example – setup with jspmix 55

Mixed Response continued Model fit is really fast <30s versus 45 minutes in REALCOM!! Model uses latent variables for responses and imputation can be done every x iterations to allow several imputed datasets to be formed. Note the reverse rules of construction of latent variables are used to work back to original variables i.e. if LV between specific thresholds or LV1 > LV2 and LV1 > 0 etc. 56

Datasets constructed 57 Here datasets stored in iter0, iter1000 etc.

Datasets constructed 58 Here datasets stored in iter0, iter1000 etc.

Datasets summary 59 The dataset contains the 3 response variables with imputed values in this case there are complete cases so the responses are as given in the original data but the conversion from latent variables is still performed.

Capture/Recapture template Models used in statistical ecology where interest is in population size / stability. Birds are caught (in annual cohorts) then marked and when they are next recaptured is recorded. Product Multinomial models for each cohort are used where the probabilities of capturing a bird in each year can be constructed from the product of series of survival and recapture probabilities. This template uses e-STAT but also WinBUGS and R code supplied by the book by King et al. 60

The dipper dataset 61

The inputs 62

The inputs 63

Model code for E-STAT 64

Model code for E-STAT 65

Results 66 Recapture probability p = Survival probability Phi = These are held constant across years. Other models allow these to vary etc.

Results 67 Recapture probability p = Survival probability Phi = These are held constant across years. Other models allow these to vary etc.

Continuous Time Survival template I had a veterinary epidemiology collaborator from Greece, Pol Kostoulas visit for 6 weeks this summer. He worked on a template for fitting general continuous time survival models. This template requires the ability to deal with censored responses. This has been implemented rather crudely in STAT-JR and the template contains a WinBUGS implementation via the I(,) mechanism. The template is not yet finished. 68

Kidney example from WinBUGS examples 69

Kidney example from WinBUGS examples 70

Kidney inputs 71

Kidney inputs 72

Kidney results 73 Here censoring values for observations are 46,113,5,5,16,54,6 and 8 respectively

Kidney results 74 Here censoring values for observations are 46,113,5,5,16,54,6 and 8 respectively

Overall Progress Distributed document prior to meeting. We have diverged in many ways from original plan. Interested in opinions on whether this is an issue. Last session this afternoon to discuss this further. Chris has investigated portable Python and we now have a (very large) zip file that will allow the user to install all files to run STAT-JR in one go. Other packages to be interoperated with will need to be installed separately. 75

Work for new postdoc Depends on background If more programming based need to write runSTATJR and RtoMLwiN/RtoSTATJR. If more statistical then can: advise on further MCMC algorithms – plan to implement slice-sampling soon. write further templates with Chris. 76

Upcoming events Giving a seminar to MRC Biostatistics Unit in 2 weeks time. Plan to give updated version of Amsterdam talk. Also been invited to talk to Glasgow statistics group and hope to get feedback from them. Next Easter – 2 nd Manchester workshop and MCMC workshop for LEMMA III. Any other publicity – Paul mentioned demonstration in an . 77