For the e-Stat meeting of 6-7 April 2011 Paul Lambert / DAMES Node inputs 1)Updates on DAMES 2)Bringing DAMES inputs to e-Stat 3)Misc. feedback - Stat-JR.

Slides:



Advertisements
Similar presentations
MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Data Archiving.
Advertisements

DDI for the Uninitiated ACCOLEDS /DLI Training: December 2003 Ernie Boyko Statistics Canada Chuck Humphrey University of Alberta.
Linking the DAMES & e-Stat Nodes Paul Lambert, 26 Feb 2010, Bristol, e-Stat review meeting DAMES is the Data Management through e-Social Science research.
ESDS user support materials and resources: how to use them Support Services Royal Statistical Society, London 13 February 2009.
Manipulating data: Deriving variables, handling missing data, and cleaning data - practices, services and standards Paul Lambert (Dept. Applied Social.
Accessing the MCS via the Economic and Social Data Service Jack Kneeshaw and Alasdair Crockett MCS workshop 20 November 2003 ESDS Longitudinal.
GEODE - NeSC workshop, Oct 2006 GEODE: Grid Enabled Occupational Data Environment Paul Lambert and Larry Tan University of Stirling
NCeSS e-Stat quantitative node Prof. William Browne & Prof. Jon Rasbash University of Bristol.
For the e-Stat meeting of 27 Sept 2010 Paul Lambert / DAMES Node inputs.
- ONS Classification Coding Tools Project Occupation Classification Workshop RSS, London, 21 June 2004 Nigel Swier.
IHS: Requirements for Secondary Analysts Jo Wathan ESDS Government University of Manchester.
DAMES - Data Management through e-Social Science 1 DAMES: Data Management through e-Social Science NCeSS Research Node University of Stirling / University.
Workflows for Social Science Ken Turner Computing Science and Mathematics 31st January 2012.
DAMES, 31/JAN/2012, T6 Opportunities and prospects in social research Paul Lambert, 31 st January 2012 Talk to the seminar Data management in the social.
Research Data MANTRA Ҫuna Ekmekcioglu, Robin Rice, Stuart Macdonald JISC Managing Research Data (International) Workshop, Birmingham, March 2011.
Multiple Indicator Cluster Surveys Survey Design Workshop MICS Technical Assistance MICS Survey Design Workshop.
Documentation and Additional Resources Alexander Mack.
MOSS 2007 Document Management Adam McCarthy 1 st April 2009.
1 CIS224 Software Projects: Software Engineering and Research Methods Lecture 11 Brief introduction to the UML Specification (Based on UML Superstructure.
Stat-JR: eBooks Richard Parker. Quick overview To recap… Stat-JR uses templates to perform specific functions on datasets, e.g.: – 1LevelMod fits 1-level.
IASSIST Conference 2006 – Ann Arbor, May Metadata as report and support A case for distinguishing expected from fielded metadata Reto Hadorn S I.
A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Seven good reasons why everyone should be using R.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
GEODE, March 2007 Handling Occupational Information and Introduction to GEODE GEODE – Grid Enabled Occupational.
‘One Sky for Europe’ EUROCONTROL © 2002 European Organisation for the Safety of Air Navigation (EUROCONTROL) Page 1 VALIDATION DATA REPOSITORY Overview.
Understanding the CORBA Model. What is CORBA?  The Common Object Request Broker Architecture (CORBA) allows distributed applications to interoperate.
Scottish Social Survey Network: Master Class 1 Data Analysis with Stata Dr Vernon Gayle and Dr Paul Lambert 23 rd January 2008, University of Stirling.
GEODE, 16 Jan 2007 Handling Occupational Information and Introduction to GEODE GEODE – Grid Enabled Occupational.
GEODE - eSS Manchester, June 2006 Development of a Grid Enabled Occupational Data Environment GEODE – Paper presented.
MICS Data Processing Workshop Multiple Indicator Cluster Surveys Data Processing Workshop Overview of MICS Tools, Templates, Resources, Technical Assistance.
TheDataWeb: a New Framework for Data Cavan Capps, Chief TheDataWeb Applications Branch Data Integration Division Howard Hogan, Director Demographic Programs.
On Tap: Developments in Statistical Data Editing at Statistics New Zealand Paper by Allyson Seyb, Felibel Zabala and Les Cochran Presented by Felibel Zabala.
Current and Future Applications of the Generic Statistical Business Process Model at Statistics Canada Laurie Reedman and Claude Julien May 5, 2010.
Some comments on using research data in the social sciences Paul Lambert, School of Applied Social Science, University of Stirling, 25 March 2013.
Social Statistics ESDS FEASIBILITY STUDY: CHANGING CIRCUMSTANCES DURING CHILDHOOD IAN PLEWIS and PIERRE WALTHERY UNIVERSITY OF MANCHESTER PRESENTATION.
Using Weighted Data Donald Miller Population Research Institute 812 Oswald Tower, December 2008.
Connecting with Connexion: The UGA Experience Presenters: Robin Fay K.R. Roberto Beth Thornton.
United Nations Economic Commission for Europe Statistical Division Mapping Data Production Processes to the GSBPM Steven Vale UNECE
Copyright © 2004, SAS Institute Inc. All rights reserved. SAS Stored Processes An analyst’s perspective Sylvain Tremblay SAS Canada 24 February 2006.
Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies.
GEODE - Durban ISA RC33, July 2006 Utilising a Grid Enabled Occupational Data Environment GEODE – Paper presented.
Developing and applying business process models in practice Statistics Norway Jenny Linnerud and Anne Gro Hustoft.
Comparison of different output options from Stata
Quality Frameworks: Implementation and Impact Notes by Michael Colledge.
John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio.
Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,
NAS101, Appendex A, Page 1 DOCUMENTATION This section briefly describes the MSC.Nastran documentation. A quick overview of these documents is shown in.
PowerPoint Presentation for Dennis, Wixom & Tegarden Systems Analysis and Design Copyright 2001 © John Wiley & Sons, Inc. All rights reserved. Slide 1.
HETUS Pilot Group 8 Privacy procedures and ethical issues Kimberly Fisher, Centre for Time Use Research – co-ordinator External consultant Kai Ludwigs.
The Storage Resource Broker and.
: LSS1 Longitudinal Studies Seminars: Longitudinal Analyses Using STATA Stirling University, Data and Variable Management Paul Lambert.
Online survey analysis tools Paul Lambert, University of Stirling Presentation to the Scottish Civil Society Data Partnership Project (S-CSDP), Webinar.
Tools of data analysis Paul Lambert, University of Stirling Presentation to the Scottish Civil Society Data Partnership Project (S-CSDP), Webinar 2 on.
Linking data resources Paul Lambert, University of Stirling Presentation to the Scottish Civil Society Data Partnership Project (S-CSDP), Webinar 3 on.
Introduction to FFI: Why and how FFI was developed Introduction to FFI: Why and how FFI was developed 04/02/2013.
Forum to improve your experience entering data into SRDR 1 SRDR is being developed and maintained by the Brown EPC under contract with the Agency for Healthcare.
Evolution of storage and data management
Product Training Program
Statistical Information Systems Introducing SIS tool .Stat
ECONOMETRICS ii – spring 2018
Electronic Products Workshop Division of Air Resource Management -DEP
Centre for Multilevel Modelling, University of Bristol

GENERAL VIEW OF KRATOS MULTIPHYSICS
Code Analysis, Repository and Modelling for e-Neuroscience
Introduction to AppInventor
Code Analysis, Repository and Modelling for e-Neuroscience
C++ Object Oriented 1.
Presentation transcript:

For the e-Stat meeting of 6-7 April 2011 Paul Lambert / DAMES Node inputs 1)Updates on DAMES 2)Bringing DAMES inputs to e-Stat 3)Misc. feedback - Stat-JR 4)Outputs / applications

1) Updates on DAMES DAMES Node extended period ends 31 st July 2011 Some ongoing funding in E-Stat & NeISS projects until 2012 Dissemination workshop in Oxford in June 2011 Most funded posts have ended (1 programmer still funded) Our main contributions have been GESDE services for specialist data resources and the data services supporting them (recent paper)recent paper Training events / online materials Social care and e-Health application projects

GESDE: online services for data coordination/organisation Tools for handing variables in social science data Recoding measures; standardisation / harmonisation; Linking; Curating 17/MAR/2010 DIR workshop: Handling Social Science Data 3

The data curation tool 4 The curation tool obtains metadata and supports the storage and organisation of data resources in a more generic way It includes a file storage system allowing users to upload files and access their own and others files

2) Bringing DAMES inputs to e-Stat a)Possible mechanisms for data linking & connecting StatJR with GESDE resources (and/or other files) Filestore system, or manual inputs b)Data-management templates / pre-analysis functionality c)Workflows/e-book inputs documentation for replication

Supporting data linkage The current framework needs manual linkage (e.g. 2 files on pc) Templates could be written to link with fixed file(s); or named files + fixed qualities (e.g.: matching vars with gb91soc90.dta) ** Sample Stata code: global soc occ global new occ sav temp.dta, replace use clear keep if ukempst==0 keep soc90 mcamsis rename soc90 $soc rename mcamsis ${new}_mcamsis sort $soc sav temp2.dta, replace use temp.dta sort $soc merge $soc using temp2.dta keep if _merge==1 | _merge==3 drop _merge

Stat-JR within the DAMES filestore? Could we install Stat-JR on the Stirling unix system and allow it to be invoked, via our portal, on datasets/templates within the portal? – would allow users to add own data & link with our data – (would need programmers; could give team here access to portal) (Uploading through file browser; could potentially also use curation tool) (Actually, Templates, too, could be placed online/shared in this manner)

b) Data-oriented templates – Deterministic file matching routines – E.g. BHPS file matching routine for compiling data across multiple files (cf. PanelWhiz) – Recodes (manual input or external file input) – Aggregating/standardising variables – Templates for weighted models in relevant packages – {Perhaps: responding to leverage/diagnostics} I could do many of these via templates which compile and run Stata/R command files Is there any value in that (cf. just doing them in Stata/R!) Is there value in writing code for the e-Stat engine itself?

E.g.: BHPS panel merge macro (similar to PanelWhiz)

e.g. Recode examples (shown before) Stata syntax: recode var1 1/5=1 6/10=2 *=3, generate(var2) SPSS syntax: recode var1 (1 thru 5=1) (6 thru 10=2) (else=3) /into=var2. Data matrix format: -> Manual entry available in StatJR, but doesnt seem to preserve metadata?

c) Workflows / e-books Two main objectives: Documentation for replication..I think syntax for Stat-JR would help here.. Sensitivity analysis across multiple measures / models / data permutations Data storage/access Linking different variables Compiling results across many models

Idea of auto-compiled user notes? Full account of models constructed (What was that?) – Of benefit to novice and advanced practitioners – Potentially a part of the e-notebook, but could be a linked online guide (static) – E-Stat commands to provide documentation for replication – Terminologies used for the model/other user notes – Software equivalents or near equivalents (including estimator specs) – Algebraic expression and model abstract ?possible tools for storing/compiling multiple model results – (mentioned previously, cf. est table in Stata)

Any missing components of model description user notes? (slight modification from Sept 2010) 1) E-Stat model syntax: model{ for (i in 1:length(y36)) { y36[i] ~ dnorm(mu[i], tau) mu[i] <- cons[i] * beta0 + y8[i] * beta1 } …. 2) E-Stat model: Template1Lev = Linear regression using MCMC 3) Model abstract/background information: E.g. something like: This model is suitable for a single outcome measure with a continuous distribution. It is comparable to the widely used OLS regression model, and usually leads to identical results.. [etc]. See … for further description. 4) Algebraic representation: [Image from Latex code] 5)Specification of the model in other popular packages: BUGS syntax: [input here] MLwiN syntax: [input here] R: [input here] Stata: MCMC estimation routines not available 6) Data copy [Data after model, e.g. including new variables] 7) Outputs from model Log file; images 8)Variables summary [Summary stats]

Est store demo here 14

3) Some feedback on Stat-JR My own current thoughts {see sep. review notes file} – Look and feel – a syntactical record of the model specification..? – back and forward options; add # categories to summary; Pre- specified default settings (e.g. burn-in, cons, etc) – Make links to users datasets easy – data entry template(?) – Export data as part of output in popular formats – handling large numbers of data files & folders – any way to tie in metadata about the records, e.g. variable labels?

Dataset metadata in StatJR? Comparable options for variable labels, value labels, missing data are widely used/desirable Effort to bring these in could help Also relates to having data open in other package at same time Could a functional form tool be incorporated? For every dataset associates variables with a basic functional form, i.e. metric, nominal or ordinal, that user can set/change Impacts on data options: e.g. separate summary window to summarise categorical variables such as frequency table/bar chart; options to derive dummy variables and recode values for categorical variables (some of this is similar to whats available on NESSTAR) Use this data in some models options (or pref. just let the user decide..)?

Social science users Ive shown Alpha version to a couple of colleagues {Comments notes doc from Chris Playford} Impressed by the range of options and potential for software comparisons Frightened by the specification options/terms; statistical outputs; point and click format; and the current installation requirements The most common critical comment has been why? – as in [Stata] already does everything I need and/or I bet this doesnt work with large and complex data!! Think about niche – sophisticated users can already use software, whilst basic users dont want advanced options? I suspect that training / pedagogical value is relevant here

4) Outputs / Applications Applications Id most like to test...: – Evaluating different socio-economic measures for model performance (cf. GESDE services) – Large scale data compilation/analysis To highlight some output opportunities: – LWS/E-Stat/DAMES (NCRM/DRS) collaborative research seminar +book proposal, Sept/Oct 2011 on Modelling key variables in social science research – Social stratification research conference, Sept 2011 – Training support - an installation package plus good illustrative template for use at workshops, e.g. Essex Summer School course, July 2011?