C Q e S S 1 E-Science and Statistical Modelling in Social Research Daniel Grose Audrienne Cutajar Bezzina CQeSS University of Lancaster.

Slides:



Advertisements
Similar presentations
Using the National Document Assembly Server Marc Lauritsen Bart Earle Alan Soudakoff Capstone Practice Systems December 12, 2008.
Advertisements

CONCEPTUAL WEB-BASED FRAMEWORK IN AN INTERACTIVE VIRTUAL ENVIRONMENT FOR DISTANCE LEARNING Amal Oraifige, Graham Oakes, Anthony Felton, David Heesom, Kevin.
Use Watch folders to automatically add PDFs to Mendeley Desktop.
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Accessing the MCS via the Economic and Social Data Service Jack Kneeshaw and Alasdair Crockett MCS workshop 20 November 2003 ESDS Longitudinal.
R e D R e S S 1 Resource Discovery in e-Social Science (ReDReSS) Rob Crouchley.
R e D R e S S Resource Discovery for Researchers in e-Social Science ReDReSS A Joint Application from Lancaster and Daresbury (7 social scientists, 6 computer/computational.
A Guide to INCTR s Portal Enhancing international communication in the service of global cancer control.
The choice between fixed and random effects models: some considerations for educational research Claire Crawford with Paul Clarke, Fiona Steele & Anna.
Longitudinal and Multilevel Methods for Models with Discrete Outcomes with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David.
Stat-JR: eBooks Richard Parker. Quick overview To recap… Stat-JR uses templates to perform specific functions on datasets, e.g.: – 1LevelMod fits 1-level.
Specifying an Econometric Equation and Specification Error
2 July, Sakai VRE Portal Demonstrator Mark Baker School of Systems Engineering, University of Reading Tel:
1 Scottish Social Survey Network: Master Class 1 Data Analysis with Stata Dr Vernon Gayle and Dr Paul Lambert 23 rd January 2008, University of Stirling.
R e D R e S S Case Study in e-Social Science Building Collaborative e-Research Environments JISC Consultation Workshops, 23/2/04 and 5/3/04 Rob Allan (CCLRC.
Plannes security for items, variables and applications NEPS User Rights Management.
The World’s Fastest Crash Course in Statistics Or, What You Need to Know to Answer Your Research Question 13 November 2006.
Quantitative Methods and Computer Applications in the Historical and Social Sciences Roman Studer Nuffield College
Clustered or Multilevel Data
Geography 465 Overview Geoprocessing in ArcGIS. MODELING Geoprocessing as modeling.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Internal Consistency Reliability Analysis PowerPoint.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.
Objectives of Multiple Regression
Estimation of Demand Prof. Ravikesh Srivastava Lecture-8.
E-Social Science What is e-Science? E-Science and e-Social Science E-Social Science and Longitudinal Data Examples of the Computational Problems we Currently.
Tutorial 10 Adding Spry Elements and Database Functionality Dreamweaver CS3 Tutorial 101.
South West Grid for Learning Educational Portal Awareness Event.
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington.
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 8: Quantitative.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
OHTO -99 SOFTWARE ENGINEERING “SOFTWARE PRODUCT QUALITY” Today: - Software quality - Quality Components - ”Good” software properties.
Data Analysis – Statistical Issues Bernd Genser, PhD Instituto de Saúde Coletiva, Universidade Federal da Bahia, Salvador
(Spring 2015) Instructor: Craig Duckett Lecture 10: Tuesday, May 12, 2015 Mere Mortals Chap. 7 Summary, Team Work Time 1.
Faculty of Computer & Information
Social Statistics ESDS FEASIBILITY STUDY: CHANGING CIRCUMSTANCES DURING CHILDHOOD IAN PLEWIS and PIERRE WALTHERY UNIVERSITY OF MANCHESTER PRESENTATION.
Topic (vi): New and Emerging Methods Topic organizer: Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Oslo, Norway, September 2012.
1 of 27 How to invest in Information for Development An Introduction Introduction This question is the focus of our examination of the information management.
Multilevel Modeling Software Wayne Osgood Crime, Law & Justice Program Department of Sociology.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
My Workspace ELearning in Sakai Randy Graff, PhD HSC Training.
The World Wide Web is a great place to find more information about a topic. But there are a lot of sites out there—some are good and some are not so good.
Experiment Basics: Variables Psych 231: Research Methods in Psychology.
The Choice Between Fixed and Random Effects Models: Some Considerations For Educational Research Clarke, Crawford, Steele and Vignoles and funding from.
Market research for a start-up. LEARNING OUTCOMES By the end of this lesson I will be able to: –Define and explain market research –Distinguish between.
Chapter 4: Introduction to Predictive Modeling: Regressions
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
3 Copyright © 2004, Oracle. All rights reserved. Working in the Forms Developer Environment.
Working Memory and Learning Underlying Website Structure
More complex event history analysis. Start of Study End of Study 0 t1 0 = Unemployed; 1 = Working UNEMPLOYMENT AND RETURNING TO WORK STUDY Spell or Episode.
Comparison of different output options from Stata
The Sakai VRE Demonstrator Rob Crouchley, Adrian Fish and Miguel Gonzalez E-Science Centre, and Collaboratory for Quantitative e-Social Science, University.
Analysis of Experiments
Tutorial I: Missing Value Analysis
Oman College of Management and Technology Course – MM Topic 7 Production and Distribution of Multimedia Titles CS/MIS Department.
An Alternative Package for Estimating Multivariate Generalised Linear Mixed Models in R Damon Berridge, Robert Crouchley & Daniel Grose, Lancaster University,
Unit 11: Use observation, assessment and planning
Organisations – Groups and Teams
R e D R e S S Portals, Desktop Applications and Distributed Services for e-Research - Updated vision - Rob Crouchley and Rob Allan With thanks to Chuck.
Quantitative research Meeting 7. Research method is the most concrete and specific part of the proposal.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Survey Design: Some Implications for.
Computacion Inteligente Least-Square Methods for System Identification.
INFERENCE FOR BIG DATA Mike Daniels The University of Texas at Austin Department of Statistics & Data Sciences Department of Integrative Biology.
Human Computer Interaction Lecture 21 User Support
Using LORO A presentation created by Anna Calvi
QMT 3033 ECONOMETRICS QMT 3033 ECONOMETRIC.
Computational Reasoning in High School Science and Math
From GLM to HLM Working with Continuous Outcomes
Presentation transcript:

C Q e S S 1 E-Science and Statistical Modelling in Social Research Daniel Grose Audrienne Cutajar Bezzina CQeSS University of Lancaster

C Q e S S 2 Contents Some Background on Statistical Methods and Social Research; Disentangling Complexity: Educational attainment, truancy and PT work (NCDS) ReDReSS RELOAD and CopperCore Demo of SAKAI Questions.

C Q e S S 3 Some Background on Statistical Methods and Social Research

C Q e S S 4 Objectives of Social Science Research To develop evidence based substantive theory. We want to know “what determines what”, e.g. the (wage) returns to education; To explore the consequences of policy changes on individual behaviour, e.g. the impact of increasing the staying on rate at school on educational attainment & wages;

C Q e S S 5 Objectives of Social Science Research Randomised experiments offer the most powerful tool to meet these objectives, but outside of psychology, they are infeasible, unethical or flawed (e.g. for instance we can not allocate pupils to different levels of education); Social scientists must therefore rely on observational data from longitudinal and other surveys e.g. YCS, NCDS, BHPS, this raises complications.

C Q e S S 6 Complication 1. Cluster Effects (CE) Most large scale surveys use multi-stage sample designs to obtain 'representative' samples; this procedure often creates cluster effects, e.g. BHPS (households), YCS (schools); Pupils in the same class are often more behaviourally alike that pupils in different classes (even in the same school) some non nested cluster structures can also be present e.g. siblings (children of the same family) at different schools;

C Q e S S 7 Complication 1. Cluster Effects (CE) Procedures have been developed to take cluster effects into account by means of shared random effects in the model - MLwiN, Stata (Gllamm ); The estimation of non-identity link and non nested CE models, e.g. probit, can be computationally demanding;

C Q e S S 8 Complication 2. Measurement Errors (ME) Ignoring ME can seriously mislead the quantification of the link between explanatory and response variables; In observational studies, it is rarely possible to measure all relevant covariates accurately, e.g. age, educational attainment; ME in one covariate can bias the association between other covariates and the response variable, even if those other covariates are measured without error;

C Q e S S 9 Complication 2. Measurement Errors (ME) Also some important determinants of behaviour are either not measured (i.e. omitted) or are unmeasurable (e.g. motivation); Repeated measures and longitudinal data provide the opportunity to deal with ME in explanatory variables, this adds to the computational demands of the analysis.

C Q e S S 10 Complication 3. Missing Data, Dropout and Selection All of the major data sets available to the British social science community, (e.g. YCS, BHPS and NCDS), contain missing data and dropout; This creates bias in the data; We need to model, as realistically as possible, the process by which the observed subjects have been retained in the sample, otherwise we will not know how much bias is present in our results; Some sample designs create selection effects, e.g. by using a subset of locations, or oversampling the poor; These add to the computational demands of the analysis.

C Q e S S 11 Complication 4. Parametric Assumptions Our statistical tools are assumption rich: –Parametric linear predictors, –Parametric link functions and error structures; What if the assumed parametric relationships do not hold, (no gaussian errors?) We need more robust alternatives; BUT - Nonparametric statistical models are usually computationally intensive.

C Q e S S 12 Complication 5. Endogenous effects The curse of endogenous effects, everything seems to depend on everything else; We need multiprocess models (simultaneous equations) to disentangle this complexity, adds to computation;

C Q e S S 13 Disentangling complexity with existing tools: an example These are the kind of examples that got me interested in e- Science. As we start to more fully acknowledge the stochastic complexity of social processes our results will change.

C Q e S S 14 Example 1: Allowing for Cluster effects Stata, e.g. dprobit with the cluster option ( MlWin ( AMl, SAS What happens if we have more than one response, training and promotion? Standard software can’t do it. What happens if we have previous outcomes in the model? standard software can’t do it.

C Q e S S 15 Example 2: Allowing for Endogenous effects Simultaneous equation systems Commands in Stata Commands in Aml

C Q e S S 16 Nesstar allows 66 major datasets to be explored online( Only uses one data set at a time; Has very limited facilities for sub-setting and none for fusing; Restricted statistical facilities, e.g. descriptive analysis, linear regression; No facilities for handling missing data. Some existing web based tools

C Q e S S 17 Joining Up the Analysis Cycle Main ESDS Data Sets Select Data Set and Appropriate Variables: TTWA Data, NOMIS Merge Files: Add Variables Working Data Contextual Data Results

C Q e S S 18 Portals make all our e-tools easier to use Portals provide a framework to deploy our e- tools (aka rectangles), they focus on how the user wants to arrange these “rectangles”; The portal allows component integration, the goal is for the tools to work together closely and seem to really be parts of a larger “tool”;

C Q e S S 19 SAKAI Provides our VRE Portal Sakai = Collaboration & Research/Learning Environment Portal Res 1 Discussion, Video Conf and VOIP GE Resource Discovery E-Collaboration Portlets Res 2Res 3Res 4 GE DBMS GE Statistical Analysis Quantitative Methods Portlets Res 5Res 6

C Q e S S 20 Sakai Sakai is open source, it’s the hosting framework of choice for VLE and VRE (OGCE) development in the US; Big investment from Mellon Foundation and Ivy League Universities ($6.8M); Sakai 2.0 (release 10th June 05) will take WSRP compliant portlets.

C Q e S S 21 HTTP Sakai WSRP tool Portal Non-Sakai Non-Java Tools tool WSRP Non-Sakai Tool Sakai tool HTTP WSRP Sakai tool HTTP WSRP Using WSRP and to Federate across sites and provide extreme user flexibility in presentation

C Q e S S 22 LDCue for Structuring Content LDCue integrates content created by most standard authoring systems (incl. video) that is visible on the web; A resource discoverer will be able to specify where am I now and where I want to be, then the are supplied, by the LDCue tool, with a list of potentially suitable learning object URIs; The metadata on these URIs are then used to create learning designs that sequence material (read this first, then this, etc ).

C Q e S S 23 Reload & CopperCore Just like a musician, Reload is used to compose the structure for the learning design. The learner is the deejay who plays back the learning design created in Reload.

C Q e S S 24 Reload & CopperCore (cont) CopperCore is the medium used to play back the learning design created in Reload. CopperCore gives a structure to the learning modules, and keeps track of what has been covered by the learner.

C Q e S S 25 Reload Structure The IMS Learning Design package within Reload is made up of the following tabs: –General –Roles –Environment –Activities –Methods –Resources

C Q e S S 26 General

C Q e S S 27 General (cont) This contains the top level information for the IMS Learning Design. The most important to fill in are the objectives, requirements, description and overview.

C Q e S S 28 Roles

C Q e S S 29 Roles (cont) This tab allows the user to choose input learner and staff, both with different characteristics. Various information can be added, such as minimum and maximum size of group.

C Q e S S 30 Environment This describes the environments in which the learning occurs.

C Q e S S 31 Activities

C Q e S S 32 Activities (cont) In here, the designer can group activities together from a selection of resources. The activities can be presented as selections or in sequence.

C Q e S S 33 Method

C Q e S S 34 Method (Cont) Learning designs consist of one or more plays, each with one or more acts following sequentially. The roles need to be specified here as well.

C Q e S S 35 Resources The Resources tab allows the user to manage the resources needed by the Learning Design.

C Q e S S 36 Validation When the learning design has been saved, create a zip file and upload this into Coppercore.

C Q e S S 37 Uploading If there are no errors, this is what you get.

C Q e S S 38 Running CopperCore

C Q e S S 39 Running CopperCore After entering the student names, and setting runs and roles, this is what happens.

C Q e S S 40 CopperCore (cont) Clicking on the run will open a new web browser.

C Q e S S 41 RELOAD Project Creates a learning design in Reload –use Coppercore to play it back.

C Q e S S 42

C Q e S S 43 Advantages of LDCue over a search engine on the web Search engines do not sequence material by difficulty/complexity; With Learning Design you get semantically coherent content; Search Engines (e.g. Google) typically gives associative learning, which can be inefficient, especially when you get a lot of hits;

C Q e S S 44 Some of the VRE Tools we have written E-Collaboration Distributed Whiteboard; Voice and Video over IP; Broadcast Display (e.g. word and ppt). E-Discovery LDCue for Structuring Content.

C Q e S S 45 ReDReSS ReDRess is a joint project between Lancaster University and CCLRC Daresbury. It is a training and awareness project in eScience and eSocial Science. We are commissioning social scientists to write material for our portal

C Q e S S 46 ReDRESS NCeSS NCeSS Conference paper Other Content Jan-May 2005

C Q e S S 47 Finished NCeSS Other NCeSS/ ReDReSS Content Jan-May 2005 (cont)

C Q e S S 48 Content May – Aug 2005 ReDRESS NCeSS NCeSS Conference Paper ReDReSS/ NCeSS

C Q e S S 49 Content May–Aug 2005 (cont) ReDReSS NCeSS Other ReDReSS/ NCeSS

C Q e S S 50 Any Questions ?