UPTAP Workshop 20071 How Can e-Social Science Promote the Re-Use of Data? Rob Procter National Centre for e-Social Science

Slides:



Advertisements
Similar presentations
Objectives Create an action query to create a table
Advertisements

UNITED NATIONS REGIONAL WORKSHOP ON DATA DISSEMINATION AND COMMUNICATION VENUE: Amman, Jordan DATE: 9th September, 2013 Presenter: GODWIN ODEI GYEBI Statistical.
ASYCUDA Overview … a summary of the objectives of ASYCUDA implementation projects and features of the software for the Customs computer system.
Scaling distributed search for diagnostics and prognostics applications Prof. Jim Austin Computer Science, University of York UK CEO Cybula Ltd.
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
DRIVER Long Term Preservation for Enhanced Publications in the DRIVER Infrastructure 1 WePreserve Workshop, October 2008 Dale Peters, Scientific Technical.
Fighting Malaria With The Grid. Computing on The Grid The Internet allows users to share information across vast geographical distances. Using similar.
New Services for Users Enhanced User Support and Enhanced Access to Data Angela Dale, Head ESDS Government Melanie Wright, Head ESDS Access & Preservation.
1 ESDS Government Vanessa Higgins Cathie Marsh Centre for Census and Survey Research University of Manchester ESDS Awareness Day December 2003.
ESDS user support materials and resources: how to use them Support Services Royal Statistical Society, London 13 February 2009.
The Economic and Social Data Service (ESDS) Kevin Schürer ESDS/UKDA ESDS Awareness Day 5 December 2003.
The Economic and Social Data Service (ESDS) Karen Dennison UK Data Archive Improving access to government datasets 18 January 2007.
Goals Rob Procter Dave Berry Anne Trefethen Paul Watson.
Will 2011 be the last Census of its kind in England and Wales? Roma Chappell, Programme Director Beyond 2011 Office for National Statistics, July 2011.
ESRC Future Strategy for Resources and Methods Professor Ian Diamond Chief Executive ESRC.
R e D R e S S Resource Discovery for Researchers in e-Social Science ReDReSS A Joint Application from Lancaster and Daresbury (7 social scientists, 6 computer/computational.
E-Science Update Steve Gough, ITS 19 Feb e-Science large scale science increasingly carried out through distributed global collaborations enabled.
DAMES - Data Management through e-Social Science 1 DAMES: Data Management through e-Social Science NCeSS Research Node University of Stirling / University.
Mapping and Visualising Census Data Keith Cole Jackie Carter Geo-data forum - 4/4/2001.
Where next…. Stakeholder workshop, 29 Jan To the end of the project.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Application of the Benefits Analysis Tools for MRC population health studies Professor Dipak Kalra Centre for Health Informatics and Multiprofessional.
Collection-level description & the Information Landscape: users evaluate strategies for resource discovery Collection Description Focus Workshop 5 Cambridge,
Objectives Explore a structured range of data Freeze rows and columns
Madrid 7 -9 March FARO EU Kick-off Meeting Introduction to the project by Marta Pérez-Soba.
1 ESDS Government: added value for large-scale government datasets Vanessa Higgins, Economic and Social Data Service CCSR, University of Manchester MOF.
E-Social Science and the doctorate Peter Halfpenny ESRC National Centre for e-Social Science New Forms of Doctorate London Knowledge Lab 10 November 2008.
Grid-Enabling Data: Sticking Plaster, Sellotape, & Chewing Gum? Colin C. Venters National Centre for e-Social Science University.
/ ConvertGrid: Grid Enabling Population Datasets Keith Cole National Centre for e-Social Science (NCeSS) & MIMAS University.
GEODE Workshop 16 th January 2007 Issues in e-Science Richard Sinnott University of Glasgow Ken Turner University of Stirling.
The e-Social Science Research Agenda Peter Halfpenny and Rob Procter School of Social Sciences - University of Manchester UK e-Science All Hands Meeting.
Modelling and Simulation for e-Social Science Mark Birkin School of Geography University of Leeds.
MoSeS meets NEC 10 th March 2008 MoSeSMoSeS Andy Turner
Information Technology for Construction: Recent Work and Future Directions in W78.
Joint Information Systems Committee Supporting Higher and Further Education Development of an Information Environment for UK Learning and Teaching NOF-Digitise.
Oxford eResearch Conference 2008 Paper Session 4A: NCeSS Oxford, UK, ( ) Experience of e-Social Science: A Case of Andy Turner and MoSeS Andy.
CCG 1 MoSeS Introduction and Progress Report Andy Turner
Shirley Crompton Source: Rob Allan. Institutional Repository Subject Repository Data Producer Repository share resources solve bigger problems integrate.
An Introduction to Social Simulation Andy Turner Presentation as part of Social Simulation Tutorial at the.
MOSES: Modelling and Simulation for e-Social Science Mark Birkin, Martin Clarke, Phil Rees School of Geography, University of Leeds Haibo Chen, Institute.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Manchester Computing Supercomputing, Visualization & e-Science OntoGrid GridPrimer Training University of Manchester 18 th to 22 nd October 2004 ConvertGrid:
Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
1 UK NeSC Meeting, November 18 th, 2004 Terry Sloan EPCC, The University of Edinburgh INWA : using OGSA-DAI in a commercial environment.
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
material assembled from the web pages at
Usability Issues Documentation J. Apostolakis for Geant4 16 January 2009.
29th NovemberAHRC ICT Methods Network Seminar 1 Sustainability Issues for e-Infrastructure Services in Arts and Humanities Rob Procter
Data documentation and metadata for data archiving and sharing Managing research data well workshop London, 30 June 2009 Manchester, 1 July 2009.
Developing and improving data resources for social science research A strategic approach to data development and data sharing in the social sciences Peter.
Developing and improving data resources for social science research A strategic approach to data development and data sharing in the social sciences Peter.
Social Statistics ESDS FEASIBILITY STUDY: CHANGING CIRCUMSTANCES DURING CHILDHOOD IAN PLEWIS and PIERRE WALTHERY UNIVERSITY OF MANCHESTER PRESENTATION.
Grid Computing & Semantic Web. Grid Computing Proposed with the idea of electric power grid; Aims at integrating large-scale (global scale) computing.
GRID ARCHITECTURE Chintan O.Patel. CS 551 Fall 2002 Workshop 1 Software Architectures 2 What is Grid ? "...a flexible, secure, coordinated resource- sharing.
Infrastructures for Social Simulation Rob Procter National e-Infrastructure for Social Simulation ISGC 2010 Social Simulation Tutorial.
Shaping a Health Statistics Vision for the 21 st Century 2002 NCHS Data Users Conference 16 July 2002 Daniel J. Friedman, PhD Massachusetts Department.
Frankfurt (Germany), 6-9 June 2011 SmartLife Guillaume & SmartLife Core Group – France – S1 – Paper SmartLife initiative in Focus.
Data Integration in Bioinformatics Using OGSA-DAI The BioDA Project Shirley Crompton, Brian Matthews (CCLRC) Alex Gray, Andrew Jones, Richard White (Cardiff.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Disclosure Risk and Grid Computing Mark Elliot, Kingsley Purdam, Duncan Smith and Stephan Pickles CCSR, University of Manchester
OGC/OGF usage in UK e-Social Science OGF 21, Seattle, USA Paul Townend School of Computing, University of Leeds.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
2nd GEO Data Providers workshop (20-21 April 2017, Florence, Italy)
GISELA & CHAIN Workshop Digital Cultural Heritage Network
How Can e-Social Science Promote the Re-Use of Data?
Data Warehousing and Data Mining
Scanning the environment: The global perspective on the integration of non-traditional data sources, administrative data and geospatial information Sub-regional.
Brian Matthews STFC EOSCpilot Brian Matthews STFC
GISELA & CHAIN Workshop Digital Cultural Heritage Network
A strategic approach to data development and data sharing in the social sciences Peter Elias NCRM/SRA Workshop: "Data Linkage: Exploring the Potential"
Presentation transcript:

UPTAP Workshop How Can e-Social Science Promote the Re-Use of Data? Rob Procter National Centre for e-Social Science

UPTAP Workshop The e-Science Vision n “e-Science is about global collaboration in key areas of science and the next generation of infrastructure that will enable it.” (John Taylor, former DG, Research Councils) n That infrastructure is the Grid: “ … a software infrastructure that enables flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions and resources” (Foster, Kesselman and Tuecke) n The Grid is not just an enabler of visionary research, however, but can help researchers in more mundane ways. n But, to be successful, the development of the Grid must be driven by researchers’ needs. n I want to use the opportunity provided by this workshop to gather ideas from you about what those needs are with a specific focus on the (re-)use of data.

UPTAP Workshop NCeSS Overview n Launched in May 2004 to develop and promote UK e-Social Science. n Unified Centre with distributed structure: –Co-ordinating Hub: Manchester & UKDA –Seven research Nodes located across UK –Twelve small projects

UPTAP Workshop NCeSS Overview n Applications of e-Social Science: –Harnessing new kinds of research infrastructure and tools to tackle substantive problems and promote innovation in research methods n Social shaping: –Usability of new infrastructure and tools –Socio-technical factors in their design, uptake and use –Research and policy drivers, impacts

UPTAP Workshop Hub Social Shaping NCeSS 2006 Tools CQeSS MoSeS PolicyGrid Disclosure Risk Assessment CeSDeMIDE GeSRM Intelligent Simulation MiMeG HeadTalk Analysis Infrastructure and services Research methods OeSS DReSS AGN enabled interviews Learning Disabilities Entangled Data Data chronicles Replayer Grid-enabled data collection Data GeoVUE GeODE

UPTAP Workshop Today’s Research Infrastructure n Heterogeneous resources with poor inter- operability and complex administrative arrangements. HPC Analysis Data archive Analysis Study Experiment HPC Researcher Computing Data archive n Doesn’t scale well and makes re-use and sharing of data and other research resources difficult.

UPTAP Workshop Grid-Enabled Research Infrastructure Social scientist Grid Middle- ware Storage Computing Analysis Experiment HPC Grid middleware manages the interactions between users, and heterogeneous and distributed resources, providing seamless integration of data, analytic tools and compute resources. Data archive Study

UPTAP Workshop The Grid Dissected n Tools to support collaboration between distributed researchers. n Computational Grids for scalable, high- performance computation. n Data Grids for accessing and integrating heterogeneous datasets. n Sensor Grids for collecting real-time data.

UPTAP Workshop Research and Policy Drivers Ageing population Migration Globalisation Childhood development Census and population surveys Administrative data Longitudinal surveys Socio- medical data Business and economic data International macro/micro data       

UPTAP Workshop Research and Policy Drivers n The range of research resources on offer to the social science community has never been greater. n These include not only traditional research datasets, but new kinds of social data. n However, the often highly distributed and heterogeneous character of these datasets makes it difficult to exploit them to their full potential.

UPTAP Workshop Research and Policy Drivers n The data deluge in social sciences: –WWW archive currently contains 55 billion Web pages or 2 petabytes (2x10 15 ) of data and is growing at the rate of 20 terabytes (20x10 12 ) per month n Administrative and transactional data is generated on increasing scale as by product of our everyday activities: –This data is complex and multi-dimensional

UPTAP Workshop Data Grids for Social Science n Data Grids are designed to provide unimpeded and integrated use of distributed, heterogeneous, autonomous data resources. n Grid enabling a dataset creates new opportunities for (re-)use: –enables users to integrate it with other datasets –makes it possible to analyse the dataset using techniques that require the kind of computational power that is only feasible using the Grid (e.g., more complex models, more data points) –standardisation of procedures and mechanisms used to access and update the dataset increase its shareability –Automated analyses (i.e., analyses can be re-run automatically when databases are updated)

UPTAP Workshop An Example Data Linkage Problem n Many research questions require combination of data from multiple geo-referenced datasets: –E.g., Linking post coded data to census geography n Conversion of data relating to different geographies to a common target geography is –A complex time consuming task –Requires a range of data handling/processing skills –A major barrier to use! n The data conversion process requires users to perform the following generic tasks: –Extract and download data in different formats from a number of databases using different interfaces –Convert each dataset to the desired target geography using geographical conversion tables –Combine the converted sets into a single dataset for analysis n These generic tasks can be automated.

UPTAP Workshop A Solution: ConvertGrid n ConvertGrid provides access to 225 UK-wide geography conversion tables between census, electoral, administrative, postal, health and statistical geographies derived from the AFPD. n Facility to convert a researcher’s data from one set of geographical units to another (e.g., from postcode geography to heath geography). n Extensible system - further conversion tables from any source can be incorporated.

UPTAP Workshop ConvertGrid – Data Visualisation Interface n Relationship between average house price sales (Experian) and percentage of year olds entering university (Neighbourhood Statistics & Census aggregate statistics). n Contact Keith Cole for more High average house price sales but low participation rates Low average house price sales but high participation rates Ten minutes from start to finish

UPTAP Workshop Supporting the Research Lifecycle Share results and conclusions and discuss with collaborators Explore datasets and determine suitability Analyse results and compare with hypothesis Review literature and generate hypothesis Write papers Build models and execute them Publish papers Find datasets related to proposed area of work

UPTAP Workshop Increasing (Re-)Use of Social Data n Removing barriers to more effective use of existing social data collections: –Data providers (e.g., ONS, data archives) –Data users n Many researchers are both generators and users of data: –Preparation of data for submission to data archives is not well rewarded so re-use suffers n Removing barriers to use of new kinds of social data: –Privacy and confidentiality of personal data

UPTAP Workshop The Data Provider Perspective n Preparation procedures: –Cleaning the data –Generating derived variables –Re-weighting –Adding metadata –Writing user documentation n Maintenance: –Managing changes in sampling frames, definitions, variables and questionnaire over time –Re-weighting n User support: –Handling queries from users about concepts, meaning and linking waves

UPTAP Workshop The Data User Perspective n Discovering appropriate data: –Determining what can be done with the data and how. n Accessing the data: –Are existing provisions, such as VMDLs, for access to confidential data adequate? n Understanding how the data has been used to generate answers to other research questions: –Provenance of results, links to publications –Re-running statistical models, comparing results n Ease and of use and quality of documentation: –User manuals

UPTAP Workshop The Data User Perspective n Data preparation: –Selecting variables –Linking waves –Linking data sets n Performing and possibly repeating analysis with different data. n Interpreting and visualising results. n Supporting the research lifecycle. n Collaboration with other users and with data providers.

UPTAP Workshop Contacting NCeSS and Getting Involved n n –Join our list: –Participate in events: Agenda setting workshop on combining and sharing data, January 22 nd -23 rd, Manchester Annual conference