/ ConvertGrid: Grid Enabling Population Datasets Keith Cole National Centre for e-Social Science (NCeSS) & MIMAS University.

Slides:



Advertisements
Similar presentations
The Economic and Social Data Service (ESDS) Kevin Schürer ESDS/UKDA ESDS Awareness Day 5 December 2003.
Advertisements

Access to Economic and Social Data via the UK Data Archive Jack Kneeshaw UKDA.
Issues in methods and reuse for hypermedia ethnography Presented at QUADS Showcase day September 28, 2006 Louise Corti.
UPortal Workshop The Deep 19 th November The University of Hull Portal and the Digital University Project Ian Dolphin Head of Interactive Media,
Will 2011 be the last Census of its kind in England and Wales? Roma Chappell, Programme Director Beyond 2011 Office for National Statistics, July 2011.
ESRC Future Strategy for Resources and Methods Professor Ian Diamond Chief Executive ESRC.
Mapping and Visualising Census Data Keith Cole Jackie Carter Geo-data forum - 4/4/2001.
Optimising metadata workflows in a distributed information environment R. John Robertson & Jane Barton Centre for Digital Library Research University of.
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
A Prototype Implementation of a Framework for Organising Virtual Exhibitions over the Web Ali Elbekai, Nick Rossiter School of Computing, Engineering and.
UPTAP Workshop How Can e-Social Science Promote the Re-Use of Data? Rob Procter National Centre for e-Social Science
Grid-Enabling Data: Sticking Plaster, Sellotape, & Chewing Gum? Colin C. Venters National Centre for e-Social Science University.
Spatial Data Infrastructure: Concepts and Components Geog 458: Map Sources and Errors March 6, 2006.
Spatial Information Integration Services (SIIS) ISO/TC211 Workshop on Standards in Action Adelaide, South Australia October 2001 Mr. Neil Sandercock, SA.
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
T HE W EB - BASED I NTERFACE TO C ENSUS I NTERACTION D ATA - WICID Presentation to the ESRC Research Methods Festival Adam Dennett Centre for Interaction.
Learning and Teaching with the UK Census Developing the Collection of Historical and Contemporary Census Data and Materials into a Major Learning and Teaching.
Modelling and Simulation for e-Social Science Mark Birkin School of Geography University of Leeds.
Part of the Arts and Humanities Data Service and the UK Data Archive. Funded by the Joint Information Systems Committee and the Arts and Humanities Research.
Part of the Arts and Humanities Data Service and the UK Data Archive. Funded by the Joint Information Systems Committee and the Arts and Humanities Research.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,
Geography 465 Overview Geoprocessing in ArcGIS. MODELING Geoprocessing as modeling.
Shirley Crompton Source: Rob Allan. Institutional Repository Subject Repository Data Producer Repository share resources solve bigger problems integrate.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Geographical Data Products Carol Blackwood UKBORDERS 3 rd July 2012.
Manchester Computing Supercomputing, Visualization & e-Science OntoGrid GridPrimer Training University of Manchester 18 th to 22 nd October 2004 ConvertGrid:
GEOG3025 Census and administrative data sources 2: Outputs and access.
Centre for Earth Systems Engineering Research Infrastructure Transitions Research Consortium (ITRC) David Alderson & Stuart Barr What is the aim of ITRC?
Supercomputing, Visualization & eScience1 e-Social Science Grid technologies for Social Science: the Seamless Access to Multiple Datasets (SAMD) project.
Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011.
Presenting Statistical Data Using XML Office for National Statistics, United Kingdom Rob Hawkins, Application Development.
Using PostGIS and MapServer in the Census Interaction Data Service Presentation to AGI Technical SIG 'Open-Source in GIS' British Antarctic Survey, Cambridge,
Fundamentals of Database Chapter 7 Database Technologies.
DAME: Distributed Engine Health Monitoring on the Grid
Stephen Booth EPCC Stephen Booth GridSafe Overview.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Introduction to ESDS International Celia Russell Economic and Social Data Service MIMAS April 14 th 2004 University of Manchester Delivering the World:
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
New and easier ways of working with aggregate data and geographies from UK censuses Justin Hayes UK Data Service Census Support.
ESDS International Celia Russell and Susan Noble Economic and Social Data Service University of Manchester ESDS International Conference 2007.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Federated Database Set Up Greg Magsamen ITK478 SIA.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 14 Database Connectivity and Web Technologies.
OGSA-DAI.
Supporting Further and Higher Education Collection description as Middleware The Information Environment Service Registry (IESR) Rachel Bruce, Information.
INFSO-RI Enabling Grids for E-sciencE OGSA DAI Data Access and Integration Marek Ciglan Institute of Informatics, Slovac Academy.
ABSTRACT The JDBC (Java Database Connectivity) API is the industry standard for database- independent connectivity between the Java programming language.
Structural analysis of the aggregate outputs from the 2011 Census to develop alternative integrated multidimensional conceptual models of data and geographies.
Combining the strengths of UMIST and The Victoria University of Manchester “Use cases” Stephen Pickles e-Frameworks meets e-Science workshop Edinburgh,
User Requirements and Engagement in Health Informatics Alistair Sutcliffe Sarah Thew, Oscar De Bruijn, Manchester Business School, Jock McNaught National.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
Information Integration 15 th Meeting Course Name: Business Intelligence Year: 2009.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Exeter – Implementation of a Crosswalk Connector S. Trowell, University of Exeter Nov 2013.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
OGSA-DAI.
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
Service Oriented Architecture (SOA) Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
How Can e-Social Science Promote the Re-Use of Data?
Flanders Marine Institute (VLIZ)
Google Sky.
Features Overview.
Presentation transcript:

/ ConvertGrid: Grid Enabling Population Datasets Keith Cole National Centre for e-Social Science (NCeSS) & MIMAS University of Manchester

Presentation Overview n Data Grids and the e-Social Science vision n The ConvertGrid pilot demonstrator project –An example of Grid enabling population datasets & existing web based services –Lessons learned n Building the Social Science Data Grid –The next steps n GEMS project

What are the benefits of Data Grids for Social Science? n Data Grids facilitate unimpeded use of distributed, heterogeneous, autonomous data resources. –Integrated view of the data resources that allow users to interact with them as if they constituted a single, global, integrated data resource. n Grid enabling a dataset creates new opportunities for its use. –enables users to integrate it with other datasets –makes it possible to analyse the dataset using techniques that require the kind of computational power that it is only feasible using the Grid (e.g. more complex models, more data points). –standardisation of procedures and mechanisms used to access and update the dataset, increase its shareability

The Social Science Data Grid Vision n It involves placing the data resource (e.g. database) behind ‘wrapper’ middleware. n Once wrapped, ‘mediator’ middleware’ can be employed for data access. n Once a data resource is Grid- enabled, its availability can be easily advertised in registries. n June’s application can now access data on inflation and VAT as if Joan’s and Javier’s data were hers and held in Manchester. n Analysis can be re-run automatically when databases are updated. n It all sounds so easy in theory! Now let’s see a real example!

ConvertGrid – An e-Social Science Pilot Demonstrator Project n Research context: –Research questions that require the combination of a data from multiple geo-referenced datasets which require users to perform the following generic tasks: Extract data from a number of datasets using different interfaces Convert each set of data to the desired target geography Combine the converted sets into a single set of data n ConvertGrid objectives: –To Grid enable existing socio-economic data sources; –Use Grid technologies to extend the functionality of an existing web based data service (i.e. Convert); –Demonstrate how Grid technologies can automate complex workflows; –Build a user interface to a Grid based service which is suitable for student/teaching use;

Different Target Geographies

ConvertGrid - Data Sources Used n Data Sources –1991 LBS/SAS (1991 Census geographies) –ONS Neighbourhood Statistics (1998 Ward & LADs) –Experian (2000 Postcode Sectors) –All Fields Postcode Directory (AFPD) (1999b) n Selection criteria –Data on a range of themes (Health, Education and Crime Use Cases) –Different geographies and time points –AFPD derived conversion tables available for geographies via Convert

Example Use Case – Crime Theme n Spatial correlation of recorded burglaries with house prices and other indicators of social wellbeing/deprivation. n Study target geography –1998 LAD n Datasets required: –1991 Census Total population (1991 ward) Unemployment (1991 ward) Overcrowding (1991 ward) –Neighbourhood Statistics 1998 data Population estimates (1998 ward) Recorded household burglaries (1998 LAD) –Experian1999 supply Total population (1999 PCS) Annual average house sale value (1999 PCS) Population in MOSAIC Group A (1999 PCS)

Use Case – Health Theme n Health researcher wishing to look for relations between incidence of coronary heart disease and other demographic factors. n Study target geography –1998 Primary Care Group n Datasets required: –1991 Census Total population (1991 ward) Limiting Long Term Illness (1991 ward) Unemployment (1991 ward) Ethnicity (1991 ward) –Neighbourhood Statistics 1998 data Population estimates (1998 ward) Heart disease diagnosis episodes (1998 LAD) –Experian1999 supply Total population (1999 PCS) Population in MOSAIC Group A (1999 PCS)

ConvertGrid Architecture (Techies only!)

ConvertGrid – Services Provided n Converts data sources with different native geographies to a common Target Geography and outputs combined data as: –A data stream in CSV or XML format or –Transferred to a web based visualisation system n Grid-enabled datasets (incl. AFPD) –Available to other Grid services n Accessible to users via a ‘classic’ web based interface –Essential for demonstration purposes –Step by step guide developed n Extensible system –Available to other applications via a web services interface –Easy to add other Grid-enabled datasets to the system

ConvertGrid – Data Visualisation Interface n Relationship between average house price sales (Experian) and percentage of year olds entering university (Neighbourhood Statistics & Census aggregate statistics)

ConvertGrid – Issues and Challenges n Establishment of a Grid infrastructure –Early adopter of the National Grid Service Data Node –Key Grid middleware still under rapid development n Database migration problems –SQLServer to Oracle on the National Grid Service –Maintaining multiple databases resource intensive n Data comparability issues a problem –Postcode formats n Developing metadata registries –For resource discovery, data access and interpretation n System performance, scalability and security –OGSA-DAI still relatively inefficient –Implementation of Grid security non-trivial

Grid Enabling Data - The Next Steps n Establishing a social science data Grid is a key component of the wider e-Social Science strategy. n Current social science data infrastructure (academic and non- academic) needs to be Grid enabled in a standards compliant and sustainable way. n Data service infrastructures need to be able to support multiple forms of access. n MIMAS is being funded by JISC to Grid enable the 2001 census aggregate statistics via OGSA-DAI on the NGS (GEMS project). n The NCeSS Hub and Nodes will have a key role to play in addressing many of the key technical and methodological issues. n Grid enabling the underlying databases may turn out to be the easy bit! Methodologies and intermediary applications/interfaces to facilitate data integration/analysis is much harder!

GEMS – Grid Enabling MIMAS Services  Establishing production data grids to support e- Research  Connecting the MS SQLServer databases holding the 2001 Census aggregate data directly to the Grid via the NGS  Grid enabling the current data access system (Casweb)Casweb  Maximise and build upon the ESRC/JISC investment in the establishment of an existing social science data infrastructureESRCJISC

GEMS Functionality  Transform query result into a variety of formats (CSV, HTML, etc...) by employing built-in or user uploaded XSL Transform scripts  Upload query results to a Grid/FTP server  View SQL generated by user interface for further integration into an OGSA-DAI client  Redirect query results to an grid service/OGSA-DAI activity for further processing  Bulk upload query results to a user specified OGSA-DAI enabled database  Implement secure access management

Acknowledgements n ConvertGrid Manchester –Jon Mclaren –Pascal Ekin –Linda Mason –Stephen Pickles –Justin Hayes n NCeSS –Laura Bond –Alvaro Fernandes