Manchester Computing Supercomputing, Visualization & eScience Celia Russell, Stephen Pickles and Mike Jones Combining Data Workshop ESRC Research Methods.

Slides:



Advertisements
Similar presentations
Wei Lu 1, Kate Keahey 2, Tim Freeman 2, Frank Siebenlist 2 1 Indiana University, 2 Argonne National Lab
Advertisements

GridWorld 2006 Use of MyProxy for the FusionGrid Mary Thompson Monte Goode GridWorld 2006.
ASPiS - Architecture for a Shibboleth-Protected iRODS System Mark Hedges, Tobias Blanke Centre for e-Research, Kings College London Adil Hasan, Jens Jensen.
Password?. Project CLASP: Common Login and Access rights across Services Plan
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
John Kewley e-Science Centre GIS and Grid Computing Workshop 13 th September 2005, Leeds Grid Middleware and GROWL John Kewley
Slides for Grid Computing: Techniques and Applications by Barry Wilkinson, Chapman & Hall/CRC press, © Chapter 1, pp For educational use only.
The Cactus Portal A Case Study in Grid Portal Development Michael Paul Russell Dept of Computer Science The University of Chicago
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
INTERNET DATABASE Chapter 9. u Basics of Internet, Web, HTTP, HTML, URLs. u Advantages and disadvantages of Web as a database platform. u Approaches for.
03 December 2003 Digital Certificate Operation in a Complex Environment Consultation/Stakeholders Meeting 3 December 2003.
1-2.1 Grid computing infrastructure software Brief introduction to Globus © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. Modification.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
GridSphere for GridLab A Grid Application Server Development Framework By Michael Paul Russell Dept Computer Science University.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Globus Computing Infrustructure Software Globus Toolkit 11-2.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Cracow Grid Workshop’10 Kraków, October 11-13,
INTRODUCTION TO WEB DATABASE PROGRAMMING
1 Web Server Concepts Dr. Awad Khalil Computer Science Department AUC.
Copyright © cs-tutorial.com. Introduction to Web Development In 1990 and 1991,Tim Berners-Lee created the World Wide Web at the European Laboratory for.
Selecting and Combining Tools F. Duveau 02/03/12 F. Duveau 02/03/12 Chapter 14.
Supercomputing, Visualization & eScience1 e-Social Science Grid technologies for Social Science: the Seamless Access to Multiple Datasets (SAMD) project.
CIS 375—Web App Dev II Microsoft’s.NET. 2 Introduction to.NET Steve Ballmer (January 2000): Steve Ballmer "Delivering an Internet-based platform of Next.
TeraGrid Science Gateways: Scaling TeraGrid Access Aaron Shelmire¹, Jim Basney², Jim Marsteller¹, Von Welch²,
 2001 Prentice Hall, Inc. All rights reserved. 1 Chapter 21 - Web Servers (IIS, PWS and Apache) Outline 21.1 Introduction 21.2 HTTP Request Types 21.3.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America GENIUS server installation and configuration.
INFSO-RI Enabling Grids for E-sciencE The GENIUS Grid portal Tony Calanducci INFN Catania - Italy First Latin American Workshop.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
National Center for Supercomputing Applications NCSA OPIE Presentation November 2000.
Topaz : A GridFTP extension to Firefox M. Taufer, R. Zamudio, D. Catarino, K. Bhatia, B. Stearn University of Texas at El Paso San Diego Supercomputer.
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
1 All-Hands Meeting 2-4 th Sept 2003 e-Science Centre The Data Portal Glen Drinkwater.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
National Computational Science National Center for Supercomputing Applications National Computational Science NCSA-IPG Collaboration Projects Overview.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Grid Security in a production environment: 4 years of running Andrew McNab University of Manchester.
Rob Allan Daresbury Laboratory A Web Portal for the National Grid Service Xiaobo Yang, Dharmesh Chohan, Xiao Dong Wang and Rob Allan CCLRC e-Science Centre,
Holding slide prior to starting show. A Portlet Interface for Computational Electromagnetics on the Grid Maria Lin and David Walker Cardiff University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
© Geodise Project, University of Southampton, Geodise Middleware & Optimisation Graeme Pound, Hakki Eres, Gang Xue & Matthew Fairman Summer 2003.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation NEES TeraGrid.
National Computational Science National Center for Supercomputing Applications National Computational Science GSI Online Credential Retrieval Requirements.
© Geodise Project, University of Southampton, Data Management in Geodise Zhuoan Jiao, Jasmin Wason & Marc Molinari { z.jiao,
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
Web Server.
Shibboleth & Grid Integration STFC and University of Oxford (and University of Manchester)
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
© Geodise Project, University of Southampton, Geodise Middleware Graeme Pound, Gang Xue & Matthew Fairman Summer 2003.
1 AHM, 2–4 Sept 2003 e-Science Centre GRID Authorization Framework for CCLRC Data Portal Ananta Manandhar.
John Kewley e-Science Centre All Hands Meeting st September, Nottingham GROWL: A Lightweight Grid Services Toolkit and Applications John Kewley.
USGS GRID Exploratory Status Review Stuart Doescher Mike Neiers USGS/EDC May
The National Grid Service Mike Mineter.
The overview How the open market works. Players and Bodies  The main players are –The component supplier  Document  Binary –The authorized supplier.
Manchester Computing Supercomputing, Visualization & eScience Seamless Access to Multiple Datasets Mike AS Jones ● Demo Run-through.
Holding slide prior to starting show. Lessons Learned from the GECEM Portal David Walker Cardiff University
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
 Project Team: Suzana Vaserman David Fleish Moran Zafir Tzvika Stein  Academic adviser: Dr. Mayer Goldberg  Technical adviser: Mr. Guy Wiener.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
WEB BASED DSS Aaron Atuhe. KEY CONCEPTS When software vendors propose implementing a Web-Based Decision Support System, they are referring to a computerized.
1 Chapter 1 INTRODUCTION TO WEB. 2 Objectives In this chapter, you will: Become familiar with the architecture of the World Wide Web Learn about communication.
Architecture Review 10/11/2004
WWW and HTTP King Fahd University of Petroleum & Minerals
CRC exercises Not happy with the way the document for testbed architecture is progressing More a collection of contributions from the mware groups rather.
Tweaking the Certificate Lifecycle for the UK eScience CA
Chapter 27 WWW and HTTP.
Presentation transcript:

Manchester Computing Supercomputing, Visualization & eScience Celia Russell, Stephen Pickles and Mike Jones Combining Data Workshop ESRC Research Methods Programme Manchester, December 18, 2002 SAMD Seamless Access to Multiple Datasets A ESRC/DTI e-Science demonstrator project

Supercomputing, Visualization & eScience2 SAMD Seamless Access to Multiple Datasets  A project to demonstrate the benefits of applying e- Science grid technologies to an ordinary social science query  We solve a genuine problem from the UK academic social science community - a multivariate analysis using a complex mathematical algorithm  Based on a major social science databank, the Office for National Statistics Time Series Data, hosted at MIMAS

Supercomputing, Visualization & eScience3 The problem  Published as Sensier, M., Osborn D.R. and Öcal N. (2002) ‘Asymmetric Interest Rate Effects for the UK Real Economy’, Oxford Bulletin of Economics and Statistics, Volume 64, September 2002, n°4  The research query looks at the effect interest rate changes had on Gross Domestic Product in the UK over the period 1960 – 2000

Supercomputing, Visualization & eScience4 Interest Rates in the UK

Supercomputing, Visualization & eScience5 UK GDP – quarterly changes

Supercomputing, Visualization & eScience6 The Model Where y is the quarterly change in GDP and z is the quarterly change in interest rates

Supercomputing, Visualization & eScience7 Before SAMD

Supercomputing, Visualization & eScience8 e-Science Grid

Supercomputing, Visualization & eScience9 SAMD Methodology We built a mini demonstrator grid for SAMD by:  Grid-enabling the NS Time Series Databank  Parallelising the code to represent the HPC facilities  Using Grid protocols for data transfer  Creating a graphical user interface that included a single sign-on  It all worked, and cut the data collection and analysis time down to around 8 minutes.

Supercomputing, Visualization & eScience10 Extending SAMD  The approach and methods of SAMD are applicable to more general social science applications involving data collection and analysis  More efficient handling of datasets – data is moved to where it's needed, not just to web browser  The single sign-on for all databanks means users can cross search datasets and perform cross analyses of multiple datasets from different providers  Grants access to high performance computing facilities on the grid without the user having to learn how to use them  Can automate routine enquiries  Cuts the time taken to run computing intensive problems by a factor of around 100

Supercomputing, Visualization & eScience11 Scaling up with the Grid E-Science Grids allow the social scientist to scale up their quantitative research by:  Including many more data points in their analysis  Developing more complex models incorporating more variables  Dropping assumptions  Visualising data  Creating new communities and collaborations  Exploring new types of analyses

Manchester Computing Supercomputing, Visualization & eScience SAMD Architecture

Supercomputing, Visualization & eScience13 Motivation Web-based access to socio-economic datasets such as Office of National Statistics Time series data has lead to greatly increased use, but:-  No standard authentication or authorisation –too many usernames and passwords to remember  To automate search and retrieval, can only emulate navigation through "screen scraping" –breaks whenever the interface is "improved" –discourages third party developments and periodic re-analysis  Data must be downloaded and saved to local disk –not necessarily the system on which subsequent analysis is to be performed –inefficient, especially for large datasets

Supercomputing, Visualization & eScience14 The SAMD solution  Use Grid Security Infrastructure for "single sign-on" authentication everywhere –Modified standard Apache web server to accept proxy credentials Permits re-use of existing CGI code  Use third party file transfers (grid-ftp) to move data directly to where it's needed  Use standard globus mechanisms to –Locate HPC facility for analysis –Stage analysis binary from local repository and run analysis job on HPC facility –Retrieve results

Supercomputing, Visualization & eScience15 Architecture

Supercomputing, Visualization & eScience16 What's new?  Web interfaces to datasets? –We show that there are more flexible ways of delivering access to data over the internet than through static web pages alone  Single sign-on? –We show that the domain of single sign-on can be much broader than provided by Athens  Graphical User Interfaces? –We show that it's possible for a third party to develop new tools independently of data providers –A short script can encapsulate all the essential functionality of the SAMD GUI  Integration, Interoperability!

Supercomputing, Visualization & eScience17 What's needed? Culture of Standards  If key datasets are Grid-enabled in a commonly understood, well-documented way, we create an environment in which third parties can develop tools and services that add real value by bringing together independent datasets  SAMD shows that such an environment is technically possible, but does not by itself establish any standard. –Look to Web services, Grid services, OGSA-DAI…

Manchester Computing Supercomputing, Visualization & eScience SAMD User Interfaces

Supercomputing, Visualization & eScience19 GUI: Single Sign-on Panel located at the top left  Uses X509 proxy certificates  grid-proxy-init –Creates your proxy credential  grid-proxy-destroy –Removes your proxy credential

Supercomputing, Visualization & eScience20 GUI: Data Acquisition The Interface to the SAMD-ONS web server, steps 1 to 8

Supercomputing, Visualization & eScience21 Data Search Search by Keyword 1 Request and Mutual Authentication using a proxy credential 2,3 Authorisation 4 Query Data Store

Supercomputing, Visualization & eScience22 Data Request Data moved to GridFTP server  1: send references to data  1,2,3: authentication & authorisation  4: ask datastore to move data (5)  6,7: datastore returns XML ticket

Supercomputing, Visualization & eScience23 Data Transfer Data moved to HPC engine  8: third party file transfer –from MIMAS to HPC engine, ready for analysis

Supercomputing, Visualization & eScience24 Finding an HPC Resource GIIS MDS Server  e.g. ginfo.grid-support.ac.uk Search for:  OS type eg: IRIX64  Minimum No. Processors  Jobmanager or manually enter your favourite Data Analysis panel

Supercomputing, Visualization & eScience25  Select an executable on the local machine  Stage job using Globus  Check status using Globus  Retrieve results using Globus  Clean-up using Globus  Even delete job using Globus Data Analysis panel Using the HPC Resource

Supercomputing, Visualization & eScience26 Command line automation Not everyone has the expertise or time to write a special- purpose GUI. Given a GSI-enabled web server and documented protocol to communicate with it, a few lines of shell script can do all the essential steps  Use grid-proxy-init to sign on  Use curl to talk https to the web server  Use GridFTP to move data to the HPC engine  Use globus-commands to –(stage and) run executable. –retrieve results –and clean-up

Supercomputing, Visualization & eScience27 Acknowledgments Funded by theand the Keith Cole Celia Russell Marianne Sensier Geoff Lane Tim Hateley Mark Riding Kevin Roy