Building the e-Minerals Minigrid Rik Tyer, Lisa Blanshard, Kerstin Kleese (Data Management Group) Rob Allan, Andrew Richards (Grid Technology Group)
AHM, Nottingham 2003 Royal Institution University of Reading Project Members
AHM, Nottingham 2003 Who we are One of Europe’s largest Research Support Organisations Provides large scale experimental, data and computing facilities Serves the UK research community both in academia and industry Annually support scientists from all major scientific domains 1800 members of staff over three sites: Rutherford Appleton Laboratory in Oxfordshire Daresbury Laboratory in Cheshire Chilbolton Observatory in Hampshire Large quantities of data associated with the various facilities Major e-Science centre in the UK Council for the Central Laboratory of the Research Councils
AHM, Nottingham 2003 Environmental Issues Radioactive waste disposal Crystal growth and scale inhibition Pollution: molecules and atoms on mineral surfaces Crystal dissolution and weathering
AHM, Nottingham 2003 Examples of Codes DL_POLY3: parallel molecular dynamics code. Modifications aimed at running efficient simulations with millions of atoms for simulations of radiation damage (Daresbury) SIESTA: Order-N quantum mechanics code. Objective to run with large samples and realistic fluids (Cambridge) SURFACE SIMULATIONS: New developments aimed at efficient scanning of many configurations of complex fluid-mineral interfaces, for studies of crystal growth and dissolution (Bath)
AHM, Nottingham 2003 Data Management Requirements Many output files produced from each simulation run Each set of input and output files is a dataset Need to keep information about each simulation – metadata Other scientists need access to this information and datasets Need to search different metadata repositories at once Access could be from anywhere in the world Need to categorise data so it can be found by someone else These requirements are same for all scientists
AHM, Nottingham 2003 Integrated Portals using web services External Applications DataPortal HPCPortal High Performance Computers on the GRID Metadata databases Web Services
AHM, Nottingham 2003 DataPortal Metadata Object Topic Study Description Access Conditions Data Location Data Description Related Material Discipline e.g. Earth Sciences/Soil Contamination/Heavy Metals/Arsenic Provenance about what the study is, who did it and when. Conditions of use providing information on who and how the data can be accessed. Detailed description of the organisation of the data into datasets and files. Locations providing a navigational to where the data on the study can be found. References into the literature and community providing context about the study. Scientific Metadata Model
AHM, Nottingham 2003 DataPortal – Use Cases DATA PORTAL Request Metadata Store Associated Data Files Transfer Data Files Multiple Metadata Repositories Scientist External Application Remote Machines
AHM, Nottingham 2003 Plan of Work High-requirement science High-performance codes Collaborative environment
AHM, Nottingham 2003 e-Minerals Minigrid and Portal Minigrid makes the shared computing resources available to the project through the UK e-Science grid (built using globus, incorporating storage resource broker) The e-Minerals minigrid is accessed through the e-Minerals portal, based on the HPCPortal and Dataportal developed at the Daresbury Laboratory The e-Minerals minigrid links to a storage resource broker to store the outputs of simulation runs
AHM, Nottingham 2003 e-Minerals Portal
AHM, Nottingham 2003 DataPortal Results
AHM, Nottingham 2003 Condor Technologies Condor: Mature-ish technology to build small or large distributed computing systems from standard desktop computers The important point is that condor can allow you to use idle time on desktops, and hence harness the potential of powerful processors
AHM, Nottingham 2003 The UCL Windows Condor Pool Runs WTS (Windows Terminal Server) Approximately 750 cpu’s in 30 clusters. Most are 1GHz Pentium 4, with 256/512Mb ram and 40Gb hard disks. All 90%+ underutilised and running 24/7… We are using condor to use this pool as a massive distributed computing system
AHM, Nottingham 2003 Technology Used Operating system: SuSE Linux 8.1 (kernel GB) Sun Microsystems J2SDK version 1.4 All Data Portal web services built and deployed under Apache Tomcat version using Apache Ant version Apache Axis as the SOAP engine For authentication MyProxy server used Systinet UDDI Server version 4.5 for Lookup Web Service PostgreSQL (version ) databases for Lookup, Session Manager, Access & Control, Shopping Cart Web Services HPCPortal services built using Globus 2 toolkit with GSoap2 libraries. Deployed under standard Apache http server
AHM, Nottingham 2003 Further Information Environment from the Molecular Level E-minerals Mini Grid (need a X.509 certificate) Integrated e-Science Environment Portal HPC Grid Services Portal DataPortal demonstration UK CCLRC e-Science Centre