Patterns for E-Research Dave Berry, Research Manager E-Research within the University of Edinburgh, 2 nd March 2005
E-Research “The invention and application of computing methods to extend our capabilities in any research discipline” “Research in any discipline which benefits from and often depends on the use of advanced facilities and methods for computation, data curation, digital communication and visualisation”
Technology Growth Gilder’s Law (32X in 4 yrs) Storage Law (16X in 4yrs) Moore’s Law (5X in 4yrs) Triumph of Light – Scientific American. George Stix, January 2001 Performance per Dollar Spent Optical Fibre (bits per second) Chip capacity (# transistors) Data Storage (bits per sq. inch) Number of Years Doubling Time (months)
Pattern 1: Distributed Collaboration Groups in different sites working together Sharing knowledge and ideas Technologies: Shared repositories Wikis, SourceForge/NeSCForge, Forums, … Videoconferencing Computer Supported Cooperative Work (CSCW)
Technology: Access Grid Microphones Cameras
Pattern 2: Simulation & Modelling Large variety of topics, e.g. Protein folding Position of atoms in semiconductors Human heart Ecology of ice sheets Multiple scales Remote visualisation and control
Example: The TeraGyroid Scientific Experiment High-density isosurface of the late-time configuration in a ternary amphiphilic fluid as simulated on a 64 3 lattice by LB3D. Gyroid ordering coexists with defect-rich, sponge-like regions. The dynamical behaviour of such defect-rich systems can only be studied with very large scale simulations, in conjunction with high- performance visualisation and computational steering. See
Example: Terrestrial Carbon Dynamics
Pattern 3: Data archives Data archives maintain data for widespread use, e.g. UK Borders, Go-Geo, … (EDINA) ArkDB (Roslin) Mouse Atlas (HGU) EMBL, UniProt, … (EBI) Census, … (MIMAS) Client-server access Schemas defined centrally Often subject to change… … if they’re defined at all!
Infrastructure: Digital Curation Centre Industry research collaborators standards bodies testbeds & tools communities of practice: users community support & outreach research development co-ordination service definition & delivery management & admin support Collaborative Associates Network of Data Organisations curation organisations eg DPC
Technology: Schema-Directed XML Publishing cached XML tree T I source database XML publishing Updates: I incremental update TT
Pattern 4: Federated data Sites maintain their own data Remote access to other sites Control access to your site Integrated views Community-defined schemas Translation between schemas Distributed algorithms Run jobs remotely Distributed data mining
Example: Mass-scale Data Mining
Pattern 5: Parameter Search Run the same algorithm on different data, e.g. Finding local minima Combinatorial search Allows the use of multiple machines, e.g. A cluster Multiple clusters Desktop PCs
Example: ClimatePrediction.net See
Composing Patterns Patterns that compose… Complex problems require many inputs and many processes Shared contributions compose indefinitely, accumulating knowledge … and how to compose them A common infrastructure Technologies, naming, schemas, … Workflow languages Portals and “problem-solving environments”
Example: BRIDGES (BioInformatics) Synteny Grid Service blast + Authorisation
Example: FireGrid (proposal) Maps, models, scenarios Super-real-time simulation (HPC) KBS and Planning Emergency Responders 1000s of sensors & gateway processing
PiperAlpha Mont Blanc Kob e Kings Cross WTC
Practical Challenges Technical A variety of partial answers Standardisation work is long and political Social Sharing of resources means sharing YOUR resources Contributor recognition and IPR Defining common schemas and ontologies Training, funding for software developers and sysadmins Responsibility of data publishers Cost, dependability, trustworthy, capable, flexibility, … Management of infrastructure Operation – NGS (national), ACF (local) Funding