Tony Hey Corporate Vice President Corporate Vice President Technical Computing Microsoft Corporation Microsoft Corporation Computer and Information Sciences.

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

1 Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Digital Repositories: interoperability & common services Closing Remarks Dr Liz Lyon, UKOLN, University of Bath, UK
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
E-Science and Cyberinfrastructure Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
Opportunities and Challenges in e_Science Fabrizio Gagliardi & Carlos Hulot Microsoft Corporation Fabrizio Gagliardi & Carlos Hulot Microsoft Corporation.
Future of Scientific Computing Marvin Theimer Software Architect Windows Server High Performance Computing Group Microsoft Corporation Marvin Theimer.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
Client+Cloud The Future of Research Dr. Daniel A. Reed Corporate Vice President Extreme Computing Group & Technology Strategy and Policy.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Riding the Wave: a Perspective for Today and the Future APA Conference, November 2011 Monica Marinucci EMEA Director for Research, Oracle.
Tony Hey Corporate Vice President Corporate Vice President Technical Computing Microsoft Corporation Microsoft Corporation Computer and Information Sciences.
E-Science and Global Grids in the Information Society: The Role of an EU e-IRG? Tony Hey Director of the UK e-Science Core Programme
EPrints Workshop, January eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.
HPC Technical Workshop Björn Tromsdorf Product & Solutions Manager, Microsoft EMEA London
This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative Commons Attribution 3.0 United States License Enabling Academic.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
1 Challenges Facing Modeling and Simulation in HPC Environments Panel remarks ECMS Multiconference HPCS 2008 Nicosia Cyprus June Geoffrey Fox Community.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Supercomputing Center Jysoo Lee KISTI Supercomputing Center National e-Science Project.
Tony Hey Corporate Vice President Corporate Vice President Technical Computing Microsoft Corporation Microsoft Corporation Computer and Information Sciences.
GridPP Tuesday, 23 September 2003 Tim Phillips. 2 Bristol e-Science Vision National scene Bristol e-Science Centre Issues & Challenges.
E-Science and Cyberinfrastructure Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
1 e-Infrastructures: the European Perspective on Scientific Data Carlos Morais Pires INFSO Directorate F Unit F3 “The views expressed in this presentation.
1 European policies for e- Infrastructures Belarus-Poland NREN cross-border link inauguration event Minsk, 9 November 2010 Jean-Luc Dorel European Commission.
Integrated e-Infrastructure for Scientific Facilities Kerstin Kleese van Dam STFC- e-Science Centre Daresbury Laboratory
OpenQuake Infomall ACES Meeting Maui May Geoffrey Fox
Fourth Paradigm Science-based on Data-intensive Computing.
Lewis Shepherd GOVERNMENT AND THE REVOLUTION IN SCIENTIFIC COMPUTING.
DOE 2000, March 8, 1999 The IT 2 Initiative and NSF Stephen Elbert program director NSF/CISE/ACIR/PACI.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
Perspectives on Cyberinfrastructure Daniel E. Atkins Professor, University of Michigan School of Information & Dept. of EECS October 2002.
NanoHUB.org and HUBzero™ Platform for Reproducible Computational Experiments Michael McLennan Director and Chief Architect, Hub Technology Group and George.
© Internet 2012 Internet2 and Global Collaboration APAN 33 Chiang Mai 14 February 2012 Stephen Wolff Internet2.
SEEK Welcome Malcolm Atkinson Director 12 th May 2004.
The Swiss Grid Initiative Context and Initiation Work by CSCS Peter Kunszt, CSCS.
NVO Review -- San Diego Jan The VO compared to Other O‘s Jim Gray Microsoft T HE US N ATIONAL V IRTUAL O BSERVATORY.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.
Interoperability from the e-Science Perspective Yannis Ioannidis Univ. Of Athens and ATHENA Research Center
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
August 3, March, The AC3 GRID An investment in the future of Atlantic Canadian R&D Infrastructure Dr. Virendra C. Bhavsar UNB, Fredericton.
UKOLN is supported by: Introduction to UKOLN Dr Liz Lyon, Director UKOLN, University of Bath, UK Grand Challenge Meeting, June a centre.
1 NSF/TeraGrid Science Advisory Board Meeting July 19-20, San Diego, CA Brief TeraGrid Overview and Expectations of Science Advisory Board John Towns TeraGrid.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
The Collaborative Semantic Grid David De Roure University of Southampton, UK
Entering the Data Era; Digital Curation of Data-intensive Science…… and the role Publishers can play The STM view on publishing datasets Bloomsbury Conference.
CombeDay Making Data Openly Available Simon Coles.
Statistics in WR: Lecture 1 Key Themes – Knowledge discovery in hydrology – Introduction to probability and statistics – Definition of random variables.
Toward a common data and command representation for quantum chemistry Malcolm Atkinson Director 5 th April 2004.
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
The Global Scene Wouter Los University of Amsterdam The Netherlands.
1 Kostas Glinos European Commission - DG INFSO Head of Unit, Géant and e-Infrastructures "The views expressed in this presentation are those of the author.
Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.
E-Science and Cyberinfrastructure Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
3 rd EGEE Conference Athens 18 th April, Stephen McGibbon Senior Director, EMEA Technology Office Chief Technology Officer, Central & Eastern Europe,
Virtual Laboratory Amsterdam L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit van Amsterdam.
Dan Fay Technical Computing Microsoft
Clouds , Grids and Clusters
Design and Manufacturing in a Distributed Computer Environment
Cyberinfrastructure and PolarGrid
Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing
Presentation transcript:

Tony Hey Corporate Vice President Corporate Vice President Technical Computing Microsoft Corporation Microsoft Corporation Computer and Information Sciences Life Sciences Multidisciplinary Research Earth Sciences e-Science and Cyberinfrastructure Social Sciences New Materials, Technologies and Processes

Licklider’s Vision “Lick had this concept – all of the stuff linked together throughout the world, that you can use a remote computer, get data from a remote computer, or use lots of computers in your job” “Lick had this concept – all of the stuff linked together throughout the world, that you can use a remote computer, get data from a remote computer, or use lots of computers in your job” Larry Roberts – Principal Architect of the ARPANET

Physics and the Web  Tim Berners-Lee developed the Web at CERN as a tool for exchanging information between the partners in physics collaborations  The first Web Site in the USA was a link to the SLAC library catalogue  It was the international particle physics community who first embraced the Web  ‘Killer’ application for the Internet  Transformed modern world – academia, business and leisure

Beyond the Web?  Scientists developing collaboration technologies that go far beyond the capabilities of the Web  To use remote computing resources  To integrate, federate and analyse information from many disparate, distributed, data resources  To access and control remote experimental equipment  Capability to access, move, manipulate and mine data is the central requirement of these new collaborative science applications  Data held in file or database repositories  Data generated by accelerator or telescopes  Data gathered from mobile sensor networks

What is e-Science? ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it’ ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it’ John Taylor Director General of Research Councils Director General of Research Councils UK, Office of Science and Technology UK, Office of Science and Technology

The e-Science Vision  e-Science is about multidisciplinary science and the technologies to support such distributed, collaborative scientific research  Many areas of science are in danger of being overwhelmed by a ‘data deluge’ from new high- throughput devices, sensor networks, satellite surveys …  Areas such as bioinformatics, genomics, drug design, engineering, healthcare … require collaboration between different domain experts  ‘e-Science’ is a shorthand for a set of technologies to support collaborative networked science

e-Science – Vision and Reality Vision  Oceanographic sensors - Project Neptune  Joint US-Canadian proposal Reality  Chemistry – The Comb-e-Chem Project  Annotation, Remote Facilities and e-Publishing

Undersea Sensor Network Connected & Controllable Over the Internet

Data Provenance

Visual Programming Persistent Distributed Storage

Distributed Computation Interoperability & Legacy Support via Web Services

Live Documents Searching & Visualization Reputation & Influence

Reproducible Research

Dynamic Documents Interactive Data

The Comb-e-Chem Project National X-Ray Service Data Mining and Analysis Automatic Annotation Combinatorial Chemistry Wet Lab HPC Simulation Video Data Stream Diffractometer Middleware Structures Database

National Crystallographic Service Send sample material to NCS service Search materials database and predict properties using Grid computations Download full data on materials of interest Collaborate in e-Lab experiment and obtain structure

A digital lab book replacement that chemists were able to use, and liked

Monitoring laboratory experiments using a broker delivered over GPRS on a PDA

Crystallographic e-Prints Direct Access to Raw Data from scientific papers Raw data sets can be very large - stored at UK National Datastore using SRB software

Support for e-Science  Cyberinfrastructure and e-Infrastructure  In the US, Europe and Asia there is a common vision for the ‘cyberinfrastructure’ required to support the e-Science revolution  Set of Middleware Services supported on top of high bandwidth academic research networks  Similar to vision of the Grid as a set of services that allows scientists – and industry – to routinely set up ‘Virtual Organizations’ for their research – or business  Many companies emphasize computing cycle aspect of Grids  The ‘Microsoft Grid’ vision is more about data management than about compute clusters

Six Key Elements for a Global Cyberinfrastructure for e-Science 1. High bandwidth Research Networks 2. Internationally agreed AAA Infrastructure 3. Development Centers for Open Standard Grid Middleware 4. Technologies and standards for Data Provenance, Curation and Preservation 5. Open access to Data and Publications via Interoperable Repositories 6. Discovery Services and Collaborative Tools

The Web Services ‘Magic Bullet’ Company A (J2EE) Open Source (OMII) Company C (.Net) Web Services

Computational Modeling Real-world Data Interpretation & Insight Persistent Distributed Data Workflow, Data Mining & Algorithms

Technical Computing in Microsoft  Radical Computing  Research in potential breakthrough technologies  Advanced Computing for Science and Engineering  Application of new algorithms, tools and technologies to scientific and engineering problems  High Performance Computing  Application of high performance clusters and database technologies to industrial applications

Radical Computing  The end of Moore’s Law as we know it  Number of transistors on a chip will continue to increase  No significant increase in Clock speed  Remember Amdahl’s Law  If application is 90% parallel, maximum speed-up that can be gained from parallelism is at most 10X  Future of silicon chips  “100’s of cores on a chip in 2015” (Justin Rattner, Intel)  “4 cores”/Tflop => 25 Tflops/chip

Radical Computing (continued)  IT industry has been driven by increasing chip volumes and new applications  Multi-core chips for servers  Multi-core chips for clients?  Challenge not only for Microsoft but for entire IT industry  New paradigms to exploit parallelism  What applications can exploit such on- chip parallelism?

CONTENT Scholarly Communication, Institutional Repositories DATA Acquisition, Storage, Annotation, Provenance, Curation, Preservation TOOLS Workflow, Collaboration, Visualization, Data Mining Advanced Computing for Science and Engineering...

New Science Paradigms  Thousand years ago: Experimental Science - description of natural phenomena - description of natural phenomena  Last few hundred years: Theoretical Science - Newton’s Laws, Maxwell’s Equations … - Newton’s Laws, Maxwell’s Equations …  Last few decades: Computational Science - simulation of complex phenomena - simulation of complex phenomena  Today: e-Science or Data-centric Science - unify theory, experiment, and simulation - unify theory, experiment, and simulation - using data exploration and data mining - using data exploration and data mining  Data captured by instruments  Data generated by simulations  Processed by software  Scientist analyzes databases/files (With thanks to Jim Gray)

The Problem for the e-Scientist  Data ingest  Managing a petabyte  Common schema  How to organize it?  How to reorganize it?  How to coexist & cooperate with others?  Data Query and Visualization tools  Support/training  Performance  Execute queries in a minute  Batch (big) query scheduling Experiments & Instruments Simulations facts answers questions ? Literature Other Archives facts

Top 500 Supercomputer Trends Industry usage rising Clusters over 50% x86 is winning GigE is gaining

Supercomputing Goes Personal System Cray Y-MP C916 Sun HPC10000 NewEgg.com Architecture 16 x Vector 4GB, Bus 24 x 333MHz Ultra- SPARCII, 24GB, SBus 4 x 2.2GHz x64 4GB, GigE OSUNICOS Solaris Windows Server 2003 SP1 GFlops~10~10~10 Top500 # 1500N/A Price$40,000,000 $1,000,000 (40x drop) < $4,000 (250x drop) Customers Government Labs Large Enterprises Every Engineer & Scientist Applications Classified, Climate, Physics Research Manufacturing, Energy, Finance, Telecom Bioinformatics, Materials Sciences, Digital Media

Continuing Trend Towards Decentralized, Networked Resources Grids of personal & departmental clusters Personal workstations & departmental servers Minicomputers Mainframes

Berlin Declaration 2003  ‘To promote the Internet as a functional instrument for a global scientific knowledge base and for human reflection’  Defines open access contributions as including:  ‘original scientific research results, raw data and metadata, source materials, digital representations of pictorial and graphical materials and scholarly multimedia material’

NSF ‘Atkins’ Report on Cyberinfrastructure  ‘the primary access to the latest findings in a growing number of fields is through the Web, then through classic preprints and conferences, and lastly through refereed archival papers’  ‘archives containing hundreds or thousands of terabytes of data will be affordable and necessary for archiving scientific and engineering information’

Microsoft Strategy for e-Science Microsoft intends to work with both the scientific and library communities: Microsoft intends to work with both the scientific and library communities:  to define open standard and/or interoperable high-level services, work flows and tools  to assist the community in developing open scholarly communication and interoperable repositories

Acknowledgements With special thanks to Geoffrey Fox, Jeremy Frey, Brad Gillespie, Jim Gray and Marvin Theimer With special thanks to Geoffrey Fox, Jeremy Frey, Brad Gillespie, Jim Gray and Marvin Theimer