The University of Washington eScience Institute This afternoon: y Phyllis Wise, Provost y Ed Lazowska, Computer Science & Engineering y Dan Fay, Microsoft Research y Martin Savage, Physics y David Baker, Biochemistry y Andy Connolly, Astronomy
eScience: Computational Science for the 21st Century Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering Interim Director, eScience Institute November
Theory Experiment Observation
Computational Science
Protein interactions in striated muscles Tom Daniel lab
QCD to study interactions of nuclei David Kaplan lab
Gas Stars Dark Matter Study of dark matter Tom Quinn lab
Theory Experiment Observation Computational Science eScience
eScience is driven by data z Massive volumes of data from sensors and networks of sensors Apache Point telescope, SDSS 15TB of data (15,000,000,000,000 bytes)
Large Synoptic Survey Telescope (LSST) 30TB/day, 60PB in its 10-year lifetime
Large Hadron Collider 700MB of data per second, 60TB/day, 20PB/year
Illumina Genome Analyzer ~1TB/day
Regional Scale Nodes of the NSF Ocean Observatories Initiative 2000 km of fiber optic cable on the seafloor, connecting thousands of chemical, physical, and biological sensors
The Web 20+ billion web pages x 20KB = 400+TB One computer can read MB/sec from disk => 4 months just to read the web
Point-of-sale terminals
eScience is about the analysis of data z The automated or semi-automated extraction of knowledge from massive volumes of data y There’s simply too much of it to look at
The technologies of eScience z Sensors and sensor networks z Databases z Data mining z Machine learning z Data visualization z Cluster computing at enormous scale
eScience will be pervasive z Computational science has been transformational, but to some extent it has been a niche y As an institution (e.g., a university), you didn’t need to employ it broadly in order to be competitive z eScience capabilities must be broadly available and broadly practiced y If not, the institution will simply cease to be competitive
The University of Washington eScience Institute z Mission y Help position the University of Washington at the forefront of research both in modern eScience techniques and technologies, and in the fields that depend upon these techniques and technologies z Strategy y Increase the sharing of expertise and facilities y Bootstrap a cadre of Research Scientists y Add faculty in key fields y Make the entire University more effective z Launched July 1 with $1 million in permanent funding from the Washington State Legislature y Sought, and need, $2 million
Steering Committee z Appointed by Provost Phyllis Wise y Tom Ackerman, Atmospheric Sciences y Ginger Armbrust, Oceanography y Tom Daniel, Biology y David Goodlett, Medicinal Chemistry y Terry Gray, UW Technology y Ron Johnson, CTO y David Kaplan, Physics y Richard Karpen, Arts & Sciences y Ed Lazowska, CSE and eScience Institute Interim Director y Mary Lidstrom (chair), Vice Provost for Research y Matt O’Donnell, Engineering y Tom Quinn, Astronomy y Chance Reschke, eScience Institute Technical Coordinator y Mani Soma, EE and Office of the VP for Research y Werner Stuetzle, Arts & Sciences y Peter Tarczy-Hornoch, Biomedical & Health Informatics
z Direction-setting interviews with UW research leaders regarding technology needs y 124 interviews thus far x Top researchers of all ages in all fields y Technology needs, in priority order 1.Data management facilities Storage, backup, security 2.Shared expertise Data management specifically, technology in general 3.Computing power and high-bandwidth network access 4.Data collection and analysis 5.Communication and collaboration technologies 6.Shared laboratories and pricing Activities
z Initial staffing y Research Scientist recruited for cluster computing x Chance Reschke y Research Scientist being recruited for data management y Consulting model developed x Jeff Gardner as “TeraGrid Champion” x Data management consultancy under development y Overall coordination coming on-board x Erik Lundberg y First faculty search underway x Werner Stuetzle chairing search committee
z Laying the groundwork for broadly shared facilities y Data center space coordination and planning x UW Tower scheduled to come online in late 2009 x ~600KW for research computing y EPIC x Intelligent use of the research allocation in UW Tower x Coordinated, cost-effective compute and storage solutions for the UW eScience community
z Active exploration of alternative approaches to facilities y Amazon Web Services y Google/IBM cloud y Microsoft Dryad and Azure z Participation in proposal preparation y Moore Foundation Sequencing Center y NSF Data Net - The GRADD Collaboration y NSF Track 2d (with PNNL, PSC, CMU)
z Community building y Web site for general information x y SIG for eScience technical staff x y Monthly technical “brown bag lunch” y Regular discussions with research groups across campus regarding their eScience needs
We can help you (some currently, better shortly) with … z Facilities z Proposals z Data management issues See posters