Data-Intensive Science (eScience) Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington August 2011.

Slides:



Advertisements
Similar presentations
Trying to Use Databases for Science Jim Gray Microsoft Research
Advertisements

John A. Orcutt Deputy Director, SIO Ocean Observations Initiative NSF MREFC Chair, NSF/CORE DEOS Comm.
Space …. are big. Really big. You just won't believe how vastly, hugely, mindbogglingly big they are. Massive data streams Douglas Adams – Hitchhiker’s.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
OVERVIEW OF NETWORKING RESEARCH IN NETLAB 1 Dr. Jim Martin Associate Professor School of Computing Clemson University
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO IEEE Symposium of Massive Storage Systems, May 3-5, 2010 Data-Intensive Solutions.
Cloud Computing Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington August 2010.
Universal Memex (A Research Project for Discussion)
Human computation, and the wisdom of crowds Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington August 2010.
GPS Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington August 2010.
Kindle, newspapers, books, academia Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington August 2010.
Introduction to Data Science Kamal Al Nasr, Matthew Hayes and Jean-Claude Pedjeu Computer Science and Mathematical Sciences College of Engineering Tennessee.
Big Data A big step towards innovation, competition and productivity.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Mining Large Data at SDSC Natasha Balac, Ph.D.. A Deluge of Data Astronomy Life Sciences Modeling and Simulation Data Management and Mining Geosciences.
CSCI-2950u :: Data-Intensive Scalable Computing Rodrigo Fonseca (rfonseca)
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
“Here comes the Grid” Mark Hayes Technical Director - Cambridge eScience Centre NIEeS Summer School 2003.
Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories.
National Center for Supercomputing Applications Observational Astronomy NCSA projects radio astronomy: CARMA & SKA optical astronomy: DES & LSST access:
Alex Szalay, Jim Gray Analyzing Large Data Sets in Astronomy.
1 The Terabyte Analysis Machine Jim Annis, Gabriele Garzoglio, Jun 2001 Introduction The Cluster Environment The Distance Machine Framework Scales The.
The University of Washington eScience Institute This afternoon: y Phyllis Wise, Provost y Ed Lazowska, Computer Science & Engineering y Dan Fay, Microsoft.
Report on CSU HPC (High-Performance Computing) Study Ricky Yu–Kwong Kwok Co-Chair, Research Advisory Committee ISTeC August 18,
Public Access to Large Astronomical Datasets Alex Szalay, Johns Hopkins Jim Gray, Microsoft Research.
NASA Biodiversity and Ecological Forecasting Team Meeting 29 – 31 August 2005 Washington, DC Fig.2 We are: NASA GRT # NNG04GM64G “Pacific climate variability.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Taxonomy of Similarity Mechanisms for Case-Based Reasoning.
Dr. M.-C. Sawley IPP-ETH Zurich Nachhaltige Begegnungen Standing at the crossing point between data analysis and simulation Knowledge Discovery Panel.
Fourth Paradigm Science-based on Data-intensive Computing.
Lewis Shepherd GOVERNMENT AND THE REVOLUTION IN SCIENTIFIC COMPUTING.
LSST: Preparing for the Data Avalanche through Partitioning, Parallelization, and Provenance Kirk Borne (Perot Systems Corporation / NASA GSFC and George.
Perspectives on Cyberinfrastructure Daniel E. Atkins Professor, University of Michigan School of Information & Dept. of EECS October 2002.
National Ecological Observatory Network
Research Networks and Astronomy Richard Schilizzi Joint Institute for VLBI in Europe
EScience May 2007 From Photons to Petabytes: Astronomy in the Era of Large Scale Surveys and Virtual Observatories R. Chris Smith NOAO/CTIO, LSST.
“Big Data” and Data-Intensive Science (eScience) Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington July.
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
Components of a Computer System
Large Area Surveys - I Large area surveys can answer fundamental questions about the distribution of gas in galaxy clusters, how gas cycles in and out.
Cloud Computing Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington.
DDM Kirk. LSST-VAO discussion: Distributed Data Mining (DDM) Kirk Borne George Mason University March 24, 2011.
GET CONNECTED Information Technology Career Cluster.
Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University.
EScience: Techniques and Technologies for 21st Century Discovery Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering Computer Science.
Welcome and Introduction to the Course MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way.
Human Computation and The Wisdom of Crowds Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington July 2015.
* Numeral Systems: A writing method for expressing numbers is called a “Numeral System". In the most common numeral system, we write numbers with combinations.
1 Why is Digital Curation Important for Workforce and Economic Development? Alan Blatecky Office of Cyberinfrastructure Symposium on Digital Curation in.
Expedition Workshop Towards Scalable Data Management June 10, 2008 Chris Greer Director, NCO.
Big Data Why it matters Patrice KOEHL Department of Computer Science Genome Center UC Davis.
Petascale Computing Resource Allocations PRAC – NSF Ed Walker, NSF CISE/ACI March 3,
Big data What they are and why we should care. Previous hot topics 60’s Catastrophe theory 70’s Fractals 80’s Chaos theory 90’s Data mining 00’s Machine.
High Performance Cyberinfrastructure Discovery Tools for Data Intensive Research Larry Smarr Prof. Computer Science and Engineering Director, Calit2 (UC.
Cloud Computing Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington August 2012.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Department of Computer Science Sir Syed University of Engineering & Technology, Karachi-Pakistan. Presentation Title: DATA MINING Submitted By.
EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Giuseppe Andronico INFN Sez. CT & Consorzio COMETA Workshop Clouds.
Babbage’s Difference Engine #2 Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington August 2011.
Grid Activities in the Philippines Rey Vincent P. Babilonia Advanced Science and Technology Institute Department of Science and Technology PHILIPPINES.
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Cloud Computing Ed Lazowska August 2011 Bill & Melinda Gates Chair in
What is Pattern Recognition?
Nanotechnology/Nanoscience Education Connections
TeraScale Supernova Initiative
Human computation, and the wisdom of crowds
Kindle, Newspapers, Books
CS246: Search-Engine Scale
Presentation transcript:

Data-Intensive Science (eScience) Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington August 2011

eScience: Sensor-driven (data-driven) science and engineering Transforming science (again!) Jim Gray

Theory Experiment Observation

[John Delaney, University of Washington]

Theory Experiment Observation Computational Science

Theory Experiment Observation Computational Science eScience

eScience is driven by data more than by cycles z Massive volumes of data from sensors and networks of sensors Apache Point telescope, SDSS 80TB of raw image data (80,000,000,000,000 bytes) over a 7 year period

Large Synoptic Survey Telescope (LSST) 40TB/day (an SDSS every two days), 100+PB in its 10-year lifetime 400mbps sustained data rate between Chile and NCSA

Large Hadron Collider 700MB of data per second, 60TB/day, 20PB/year

Illumina HiSeq 2000 Sequencer ~1TB/day Major labs have of these machines

Regional Scale Nodes of the NSF Ocean Observatories Initiative 1000 km of fiber optic cable on the seafloor, connecting thousands of chemical, physical, and biological sensors

The Web 20+ billion web pages x 20KB = 400+TB One computer can read MB/sec from disk => 4 months just to read the web

eScience is about the analysis of data z The automated or semi-automated extraction of knowledge from massive volumes of data y There’s simply too much of it to look at z It’s not just a matter of volume y Volume y Rate y Complexity / dimensionality

eScience utilizes a spectrum of computer science techniques and technologies z Sensors and sensor networks z Backbone networks z Databases z Data mining z Machine learning z Data visualization z Cluster computing at enormous scale

eScience will be pervasive z Simulation-oriented computational science has been transformational, but it has been a niche y As an institution (e.g., a university), you didn’t need to excel in order to be competitive z eScience capabilities must be broadly available in any institution y If not, the institution will simply cease to be competitive