Presentation is loading. Please wait.

Presentation is loading. Please wait.

“Big Data” and Data-Intensive Science (eScience) Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington July.

Similar presentations


Presentation on theme: "“Big Data” and Data-Intensive Science (eScience) Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington July."— Presentation transcript:

1 “Big Data” and Data-Intensive Science (eScience) Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington July 2013

2 Exponential improvements in technology and algorithms are enabling the “big data” revolution z A proliferation of sensors y Think about the sensors on your phone z More generally, the creation of almost all information in digital form y It doesn’t need to be transcribed in order to be processed z Dramatic cost reductions in storage y You can afford to keep all the data z Dramatic increases in network bandwidth y You can move the data to where it’s needed

3 z Dramatic cost reductions and scalability improvements in computation y With Amazon Web Services, or Google App Engine, or Microsoft Azure, 1000 computers for 1 day cost the same as 1 computer for 1000 days! z Dramatic algorithmic breakthroughs y Machine learning, data mining – fundamental advances in computer science and statistics

4 Some examples of “big data” in action z Collaborative filtering

5 z Fraud detection

6 z Price prediction

7 z Hospital re-admission prediction

8 z Travel time prediction under specific circumstances

9 z Sports

10 z Home energy monitoring

11 Larry Smarr, UCSD Gordon Bell, Microsoft Research John Guttag & Collin Stultz, MIT Google self-driving car

12 z Speech recognition

13 z Machine translation y Speech -> text y Text -> text translation y Text -> speech in speaker’s voice http://www.youtube.com/watch?v=Nu-nlQqFCKg&t=7m30s 7:30 – 8:40

14 z Scientific discovery Ocean Observatories Initiative Gene Sequencing Large Hadron Collider Large Synoptic Survey Telescope

15 z Presidential campaigning

16 z Electoral forecasting

17 z Real data-driven decision-making (vs. MBA baloney) for every sector!

18 eScience: Sensor-driven (data-driven) science and engineering Transforming science (again!) Jim Gray

19 Theory Experiment Observation

20

21 [John Delaney, University of Washington]

22 Theory Experiment Observation Computational Science

23 Theory Experiment Observation Computational Science eScience

24 eScience is driven by data more than by cycles z Massive volumes of data from sensors and networks of sensors Apache Point telescope, SDSS 80TB of raw image data (80,000,000,000,000 bytes) over a 7 year period

25 Large Synoptic Survey Telescope (LSST) 40TB/day (an SDSS every two days), 100+PB in its 10-year lifetime 400mbps sustained data rate between Chile and NCSA

26 Large Hadron Collider 700MB of data per second, 60TB/day, 20PB/year

27 Illumina HiSeq 2000 Sequencer ~1TB/day Major labs have 25-100 of these machines

28 Regional Scale Nodes of the NSF Ocean Observatories Initiative 1000 km of fiber optic cable on the seafloor, connecting thousands of chemical, physical, and biological sensors

29 The Web 20+ billion web pages x 20KB = 400+TB One computer can read 30-35 MB/sec from disk => 4 months just to read the web

30 eScience is about the analysis of data z The automated or semi-automated extraction of knowledge from massive volumes of data y There ’ s simply too much of it to look at z It ’ s not just a matter of volume y Volume y Rate y Complexity / dimensionality

31 eScience utilizes a spectrum of computer science techniques and technologies zSensors and sensor networks zBackbone networks zDatabases zData mining zMachine learning zData visualization zCluster computing at enormous scale

32 eScience will be pervasive z Simulation-oriented computational science has been transformational, but it has been a niche y As an institution (e.g., a university), you didn ’ t need to excel in order to be competitive z eScience capabilities must be broadly available in any institution y If not, the institution will simply cease to be competitive


Download ppt "“Big Data” and Data-Intensive Science (eScience) Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington July."

Similar presentations


Ads by Google