Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data-Intensive Computing at NSF Corporate Alliance June 18, 2008 Jeannette M. Wing Assistant Director Computer and Information Science and Engineering.

Similar presentations


Presentation on theme: "Data-Intensive Computing at NSF Corporate Alliance June 18, 2008 Jeannette M. Wing Assistant Director Computer and Information Science and Engineering."— Presentation transcript:

1 Data-Intensive Computing at NSF Corporate Alliance June 18, 2008 Jeannette M. Wing Assistant Director Computer and Information Science and Engineering Directorate Thanks to the NSF team: Dan Atkins, Debbie Crawford, Haym Hirsh, Jim French, Stephen Meacham, …

2 2Data-Intensive ComputingJeannette M. Wing Science Story

3 3Data-Intensive ComputingJeannette M. Wing How Much Data? NOAA has ~1 PB climate data (2007) Wayback machine has ~2 PB (2006) CERN’s LHC will generate 15 PB a year (2008) HP building WalMart a 4PB data warehouse (2007) Google processes 20 PB a day (2008) “all words ever spoken by human beings” ~ 5 EB Int’l Data Corp predicts 1.8 ZB of digital data by 2011 640K ought to be enough for anybody.

4 4Data-Intensive ComputingJeannette M. Wing Convergence in Trends Drowning in data Data-driven approach in computer science research –graphics, animation, language translation, search, …, computational biology Cheap storage –Seagate Barracuda 1TB hard drive for $195 Growth in huge data centers Open Source “MapReduce” programming model

5 5Data-Intensive ComputingJeannette M. Wing “Work” w1w1 w2w2 w3w3 r1r1 r2r2 r3r3 “Result” “worker” Partition Combine Master Divide and Conquer

6 6Data-Intensive ComputingJeannette M. Wing Data-Intensive Computing Sample Research Questions Science –What are the fundamental capabilities and limitations of this paradigm? –What new programming abstractions (including models, languages, algorithms) can accentuate these fundamental capabilities? –What are meaningful metrics of performance and QoS? Engineering –How can we automatically manage the hardware and software of these systems at scale? –How can we provide security and privacy for simultaneous mutually untrusted users, for both processing and data? –How can we reduce these systems’ power consumption? Users –What (new) applications can best exploit this computing paradigm?

7 7Data-Intensive ComputingJeannette M. Wing NSF’s Interest in Data-Intensive Computing Broad interest, (potentially) long-term CISE –Cross-directorate: CCF, CNS, IIS –Short-term: CluE To provide the broad academic community access to large-scale computing cluster and massive data sets –Longer-term: Look for cross-cutting theme in FY09 solicitation NSF –Potentially cross-foundational, e.g., via Cyber-enabled Discovery and Innovation (CDI); CISE, OCI, MPS, ENG, … –Why? Scientists are drowning in data!

8 8Data-Intensive ComputingJeannette M. Wing CluE: Cluster Exploratory Google+IBM cluster software and services –Same as Academic Computing Cluster provided for six universities (announced last October) Seed program by NSF –$5M will fund SGERs and regular awards –Solicitation released; July 17 proposal deadline. –Jim French (IIS Program Director) Hope: CluE will be a wild success and community interest and demand will be high

9 9Data-Intensive ComputingJeannette M. Wing Google+IBM Cluster Cluster –1600+ processors, terabytes of memory, hundreds of terabytes of storage, internal networking –External network connection Software –Linux –Hadoop (written by Yahoo!): Open Source version of Google’s MapReduce, Google File System –IBM Tivoli: management, monitoring and dynamic resource provisioning of the cluster Services –Operations and maintenance, including staff, loading data and programs, energy costs

10 10Data-Intensive ComputingJeannette M. Wing Legal Issues

11 11Data-Intensive ComputingJeannette M. Wing The Partnership: Roles Google and IBM –Provide data cluster, user support, scheduling, NSF –Review proposals, identify awardees, funding Universities –Propose and execute research plans on data cluster

12 12Data-Intensive ComputingJeannette M. Wing The MOU Codify the roles Establish restrictions to comply with export law Prescribe the need for “usage agreement” –Remove NSF from this industry/university process and raise awareness of university sensitivities

13 13Data-Intensive ComputingJeannette M. Wing The Usage Agreement Sets out terms and conditions for use of the hardware/software suite Three significant issues –Indemnification State universities prevented by constitution or law from signing Private universities will not sign as a matter of policy –Export control Barrier to university mission. May prohibit access by some students. –Intellectual Property Jury is out on this. Part of 1 on 1 negotiation.

14 14Data-Intensive ComputingJeannette M. Wing Indemnification Example University and Corporation each agree to defend, indemnify and hold harmless the other respective parties for and against any losses damages or claims for damages arising from the wrongful acts or omissions of their respective officers, employees, students or agents (including, without limitation, University Students and University Personnel) in connection with the exercise of their rights and the performance of their obligations under this Agreement, including but not limited to … Asymmetric: We agree not to sue each other but University pays cost of defending Corporation should it be sued based on something a University person did.

15 15Data-Intensive ComputingJeannette M. Wing Export Control Example Specifically, unless authorized by appropriate government license or regulations, you agree not to export, directly or indirectly, any technology, software or commodities provided by Corporation or their direct product (including software developed by you on the Corporate systems) to any of the following countries or to the nationals of any of the following countries, wherever they may be located: Cuba, Iran, Sudan, Syria, and North Korea. Explicit Country List discriminates against students from those countries who may be enrolled in University.

16 16Data-Intensive ComputingJeannette M. Wing Interesting Logistical Issues How do you allocate the resource? –A “rack week” is not a unit of measurement a typical researcher could relate to How do you get the data on/off the resource? –Sending terabytes of data over the net is slow.

17 17Data-Intensive ComputingJeannette M. Wing Academia-Industry-Government Partnership Win-win-win for all New model for NSF –CISE is breaking new ground at NSF (in many ways) NSF/CISE welcomes –Other corporations to participate in Data-Intensive Computing effort and other efforts in the future –This and other new models of A-I-G partnerships

18 18Data-Intensive ComputingJeannette M. Wing Thank you!

19 19Data-Intensive ComputingJeannette M. Wing Credits Copyrighted material used under Fair Use. If you are the copyright holder and believe your material has been used unfairly, or if you have any suggestions, feedback, or support, please contact: jsoleil@nsf.gov Except where otherwise indicated, permission is granted to copy, distribute, and/or modify all images in this document under the terms of the GNU Free Documentation license, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation license” (http://commons.wikimedia.org/wiki/Commons:GNU_Free_Documentation_License)http://commons.wikimedia.org/wiki/Commons:GNU_Free_Documentation_License


Download ppt "Data-Intensive Computing at NSF Corporate Alliance June 18, 2008 Jeannette M. Wing Assistant Director Computer and Information Science and Engineering."

Similar presentations


Ads by Google