G G riPhyN Project Review Criteria l Relevance to Information Technology l Intellectual Merit l Broader Impacts l ITR Evaluation Criteria (innovation in approach, scientific excitement and promise, justification for large size, community extending activities
G G riPhyN Project Reviews: What They Liked l “Considerable intellectual merit in the CS and physics aspects which will lead to major impacts in the community” l Strong team l Strong focus upon students l Etc. etc. etc.
G G riPhyN Project Reviews: Concerns l Software planning needs some enhancement l Project planning needs enhancement l No focus upon underrepresented groups l Huge amount of money going towards this effort - what leveraging of funds? l How is the current knowledge of database technology being integrated? l Better integration between CS and physical sciences required
G G riPhyN Project Increasing Diversity: Thoughts l “Broadening opportunities and enabling the participation of all citizens — women and men, underrepresented minorities, and persons with disabilities — are essential to the health and vitality of science and engineering. NSF is committed to this principle of diversity and deems it central to the programs, projects, and activities it considers and supports.” u Increase diversity of senior personnel: add UT Brownsville u Outreach to faculty and students from minority institutions and other countries: tutorials, access to infrastructure for analysis, access to infrastructure for experimental computer science, education in how to participate in large collaborative experiments u Need to dedicate a person (postdoc?) to this purpose: coordination, teaching, etc., etc. u Florida State, UT Brownsville, Clark Atlanta, LaTech (Lousiana), Occidental (LA) u Visitor program: summer students, faculty? u Leverage LIGO outreach activities: extend to analysis
A Proposed GriPhyN Task Breakdown G G
G G riPhyN Project The Three Project Components l CS research: new and exciting CS research motivated and guided by challenges encountered by domain scientists in meeting their computational and data management needs l Application experiments: prototyping new information technology and interfacing it with scientific applications in real-life test-bed environments defined by CMS, ATLAS, LIGO, and SDSS requirements l Tool building and deployment: turn “winning” prototypes into production quality tools to be used by the scientific community (Virtual Data Grid Toolkit: VDGT)
G G riPhyN Project Roles CS Research Application Experiments Toolkit Development Ideas, expressed in prototypes Validated ideas “failure” CS publications Software tools Trained professionals Applications requirements Testbed development Scientific analyses enables informs define enables exploits refines PhysicsComputer Science
G G riPhyN Project Overall Goal: A Petascale Virtual Data Grid l User requests data V; system determines: u Whether stored, staged, cached, and/or how/where can be computed u Hence, cost of various “access” options u And hence, “execution” plan u (Note that can be recursive) l With behavior satisfying local & global resource management policies l Can deal with Petascale data & computing l + monitoring, fault recovery, security, etc.
G G riPhyN Project Key Problems (The Focus of CS Research) l Virtual data catalog(s): independent representations of data type, derivation, location; software for performing derivations l Storage resource management: high-performance, predictable, managed access to mucho data l Policy-driven resource allocation: representing & enforcing local and global policies l Request planning: cost estimation, co-scheduling, based on predictions and policy information l Request execution: Monitoring, recovery, …
G G riPhyN Project Other Candidate Challenges l Security l Data integrity
G G riPhyN Project Infrastructure Requirements (Many: Most We Will Have to Steal) l Data repositories l Caching services l Security mechanisms l Metadata catalogs l Information service l Data movers l CPU schedulers and managers: instantiate a virtual service l …
G G riPhyN Project Applications l LIGO: Big opportunistic computation, good as a background load l SDSS: Gravitational lensing is CPU limited, other things tend to be I/O limited
G G riPhyN Project Categorizing Applications … l Table listing: u Volume of data u Data rate u Computation rate u Derived data rate u Computation/data access rate
G G riPhyN Project Observations l Versioning of software used to compute data l Can use background cycles to compute things that we expect to need l SDSS already precomputes some stuff: e.g., images of galaxies … l Adaptive techniques for moving data around … l We need to define the architecture
G G riPhyN Project Proposed Resource Allocations (As Stated in the Proposal) + 2 management positions
G G riPhyN Project Resource Requirements l Research u 4 key problems => ~2.5 GS, 1 postdoc u => Hard to involve > 1 institution/problem u Should we stage? l Application experiments u 4 application areas => 1 GS, 0.5 postdoc u => Hard to involve > 1 institution/application area u => Need to stage application work? l VDTK u Packaging: 2 GS, 2 staff u Infrastructure: 1 GS, 2 staff u 2, at most 3 institutions?
G G riPhyN Project Tasks l Generate list of tasks, seek candidates l Technical content: Ian Foster, corrected by Harvey l Collect biographies: Bruce Allen l Available facilities: (Paul Avery) l Budget + justification: Larry Price l Letters of support: Richard Mount l Technical coordinator: Dave Cassel l Conflicts of interest: (Paul Avery) l Industrial support: Paul Avery l Matching funds: Paul Avery l Outreach/diversity: Joseph Romano l Budget submission process: Paul Avery l Write international collab text: Harvey Newman l Management plan: Larry Price
G G riPhyN Project Schedule l Today: March 9 l Budgets to UF: ?? Need to find out when ?? l Full proposal: April 7th l Meet in Chicago: April 10th l Transmitted: April 17th
G G riPhyN Project Letters of Support (Electronic) l LIGO, CMS, ATLAS, SDSS: Paul l NCSA (Ian), SDSC (Reagan) l NVO: Tom l EU Grid: Harvey l PPDG/DOE: Dan Hitchcock (Ian, copy Mount) l Industrial: IBM (Paul, Albert, Ian[UC]), Compaq (Paul), HP (Harvey), Sun (Albert) l Networks: EVL/STAR-TAP (Ian), Internet-2 (Ian) l NSF MPS (Paul)