Presentation is loading. Please wait.

Presentation is loading. Please wait.

Campus Cyberinfrastructure

Similar presentations


Presentation on theme: "Campus Cyberinfrastructure"— Presentation transcript:

1 Campus Cyberinfrastructure
ACTI Meetings 17 April 2013

2 Past Work From early 2011 through May 2012, we studied all six of the Task Force Reports to NSF’s Advisory Committee for Cyberinfrastructure. Grand Challenges HPC Data and Visualization Software Learning and Workforce Campus Bridging and wrote a white paper that, for each Report, summarized, reinforced, critiqued, and offered applications to ourselves.

3 Current Work We are now studying the relationship between “big data” and Campus Cyberinfrastructure. It is relatively easy to stand up thousands of compute nodes and run jobs. The barriers to supporting science, however, are more likely to come from the “big data” needs of a wide variety of disciplines. The point is not to brag about our “big data” expertise, but to note the implications for CCI of the coming big data reality.

4 Examples (Canonical) High-energy Physics community shares a large number of multi-TByte datasets across its international set of labs and universities. A typical pattern is (a) identify a dataset, (b) pull a replica over the net, (c) attack it with thousands of jobs, (d) discard, (e) repeat. The Center for Multiscale Modeling of Atmospheric Processes (CMMAP) is an NSF Science and Technology Center focused on improving the representation of cloud processes in climate models. Even simulating one year in detail generates 35 PBytes. 10% of this will be transferred to SDSC’s Gordon system for analysis. (Gordon has unusually strong SSD-based ‘scratch i/o’ capabilities.)

5 Examples, continued The Digital Humanities field includes efforts to organize all 18th-century British literature (as page images) and then use OCR techniques to reduce these to text. The total storage space is moderate, but their community is facing data management challenges new to them. Supporting health (including human medical) research, e.g., at Columbia University, requires storing/managing large complex data with stringent rules enforced by IRBs. Can campus CI support this research in a scalable way?

6 Challenges Cost of storage. Cost of (currently) 3 TByte disks going down, but there are SANs, interconnects, and very high demand. Data integrity, including some form of backup or multiple coordinated replicas. Metadata, provenance, and ontologies. Usually highly discipline-specific. Scalability threat relates more to software and human expertise. Security issues. Can we support controlled sharing?

7 Challenges, continued Wide-area network performance. Think of large datasets shared among an international community. Local network performance. Making these data immediate to researchers. Interconnect performance. Enabling computations to access the data at high speed to harness our compute node investment. Very-long-term preservation. Will the key data be around in 2050? Data analytics and visualization.

8 Other CCI Topics: IPv6 What are the barriers to widespread support for IPv6? The consequences of creeping NATs. Barriers to innovation in applications. Federal mandates. Barriers to international collaboration.

9 Broad CCI emphases Enabling campus CI as an enabler of research by faculty, students, and staff Whatever, of a conceptual, technical, or practical nature, inhibits this enabling is an enemy


Download ppt "Campus Cyberinfrastructure"

Similar presentations


Ads by Google