Cyberinfrastructure for Distributed Rapid Response to National Emergencies Henry Neeman, Director Horst Severini, Associate Director OU Supercomputing.

Slides:



Advertisements
Similar presentations
DOSAR Workshop VI April 17, 2008 Louisiana Tech Site Report Michael Bryant Louisiana Tech University.
Advertisements

♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Beowulf Supercomputer System Lee, Jung won CS843.
High Performance Computing and CyberGIS Keith T. Weber, GISP GIS Director, ISU.
Southwest Tier 2 Center Status Report U.S. ATLAS Tier 2 Workshop - Harvard Mark Sosebee for the SWT2 Center August 17, 2006.
My Road to Supercomputing Holly Krutka The University of Oklahoma.
Supercomputing in Plain English Overview: What the Heck is Supercomputing? Henry Neeman Director OU Supercomputing Center for Education & Research September.
Information Technology Center Introduction to High Performance Computing at KFUPM.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
1. 2 Welcome to HP-CAST-NTIG at NSC 1–2 April 2008.
Henry Neeman, OSCER Director OU Supercomputing Center for Education & Research Wednesday October University of Oklahoma OSCER: State.
NPACI Panel on Clusters David E. Culler Computer Science Division University of California, Berkeley
Parallel & Cluster Computing Supercomputing Overview Paul Gray, University of Northern Iowa David Joiner, Shodor Education Foundation Tom Murphy, Contra.
An Introduction to Princeton’s New Computing Resources: IBM Blue Gene, SGI Altix, and Dell Beowulf Cluster PICASso Mini-Course October 18, 2006 Curt Hillegas.
Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.
Chiba City: A Testbed for Scalablity and Development FAST-OS Workshop July 10, 2002 Rémy Evard Mathematics.
University of Oklahoma Network Infrastructure and National Lambda Rail.
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Recent experience in buying and configuring a cluster John Matrow, Director High Performance Computing Center Wichita State University.
Supercomputing in Plain English Overview: What the Heck is Supercomputing? Henry Neeman Director OU Supercomputing Center for Education & Research ChE.
ICER User Meeting 3/26/10. Agenda What’s new in iCER (Wolfgang) Whats new in HPCC (Bill) Results of the recent cluster bid Discussion of buy-in (costs,
Parallel & Cluster Computing Clusters & Distributed Parallelism Paul Gray, University of Northern Iowa David Joiner, Shodor Education Foundation Tom Murphy,
OU Supercomputing Center for Education & Research Henry Neeman, Director OU Supercomputing Center for Education & Research OU Information Technology University.
Masters of Disaster Weather events? Geological events?
Parallel & Cluster Computing An Overview of High Performance Computing Henry Neeman, Director OU Supercomputing Center for Education & Research University.
GPU Cluster for Scientific Computing Zhe Fan, Feng Qiu, Arie Kaufman, Suzanne Yoakum-Stover Center for Visual Computing and Department of Computer Science,
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Maximizing The Compute Power With Mellanox InfiniBand Connectivity Gilad Shainer Wolfram Technology Conference 2006.
Planning and Designing Server Virtualisation.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006.
U Oklahoma and the Open Science Grid Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University of Oklahoma Condor Week 2008, University of.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
Parallel & Cluster Computing Monte Carlo Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma SC08 Education.
HPC system for Meteorological research at HUS Meeting the challenges Nguyen Trung Kien Hanoi University of Science Melbourne, December 11 th, 2012 High.
HPCVL High Performance Computing Virtual Laboratory Founded 1998 as a joint HPC lab between –Carleton U. (Comp. Sci.) –Queen’s U. (Engineering) –U. of.
INUNDATION TESTBED PI MEETING: LSU FVCOM Progress Chunyan Li (with ACKNOWLEDGEMENT to UMASS Team and Dr. Zheng) Louisiana State University 1.
Computer Systems Lab The University of Wisconsin - Madison Department of Computer Sciences Linux Clusters David Thompson
26SEP03 2 nd SAR Workshop Oklahoma University Dick Greenwood Louisiana Tech University LaTech IAC Site Report.
The II SAS Testbed Site Jan Astalos - Institute of Informatics Slovak Academy of Sciences.
D0 Taking Stock 11/ /2005 Calibration Database Servers.
National Science Foundation CI-TEAM Proposal: Blast on Condor How Will This Help [InstAbbrev]? Your Name Here Your Job Title Here Your Department Here.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
On High Performance Computing and Grid Activities at Vilnius Gediminas Technical University (VGTU) dr. Vadimas Starikovičius VGTU, Parallel Computing Laboratory.
“Contact Centers with Nine Lives” Topics 1. Business Continuity Definitions 2. Mission Criticality –Essential constants for mission criticality 3. Business.
Building an Exotic HPC Ecosystem at The University of Tulsa John Hale Peter Hawrylak Andrew Kongs Tandy School of Computer Science.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
National Centers for Environmental Prediction: “Where America’s Climate, Weather and Ocean Services Begin” An Overview.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
CIP HPC CIP - HPC HPC = High Performance Computer It’s not a regular computer, it’s bigger, faster, more powerful, and more.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
AT LOUISIANA STATE UNIVERSITY CCT: Center for Computation & LSU Condor in Louisiana Tevfik Kosar Center for Computation & Technology Louisiana.
UTA Site Report Jae Yu UTA Site Report 7 th DOSAR Workshop Louisiana State University Apr. 2 – 3, 2009 Jae Yu Univ. of Texas, Arlington.
Hi! My name is Niel, please choose a version to start! Teacher Student.
North Dakota EPSCoR State Cyberinfrastructure Strategic Planning Workshop Henry Neeman, Director OU Supercomputing Center for Education & Research (OSCER)
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
Condor on Dedicated Clusters Peter Couvares and Derek Wright Computer Sciences Department University of Wisconsin-Madison
High Energy Physics at the OU Supercomputing Center for Education & Research Henry Neeman, Director OU Supercomputing Center for Education & Research University.
Brief introduction about “Grid at LNS”
Southwest Tier 2 Center Status Report
Super Computing By RIsaj t r S3 ece, roll 50.
Using Remote HPC Resources to Teach Local Courses
Cluster Computers.
Presentation transcript:

Cyberinfrastructure for Distributed Rapid Response to National Emergencies Henry Neeman, Director Horst Severini, Associate Director OU Supercomputing Center for Education & Research University of Oklahoma Condor Week 2006, University of Wisconsin

Cyberinfrastructure for Distributed Rapid Response to National Emergencies Neeman & Severini, OSCER/University of Oklahoma, Condor Week Disasters May Oklahoma Congressional Anthrax OKC Wildfires Jan grass-fires_x.htm?POE=WEAISVA Indian Ocean Tsunami 2004

Cyberinfrastructure for Distributed Rapid Response to National Emergencies Neeman & Severini, OSCER/University of Oklahoma, Condor Week The Problem and the Solution The Problem: Problems will happen. The problem is that we don’t know the problem. The solution is to be able to respond to unknown problems with unknown solutions. Unknown problems that have unknown solutions may require lots of resources. But, we don’t want to buy resources just for the unknown solutions to the unknown problems – which might not even happen. The Solution: Be able to use existing resources for emergencies.

Cyberinfrastructure for Distributed Rapid Response to National Emergencies Neeman & Severini, OSCER/University of Oklahoma, Condor Week Who Knew?

Cyberinfrastructure for Distributed Rapid Response to National Emergencies Neeman & Severini, OSCER/University of Oklahoma, Condor Week National Emergencies Natural Severe storms (e.g., hurricanes, tornadoes, floods) Wildfires Tsunamis Earthquakes Plagues (e.g., bird flu) Intentional Dirty bombs Bioweapons (e.g., anthrax in the mail) Poisoning the water supply (See Bruce Willis/Harrison Ford movies for more ideas.)

Cyberinfrastructure for Distributed Rapid Response to National Emergencies Neeman & Severini, OSCER/University of Oklahoma, Condor Week How to Handle a Disaster? Prediction Forecast phenomenon's behavior, path, etc. Amelioration Genetic analysis of biological agent (find cure) Forecasting of contaminant spread (evacuate whom?)

Cyberinfrastructure for Distributed Rapid Response to National Emergencies Neeman & Severini, OSCER/University of Oklahoma, Condor Week OSCER's Project NSF Small Grant for Exploratory Research (SGER) Configure machines for rapid switch to Condor Maintain resources in state of readiness Train operational personnel: maintain, react, analyze Fire drills Generate, conduct and analyze scenarios of possible incidents

Cyberinfrastructure for Distributed Rapid Response to National Emergencies Neeman & Severini, OSCER/University of Oklahoma, Condor Week 2006 OU: Available for Emergencies 512 node Xeon64 cluster (6.5 TFLOPs peak) 135 node Xeon32 cluster (1.08 TFLOPs peak) 32 node Itanium2 cluster (256 GFLOPs peak) Desktop Condor pool – growing to 750 Pentium4 PCs (4.5 TFLOPs peak) TOTAL: 12.4 TFLOPs

Cyberinfrastructure for Distributed Rapid Response to National Emergencies Neeman & Severini, OSCER/University of Oklahoma, Condor Week Dell Xeon64 Cluster 1,024 Pentium4 Xeon64 CPUs 2,180 GB RAM 14 TB disk (SAN+IBRIX) Infiniband & Gigabit Ethernet Red Hat Linux Enterprise Peak speed: 6.5 TFLOPs Usual scheduler: LSF Emergency Scheduler: Condor topdawg.oscer.ou.edu DEBUTED AT #54 WORLDWIDE, #9 AMONG US UNIVS, #4 EXCLUDING BIG 3 NSF CENTERS

Cyberinfrastructure for Distributed Rapid Response to National Emergencies Neeman & Severini, OSCER/University of Oklahoma, Condor Week Xeon32 CPUs 270 GB RAM ~10 TB disk Myrinet2000 Red Hat Linux Enterprise Peak speed: 1.08 TFLOPs Scheduler: Condor Will be owned by High Energy Physics group DEBUTED at #197 on the Top500 list in Nov 2002 Aspen Systems Xeon32 Cluster boomer.oscer.ou.edu

Cyberinfrastructure for Distributed Rapid Response to National Emergencies Neeman & Severini, OSCER/University of Oklahoma, Condor Week Itanium2 1.0 GHz CPUs 128 GB RAM 5.7 TB disk Infiniband & Gigabit Ethernet Red Hat Linux Enterprise 3 Peak speed: 256 GFLOPs Usual scheduler: LSF Emergency scheduler: Condor Aspen Systems Itanium2 Cluster schooner.oscer.ou.edu

Cyberinfrastructure for Distributed Rapid Response to National Emergencies Neeman & Severini, OSCER/University of Oklahoma, Condor Week Dell Desktop Condor Pool OU IT is deploying a large Condor pool (750 desktop PCs) over the course of the 2006: 3 GHz Pentium4 (32 bit), 1 GB RAM, 100 Mbps network connection. When deployed, it’ll provide 4.5 TFLOPs (peak) of additional computing power – more than is currently available at most supercomputing centers. Currently, the pool is 136 PCs in a few of the student labs.

Cyberinfrastructure for Distributed Rapid Response to National Emergencies Neeman & Severini, OSCER/University of Oklahoma, Condor Week Oklahoma has just gotten onto NLR; the pieces are all in place but we’re still configuring. National Lambda OU

Cyberinfrastructure for Distributed Rapid Response to National Emergencies Neeman & Severini, OSCER/University of Oklahoma, Condor Week 2006 MPI Capability Many kinds of national emergencies – weather forecasting, floods, contaminant distribution, etc. – use fluid flow and related methods, which are tightly coupled and therefore require MPI. Condor provides the MPI universe. Most of the available resources – 7.9 TFLOPs out of 12.8 – are clusters, ranging from ¼ TFLOP to 6.5 TFLOPs. So, providing MPI capability is straightforward.

Cyberinfrastructure for Distributed Rapid Response to National Emergencies Neeman & Severini, OSCER/University of Oklahoma, Condor Week Fire Drills Switchover from production to emergency Condor: 1.Shut down all user jobs on the production scheduler. 2.Shut down the production scheduler (if not Condor; e.g., LSF). 3.Start Condor (if necessary). Condor jobs for national emergency discover these resources and start themselves. We've done this several times at OU. Only during scheduled downtimes! Switchover times range from 9 minutes down to 2.5 min. Pretty much we have this down to a science.

Thanks for your attention! Questions?