Infiniband in EDA (Chip Design) Glenn Newell Sr. Staff IT Architect Synopsys
2 Agenda Synopsys + Synopsys Computing EDA design flow vs. data size and communication High Performance Linux Clusters at Synopsys Storage is dominant vs. Inter-process communication Performance Increases with Infiniband Tell the world Next Steps
3 Synopsys “A world leader in semiconductor design software” Company Founded:1986 Revenue for FY 2006: $1.096 billion Employees for FY 2006: ~5,100 Headquarters: Mountain View, California Locations: More than 60 sales, support and R&D offices worldwide in North America, Europe, Japan, the Pacific Rim and Israel
4 Synopsys IT (2007) Over 60 Offices World Wide Major Data Centers 5 at HQ Hillsboro OR Austin TX Durham NC Nepean Canada Munich Germany Hyderabad India Yerevan Armenia Shanghai China Tokyo Japan Taipei Taiwan 2 Petabytes of NFS Storage ~15000 compute servers Linux 4000 Solaris 700 HPUX 300 AIX GRID Farms 65 farms composed of 7000 machines 75% SGE 25% LSF Interconnect GigE storage Fast E clients #242 on Nov. ’06 Top400.org TFlops on 1200 Processors
5 Design Flow vs. Data Size and Communication (2007 ) RTL “relatively” small data sets Physical Layout Data up to 300GB Inter Process Communication “small” compared to file i/o Post Optical Proximity Correction 300GB - >1TB OPC adds complex polygons Mask machines need flat data (no hierarchy or pattern replication ) Physical world is “messy” (FFT + FDTD)
6 High Performance Linux Clusters Progression NameInterconn ect StorageNodes HPLC1Non- Blocking GigE Dedicated NFS 112 cores 52 nodes HPLC2 (mobile) MyrinetGPFS (IP) 64 cores 29 nodes HPLC34X SDR IB Lustre (native IB) 76 cores 19 nodes HPLC2 HPLC3
7 HPLC3 vs. HPLC1 Why IB + Lustre?
8 Why? – HPLC1 NFS + GigE Large CPU Count Fractures overtax NFS server CPU Maximum Read Bandwidth is 90 MB/sec Explode Fracture
9 Why? – HPLC3 Lustre + IB 64 CPU Fracture Lustre splits traffic across resources 250 MB/sec max read bandwidth
10 Storage + Interconnect Option Comparisons + Myrinet
11 High Performance Linux Clusters Production Model NameInterconnectStorageNodes HPLC54X Infiniband 192 port DDR capable Switch DataDirectNetwo rks + Lustre 8 OSS 2 MDS 16TB 200 cores 50 nodes 17x the storage performance of HPLC1 6x the storage performance of HPLC3 IB gateway to Enterprise (6xGigE) means no dual homed hosts “Fantastic” Performance More Linear Scaling for Distributed Processing Applications
12 State of the IB Cluster Our Customer facing Engineers typically see 10x improvement for post layout tools over Fast Ethernet 3x improvement for post OPC tools over GigE NFS, or direct attached storage So, we are evangelizing IB + parallel file systems (Lustre) with our customers User Manuals Papers, Posters, and Presentations at conferences Presentations and Site visits with customers Push storage vendors to support IB to the client (vs. 10GigE + TOE) But…
13 Estimated EDA Relative CPU Cycles Required 2007 2009 ~10x 65nm 45 nm 2007 2012 ~100x 65nm 22 nm 45 nm
14 Next Steps “Inside the box” changing Multicore Hardware Acceleration (GPU/co-processor merges with CPU) Micro Architecture Applications changing to deal with above and increased data set sizes Things for IT to explore Other “new” parallel file systems (e.g. Gluster) 12X DDR IB 10GB uplinks IB Top500 entry? ;-)