Download presentation
Presentation is loading. Please wait.
Published byAshley Gilbert Modified over 6 years ago
1
My Research Experiences on Computer Performance Optimization
Shih-Hao Hung, Ph.D. Sun Microsystems Inc. Confidential Information – Do Not Forward
2
Confidential - Do Not Forward
Computer Performance Demands for high performance: Getting jobs done faster Getting more jobs done at the same time Getting complex jobs done in time Price-performance trade-offs: Getting jobs done efficiently Getting jobs done with limited resources Capacity planning 2018/11/20 Confidential - Do Not Forward
3
Performance Optimization
2018/11/20 Confidential - Do Not Forward
4
My Research Background
High-Performance/Technical Computing Parallel Performance Project University of Michigan, Parallelization for high-performance applications Performance characterization tools Performance optimization methodologies Commercial Applications Optimization Performance and Availability Engineering Sun Microsystems, Inc., 2000-present Database server optimization Network stack optimization Webserver optimization Security Infrastructure optimization 2018/11/20 Confidential - Do Not Forward
5
Parallel Performance Project
Started by Prof. Edward S. Davidson of U of Michigan in 1990, funded by NSF, Ford Motor Co., UM Center of Parallel Computing, IBM, DoD, etc. Produced 11 Ph.D.’s in 10 years. Work on state-of-the-art parallel supercomputers and realistic applications Covers many aspects of computer architecture, from CPU pipelines to clustered systems. Optimization by all means: instruction scheduling, memory locality, parallelization, etc. via compiler techniques and hand-tuning. 2018/11/20 Confidential - Do Not Forward
6
Confidential - Do Not Forward
Parallel Computing Very hot in the 90’s: People rushed to build large MPP’s. Looks good in theory, but lack of practical tools and experiences. Most existing apps are difficult to parallelize. Failed to race with Moore’s Law. R&D cycle too expensive and too long to catch up with increase of CPU Mhz-Ghz. Looking ahead: Throughput computing and commercial workload drive MP. Chip density and area favors SMT & CMP designs. Struggling to find ways to keep the same growth of Ghz. Multiple-core processors, multiple-processors systems are becoming the norm in the coming years. 2018/11/20 Confidential - Do Not Forward
7
Optimizing Parallel Applications
Very complex, difficult problems: Program parallelization Load balance Scheduling Minimize interprocessor communications Architecture-dependent optimization Today: Still lots of open problems. Parallelizing compilers are far from automatic solutions. Tomorrow: Further research and practical solutions will be in high demand as MP systems becomes popular at all levels. 2018/11/20 Confidential - Do Not Forward
8
Hierarchical Performance Bounds
2018/11/20 Confidential - Do Not Forward
9
Confidential - Do Not Forward
Example: FCRASH Vehicle crash simulation at Ford. Finite-element code contains over 10,000 Fortran lines and 14 parallel loops. Profiled on a NUMA system (HP/Convex SPP-1000). P-gap: imperfect parallelization C’-gap: inter-cluster communications L & M’-gaps: Load balancing issues 2018/11/20 Confidential - Do Not Forward
10
Goal-directed Optimization
2018/11/20 Confidential - Do Not Forward
11
Confidential - Do Not Forward
Performance Tuning 2018/11/20 Confidential - Do Not Forward
12
Modeling a Parallel Application
2018/11/20 Confidential - Do Not Forward
13
Model-Driven Simulation
2018/11/20 Confidential - Do Not Forward
14
Performance Tuning Results
SP - initial parallel version SD - changing domain decomposition to reduce load imbalance (L-gap) and communications (C’-gap) SD2 - SD + array padding to reduce false-sharing communications (Unmodeled-gap) SD3 - SD2 + eliminating thread migration to reduce communications (Unmodeled-gap) SD4 - SD3 + eliminating unnecessary synchronization barriers (S’-gap) 2018/11/20 Confidential - Do Not Forward
15
Confidential - Do Not Forward
Sun Microsystems Proud of visions and innovative technologies. Face fierce competitions in the server business OS: Microsoft, Linux CPU: Intel, IBM High-end market: IBM, HP Low-end market: Dell and other x86 vendors Still going for the next big thing Network computing (Java, JDS, JES, GridEngine) Throughput computing (Niagara 1 & 2, Rock) Solaris 10 & x86 support 2018/11/20 Confidential - Do Not Forward
16
Performance Engineering
Performance problems everywhere… Deal with important commercial applications: Database Network infrastructure & applications Throughput computing Security Infrastructure Solve problems by: Identifying issues Improving products Influencing future development 2018/11/20 Confidential - Do Not Forward
17
Networking Infrastructure
Gigabit Ethernet driver optimization TCP/IP stack optimization Multi-data transmission and Jumbo Frames TCP Offloading Engine (TOE) Infiniband vs 10GE On-chip high-speed Ethernet support 2018/11/20 Confidential - Do Not Forward
18
Networking Applications
Optimizing SunOne servers Webserver Directory server Application server Portal server Tweaking benchmarks SPECweb99 & 2004 SPECweb99_SSL TPC-W (W = Web commerce) 2018/11/20 Confidential - Do Not Forward
19
Security Infrastructure
Crypto accelerators On-chip crypto support Secure Socket Layer (SSL) & HTTPS acceleration IPsec & VPN acceleration Crypto optimization Solaris Cryptographic Framework 2018/11/20 Confidential - Do Not Forward
20
Confidential - Do Not Forward
Crypto Acceleration 2018/11/20 Confidential - Do Not Forward
21
Confidential - Do Not Forward
HTTP/SSL Performance HTTP, 100% Keep Alive HTTP, 0% Keep Alive HTTPS, 100% Keep Alive, no encryption, SHA1 hashing HTTPS, 100% Keep Alive, RC4 encryption, SHA1hashing HTTPS, 0% Keep Alive, 100% session creation (RSA), RC4, SHA1 HTTPS, 0% Keep Alive, 100% session resumption (RSA-reuse), RC4, SHA1 http tcp sha1 rc4 rsa_reuse rsa 2018/11/20 Confidential - Do Not Forward
22
Confidential - Do Not Forward
2018/11/20 Confidential - Do Not Forward
23
Confidential - Do Not Forward
IPsec Performance 2018/11/20 Confidential - Do Not Forward
24
Solaris Cryptographic Framework
2018/11/20 Confidential - Do Not Forward
25
Throughput Computing - Niagara
2018/11/20 Confidential - Do Not Forward
26
Niagara-2 4-Core Server Competition – Nov. 2007
2018/11/20 Confidential - Do Not Forward
27
Confidential - Do Not Forward
Rock 2018/11/20 Confidential - Do Not Forward
28
Confidential - Do Not Forward
Conclusion Will see radical changes in computer systems in the near future, and system-wide hardware-software co-optimization is key to unleash their potentials. High density chips Multi-core CPUs Highly scalable systems Enormous network & I/O capacity Built-in security support Performance is an expertise that is best acquired from experiences. Methodology and collaboration are our formulas for success. 2018/11/20 Confidential - Do Not Forward
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.