Presentation is loading. Please wait.

Presentation is loading. Please wait.

My Research Experiences on Computer Performance Optimization

Similar presentations


Presentation on theme: "My Research Experiences on Computer Performance Optimization"— Presentation transcript:

1 My Research Experiences on Computer Performance Optimization
Shih-Hao Hung, Ph.D. Sun Microsystems Inc. Confidential Information – Do Not Forward

2 Confidential - Do Not Forward
Computer Performance Demands for high performance: Getting jobs done faster Getting more jobs done at the same time Getting complex jobs done in time Price-performance trade-offs: Getting jobs done efficiently Getting jobs done with limited resources Capacity planning 2018/11/20 Confidential - Do Not Forward

3 Performance Optimization
2018/11/20 Confidential - Do Not Forward

4 My Research Background
High-Performance/Technical Computing Parallel Performance Project University of Michigan, Parallelization for high-performance applications Performance characterization tools Performance optimization methodologies Commercial Applications Optimization Performance and Availability Engineering Sun Microsystems, Inc., 2000-present Database server optimization Network stack optimization Webserver optimization Security Infrastructure optimization 2018/11/20 Confidential - Do Not Forward

5 Parallel Performance Project
Started by Prof. Edward S. Davidson of U of Michigan in 1990, funded by NSF, Ford Motor Co., UM Center of Parallel Computing, IBM, DoD, etc. Produced 11 Ph.D.’s in 10 years. Work on state-of-the-art parallel supercomputers and realistic applications Covers many aspects of computer architecture, from CPU pipelines to clustered systems. Optimization by all means: instruction scheduling, memory locality, parallelization, etc. via compiler techniques and hand-tuning. 2018/11/20 Confidential - Do Not Forward

6 Confidential - Do Not Forward
Parallel Computing Very hot in the 90’s: People rushed to build large MPP’s. Looks good in theory, but lack of practical tools and experiences. Most existing apps are difficult to parallelize. Failed to race with Moore’s Law. R&D cycle too expensive and too long to catch up with increase of CPU Mhz-Ghz. Looking ahead: Throughput computing and commercial workload drive MP. Chip density and area favors SMT & CMP designs. Struggling to find ways to keep the same growth of Ghz. Multiple-core processors, multiple-processors systems are becoming the norm in the coming years. 2018/11/20 Confidential - Do Not Forward

7 Optimizing Parallel Applications
Very complex, difficult problems: Program parallelization Load balance Scheduling Minimize interprocessor communications Architecture-dependent optimization Today: Still lots of open problems. Parallelizing compilers are far from automatic solutions. Tomorrow: Further research and practical solutions will be in high demand as MP systems becomes popular at all levels. 2018/11/20 Confidential - Do Not Forward

8 Hierarchical Performance Bounds
2018/11/20 Confidential - Do Not Forward

9 Confidential - Do Not Forward
Example: FCRASH Vehicle crash simulation at Ford. Finite-element code contains over 10,000 Fortran lines and 14 parallel loops. Profiled on a NUMA system (HP/Convex SPP-1000). P-gap: imperfect parallelization C’-gap: inter-cluster communications L & M’-gaps: Load balancing issues 2018/11/20 Confidential - Do Not Forward

10 Goal-directed Optimization
2018/11/20 Confidential - Do Not Forward

11 Confidential - Do Not Forward
Performance Tuning 2018/11/20 Confidential - Do Not Forward

12 Modeling a Parallel Application
2018/11/20 Confidential - Do Not Forward

13 Model-Driven Simulation
2018/11/20 Confidential - Do Not Forward

14 Performance Tuning Results
SP - initial parallel version SD - changing domain decomposition to reduce load imbalance (L-gap) and communications (C’-gap) SD2 - SD + array padding to reduce false-sharing communications (Unmodeled-gap) SD3 - SD2 + eliminating thread migration to reduce communications (Unmodeled-gap) SD4 - SD3 + eliminating unnecessary synchronization barriers (S’-gap) 2018/11/20 Confidential - Do Not Forward

15 Confidential - Do Not Forward
Sun Microsystems Proud of visions and innovative technologies. Face fierce competitions in the server business OS: Microsoft, Linux CPU: Intel, IBM High-end market: IBM, HP Low-end market: Dell and other x86 vendors Still going for the next big thing Network computing (Java, JDS, JES, GridEngine) Throughput computing (Niagara 1 & 2, Rock) Solaris 10 & x86 support 2018/11/20 Confidential - Do Not Forward

16 Performance Engineering
Performance problems everywhere… Deal with important commercial applications: Database Network infrastructure & applications Throughput computing Security Infrastructure Solve problems by: Identifying issues Improving products Influencing future development 2018/11/20 Confidential - Do Not Forward

17 Networking Infrastructure
Gigabit Ethernet driver optimization TCP/IP stack optimization Multi-data transmission and Jumbo Frames TCP Offloading Engine (TOE) Infiniband vs 10GE On-chip high-speed Ethernet support 2018/11/20 Confidential - Do Not Forward

18 Networking Applications
Optimizing SunOne servers Webserver Directory server Application server Portal server Tweaking benchmarks SPECweb99 & 2004 SPECweb99_SSL TPC-W (W = Web commerce) 2018/11/20 Confidential - Do Not Forward

19 Security Infrastructure
Crypto accelerators On-chip crypto support Secure Socket Layer (SSL) & HTTPS acceleration IPsec & VPN acceleration Crypto optimization Solaris Cryptographic Framework 2018/11/20 Confidential - Do Not Forward

20 Confidential - Do Not Forward
Crypto Acceleration 2018/11/20 Confidential - Do Not Forward

21 Confidential - Do Not Forward
HTTP/SSL Performance HTTP, 100% Keep Alive HTTP, 0% Keep Alive HTTPS, 100% Keep Alive, no encryption, SHA1 hashing HTTPS, 100% Keep Alive, RC4 encryption, SHA1hashing HTTPS, 0% Keep Alive, 100% session creation (RSA), RC4, SHA1 HTTPS, 0% Keep Alive, 100% session resumption (RSA-reuse), RC4, SHA1 http tcp sha1 rc4 rsa_reuse rsa 2018/11/20 Confidential - Do Not Forward

22 Confidential - Do Not Forward
2018/11/20 Confidential - Do Not Forward

23 Confidential - Do Not Forward
IPsec Performance 2018/11/20 Confidential - Do Not Forward

24 Solaris Cryptographic Framework
2018/11/20 Confidential - Do Not Forward

25 Throughput Computing - Niagara
2018/11/20 Confidential - Do Not Forward

26 Niagara-2 4-Core Server Competition – Nov. 2007
2018/11/20 Confidential - Do Not Forward

27 Confidential - Do Not Forward
Rock 2018/11/20 Confidential - Do Not Forward

28 Confidential - Do Not Forward
Conclusion Will see radical changes in computer systems in the near future, and system-wide hardware-software co-optimization is key to unleash their potentials. High density chips Multi-core CPUs Highly scalable systems Enormous network & I/O capacity Built-in security support Performance is an expertise that is best acquired from experiences. Methodology and collaboration are our formulas for success. 2018/11/20 Confidential - Do Not Forward


Download ppt "My Research Experiences on Computer Performance Optimization"

Similar presentations


Ads by Google