Architectural Impact of SSL Processing Jingnan Yao.

Architectural Impact of SSL Processing Jingnan Yao

Reference  “ Architectural Impact of Secure Socket Layer on Internet Servers ”, Karishna Kant, Ravishankar Iyer and Prasant Mohapatra.  “ Anatomy and Performance of SSL Processing ”, Li Zhao, Ravi Iyer, Srihari Makineni and Laxmi Bhuyan.

Two Major Approach  IPSEC: Internet Protocal Security Protocol IP level Implemented in NICs (network interface cards)  SSL: Secure Socket Layer Transport level Secures an individual communication session Secure HTTP (called HTTPS) uses SSL for security and is being used widely in e- commerce environment.

Performance Impact  Server: number of simultaneous connections drop significantly  Client: unduly long client response time (10-25% ecommerce transactions are aborted)

Simultaneous Connections for SPECWeb99 and SPECweb99_SSL It can be seen that SPECWeb99 can achieve much higher throughput than SPECWeb99_SSL.

Overview of SSL  Privacy, Integrity & Authentication  Session Negotiation Phase: Authentication of the server and client at the beginning of the session  Bulk Data Transfer Phase: Encryption/decryption of data exchanged between the two parties during the session

Execution Time Breakdown in Web Server (1KB webpage)  SSL processing (libcrypto & libssl) takes 71.6% of the execution time.

Further Breakdown of Crypto Operations  Public key encryption  Private key encryption  Hashing  Other operations

Configurations  Number of processors in the SMP server: Uniprocessor Dual Processor Quad processor  Three different L2 cache sizes 512KB 1MB 2MB  Three different file sizes 30 byte  handshake performance 1 MB  bulk data encryption performance 36 kB  average web-page transfer

Overall Performance

Observation 1: “ SSL increases path length 10-15 fold over non- SSL case ” + “ CPI drops by more than a factor of 2 ”  “ The use of SSL increases computational cost of the transactions by a factor of 5-7. ” “ As the number of processors increase, the ratio goes down. ”  “ More processors mean more coherency traffic in both SSL and non-SSL cases. ”

Observation 2: “ Small CPI for SSL ”  A faster CPU core would not be very helpful in improving SSL performance so long as L1 is large enough to supply much of the code and data needed. “ Bulk data encryption/decryption algorithms highly sequential in nature ”  A wider issue width would not help, but a longer pipeline would.

L1 Cache Characteristics  Separate instruction and data L1 caches: 16KB  Single unified L2 Cache

Observation 1: “ L1 instruction miss ratios are very low in all cases. L1 data miss ratios are more significant. ” “ The instruction miss ratio generally decreases with number of processors, but the data miss ratio goes up. ”  “ More processors allow a better sharing of code, but the coherency misses in data cache increase. ”

Observation 2: “ 30 byte file sizes: the miss ratio for both instruction and data are much lower in the SSL case than non-SSL case. ” “ The data miss ratio retains the same behavior for all file sizes and processor configurations. ”  “ The frequent reuse of the data during the encryption and decryption process. ” “ The instruction locality relating to handshaking process is very high. ”

Observation 3: “ 1 MB files sizes: the instruction miss ratio becomes very poor with the SSL traffic for bulk data transfers. ”  “ Low instruction locality in the bulk data transfer case. ” “ Working set of instructions in the bulk transfer case does not fit within L1 cache. ”  “ Larger instruction L1 cache would help to improve bulk data encryption performance. ”

L2 Cache Characteristics

Observation 1: “ High L2 miss ratios, especially for large size webpages (1MB sizes) ”  “ High degree of locking/contention in TCP processing. ” “ Cache pollution because of TCP checksum. ”

Encryption Dominated & SSL Handshake Dominated  (1MB files)  (30 byte files)

Observation 1: “ 1MB case: SSL bulk data transfer shows very good L2 miss ratios. ”  “ The heavy computational workload of SSL helps in reducing the L2 cache miss ratio. ” “ SSL processing itself has certain features that would lead to high L2 cache miss ratios. ”  “ 30 byte case: SSL Handshake shows very high L2 miss ratios. ”

Branch and Prediction Behavior

Observation 1:  “ Branch frequency with SSL is about 30%-50% of that without SSL. ”  “ There are less control dependencies in the SSL- based transactions. ” “ Low branch frequency in SSL encourages high degree of pipelining in the processor architecture. ” “ Lower control dependency is another reason for high hit rate in L1 and low CPI in case of SSL. ”

Observation 2:  “ For 1P/2P configuration: the miss-prediction rate with SSL is lower. ”  “ For 4P configuration: the miss-prediction rate with SSL is always higher. ”  “ For 4P configuration: BTB is highly inefficient. ”  “ Better branch prediction algorithms can be investigated. ” “ Avoid overly complex branch predictor for SSL transactions since the branch frequency is very low. ”

Conclusion  SSL overhead increases computational cost of the transactions by a factor of 5-7 times  SSL transactions do not benefit much from a larger L2 cache but a larger L1 cache would be helpful.  A complex logic for handling control dependencies is not useful for SSL transaction as the frequency of branches is very low.

Architectural Impact of SSL Processing Jingnan Yao.

Similar presentations

Presentation on theme: "Architectural Impact of SSL Processing Jingnan Yao."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Architectural Impact of SSL Processing Jingnan Yao.

Similar presentations

Presentation on theme: "Architectural Impact of SSL Processing Jingnan Yao."— Presentation transcript:

Similar presentations

About project

Feedback