Differential (De)Serialization for Optimized SOAP Performance Michael J. Lewis Grid Computing Research Laboratory Department of Computer Science Binghamton.

Slides:



Advertisements
Similar presentations
Pointers.
Advertisements

Paul Chu FRIB Controls Group Leader (Acting) Service-Oriented Architecture for High-level Applications.
SOAP.
Web Services Nasrullah. Motivation about web service There are number of programms over the internet that need to communicate with other programms over.
SOAP Quang Vinh Pham Simon De Baets Université Libre de Bruxelles1.
Spring, Hibernate and Web Services 13 th September 2014.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Binghamton University CS-220 Spring 2015 Binghamton University CS-220 Spring 2015 Heap Management.
Latest techniques and Applications in Interprocess Communication and Coordination Xiaoou Zhang.
Understand Web Services
J2ME Web Services Specification.  With the promise to ease interoperability and allow for large scale software collaboration over the Internet by offering.
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA 30302
Multiprocessing Memory Management
Register Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure Oguz Ergin*, Deniz Balkan, Kanad Ghose, Dmitry Ponomarev Department.
OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”
SOAP Chandra Dutt Yarlagadda Introduction  Why ?  What ?  How ?  Security Issues in SOAP  Advantages  Uses  Conclusion.
BASE: Using Abstraction to Improve Fault Tolerance Rodrigo Rodrigues, Miguel Castro, and Barbara Liskov MIT Laboratory for Computer Science and Microsoft.
COS 381 Day 16. Agenda Assignment 4 posted Due April 1 There was no resubmits of Assignment Capstone Progress report Due March 24 Today we will discuss.
7-Aug-15 Serialization and XML Pat Palmer What is serialization? “The process of converting an object (or a graph of objects) into a linear sequence.
Web Services Michael Smith Alex Feldman. What is a Web Service? A Web service is a message-oriented software system designed to support inter-operable.
LabView Users Group Meeting June 20 th, 2006 Process Control Using Compact Field Point/Labview Real-time Michael Tockstein Microelectronics Technology.
1 Simple Object Access Protocol (SOAP) by Kazi Huque.
By Justin Thompson. What is SOAP? Originally stood for Simple Object Access Protocol Created by vendors from Microsoft, Lotus, IBM, and others Protocol.
Avro Apache Course: Distributed class Student ID: AM Name: Azzaya Galbazar
ISO Layer Model Lecture 9 October 16, The Need for Protocols Multiple hardware platforms need to have the ability to communicate. Writing communications.
Enabling Embedded Systems to access Internet Resources.
WS-Security: SOAP Message Security Web-enhanced Information Management (WHIM) Justin R. Wang Professor Kaiser.
IEEE CCGrid May 22, The gSOAP Toolkit Robert van Engelen Kyle Gallivan Florida State University.
By Matt Deakyne, Adam Krasny, and Derek Meek. History of ICE ICE stands for Internet Communications Engine Object-oriented middleware allowed programmers.
Comparison of Web Services, RMI, CORBA, DCOM Usha, Lecturer MCA Department of Computer Science and Engineering.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Distributed Shared Memory Steve Ko Computer Sciences and Engineering University at Buffalo.
Crystal-25 April The Rising Power of the Web Browser: Douglas du Boulay, Clinton Chee, Romain Quilici, Peter Turner, Mathew Wyatt. Part of a.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Grid Computing Research Lab SUNY Binghamton 1 Plans for Babelizing XCAT-C++ Madhu Govindaraju Kenneth Chiu.
XML Web Services Architecture Siddharth Ruchandani CS 6362 – SW Architecture & Design Summer /11/05.
Web Services. ASP.NET Web Services  Goals of ASP.NET Web services:  To enable cross-platform, cross- business computing  Great for “service” based.
1 MSCS 237 Overview of web technologies (A specific type of distributed systems)
An Overview and Evaluation of Web Services Security Performance Optimizations Robert van Engelen & Wei Zhang Department of Computer Science Florida State.
XML Engr. Faisal ur Rehman CE-105T Spring Definition XML-EXTENSIBLE MARKUP LANGUAGE: provides a format for describing data. Facilitates the Precise.
1 Engineering Web Based Legacy Systems By Kanchana Eramudugoda Distributed Computing – CS843.
XML Presented by Kushan Athukorala. 2 Agenda XML Overview Entity References Elements vs. Atributes XML Validation DTD XML Schema Linking XML and CSS XSLT.
S imple O bject A ccess P rotocol Karthikeyan Chandrasekaran & Nandakumar Padmanabhan.
Research Seminar Robert van Engelen Computer Science & CSIT Florida State University.
Chapter 13 Session Layer. OSI Application Presentation Session Transport Network Data Link Physical Functions of Session Layer Interhost Communication.
CSIT 220 (Blum)1 Remote Procedure Calls Based on Chapter 38 in Computer Networks and Internets, Comer.
S O A P ‘the protocol formerly known as Simple Object Access Protocol’ Team Pluto Bonnie, Brandon, George, Hojun.
Performance of Compressed Inverted Indexes. Reasons for Compression  Compression reduces the size of the index  Compression can increase the performance.
.NET and SOAP An Overview of SOAP By Raghavendra Aekka.
Introduction to Web Services. SOAP SOAP originally stood for "Simple Object Access Protocol". Web Services expose useful functionality to Web users through.
FCM Workflow using GCM.
Introduction to Web Services. Agenda Motivation History Web service model Web service components A walkthrough examples.
Computing & Information Sciences Kansas State University Friday, 20 Oct 2006CIS 560: Database System Concepts Lecture 24 of 42 Friday, 20 October 2006.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
Web Services Using Visual.NET By Kevin Tse. Agenda What are Web Services and Why are they Useful ? SOAP vs CORBA Goals of the Web Service Project Proposed.
On Implementing High Level Concurrency in Java G Stewart von Itzstein Mark Jasiunas University of South Australia.
Dr. Rebhi S. Baraka Advanced Topics in Information Technology (SICT 4310) Department of Computer Science Faculty of Information Technology.
BEA position on W3C ‘Web Services’ Standards Jags Ramnarayan 11th April 2001.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Matthew Farrellee Computer Sciences Department University of Wisconsin-Madison Condor and Web Services.
SOAP RMI Aleksander Slominski, Madhusudhan Govindaraju, Randall Bramley, Dennis Gannon Indiana University Extreme! Lab A New-Old Programming Model for.
Benchmarking XML Processors for Applications in Grid Web Services Michael R. Head*, Madhusudhan Govindaraju*, Robert van Engelen**, Wei Zhang** *Grid Computing.
Presentation Services
Evaluating Web Services Based Implementations of Grid RPC
The Client-Server Model
WEB SERVICES.
Reducing Disk I/O of Messaging Systems
Deepak Shenoy Agni Software
AGENT FRAMEWORK By- Arpan Biswas Rahul Gupta.
Presentation transcript:

Differential (De)Serialization for Optimized SOAP Performance Michael J. Lewis Grid Computing Research Laboratory Department of Computer Science Binghamton University State University of New York (with Nayef Abu-Ghazaleh, Madhu Govindaraju)

Motivation SOAP is an XML-based protocol for Web Services that (usually) runs over HTTP Advantages – extensible, language and platform independent, simple, robust, expressive, and interoperable The adoption of Web Services standards for Grid computing requires high performance

The SOAP Bottleneck Serialization and deserialization – The in memory representation for data must be converted to ASCII and embedded within XML – Serialization and deserialization conversion routines can account for 90% of end-to-end time for a SOAP RPC call [HPDC 2002, Chiu et. al.] Our approach – Avoid serialization and deserialization altogether, whenever possible bSOAP: Binghamtons SOAP implementation

Overview of the Optimizations Differential Serialization (DS) (sender side) – Save a copy of the last outgoing message – If the next calls message would be similar, then use the previous message as a template only serialize the differences from the last message Differential Deserialization (DDS) (receiver side) – Checkpoint incoming message portions – If the next incoming message is similar, then use the deserialized values from the last message only deserialize the differences from the last message

DS and DDS DS and DDS are separate, different, disjoint optimization techniques – sender side (DS) vs. receiver side (DDS) – data update tracking (DS) vs. parser checkpointing (DDS) – neither depends on the other – each takes advantage of sequences of similar messages to avoid expensive SOAP message processing – neither changes the protocol, what goes in the SOAP message, or on the wire – each remains interoperable with other SOAP implementations

DS: Update Tracking How do we know if the data in the next message will be the same as in the previous one? If it is different, how do we know which parts must be reserialized? How can we ensure that reserialization of message parts does not corrupt other portions of the message?

Data Update Tracking (DUT) Table Field TPointerSLengthFWidth Dirty? X5 5 YES Y3 7 YES Z5 10 NO POST /mioExample HTTP/ struct MIO { int a; int b; double val;}; int mioArray(MIO[] mios)

Problems and Approaches Problems – Some fields require reserialization – The current field width may be too small for the next value – The current message (or chunk) size may be too small Solving these problems enables DS, but incurs overhead Approaches shifting chunking stuffing stealing chunk overlaying

Shifting Shifting: Expand the message on-the-fly when the serialized form of a new value exceeds its field width – Shift the bytes of the template message to make room – Update DUT table entries for all shifted data – Performance penalty DUT table updating, memory moves, possible memory reallocation … 1.2 <y xsi:type=…. becomes … <y xsi:type=….

Stuffing Stuffing: Allocate more space than necessary for a data element – explicitly when the template is first created, or after serializing a value that requires less space – Helps avoid shifting altogether – Doesnt work for strings, base64 encoding … 678 <z xsi:type=… can be represented as … 678 <z xsi:type=…

Stealing Stealing: Take space from nearby stuffed fields – Can be less costly than shifting [ISWS 04] Performance depends on several factors – Halting Criteria: When to stop stealing? – Direction: Left, right, or back-and-forth? …'> y can steal from z to yield… …'>

Performance Performance depends on – which techniques are invoked – how different the next message is Message Content Matches – identical messages, no dirty bits Perfect Structural Matches – data elements and their sizes persist Partial Structural Matches – some data elements change size – requires shifting, stealing, stuffing, etc. We study the performance of all our techniques on synthetic workloads of scientific data summary: 17% 10X improvement

Perfect Structural Matches Perfect Structural Matches : – Some data items must be overwritten (DUT table dirty bits) – No shifting required Performance study: – vary the message size – vary the reserialization percentage – vary the data type Doubles and Message Interface Objects (MIOs, ) (not shown)

– Send Time depends directly on % serialized – Important to avoid reserializing

DDS: The Approach As an incoming message is being processed – store parser states periodically – compute corresponding message portion checksums For subsequent (hopefully similar) messages – compare incoming message checksums with stored checksums (fast mode parsing) – if checksums match the parser can skip to the next parser state without actually generating it from the incoming message – on checksum mismatch revert to regular mode parsing

Effectiveness Depends on… Similarity in consecutive messages – determines how often in fast mode How much faster fast mode is – deserialization vs. checksum calc and compare Efficiency in identifying mode switches Checkpoint and checksum overhead

Creating Checkpoints First one right after start tag that contains the name of the back end element Thereafter, checkpoints are created periodically – based on number of bytes processed – configurable parameter of bSOAP – Tradeoff: overhead vs. fast mode processing time – standard implementation: full parser checkpoints – optimization [Grid 2005]: differential checkpoints

Fast mode parsing Parser reads messages and computes checksums on message portion boundaries – a match allows the parser to skip to the next saved state Switching back to fast mode – must compare current and stored parser states – matching stack contains necessary structural info – namespace aliases must also be the same stored and checked separately

Performance Summary Without DDS: comparable to gSOAP DDS Overhead – message portions 256 4K; overhead < 10% – message portion size 32 bytes too small DDS improvement – large hard to deserialize messages, very similar 25X speedup (upper bound) – Dual mode performance depends on message portion size, where and how often mode switches take place can reduce by a factor of 3, or be slightly slower

Benchmark Suite for SOAP- Based Grid Web Services Motivation – Web services based applications have diverse requirements – SOAP and XML present design and implementation challenges – Several novel efforts exist to address key bottlenecks examples can be found in gSOAP, bSOAP,.NET A benchmark suite – can help determine the best available toolkit – based on communication patterns and data structures in use Benchmarks and performance evaluation framework – Drivers, WSDL files and Java code – helps provide insights on opportunities for optimization Madhu Govindaraju: mgovinda at cs.binghamton.edu grid.cs.binghamton.edu / projects / soap_bench

Thank You For More information – Grid Computing Research Laboratory – SUNY – Binghamton: Computer Science Department – grid.cs.binghamton.edu – mlewis at binghamton.edu – DS: [HPDC 04], [IC 04] – DDS: [SC 05], [Grid2005]

Extra Slides

Experimental Setup Machines – Dual Pentium 4 Xeon 2.0 GHz, 1 GB DDR RAM, 15K RPM 18 GB Ultra-160 SCSI drive. Network – Gigabit Ethernet. OS – Debian Linux. Kernel version SOAP implementations – bSOAP and gSOAP v2.4 compiled with gcc version , flags: -O2 – XSOAP RC1 compiled with JDK – bSOAP/gSOAP socket options: SO_KEEPALIVE, TCP_NODELAY,SO_SNDBUF = SO_RCVBUF = – Dummy SOAP Server (no deserialization).

Message Content Matches Message Content Match: – The entire stored message template can be reused without change – No dirty bits in the DUT table – Best case performance improvement Performance Study – compare gSOAP, XSOAP, and bSOAP, with differential serialization on and off – vary the message size – vary the data type: doubles and MIOs (not shown)

– bSOAP ~= gSOAP – 10X imprvmt in DS (expected result) Upper bound

Shifting Partial Structural Match: – Not all of array elements are reserialized Performance Study – Intermediate size values to maximum size values. – Array of doubles (18 24) – Array of MIOs (36 46) (not shown)

100% 75%: Imprvt 23% 75% 50%: Imprvt 31% 50% 25%: Imprvt 46%

Stuffing Closing Tag Shift : – Stuffed whitespace comes after the closing tag – Must move the tag to accommodate smaller values Performance Study – send smallest values (1 char) – vary field size: smallest, intermediate, maximum – Array of doubles (max = 24, intermediate = 18, min = 1) – Array of MIOs (max = 46, intermediate = 38, min = 3) (not shown)

Closing tag shift, not increased message size, effects stuffing performance

Summary SOAP performance is poor, due to serialization and deserialization Differential serialization – Save a copy of outgoing messages, and serialize changes only, to avoid the observed SOAP bottleneck Techniques: – Shifting, chunking, chunk padding, stuffing, stealing, chunk overlaying Performance is promising (17% to 10X improvement), depends on similarity of messages

Other Approaches SOAP performance improvements – Compression – Base-64 encoding – External encoding: Attachments (SwA), DIME These approaches may be necessary and can be effective. However – they undermine SOAPs beneficial characteristics – interoperability suffers The goal – improve performance, retain SOAPs benefits

Applications that can Benefit Differential Serialization is only beneficial for applications that repeatedly resend similar messages Such applications do exist: – Linear system analyzers – Resource information dissemination systems – Google & Amazon query responses – etc.

Data Update Tracking (DUT) Table Each saved message has its own DUT table Each data element in the message has its own DUT table entry, which contains: – Location: A pointer to the data items current location in the template message – Type: A pointer to a data structure that contains information about the data item's type. – Serialized Length: The number of characters needed to store the last written value – Field Width: The number of allocated characters in the template – A Dirty Bit indicates whether the data item has been changed since the template value was written

Updating the DUT Table DUT table dirty bits must be updated whenever in-memory data changes – Current implementation explicit programmer calls whenever data changes – Eventual intended implementation more automatic variables are registered with our bSOAP library data will have accessor functions through which changes must be made when data is written, the DUT table dirty bits can be updated accordingly – disallows back door pointer-based updates – requires calling the client stub with the same input param variables

Worst Case Shifting Worst case shifting: – All values are reserialized from smallest size values to largest size values. Performance Study – vary the chunk size (8K and 32K) – Array of doubles (1 24). – Array of MIOs (3 46) (not shown)

Worst case shifting is 4X slower Reducing chunk size doesnt help