2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley
2/14/01RightOrder : Telegraph & Java2 Telegraph Overview 100% Java In memory database Query engine for alternative sources Web Sensors Testbed for adaptive query processing
2/14/01RightOrder : Telegraph & Java3 Telegraph & WWW : FFF Federated Facts and Figures Collect Data on the Election Based on Avnur and Hellerstein Sigmod ‘00 Work: Eddies Route tuples dynamically based on source loads and selectivities
2/14/01RightOrder : Telegraph & Java4 fff.cs.berkeley.edu
2/14/01RightOrder : Telegraph & Java5 Architecture Overview Query Parser Jlex & CUP Preoptimizer Chooses Access Paths Eddy Routes Tuples To Modules
2/14/01RightOrder : Telegraph & Java6 Modules Doubly-Pipelined Hash Joins Index Joins For probing into web-pages Aggregates & Group Bys Scans Telegraph Screen Scraper: View web pages as Relations
2/14/01RightOrder : Telegraph & Java7 Execution Framework One Thread Per Query Iterator Model for Queries Experimented with Thread Per Module Linux threads are expensive Two Memory Management Models Java Objects Home Rolled Byte Arrays
2/14/01RightOrder : Telegraph & Java8 Tuples as Java Objects Tuple Data stored as a Java Object Each in separate byte array Tuples copied on joins, aggregates Issues Memory Management between Modules, Queries, Garbage collector control Allocation Overhead Performance: 30, byte tuples / sec -> 5.9 MB / sec
2/14/01RightOrder : Telegraph & Java9 Tuples As Byte Array All tuples stored in same byte array / query Surrogate Java Objects Offset, Size Surrogate Objects Byte Array Directory
2/14/01RightOrder : Telegraph & Java10 Byte Array (cont) Allows explicit control over memory / query (or module) Compaction eliminates garbage collection randomness Lower throughput: 15,000 t/sec No surrogate object reuse Synchronization costs
2/14/01RightOrder : Telegraph & Java11 Other System Pieces XML Based Catalog Java Introspection Helps Applet-based Front End JDBC Interface Fault Tolerance / Multiple Servers Via simple UNIX tools
2/14/01RightOrder : Telegraph & Java12 RightOrder Questions Performance vs. C JNI Issues Garbage Collection Issues Serialization Costs Lots of Java Objects JDBC vs ODI
2/14/01RightOrder : Telegraph & Java13 Performance Vs. C JVM + JIT Performance Encouraging: IBM JIT == 60% of Intel C compiler, faster than MSC for low level benchmarks IBM JIT 2x Faster than HotSpot for Telegraph Scans Stability Issues
2/14/01RightOrder : Telegraph & Java14 JIT Performance vs C IBM JIT Optimized Intel Optimized MS Source:
2/14/01RightOrder : Telegraph & Java15 Performance Gotchas Synchronization ~2x Function Call overhead in HotSpot Used in Libraries: Vector, StringBuffer String allocation single most intensive operation in Telegraph Mercatur: 20% initial CPU Cost Garbage Collection Java dumb about reuse Mercatur: 15% Cost OceanStore: 30ms avg latency, 1S peak
2/14/01RightOrder : Telegraph & Java16 More Gotchas Finalization Finalizing methods allows inlining Serialization RMI, JNI use serialization Philippsen & Haumacher Show Performance Slowness
2/14/01RightOrder : Telegraph & Java17 Performance Tools Tools to address some issues JAX, Jopt: make bytecode smaller, faster Bytecode optimizer Good profiler, memory allocation and garbage collection monitor
2/14/01RightOrder : Telegraph & Java18 JNI Issues Not a part of Telegraph JNI overhead quite large (JDK 1.1.8, PII 300 MHz) Source: Matt Welsh. A System Support High Performance Communication and IO In Java. Master’s Thesis, UC Berkeley, 1999.
2/14/01RightOrder : Telegraph & Java19 More JNI But, this is being worked on IBM JDK 100,000 B copy in 5ms, vs 23ms for (500 Mhz PIII) JNI allows synchronization (pin / unpin), thread management See GCJ + CNI: access Java objects via C++ classes
2/14/01RightOrder : Telegraph & Java20 Garbage Collection Performance Big problem: 1 S or longer to GC lots of objects Most Java GCs blocking (not concurrent or multi- threaded) Unexpected Latencies OceanStore: Network File Server, 30ms avg. latencies for network updates, 1000 ms peak due to GC In high-concurrency apps, such delays disastrous
2/14/01RightOrder : Telegraph & Java21 Garbage Collection Cont. Limited Control Runtime.gc() only a hint Runtime.freeMemory() unreliable No way to disable No object reuse Lots of unnecessary memory allocations
2/14/01RightOrder : Telegraph & Java22 Serialization Not in Telegraph Philippsen and Haumacher, “More Efficient Object Serialization.” International Workshop on Java for Parallel and Distributed Computing. San Juan, April, Serialization costs for RMI are 50% of total RMI time Discard longevity for 7x speed up Sun Serialization provides versioning Complete class description stored with each serialized object Most standard classes forward compatible (JDK docs note special cases) See
2/14/01RightOrder : Telegraph & Java23 Lots of Objects GC Issues Serious Memory Management GC makes programmers allocate willy-nilly Hard to partition memory space Telegraph byte-array ugliness due to inability to limit usage of concurrent modules, queries
2/14/01RightOrder : Telegraph & Java24 Storage Overheads Java Object class is big: Integer requires 23 bytes in JDK 1.3 int requires 4.3 bytes No way to circumvent object fields Use primitives or hand-written serialization whenever possible
2/14/01RightOrder : Telegraph & Java25 JDBC vs ODI No experience with Oracle JDBC overheads are high, but don’t have specific performance numbers
2/14/01RightOrder : Telegraph & Java26 Bottom Line Java great for many reasons GC, standard libraries, type safety, introspection, etc. Significant reductions in development and debugging time. Java performance isn’t bad Especially with some tuning Memory Management an Issue Lack of control over JVMs bad When to garbage collect, how to serialize, etc.