Titanium/Java Performance Analysis Ryan Huebsch Group: Boon Thau Loo, Matt Harren Joe Hellerstein, Ion Stoica, Scott Shenker P I E R Peer-to-Peer.

Slides:



Advertisements
Similar presentations
Declarative Networking Mothy Joint work with Boon Thau Loo, Tyson Condie, Joseph M. Hellerstein, Petros Maniatis, Ion Stoica Intel Research and U.C. Berkeley.
Advertisements

Symbol Table.
Implementing declarative overlays Boom Thau Loo Tyson Condie Joseph M. Hellerstein Petros Maniatis Timothy Roscoe Ion Stoica.
Distributed Systems CS
Berkeley dsn declarative sensor networks problem David Chu, Lucian Popa, Arsalan Tavakoli, Joe Hellerstein approach related dsn architecture status  B.
High Productivity Computing Systems for Command and Control 13 th ICCRTS: C2 for Complex Endeavors Bellevue, WA June 17 – 19, 2008 Scott Spetka – SUNYIT.
Mendel Rosenblum and John K. Ousterhout Presented by Travis Bale 1.
Reference: Message Passing Fundamentals.
CS 307 Fundamentals of Computer Science 1 Abstract Data Types many slides taken from Mike Scott, UT Austin.
Operating Systems CS451 Brian Bershad
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
P2p, Fall 05 1 Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval) VLDB 2003 Ryan Huebsch, Joe Hellerstein, Nick Lanham,
Portability Issues. The MPI standard was defined in May of This standardization effort was a response to the many incompatible versions of parallel.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Performance Evaluation of Load Sharing Policies on a Beowulf Cluster James Nichols Marc Lemaire Advisor: Mark Claypool.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
P2p, Fall 06 1 Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval) VLDB 2003 Ryan Huebsch, Joe Hellerstein, Nick Lanham,
1 04/18/2005 Flux Flux: An Adaptive Partitioning Operator for Continuous Query Systems M.A. Shah, J.M. Hellerstein, S. Chandrasekaran, M.J. Franklin UC.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
C++ Programming. Table of Contents History What is C++? Development of C++ Standardized C++ What are the features of C++? What is Object Orientation?
1 Enabling Large Scale Network Simulation with 100 Million Nodes using Grid Infrastructure Hiroyuki Ohsaki Graduate School of Information Sci. & Tech.
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
SALSA: Language and Architecture for Widely Distributed Actor Systems. Carlos Varela, Abe Stephens, Department of.
PIER & PHI Overview of Challenges & Opportunities Ryan Huebsch † Joe Hellerstein † °, Boon Thau Loo †, Sam Mardanbeigi †, Scott Shenker †‡, Ion Stoica.
Ideas to Improve SharePoint Usage 4. What are these 4 Ideas? 1. 7 Steps to check SharePoint Health 2. Avoid common Deployment Mistakes 3. Analyze SharePoint.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Pursuing Faster I/O in COSMO POMPA Workshop May 3rd 2010.
Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval) Ryan Huebsch † Joe Hellerstein †, Nick Lanham †, Boon Thau Loo.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
CMAQ Runtime Performance as Affected by Number of Processors and NFS Writes Patricia A. Bresnahan, a * Ahmed Ibrahim b, Jesse Bash a and David Miller a.
Computing Infrastructure for Large Ecommerce Systems -- based on material written by Jacob Lindeman.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
Big Data Analytics Carlos Ordonez. Big Data Analytics research Input? BIG DATA (large data sets, large files, many documents, many tables, fast growing)
Computer Programming 2 Why do we study Java….. Java is Simple It has none of the following: operator overloading, header files, pre- processor, pointer.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Making Watson Fast Daniel Brown HON111. Need for Watson to be fast to play Jeopardy successfully – All computations have to be done in a few seconds –
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Rassul Ayani 1 Performance of parallel and distributed systems  What is the purpose of measurement?  To evaluate a system (or an architecture)  To compare.
ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.
PIER: Peer-to-Peer Information Exchange and Retrieval Ryan Huebsch Joe Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, Ion Stoica
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Process-Concept.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
PIER ( Peer-to-Peer Information Exchange and Retrieval ) 30 March 07 Neha Singh.
Chapter 13.3: Databases Invitation to Computer Science, Java Version, Second Edition.
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.
Simulation of O2 offline processing – 02/2015 Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture Eugen Mudnić.
Ryan Huebsch, Joseph M. Hellerstein, Ion Stoica, Nick Lanham, Boon Thau Loo, Scott Shenker Querying the Internet with PIER Speaker: Natalia KozlovaTutor:
Chapter 3 Getting Started. Copyright © 2005 Pearson Addison-Wesley. All rights reserved. Objectives To give an overview of the structure of a contemporary.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Chapter 9: The Client/Server Database Environment
Parallel Databases.
Operating Systems (CS 340 D)
Parallel Programming By J. H. Wang May 2, 2017.
The Client/Server Database Environment
Operating Systems (CS 340 D)
Parallel Analytic Systems
INTRODUCTION TO By Stepan Vardanyan.
Parallel Programming in C with MPI and OpenMP
Support for Adaptivity in ARMCI Using Migratable Objects
Lecture Topics: 11/1 Hand back midterms
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
The Design and Implementation of a Log-Structured File System
Presentation transcript:

Titanium/Java Performance Analysis Ryan Huebsch Group: Boon Thau Loo, Matt Harren Joe Hellerstein, Ion Stoica, Scott Shenker P I E R Peer-to-Peer Infrastructure for Information Exchange and Retrieval 1/29/02

Testing done on Millennium (550MHz, Katmai), Titanium version Except for Java testing, data collected 11/2/01, Java collected on 1/23/02 SciMark – All Compilers Summary

Testing done on Millennium (550MHz, Katmai), Titanium version Except for Java testing, data collected 11/2/01, Java collected on 1/23/02 SciMark – Selected Compilers Small Dataset

Testing done on Millennium (550MHz, Katmai), Titanium version Except for Java testing, data collected 11/2/01, Java collected on 1/23/02 SciMark – Selected Compilers Large Dataset

SciMark – Titanium Version Comparisons Small Dataset Large Dataset All data collected on mm62 (550MHz, Katmai) on 1/23/02

PIER – Application Details Network/Database Discrete Event Simulator –A query engine (relational join & group by) on top of a distributed hash table –Simulates end-to-end network communication (latency, bandwidth divided among flows, etc.) Application written in Java for compatibility with other Berkeley database research projects –Software Engineering Over 200 class files, heavy use of inheritance, polymorphism, etc. About 25,000 lines of code (and not too many comments yet) Layered, easily ported to real, working implementation Some parts of the simulation are faked for performance reasons, tuples are kept small ( 1Kb –Primarily an object moving program with some processing (string manipulations, basic math, etc.) –All objects are kept in memory, disk I/O is minimal (for result logging) and not timed in following slides

PIER – Language Summary 63.4% faster 62.7% 77.7% 83.1% 84.0% (5.1% faster Java ) 83.3% (0.8% faster Java) Testing done on Millennium (600MHz, 2G RAM), collected on 1/28/02 Small simulation -64 Simulated Nodes Tuples per table

PIER – Memory Footprint Memory usage & runtime grow exponential with primary simulation parameters (Test parameters same as previous slide)

PIER – Parallel Attempts  Parallel attempt with Titanium failed miserably –Negative speedup (our best almost matched sequential execution) –Simulated nodes were divided among processes, best version utilized out-of-order execution to improve performance, earlier versions used small time steps to keep all processes synchronized. –Problems we encountered Lots of small remote accesses (when using 8 processes on 2 hosts, the MPI performance counters rolled over at least once) –All small accesses… due to the movement of our objects, with sub objects, and sub objects, and more sub objects. Globally, processes were load balanced, within time steps they were not… various allocations of simulated nodes to processes were attempted Application is more memory intensive then computationally bound

Parallel Speedup Graph

Parallel Execution Time Breakup 10ms 300ms Heap 300ms Vect 300ms List Region Async Execution Pre Communication (Execution imbalance) Post Communication (Comm imbalance) Communication

Titanium Wish List Titanium Features that would be nice for our application (yes, you can laugh at them) –Serialization to move objects with encapsulated objects –Better Memory Management (Regions just were not enough) Global Garbage Collection Directed memory deletion (i.e. delete object x) –Performance counters/profiling