Parallel Computation of Skyline Queries Verification COSC6490A Fall 2007 Slawomir Kmiec.

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

Concurrency Important and difficult (Ada slides copied from Ed Schonberg)
Inpainting Assigment – Tips and Hints Outline how to design a good test plan selection of dimensions to test along selection of values for each dimension.
Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.
Computer Systems/Operating Systems - Class 8
George Blank University Lecturer. CS 602 Java and the Web Object Oriented Software Development Using Java Chapter 4.
Chapter 10.
FALL 2004CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
Inter Process Communication:  It is an essential aspect of process management. By allowing processes to communicate with each other: 1.We can synchronize.
Chapter 6. 2 Objectives You should be able to describe: Function and Parameter Declarations Returning a Single Value Pass by Reference Variable Scope.
©Brooks/Cole, 2003 Chapter 7 Operating Systems Dr. Barnawi.
1 Concurrency: Deadlock and Starvation Chapter 6.
C++ fundamentals.
Fundamentals of Python: From First Programs Through Data Structures
External Sorting Problem: Sorting data sets too large to fit into main memory. –Assume data are stored on disk drive. To sort, portions of the data must.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
CSE 486/586 CSE 486/586 Distributed Systems PA Best Practices Steve Ko Computer Sciences and Engineering University at Buffalo.
Synchronization (Barriers) Parallel Processing (CS453)
Fundamentals of Python: From First Programs Through Data Structures Chapter 14 Linear Collections: Stacks.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
A First Book of C++: From Here To There, Third Edition2 Objectives You should be able to describe: Function and Parameter Declarations Returning a Single.
© The McGraw-Hill Companies, 2006 Chapter 4 Implementing methods.
CNG 140 C Programming (Lecture set 9) Spring Chapter 9 Character Strings.
CS533 Concepts of Operating Systems Jonathan Walpole.
1 Concurrent Languages – Part 1 COMP 640 Programming Languages.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
1 Announcements The fixing the bug part of Lab 4’s assignment 2 is now considered extra credit. Comments for the code should be on the parts you wrote.
CSC 211 Data Structures Lecture 13
Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization.
Distributed System Concepts and Architectures 2.3 Services Fall 2011 Student: Fan Bai
Data Structure Introduction.
Threaded Programming in Python Adapted from Fundamentals of Python: From First Programs Through Data Structures CPE 401 / 601 Computer Network Systems.
Verification of obstruction-free algorithm with contention management Niloufar Shafiei.
Dynamic Data Structures and Generics Chapter 10. Outline Vectors Linked Data Structures Introduction to Generics.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
A Software Framework for Distributed Services Michael M. McKerns and Michael A.G. Aivazis California Institute of Technology, Pasadena, CA Introduction.
EPICS Release 3.15 Bob Dalesio May 19, Features for 3.15 Support for large arrays - done for rsrv in 3.14 Channel access priorities - planned to.
 2007 Pearson Education, Inc. All rights reserved C Arrays.
Data Structure Introduction Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
EPICS Release 3.15 Bob Dalesio May 19, Features for 3.15 Support for large arrays Channel access priorities Portable server replacement of rsrv.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
November 27, 2007 Verification of a Concurrent Priority Queue Bart Verzijlenberg.
Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.
CS533 - Concepts of Operating Systems 1 Threads, Events, and Reactive Objects - Alan West.
Processes and Threads MICROSOFT.  Process  Process Model  Process Creation  Process Termination  Process States  Implementation of Processes  Thread.
A FIRST BOOK OF C++ CHAPTER 14 THE STRING CLASS AND EXCEPTION HANDLING.
What is a Process ? A program in execution.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Random Test Generation of Unit Tests: Randoop Experience
Parallel Computation of Skyline Queries COSC6490A Fall 2007 Slawomir Kmiec.
1 Chapter 11 I/O Management and Disk Scheduling Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and.
Advanced Sorting 7 2  9 4   2   4   7
Sorts, CompareTo Method and Strings
Sorting Mr. Jacobs.
16 Exception Handling.
Threaded Programming in Python
William Stallings Computer Organization and Architecture
Parallel and Distributed Simulation Techniques
Computer Engg, IIT(BHU)
Distributed Garbage Collection
Parallel Computation of Skyline Queries Implementation
Concurrency: Mutual Exclusion and Process Synchronization
Threaded Programming in Python
The Structure of the “The” –Multiprogramming System
Lecture 2 The Art of Concurrency
CS510 Operating System Foundations
Supporting Online Analytics with User-Defined Estimation and Early Termination in a MapReduce-Like Framework Yi Wang, Linchuan Chen, Gagan Agrawal The.
Thread per client and Java NIO
Presentation transcript:

Parallel Computation of Skyline Queries Verification COSC6490A Fall 2007 Slawomir Kmiec

Presentation Outline  Skyline Concepts  The Parallel Algorithm  JPF Experience  JPF Issues  Abstraction  Results  Future Work  Summary  Questions

Skyline Concepts In a set of points (or records) identify points that are better than (i.e. not worse than) any of the others by a given set of their attributes. NameRatingAvg. Price Parthenon5$45.00 Olympus4$40.00 Coliseum4$30.00 Pyramid3$25.00 Bombay5$35.00 Paris5$40.00 Roma4$35.00 Palermo3$30.00 Point p a is said to dominate point p b if for all i such that 1 ≤ i ≤ d we have xi(p a ) ≤ xi(p b ), and at least one of those inequalities is strict. A point p is a skyline point if it is not dominated by any other point in S. The skyline of S is denoted sky(S).

The Parallel Algorithm (A)  Principles: → data divided equally and distributed → local skyline is computed at each peer → size of the local skyline is shared with peers → if combined results fit on any processor → local skylines are exchanged with peers then → processor p i picks i th chunk of the combined skyline and eliminates points in it that the combined skyline dominates → local results are sent to the central process → end // of processing

The Parallel Algorithm (A cont.)

The Parallel Algorithm (B)  Principles (continued) → else // combined results do not fit on some p i → loop until required number of results is available or all p i have finished do → each processor p i picks a random set of points (in proportion of his local skyline) → this set is submitted to all peers that mark point that they dominate and marked points are returned to sender → each processor p i collects back points submitted to peers and removes marked ones from the original set but sends the remaining ones to the central processor → end loop → end // of processing

The Parallel Algorithm (B cont.)

JPF Experience  getting JPF  getting JPF to run  the Eclipse way  the Linux way  incremental examples  configuration options  JPF value-added services

JPF Issues  independent processors - restricted to threads  eliminate native code classes - no Swing, Sockets, NIO, Regex (Eclipse) - out of 15 just java.util.ArrayList left - eliminate Socket-oriented developed classes  search-state-space reduction - input: 10 points - 2 worker threads - operation abstraction - output discarded

Abstraction 2 types of developed classes left SkylineMain and SkylineWorker - workflow classes “Handler” classes - request handling classes SkylineMain SkylineMainListener SkylineMainHandler Thread Socket ServeSocket SkylineWorker SkylineWorkerListener SkylineWorkerHandler Thread Socket ServerSocket

Abstraction (cont.) high volume of work: - due to a lot of original code removed all GUI: - remove Swing and AWT elements asynchronous Socket messaging done as: - keep references to workers instead of addresses - eliminate the “Listener” classes - each message done as an instance of the handler - create a handler for the destination worker - execute synchronous (blocking) part of data sending - start handler to execute asynchronous processing - each type of messages split into synch- and asynch- part file IO done as: - store parameters as static constants - store input data as an array - replace input scanning with referencing the array - display or discard output String.split() method (Regex) done as: - re-done as a String manipulation method

Results issues reported - different issues at different settings - large volume of output to be analyzed uncaught-exception conditions - issues regarding un-synchronized access - the above as IllegalMonitorStateException dead-lock conditions - issues regarding termination conditions PreciseRaceDetector -“Unprotected Variable Access” severe warnings possibly more - it ran for a long time with no other errors - it did not finish in the time given

Future Work atomize code - wrap code fragments into atomic operations protect shared variable access - use locks of synchronized blocks - re-run PreciseRaceDetector run it for an extended period of time - to search the complete state space analyze the applicability of issues found - wrt the applicability to the original app - not as a result of the abstraction or transformation reduce shared data interaction - handlers to create private data structures to be quickly accepted by corresponding main process - this will allow greater robustness and redundancy

Summary JPF is a flexible and complex tool JPF is memory- and time- intensive JPF is a valuable verification tool the application had to be changed extensively to work with JPF potential issues were found by JPF verification = value-added service extra testing code refinement (robustness)

Questions???