Parallel Computation of Skyline Queries Implementation

Slides:



Advertisements
Similar presentations
DISTRIBUTED COMPUTING PARADIGMS
Advertisements

Service Description: WSDL COMP6017 Topics on Web Services Dr Nicholas Gibbins –
MPI Message Passing Interface
Rheeve: A Plug-n-Play Peer- to-Peer Computing Platform Wang-kee Poon and Jiannong Cao Department of Computing, The Hong Kong Polytechnic University ICDCSW.
1 Friday, September 29, 2006 If all you have is a hammer, then everything looks like a nail. -Anonymous.
A CHAT CLIENT-SERVER MODULE IN JAVA BY MAHTAB M HUSSAIN MAYANK MOHAN ISE 582 FALL 2003 PROJECT.
1 Lecture 8 Architecture Independent (MPI) Algorithm Design Parallel Computing Fall 2007.
1 Java Networking – Part I CS , Spring 2008/9.
Networking with Java CSc 335 Object-Oriented Programming and Design Spring 2009.
Fundamentals of Python: From First Programs Through Data Structures
Web Proxy Server. Proxy Server Introduction Returns status and error messages. Handles http CGI requests. –For more information about CGI please refer.
19-Aug-15 About the Chat program. 2 Constraints You can't have two programs (or two copies of the same program) listen to the same port on the same machine.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
Cli/Serv.: Chat/121 Client/Server Distributed Systems v Objectives –discuss a client/server based chat system –mention two other ways of chatting.
12/1/98 COP 4020 Programming Languages Parallel Programming in Ada and Java Gregory A. Riccardi Department of Computer Science Florida State University.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Processes & Threads Emery Berger and Mark Corner University.
REVIEW On Friday we explored Client-Server Applications with Sockets. Servers must create a ServerSocket object on a specific Port #. They then can wait.
Analysis of Algorithms
Introduction Algorithms and Conventions The design and analysis of algorithms is the core subject matter of Computer Science. Given a problem, we want.
Workflow Early Start Pattern and Future's Update Strategies in ProActive Environment E. Zimeo, N. Ranaldo, G. Tretola University of Sannio - Italy.
Manifold Lab 2 Introduction Basic Concepts First Examples.
Li Tak Sing COMPS311F. Case study: consumers and producers A fixed size buffer which can hold at most certain integers. A number of producers which generate.
1 CSC 221: Computer Programming I Fall 2004 Lists, data access, and searching  ArrayList class  ArrayList methods: add, get, size, remove  example:
© Lethbridge/Laganière 2005 Chap. 3: Basing Development on Reusable Technology The Client-Server Architecture A distributed system is a system in.
Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization.
CS390- Unix Programming Environment CS 390 Unix Programming Environment Java Socket Programming.
Chapter 5 Implementing UML Specification (Part II) Object-Oriented Technology From Diagram to Code with Visual Paradigm for UML Curtis H.K. Tsang, Clarence.
Data Structure Introduction Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Networking with JavaN-1 Outline Client-Server Example Steps required on the server side Steps required on the client side.
Multithreading The objectives of this chapter are: To understand the purpose of multithreading To describe Java's multithreading mechanism.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Parallel Computation of Skyline Queries Verification COSC6490A Fall 2007 Slawomir Kmiec.
Networking Code CSCI 201L Jeffrey Miller, Ph.D. HTTP :// WWW - SCF. USC. EDU /~ CSCI 201 USC CSCI 201L.
Parallel Computation of Skyline Queries COSC6490A Fall 2007 Slawomir Kmiec.
Li Tak Sing COMPS311F. Case study: a multithreaded chat server The source contains 3 files: ChatServer //the chat server ChatThread //the thread on the.
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 33 Networking.
The Echo Server Problem. Contents  Basic Networking Concepts  The Echo Server Problem.
Introduction to Operating Systems Concepts
Sorts, CompareTo Method and Strings
Sorting Mr. Jacobs.
Auburn University
Process concept.
Broker in practice: Middleware
Definition of Distributed System
The Echo Algorithm The echo algorithm can be used to collect and disperse information in a distributed system It was originally designed for learning network.
In-situ Visualization using VisIt
Computer Engg, IIT(BHU)
Client-server Programming
Parallel Programming in C with MPI and OpenMP
Programming Models for Distributed Application
CSE 451: Operating Systems Winter 2006 Module 20 Remote Procedure Call (RPC) Ed Lazowska Allen Center
Communication and Memory Efficient Parallel Decision Tree Construction
MPI-Message Passing Interface
A Restaurant Recommendation System Based on Range and Skyline Queries
Lecture 2- Query Processing (continued)
CSE 451: Operating Systems Winter 2004 Module 19 Remote Procedure Call (RPC) Ed Lazowska Allen Center
Skyline query with R*-Tree: Branch and Bound Skyline (BBS) Algorithm
Introduction to parallelism and the Message Passing Interface
CSE 451: Operating Systems Spring 2012 Module 22 Remote Procedure Call (RPC) Ed Lazowska Allen Center
EECE.4810/EECE.5730 Operating Systems
Creating Computer Programs
Outline Chapter 2 (cont) Chapter 3: Processes Virtual machines
Agenda Software development (SD) & Software development methodologies (SDM) Orthogonal views of the software OOSD Methodology Why an Object Orientation?
Message Passing Systems Version 2
CSE 451: Operating Systems Messaging and Remote Procedure Call (RPC)
Exceptions and networking
Message Passing Systems
Presentation transcript:

Parallel Computation of Skyline Queries Implementation COSC6490A Fall 2007 Slawomir Kmiec

Presentation Outline Skyline Concepts The Parallel Algorithm Data & Configuration Implementation Details Goals and Objectives Deminstration & Questions

Skyline Concepts In a set of points (or records) identify points that are better than (i.e. not worse than) any of the others by a given set of their attributes. Name Rating Avg. Price Parthenon 5 $45.00 Olympus 4 $40.00 Coliseum $30.00 Pyramid 3 $25.00 Bombay $35.00 Paris Roma Palermo Point pa is said to dominate point pb if for all i such that 1 ≤ i ≤ d we have xi(pa) ≤ xi(pb) , and at least one of those inequalities is strict. A point p is a skyline point if it is not dominated by any other point in S. The skyline of S is denoted sky(S).

The Parallel Algorithm nested-loops O(d*n2) 10 attributes * 100k→1011 the choice of the local skyline algorithm is orthogonal and can even be dynamic d-dimensional data space the skyline size is O(d!) p interconnected and independent processors with O(n/p) memory Processors can be physically separate nodes

The Parallel Algorithm (cont.) Principles: → data divided equally and distributed → local skyline is computed at each peer → size of the local skyline is shared with peers → if combined results fit on any processor → local skylines are exchanged with peers then → processor pi picks ith chunk of the combined skyline and eliminates points in it that the combined skyline dominates → local results are sent to the central process → end // of processing

The Parallel Algorithm (cont.)

The Parallel Algorithm (cont.) Principles (continued) → else // combined results do not fit on some pi → loop until required number of results is available or all pi have finished do → each processor pi picks a random set of points (in proportion of his local skyline) → this set is submitted to all peers that mark point that they dominate and marked points are returned to sender → each processor pi collects back points submitted to peers and removes marked ones from the original set but sends the remaining ones to the central processor → end loop → end // of processing

The Parallel Algorithm (cont.)

Data and Configuration Configuration file: localhost,40000 ../data/set1.txt ../data/set1.sky 5 localhost,40001 localhost,40002 localhost,40003 localhost,40004 localhost,40005 Input text file set1.txt: 100000 9 441084,675002,105152,606616,90578,963749,748812,739998,625168 679542,662041,183694,274049,571353,513841,673841,136017,348093 913693,908848,273936,405560,228917,540670,8469,549431,868311 … Output text file set1.sky: 990161,447432,254614,908555,355890,594119,35340,149796,191499 178453,428473,872989,121626,57614,318734,748950,287311,124463 673578,11433,327204,110384,946426,887381,714928,51188,511141 170357,699425,57272,468474,988612,425985,193800,234079,641191

Implementation Details Java classes used: java.util.List; java.util.ArrayList; java.io.InputStreamReader; java.io.BufferedReader; java.io.FileReader; java.io.FileWriter; java.io.PrintStream; java.net.InetAddress; java.net.Socket; java.net.ServerSocket; javax.swing.JFrame; javax.swing.JLabel; javax.swing.JProgressBar; javax.swing.JScrollPane; javax.swing.JTextArea; The developed classes: SkylineMain SkylineMainListener SkylineMainHandler SkylineWorker SkylineWorkerListener SkylineWorkerHandler

Implementation Details (cont.) 3 types of classes SkylineMain and SkylineWorker - workflow classes “Listener” classes - request managing classes “Handler” classes - request handling classes SkylineMain SkylineMainListener SkylineMainHandler Thread Socket ServerSocket SkylineWorker SkylineWorkerListener SkylineWorkerHandler Thread Socket ServerSocket

Implementation Details (cont.) SkylineWorkerListener SkylineWorker parent; int port; void run( ); public void run( ) { ServerSocket listener = new ServerSocket( port ); while ( true ) Socket data = listener.accept( ); SkylineMainHandler handler = new SkylineWorkerHandler( parent, data ); handler.start( ); }

Implementation Details (cont.) SkylineWorkerHandler SkylineMain parent; Socket data; void run( ); void receiveData( ); void receiveLocalSkylineSize( ); void receiveLocalSkyline( ); void receiveChunk( ); void mergeChunk( ); void doTerminate( ); void doStop( ); public void run( ) { String dataType = dataInp.readLine( ); if ( dataType.equals( "data" ) ) receiveData( ); else if ( dataType.equals( "local_skyline_size" ) ) receiveLocalSkylineSize( ); else if ( dataType.equals( "local_skyline" ) ) receiveLocalSkyline( ); else if ( dataType.equals( "chunk_data" ) ) receiveChunk( ); else if ( dataType.equals( "chunk_result" ) ) mergeChunk( ); else if ( dataType.equals( "stop" ) ) doStop( ); else if ( dataType.equals( "termination" ) ) doTerminate( ); else System.out.println( "Unsupported data: " + dataType ); data.close( ); }

Implementation Details (cont.) SkylineWorker … public void run( ) { listener.start( ); waitForData( ); calculateLocalSkyline( ); sendLocalSkylineSizeToAll( ); waitForLocalSkylineSizesFromAll( ); if ( niTotal <= npMax ) { sendLocalSkylineToAll( ); waitForLocalSkylinesFromAll( ); consolidateLocalSkylines( ); selectIthConsolidatedSkylineChunk( ); filterSelectSkylineChunk( ); reportFilterSelectSkylineChunk( ); } else { chunkLocalSkyline( ); while ( !stopped && !terminated && siChunkIndex * siChunkSize < siLocal.length ) { sendChunkToAll( siChunkIndex ); waitForChunkFromAll( ); reportFilterSelectSkylinePart( ); reportEndOfProcess( ); waitForTermination( );

Implementation Details (cont.) the volume of work – the data described in the algorithm are very high level and resulted in a lot of actual work and code to implement them stopping and termination – to gracefully handle the termination of processing when the app stops i.e. it needed to stop its own data processing but be open to outside queries, as well as, when the app terminates and stops processing its own data and outside queries the application was developed so that the worker processes can run on separate machines thus the SkylineWorker class needed to be developed and tested as a standalone application features included it needed to be flexible as well to run for the runtime given peer and limit configurations asynchronous communications and message broadcast and receipt coordination

Further Goals and Objectives Can generic reusable higher-level operations be developed that could be used in other parallel computations? all-to-all messaging all-peer result consolidation 3-threaded processors transmission of large datasets process state maintenance and synchronization Can some a template design pattern be generalized for similar divide-distribute-and-conquer parallel computations? Can the count of dominated points be incorporated in the result? Can idle time on processors be utilized to assist peers or to do work-ahead or speculative preprocessing?

Demonstration & Questions ???