DataWarp: Making Progress Despite Inconsistent Data Stephen Crouch Peter Henderson Robert John Walters School of Electronics and Computer Science, University.

Slides:



Advertisements
Similar presentations
Categories of I/O Devices
Advertisements

MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO
University of Southampton Electronics and Computer Science M-grid: Using Ubiquitous Web Technologies to create a Computational Grid Robert John Walters.
1 Integrity Ioan Despi Transactions: transaction concept, transaction state implementation of atomicity and durability concurrent executions serializability,
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 17 Scheduling III.
IT Systems Multiprocessor System EN230-1 Justin Champion C208 –
Operating Systems Chapter 6
Chapter 5 CPU Scheduling. CPU Scheduling Topics: Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling.
Located Functions for Distributed Computations Stephen Crouch, Peter Henderson, Robert John Walters University of Southampton, Southampton, United Kingdom,
14-Jun-15 State Machines. 2 What is a state machine? A state machine is a different way of thinking about computation A state machine has some number.
Review: Process Communication Sequential Communication –Result of P1 becomes at termination the input to P2 –Tools: Redirect (>,>>) and Pipe (|) Concurrent.
BUILDING APPLICATIONS ABLE TO COPE WITH PROBLEMATIC DATA USING A DATAWARP APPROACH Stephen Crouch Peter Henderson Robert John Walters University of Southampton,
©Silberschatz, Korth and Sudarshan15.1Database System ConceptsTransactions Transaction Concept Transaction State Implementation of Atomicity and Durability.
DataWarp: Building Applications which Make Progress in an Inconsistent World Peter Henderson, Robert John Walters, Stephen Crouch, Qinglai Ni.
25-Jun-15 State Machines. 2 What is a state machine? A state machine is a different way of thinking about computation A state machine has some number.
Implementing Hierarchical Features in a Graphically Based Formal Modelling Language Peter Henderson, Robert John Walters and Stephen Crouch Department.
Systems Analysis I Data Flow Diagrams
Recall … Process states –scheduler transitions (red) Challenges: –Which process should run? –When should processes be preempted? –When are scheduling decisions.
Chapter 6: CPU Scheduling
Computer Architecture and Operating Systems CS 3230: Operating System Section Lecture OS-3 CPU Scheduling Department of Computer Science and Software Engineering.
CS212: OPERATING SYSTEM Lecture 3: Process Scheduling 1.
Chapter 6 CPU SCHEDULING.
IB Computer Science Section 1: Systems life cycle and software development.
Transaction Lectured by, Jesmin Akhter, Assistant professor, IIT, JU.
Continuous Deployment JEFFREY KNAPP 8/6/14. Introduction Why is it valuable How to achieve What to consider.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Classical problems.
Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
Chapter 5: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 5: CPU Scheduling Basic Concepts Scheduling Criteria.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Processes.
1 11/29/2015 Chapter 6: CPU Scheduling l Basic Concepts l Scheduling Criteria l Scheduling Algorithms l Multiple-Processor Scheduling l Real-Time Scheduling.
Computing & Information Sciences Kansas State University Wednesday, 05 Nov 2008CIS 560: Database System Concepts Lecture 28 of 42 Wednesday, 05 November.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Process-Concept.
Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.
1 CS.217 Operating System By Ajarn..Sutapart Sappajak,METC,MSIT Chapter 5 CPU Scheduling Slide 1 Chapter 5 CPU Scheduling.
Software System Lab. Transactions Transaction Concept A transaction is a unit of program execution that accesses and possibly updates various.
Get the New Agile Attitude: Quality First! Object Mentor, Inc. Copyright  by Object Mentor, Inc All Rights Reserved
Distributed Storage Systems: Data Replication using Quorums.
FYP 446 /4 Final Year Project 2 Dr. Khairul Farihan Kasim FYP Coordinator Bioprocess Engineering Program Universiti Malaysia Perls.
Chapter 4 CPU Scheduling. 2 Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation.
Process Scheduling. Scheduling Strategies Scheduling strategies can broadly fall into two categories  Co-operative scheduling is where the currently.
Basic Concepts Maximum CPU utilization obtained with multiprogramming
Chapter 5: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 5: CPU Scheduling Basic Concepts Scheduling Criteria.
Damian Gordon.  What are good policies to schedule processes?
Topic 3 (Textbook - Chapter 3) Processes
7 Operating system Foundations of Computer Science ã Cengage Learning.
Chapter 6: CPU Scheduling
Chapter 3: Processes.
CPU Scheduling Basic Concepts Scheduling Criteria
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Chapter 5: CPU Scheduling
Operating System Concepts
Lecture # 3 Software Development Project Management
Operating Systems Chapter 5: Input/Output Management
3: CPU Scheduling Basic Concepts Scheduling Criteria
Chapter5: CPU Scheduling
Chapter 6: CPU Scheduling
Contest Orientation.
Chapter 5: CPU Scheduling
Chapter 11 I/O Management and Disk Scheduling
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
State Machines 8-May-19.
State Machines 16-May-19.
Chapter 6: CPU Scheduling
CPU Scheduling: Basic Concepts
Module 5: CPU Scheduling
Presentation transcript:

DataWarp: Making Progress Despite Inconsistent Data Stephen Crouch Peter Henderson Robert John Walters School of Electronics and Computer Science, University of Southampton, UK

Outline Background Traditional Philosophy DataWarp Example Conclusion

Modern Systems: No longer exist in private environments Are connected to each other Use data which Is (at least partially) replicated Can be out of date Contains errors They don’t own

Traditional approach “Everything must have a correct value” We must drive out the imperfections Implement systems to make sure data remains consistent Don’t do anything unless sure it is right

Examples Transactions Elaborate schemes which ensure data remains consistent Compensations Less elaborate and restrictive Relax some restrictions of transactions but expose intermediate states

Single Datum World Transactional systems never leave left-most column Compensation systems can, but Temporarily Make sure they know how to get back

But we can never achieve full consistency “Inconsistencies” which are deliberate Different notions of consistency Ownership Cost The accumulated body of data is too big

DataWarp, an alternative We can’t “fix” the data so: We have to “fix” the applications DataWarp Can’t give up when inconsistency found Do the best you can with what you have Be prepared to make corrections

Single Datum World DataWarp: Accepts being in leftmost column is unlikely

Grid Scheduling Example Classical approach to any workflow Find and execute the first task Wait for it to complete Execute the next task … Works, but time wasted waiting

Example Workflow as Text Data DI # Input Data DJK # Output J or K Data DA,DB,DC,DH # Other Output Job A,B,C,H,J,K # Tasks A.submitJob(DI) A.waitFor() DA = A.getResults() parallel { B.submitJob(DA) B.waitFor() DB = B.getResults() } and { H.submitJob(DA) DH = H.getResults() if ( some_predicate(DH) ) { J.submitJob(DH) J.waitFor() DJK = J.getResults() } else { K.submitJob(DH) K.waitFor() DJK = K.getResults() } C.submitJob(DB, DJK) C.waitFor() DC = C.getResults()

Example Workflow as Diagram

Notice Both B and H can start as soon as A completes and can run at the same time Whether we do J or K depends on result of H C needs output from B and J or K Processing time for each job includes waiting in the queue

Execution times:

Optimisations 1 Anticipation Put jobs in the queue so they come to the head of the queue just as we have the data to execute them Run more than one job at a time Users do this manually Jobs put in slow moving queues ready for when needed

The Schedule ProcessExecution TimeDelay for placeholder job A70 B207 C4327 H57 J1112 K8

Features Start B, H together Sequentially C finishes at 116 By running B in parallel with H,J,K this improves to 88 Anticipating need for jobs this is improved to 76

Optimisations 2 Suppose queue prediction is too pessimistic Jobs for J,K arrive at head of queue while H still working Start both Abandon one when H completes Suppose H fails/still working when B finishes and C ready Pick output from one of J,K Complete the workflow

Conclusion Applications have to manage in connected environment Insisting on complete, consistent data is no longer acceptable DataWarp applications can live with uncertain data They continue where others fail