Download presentation
Presentation is loading. Please wait.
1
DataWarp: Making Progress Despite Inconsistent Data Stephen Crouch Peter Henderson Robert John Walters School of Electronics and Computer Science, University of Southampton, UK
2
Outline Background Traditional Philosophy DataWarp Example Conclusion
3
Modern Systems: No longer exist in private environments Are connected to each other Use data which Is (at least partially) replicated Can be out of date Contains errors They don’t own
4
Traditional approach “Everything must have a correct value” We must drive out the imperfections Implement systems to make sure data remains consistent Don’t do anything unless sure it is right
5
Examples Transactions Elaborate schemes which ensure data remains consistent Compensations Less elaborate and restrictive Relax some restrictions of transactions but expose intermediate states
6
Single Datum World Transactional systems never leave left-most column Compensation systems can, but Temporarily Make sure they know how to get back
7
But we can never achieve full consistency “Inconsistencies” which are deliberate Different notions of consistency Ownership Cost The accumulated body of data is too big
8
DataWarp, an alternative We can’t “fix” the data so: We have to “fix” the applications DataWarp Can’t give up when inconsistency found Do the best you can with what you have Be prepared to make corrections
9
Single Datum World DataWarp: Accepts being in leftmost column is unlikely
10
Grid Scheduling Example Classical approach to any workflow Find and execute the first task Wait for it to complete Execute the next task … Works, but time wasted waiting
11
Example Workflow as Text Data DI # Input Data DJK # Output J or K Data DA,DB,DC,DH # Other Output Job A,B,C,H,J,K # Tasks A.submitJob(DI) A.waitFor() DA = A.getResults() parallel { B.submitJob(DA) B.waitFor() DB = B.getResults() } and { H.submitJob(DA) DH = H.getResults() if ( some_predicate(DH) ) { J.submitJob(DH) J.waitFor() DJK = J.getResults() } else { K.submitJob(DH) K.waitFor() DJK = K.getResults() } C.submitJob(DB, DJK) C.waitFor() DC = C.getResults()
12
Example Workflow as Diagram
13
Notice Both B and H can start as soon as A completes and can run at the same time Whether we do J or K depends on result of H C needs output from B and J or K Processing time for each job includes waiting in the queue
14
Execution times:
15
Optimisations 1 Anticipation Put jobs in the queue so they come to the head of the queue just as we have the data to execute them Run more than one job at a time Users do this manually Jobs put in slow moving queues ready for when needed
16
The Schedule ProcessExecution TimeDelay for placeholder job A70 B207 C4327 H57 J1112 K8
17
Features Start B, H together Sequentially C finishes at 116 By running B in parallel with H,J,K this improves to 88 Anticipating need for jobs this is improved to 76
18
Optimisations 2 Suppose queue prediction is too pessimistic Jobs for J,K arrive at head of queue while H still working Start both Abandon one when H completes Suppose H fails/still working when B finishes and C ready Pick output from one of J,K Complete the workflow
19
Conclusion Applications have to manage in connected environment Insisting on complete, consistent data is no longer acceptable DataWarp applications can live with uncertain data They continue where others fail
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.