Download presentation
Presentation is loading. Please wait.
Published byMadisyn Sewall Modified over 9 years ago
1
Paraglide Martin Vechev Eran Yahav Martin Vechev Eran Yahav
2
Synthesizing System-Level Software Requirements Correctness Scalability Response time Requirements Correctness Scalability Response time Challenges Crossing abstraction levels Hardware complexity Time to market Challenges Crossing abstraction levels Hardware complexity Time to market
3
Highly Concurrent Algorithms Parallel pattern matching Anomaly detection Parallel pattern matching Anomaly detection Voxel trees Polyhedrons … Voxel trees Polyhedrons … Scene graph traversal Physics simulation Collision Detection … Scene graph traversal Physics simulation Collision Detection … Cartesian tree (fast fits) Lock-free queue Garbage collection … Cartesian tree (fast fits) Lock-free queue Garbage collection …
4
Goal Generate efficient provably correct components of concurrent systems from higher-level specs Verification/checking integrated into the design process Automatic exploration of implementation details Synthesize critical components System-level code Explore tradeoffs Some tasks are best done by machine, while others are best done by human insight; and a properly designed system will find the right balance. – D. Knuth
5
Implementation ?? Manual Construction Hard to verify/test Often buggy Did the programmer choose well?? One time deal Memory Model Thread Model Concurrency Primitives CPU primitives … Optimistic concurrency Adding metadata Adding space … ENVIRONMENTREQUIREMENTSBAG OF TRICKS Throughput Memory Consumption Pause Time … High(er) level description SYSTEM SPEC Current Approach: Manual Construction
6
Memory Model Thread Model Concurrency Primitives CPU primitives … Optimistic concurrency Adding metadata Adding space … Implementation ENVIRONMENTREQUIREMENTSBAG OF TRICKS ?? Throughput Memory Consumption Pause Time … Implementation Alternative impls Our Vision Machine Assistance Auto checking/verification Auto exploration of implementation details Repeatable Machine Assistance Auto checking/verification Auto exploration of implementation details Repeatable High(er) level description SYSTEM SPEC
7
Example: Concurrent Set Algorithm Systematically derived with machine assistance Correctness – automatically verified Performance – only uses CAS Systematically derived with machine assistance Correctness – automatically verified Performance – only uses CAS
8
Why Should You Care? Correctness Checking/verification integrated into the design process Performance Systematic exploration beats human in crossing levels of abstraction, leveraging non-intuitive memory models, etc. Systematic exploration produces many candidates with varying tradeoffs Adaptability Shorter development cycle for adapting system to a new environment
9
Correctness Checking/verification integrated into the design process Performance Systematic exploration beats human in crossing levels of abstraction, leveraging non-intuitive memory models, etc. Systematic exploration produces many candidates with varying tradeoffs Adaptability Shorter development cycle for adapting system to a new environment Why Should You Care?
10
Why There is Hope? Designer effort Provide insights that are also required in manual construction Correctness Checking helps eliminate large number of incorrect candidates Designer can focus on remaining candidates Performance … Adaptability …
11
Why There is Hope II ? Transformational derivation Concurrent garbage collection algorithms [PLDI’06] Combinatorial exploration Concurrent GC algorithms [PLDI’07] Concurrent set algorithms [PLDI’08] Automatic Verification Comparison under Abstraction for Verifying Linearizability [Amit, CAV’07] Shape Analysis for Concurrent Programs [TAU] …
12
Risk Summary Designer Effort Return on designer “investment” Is the result competitive with manually crafted system? Is the tool working in the right level of abstraction? Verification scalability
13
Outline Technical details Commonalities between concurrent algorithms Adapting to a changing environment Preliminary experience: our combinatorial approach Plan Succeed Early Many open questions Common representation “more efficient” …
14
Ben-Ari Base ‘84 Dijkstra(C) ‘78 Doligez(C) ‘93 Azatchi ‘03 Domani ‘03 Yuasa ‘90 Pixley ‘88 Ben-Ari Base ‘84 Doligez ‘94 Ben-Ari Extended ‘84 Steele(C) ‘75 Boehm ‘91 Barabash ‘03 ‘03 ALGORITHMS PROOFS Example: “The Origin of GCs” Incorrect Correct (C) Corrected FAMILY
15
Example: Concurrent Set Algorithms Harris ‘01 Michael ‘02 Heller ‘05 Valois ‘95 Ruppert ‘04 Massalin ‘91 Greenwald ‘99
16
Adapting to a Changing Environment Algorithm Synch primitives Memory model Thread model Memory manager Scheduler … …
17
Families of algorithms sharing a common skeleton with parametric functions Trace Step Mutator Step Expose Mutator Collector Machine Assisted Design Process
19
Overview High-level designFind a sufficient local invariant Find a sufficient abstraction Low-level searchVerify local invariant High-level designFind algorithm outline Find building blocks Low-level searchexplore algorithm space Generation Verification
20
{ M1: old = source.field M2: w = source.field.WF M3: w new.MC++ M4: w log = log U {new} M5: w old.MC-- M6: source.fld = new } { C1: dst = source.field C2: source.field.WF = true C3: mark dst } { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} return V } Trace Step (source, field)Mutator Step (source, field, new) Set Expose (log) Coarse-Grained to Fine-Grained Synchronization What now ? Can we remove atomics ? Result is incorrect, may lose objects! atomic
21
{ M1: old = source.field M2: w = source.field.WF M3: w new.MC++ M4: w log = log U {new} M5: w old.MC-- M6: source.fld = new } { C1: dst = source.field C2: source.field.WF = true C3: mark dst } { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} return V } Trace Step (source, field)Mutator Step (source, field, new) Set Expose (log) What now ? Can we remove atomics ? Coarse-Grained to Fine-Grained Synchronization
22
{ C1: dst = source.field C2: source.field.WF = true C3: mark dst } { M1: old = source.field M2: w = source.field.WF M5: w old.MC-- M3: w new.MC++ M4: w log = log U {new} M6: source.fld = new } { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} return V } Trace Step (source, field)Mutator Step (source, field, new) Set Expose (log) What now ? Can we remove atomics ? “When in doubt, use brute force.” --Ken Thompson “When in doubt, use brute force.” --Ken Thompson Coarse-Grained to Fine-Grained Synchronization
23
Tracing Step Building Blocks Mutator Building Blocks Expose Building Blocks M1: old = source.field M2: w = source.field.WF M3: w new.MC++ M4: w log = log U {new} M5: w old.MC-- M6: source.fld = new C1: dst = source.field C3: mark dst C2: source.field.WF = true E1: o= remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} System Input – Building Blocks Input Constraints Mutator blocks: [M3, M4] Tracing blocks: [C1, C3] Expose blocks: [ E1, E2, E3, E4 ] Dataflow e.g. M2 < M3
24
System Output – (Verified) Algorithms Mutator Step (source, field, new) { M1: old = source.field M6: source.fld = new M2: w = source.field.WF M3: w new.MC++ M4: w log = log U {new} M5: w old.MC— } Set Expose(log) { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} } Trace Step (source, field) { C1: dst = source.field C3: mark dst C2: source.field.WF = true } Explored 306 variations in around 2 mins Least atomic (verified) algorithm with given blocks
25
But What Now ? How do we get further improvement? Need more insights Need new building blocks Example: start and end of collector reading a field Coordination Meta-data AtomicityOrdering
26
Continuing the Search… We derived a non-atomic algorithm (at the granularity of blocks) Non atomic write-barrier, collector step and expose System explored over 1,600,000 algorithms (took ~34 hours) All experiments took ~41 machine hours and ~3 human hours
27
Plan Identify application domain Case studies Concurrent garbage collection algorithms Concurrent set algorithms Concurrent memory allocator (used in metronome) … Dynamic tool for testing systems (ParaDyn) Abstraction-guided synthesis Automatic verification using local abstractions Representation Choosing the right starting point
28
Highly Concurrent Plan Identify application domain Case studies Concurrent garbage collection algorithms Concurrent set algorithms Concurrent memory allocator (used in metronome) … Dynamic tool for testing systems (ParaDyn) Representation Choosing the right starting point … Abstraction-guided synthesis Automatic verification using local abstractions
29
Succeed Early Choose “the right” domain Correctness is critical High performance Highly dynamic (concurrent changes) Custom architecture (?) Irregular structures (?) Workloads unknown at compile time Examples: VM components, drivers for embedded devices…
30
Longer-term Questions
31
Representation Appropriate for transformation? Makes concurrency apparent?
32
Choosing the Right Starting Point? “Higher-level specification” ? A sequential program? start with something else? Add(S,x): S’ = S { x } Remove(S,x): S’ = S { x } Contains(S,x): x S
33
What is “More Efficient”? Multiple dimensions Scalability Response time … Theoretical models exist Disjoint-access parallelism … Not clear whether existing theoretical models capture reality
34
Abstraction-Guided Synthesis Guarantee correctness synthesize only programs that can be proved with your abstraction
35
Summary Machine assisted design and implementation of correct efficient highly-concurrent algorithms Designer provides insights, system explores implementation details Business impact Change the way concurrent systems are built (More) Reliable high-performance systems. Shorter time to market Scientific impact Realistic semi-automated synthesis of concurrent systems
36
Why us? Our team has expertise in concurrency and verification of concurrent systems We have preliminary experience with synthesizing concurrent algorithms in the domain of concurrent garbage collectors We have ongoing collaborations with world experts on verification of concurrent programs, and with researchers working on parallel computing
37
THE END
38
Parallelization Higher-level Underlying structure does not change during computation System can be broken into independent parts
39
Synthesizing Concurrent Systems Designing practical and efficient concurrent systems is hard trading off simplicity for performance fine-grained coordination Result: sub-optimal, buggy algorithms Need a more structured approach to synthesize correct and optimal implementations out of coarse-grained specifications Some tasks are best done by machine, while others are best done by human insight; and a properly designed system will find the right balance. – D. Knuth
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.