FAST : a Transducer Based Language for Manipulating Trees Presented By: Loris D’Antoni Joint work with: Margus Veanes, Ben Livshits, David Molnar.

Slides:



Advertisements
Similar presentations
Equivalence of Extended Symbolic Finite Transducers Presented By: Loris D’Antoni Joint work with: Margus Veanes.
Advertisements

C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
Introduction to Computability Theory
1 Introduction to Computability Theory Lecture7: PushDown Automata (Part 1) Prof. Amos Israeli.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
Proof-system search ( ` ) Interpretation search ( ² ) Main search strategy DPLL Backtracking Incremental SAT Natural deduction Sequents Resolution Main.
Chapter 4 Lexical and Syntax Analysis Sections 1-4.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
Parsing III (Eliminating left recursion, recursive descent parsing)
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
CS 490: Automata and Language Theory Daniel Firpo Spring 2003.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Normal forms for Context-Free Grammars
Lexical and Syntax Analysis
MIT Top-Down Parsing Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
YOLT Yuan Zheng Omar Ahmed Lukas Dudkowski T. Mark Kuba.
CPSC Compiler Tutorial 3 Parser. Parsing The syntax of most programming languages can be specified by a Context-free Grammar (CGF) Parsing: Given.
Streaming Tree Transducers Loris D'Antoni University of Pennsylvania Joint work with Rajeev Alur 1.
Static Analysis of String Encoders and Decoders Presented By: Loris D’Antoni Joint work with: Margus Veanes.
PROGRAMMING USING AUTOMATA AND TRANSDUCERS Loris D’AntoniMargus Veanes.
Query Processing Presented by Aung S. Win.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
Digital Electronics Data Structures LISP
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Streaming Tree Transducers Loris D'Antoni University of Pennsylvania Joint work with Rajeev Alur 1.
INTRODUCTION TO THE THEORY OF COMPUTATION INTRODUCTION MICHAEL SIPSER, SECOND EDITION 1.
Minimization of Symbolic Automata Presented By: Loris D’Antoni Joint work with: Margus Veanes 01/24/14, POPL14.
Proof Carrying Code Zhiwei Lin. Outline Proof-Carrying Code The Design and Implementation of a Certifying Compiler A Proof – Carrying Code Architecture.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 2.
Grammars CPSC 5135.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
BizTalk Flat File Parsing Annotations. Flat File Parsing = LL(k) Parser The flat file parser is entirely grammar driven and is implemented as an LL(k)
Fast and Precise Sanitizer Analysis with B EK Pieter Hooimeijer Ben Livshits David Molnar Prateek Saxena Margus Veanes USENIX Security.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
CPS 506 Comparative Programming Languages Syntax Specification.
QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.
Introduction to Compiling
Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Grammars CS 130: Theory of Computation HMU textbook, Chap 5.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Functions Discrete Structure. L62 Functions. Basic-Terms. DEF: A function f : A  B is given by a domain set A, a codomain set B, and a rule which for.
C H A P T E R T W O Linking Syntax And Semantics Programming Languages – Principles and Paradigms by Allen Tucker, Robert Noonan.
Exchange Intensional XML Data Tova MiloSerge Abiteboul Tova Milo INRIA & Tel-Aviv U. ; Serge Abiteboul INRIA ; Bernd AmannOmar Benjelloun Bernd Amann Cedric-CNAM.
Compiler Construction CPCS302 Dr. Manal Abdulaziz.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
A Simple Syntax-Directed Translator
Introduction to Parsing
Table-driven parsing Parsing performed by a finite state machine.
Syntax Analysis Chapter 4.
Ch. 4 – Semantic Analysis Errors can arise in syntax, static semantics, dynamic semantics Some PL features are impossible or infeasible to specify in grammar.
Syntax-Directed Translation
Top-Down Parsing CS 671 January 29, 2008.
Compiler design.
Chapter 4: Lexical and Syntax Analysis Sangho Ha
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Presentation transcript:

FAST : a Transducer Based Language for Manipulating Trees Presented By: Loris D’Antoni Joint work with: Margus Veanes, Ben Livshits, David Molnar

Motivation Trees are common input/output data structures – XML query, type-checking, etc… – Compilers/optimizers (from parse tree to parse tree) – Tree manipulating programs: data structures algorithms, ontologies, etc… 2

HTML Sanitization Removing malicious active code from HTML documents is a tree transformation body script malicious code div p “Today I’m happy” body div p “Today I’m happy” SANITIZE 3

What do we Need? We want to write these single transformations separately to avoid errors 4 Remove malicious URLs Replace deprecated tags Remove bad elements (scripts…)

Interesting Properties Composition: T(x) = T 2 (T 1 (x)) Type-checking: given two languages I,O T(I) is always in O Pre-image: compute the input that produces a particular output To achieve speed Check if the sanitizer ever produces a malicious output Produce counterexamples if type-checking fails 5 DEMODEMO:

FAST Compiler FAST code Transducers Analysis and optimization C# 6 SMT solver

Stages by Example 7 mapCmapC2 Transducers

CHOOSING THE RIGHT FORMALISM 8

Semantics as Transducers Goal: find a decidable class of tree transducers that can express the previous examples 9

Top Down Tree Transducers [Engelfriet75] q(a(x 1,x 2 ))  b(c,q 1 (x 1 )) Decidable properties: type-checking, etc… Domain expressiveness: only finite alphabets ab c q q1q1 x1x1 x2x2 x1x1 10

Symbolic Tree Transducers [PSI11] q(λa.a>3,(x 1,x 2 ))  λa.a+1,(λa.a-2,q 1 (x 1 )) Decidable properties: type-checking, etc… Domain expressiveness: infinite alphabets using predicates and functions Structural expressiveness: can’t delete a node without reading it first q q1q1 x1x1 x2x2 x1x1 Such that 5>3 is true 11 Alphabet theory has to be DECIDABLE We’ll use Z3 to check predicate satisfiability

Improving structural expressiveness Transformation: delete the left child if its root greater than 5 If we delete the node we can’t check that the left child was actually greater than q 1 3 q 12 Regular Look-Ahead (RLA) ??

Regular Look Ahead (TOP R ) : Transformation: delete a node if its left child is greater than 5 Rules can ask whether the children are in particular languages – p 1 : the language of trees whose root is greater than 5 – p 2 : the language of all trees Decidable properties: type-checking, etc… Domain expressiveness: infinite alphabets Structural expressiveness: good enough to express our examples q p1p1 p2p2 1 3 q Transformation now is safe 13

DecidabilityComplexityStructuralExpressiveness Infinite alphabets Top Down Tree Transducers [Engelfriet75]VVXX Top Down Tree Transducers with Regular Look-ahead [Engelfriet76]VV~X Streaming Tree Transducers [AlurDantoni12]VXVX Data Automata [Bojanczyk98]~XXV Symbolic Tree Transducers [VeanesBjoerner11]VVXV Symbolic Tree Transducers RLAVV~V 14

COMPOSITION OF SYMBOLIC TRANSDUCERS WITH REGULAR LOOKAHEAD 15

Composition of STT R This is not always possible!! Find the biggest class for which it is possible 16 T1T1 T1T1 T2T2 T2T2 T 1 o T 2

Classes of STT R DETERMINISTIC: at most one transducer rule applies for each input tree LINEAR: each child appear at most once in the right hand side of each rule 17 xq x+1 q2q2 q1q1 linear nonlinear x+1 q2q2 q1q1

When can we Compose? Theorem: T(x) = T 2 (T 1 (x)) definable by a Symbolic Tree Transducers with RLA if – T 1 is deterministic, OR – T 2 is linear All our examples fall in this category 18 Alphabet theory has to be DECIDABLE We’ll use Z3 to check predicate satisfiability

Pre-image as Composition 19 T O ? Domain(T o O)

FAST: Decidable by Design 20 Composition Type-checking Pre-image Symbolic Tree Transducers with RLA SMT Solver for Alphabet Theory

CASE STUDIES AND EXPERIMENTS 21

Case Studies and Experiments Program Optimization: Deforestation of functional programs Verification: HTML sanitization Analysis of functional programs Augmented reality app store 22 Infinite Alphabets: Integer Data types

Deforestation Removing intermediate data structures from programs ADVANTAGE: the program is a single transducer reads the input list only once, thanks to transducers composition 23 alphabet ILIst [i : int] { nil(0), cons(1) } trans mapC: IList  IList { nil() to nil [0] | cons(x) to cons [(i+5)%26] (mapC x) } def mapC 2 : IList  IList := compose mapC mapC

Deforestation: Speedup 24 f(f(f(…f(x)...) (f;f;f;…;f)(x)

Analysis of Functional Programs 25

AR Interference Analysis Recognizers output data that can be seen as a tree structure Spine Hip Neck HeadKnee Ankle Foot …. 26

Apps as Tree Transformations Applications that use recognizers can be modeled as FAST programs 27 trans addHat: STree -> STree Spine(x,y) to Spine(addHat(x), y) | Neck(h,l,r) to Neck(addHat(h), l, r) | Head(a) to Head(Hat(a))

Composition of Programs Two FAST programs can be composed into a single FAST program p1p1p1p1 p2p2p2p2 p 1 ;p 2 28

Interference analysis Apps can be malicious: try to overwrite outputs of other apps Apps interfere when they annotate the same node of a recognizer’s output We can compose them and check if they interfere statically!! – Put checker in the AppStore and analyze Apps before approval Interfering apps Add cat earsAdd hat Add pin to a cityBlur a city Amazon Buy Now button Malicious Buy Now button 29

Interference Analysis in Practice 100 generated FAST programs, up to 85 functions each Check statically if they conflict pairwise for ANY possible input Checked 99% of program pair in less than 0.5 sec! For an App store these are perfectly fine

No Cheap Talk 31

Conclusion FAST: a versatile language for tree manipulating programs with decidable analysis Symbolic tree transducers with RLA FAST is online: 32

DecidabilityComplexityStructuralExpressiveness Infinite alphabets Top Down Tree Transducers [Engelfriet75]VVXX Top Down Tree Transducers with Regular Look-ahead [Engelfriet76]VV~X Streaming Tree Transducers [AlurDantoni12]VXVX Data Automata [Bojanczyk98]~XXV Symbolic Tree Transducers [VeanesBjoerner11]VVXV 33