Kathleen Fisher AT&T Labs Research Yitzhak Mandelbaum, David Walker Princeton The Next 700 Data Description Languages.

Slides:



Advertisements
Similar presentations
1 Programming Languages (CS 550) Mini Language Interpreter Jeremy R. Johnson.
Advertisements

XDuce Tabuchi Naoshi, M1, Yonelab.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
1 How to transform an analyzer into a verifier. 2 OUTLINE OF THE LECTURE a verification technique which combines abstract interpretation and Park’s fixpoint.
1 Today’s lecture  Last lecture we started talking about control flow in MIPS (branches)  Finish up control-flow (branches) in MIPS —if/then —loops —case/switch.
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.
INF 212 ANALYSIS OF PROG. LANGS Type Systems Instructors: Crista Lopes Copyright © Instructors.
CS 355 – Programming Languages
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
Foundations of Programming Languages: Introduction to Lambda Calculus
Chapter 2: Communications
Typed Assembly Languages COS 441, Fall 2004 Frances Spalding Based on slides from Dave Walker and Greg Morrisett.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
Cse321, Programming Languages and Compilers 1 6/19/2015 Lecture #18, March 14, 2007 Syntax directed translations, Meanings of programs, Rules for writing.
Program Design and Development
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
Chapter 2 A Simple Compiler
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Programming Language Semantics Denotational Semantics Chapter 5 Part III Based on a lecture by Martin Abadi.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
Principles of Programming Chapter 1: Introduction  In this chapter you will learn about:  Overview of Computer Component  Overview of Programming 
Elements of Computing Systems, Nisan & Schocken, MIT Press, 2005, Chapter 10: Compiler I: Syntax Analysis slide 1www.idc.ac.il/tecs.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Abstract Types Defined as Classes of Variables Jeffrey Smith, Vincent Fumo, Richard Bruno.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Simple Program Design Third Edition A Step-by-Step Approach
INTRODUCTION TO COMPUTING CHAPTER NO. 06. Compilers and Language Translation Introduction The Compilation Process Phase 1 – Lexical Analysis Phase 2 –
1 Week 4 Questions / Concerns Comments about Lab1 What’s due: Lab1 check off this week (see schedule) Homework #3 due Wednesday (Define grammar for your.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
Mathematical Modeling and Formal Specification Languages CIS 376 Bruce R. Maxim UM-Dearborn.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
(Business) Process Centric Exchanges
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Chapter 7 File I/O 1. File, Record & Field 2 The file is just a chunk of disk space set aside for data and given a name. The computer has no idea what.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 8: Semantic Analysis and Symbol Tables.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
PADS: Processing Arbitrary Data Streams Kathleen Fisher Robert Gruber.
Introduction to Parsing
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
Chapter 3 Part II Describing Syntax and Semantics.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
1 Compiler & its Phases Krishan Kumar Asstt. Prof. (CSE) BPRCE, Gohana.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
The Next 700 Data Description Languages Yitzhak Mandelbaum Princeton University Computer Science Collaborators: Kathleen Fisher and David Walker.
 Fall Chart 2  Translators and Compilers  Textbook o Programming Language Processors in Java, Authors: David A. Watts & Deryck F. Brown, 2000,
POPL 2006 The Next 700 Data Description Languages Yitzhak Mandelbaum, David Walker Princeton University Kathleen Fisher AT&T Labs Research.
CSC 4181 Compiler Construction
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Chapter 1: Preliminaries Lecture # 2. Chapter 1: Preliminaries Reasons for Studying Concepts of Programming Languages Programming Domains Language Evaluation.
© 2010 IBM Corporation RESTFul Service Modelling in Rational Software Architect April, 2011.
LLVM IR, File - Praakrit Pradhan. Overview The LLVM bitcode has essentially two things A bitstream container format Encoding of LLVM IR.
CSE 420 Lecture Program is lexically well-formed: ▫Identifiers have valid names. ▫Strings are properly terminated. ▫No stray characters. Program.
COMP 412, FALL Type Systems C OMP 412 Rice University Houston, Texas Fall 2000 Copyright 2000, Robert Cartwright, all rights reserved. Students.
Advanced Computer Systems
A Simple Syntax-Directed Translator
Information Science and Engineering
Introduction to Parsing
Introduction to Parsing (adapted from CS 164 at Berkeley)
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Abstract Types Defined as Classes of Variables
Presentation transcript:

Kathleen Fisher AT&T Labs Research Yitzhak Mandelbaum, David Walker Princeton The Next 700 Data Description Languages

Summer School 2007 Review: Technical Challenges of Ad Hoc Data Data arrives “as is.” Documentation is often out-of-date or nonexistent. –Hijacked fields. –Undocumented “missing value” representations. Data is buggy. –Missing data, human error, malfunctioning machines, race conditions on log entries, “extra” data, … –Processing must detect relevant errors and respond in application- specific ways. –Errors are sometimes the most interesting portion of the data. Data sources often have high volume. –Data may not fit into main memory.

Summer School 2007 Many Data Description Languages PacketTypes (SIGCOMM ‘00) –Packet processing DataScript (GPCE ‘02) –Java jar files, ELF object files Erlang Binaries (ESOP ‘04) –Packet processing PADS (PLDI ‘05) –General ad hoc data

Summer School 2007 The Next 700 Programming Languages The languages people use to communicate with computers differ in their intended aptitudes, towards either a particular application area, or a particular phase of computer use (high level programming, program assembly, job scheduling, etc.). They also differ in physical appearance, and more important, in logical structure. The question arises, do the idiosyncrasies reflect basic logical properties of the situation that are being catered for? Or are they accidents of history and personal background that may be obscuring fruitful developments? This question is clearly important if we are trying to predict or influence language evolution. Continued…

Summer School 2007 The Next 700 Programming Languages, cont. To answer it we must think in terms, not of languages, but families of languages. That is to say we must systematize their design so that a new language is a point chosen from a well- mapped space, rather than a laboriously devised construction. — J. P. Landin The Next 700 Programming Languages, 1965.

Summer School 2007 The Next 700 Data Description Languages What is the family of data description languages? How do existing languages related to each other? What differences are crucial, which “accidents of history”? What do the existing languages mean, precisely? To answer these questions, we introduce a semantic framework for understanding data description languages.

Summer School 2007 PacketTypes PADS DataScript DDC Contributions A core data description calculus (DDC) –Based on dependent type theory –Simple, orthogonal, composable types –Types transduce external data source to internal representation. Encodings of high-level DDLs in low-level DDC

Summer School 2007 Outline Introduction A Data Description “Calculus” (DDC) But what does DDC mean? –Well-kinding judgment –Representation, parse descriptor, and parser generation But what do data description languages (DDLs) mean? –Idealized PADS (IPADS) –Features from other DDLs. Applications of the semantics

Summer School 2007 A Data Description Calculus ?

Summer School 2007 Candidate DDC Primitives Base types parameterized by expressions (Pstring(:`|`:)) –Type constructor constants Pair of fields with cascading scope (Pstruct) –Dependent products Additional constraints (Ptypedef, Pwhere, field constraints). –Set types Alternatives (Punion, Popt) –Sums Open-ended sequences (Parray) –Some kind of list? User-defined parameterized types –Abstraction and application “Active types”: compute, absorb, and scanning –Built-in functions

Summer School 2007 Base Types and Sequences C(e): base type parameterized by expression e.  x: .  ’: dependent product describes sequence of values. –Variable x gives name to first value in sequence. –Note syntactic sugar:  *  ’ if x not in  ’. Examples: “123hello|”int * string(‘|’) * char(123, “hello”, ‘|’) “3513”  width:int_fw(1). int_fw(width) (3,513) “:hello:”  term:char.string(term) * char (‘:’,“hello”,‘:’)

Summer School 2007 Constraints {x:  | e}: set types add constraints to the type  and express relationships between elements of the data. Examples: ‘a’{c : char | c = ‘a’} (abbrev: S c (‘a’))inl ‘a’ “101”, “82” {x : int | x > 100} inl 101, inr 82 “43|105|67”  min:int.S c (‘|’) *  max:{m:int | min ≤ m}.S c (‘|’) * {mid:int | min ≤ mid & mid ≤ max} (43, inl ‘|’, inl 105, inl ‘|’, inl 67)

Summer School 2007 Unions and the Empty String  +  ’ : deterministic, exclusive or – try  ; on failure, try  ’. unit: matches the empty string. Examples: “54”, “n/a”int + S s (“n/a”)inl 54, inr “n/a” “2341”, “”int + unitinl 2341, inr ()

Summer School 2007 Array Features What features do we need to handle data sequences? –Elements –Separator between elements –Termination condition (“Are we done yet?”) –Terminator after sequence Examples: “ ” “Harry|Ron|Hermione|Ginny;”

Summer School 2007 Bottom and Arrays  seq(  s ; e,  t ) specifies: –Element type  –Separator types  s. –Termination condition e. –Terminator type  t. bottom: reads nothing, flagging an error. Example: IP address. “ ”int seq(S c (‘.’); len 4, bottom)[192,168,1,1]

Summer School 2007 Abstraction and Application Can parameterize types over values: x.  Correspondingly, can apply types to values:  e Example: IP address with terminator none term.int seq(S c (‘.’); len 4, S c (term)) none “ |”IP_addr ‘|’ * S c (‘|’)([1,2,3,4], inl ‘|’)

Summer School 2007 Absorb, Compute and Scan Absorb, Compute and Scan are active types. –absorb(  ) : consume data from source; produce nothing. –compute(e:  ) : consume nothing; output result of computation e. –scan(  ) : scan data source for type . Examples: “|”absorb(S c (‘|’))() “10|12”  width:int.S c (‘|’) *  length:int. area:compute(width  length:int) (10,12,120) “^%$!&_|”scan(S c (‘|’))(6,inl ‘|’)

Summer School 2007 DDC Example: Idealized Web Server Log S = st.{s : string | s = st} authid_t = S(“-”) + string(‘’) response_t = x.{y : int16_FW(x) | 100 <= y and y < 600 } entry_t =  client : ip. S(“ ”) *  remoteid : authid_t. S(“ ”) *  response : response_t 3. compute(getdomain client = “edu” : bool) entry_t seq(S(“\n”). x.false, bottom) kfisher 208

Summer School 2007 A data description calculus C(e)Atomic type parameterized by expression e  x: .  ’ Field sequence with cascading scope {x:  | e} Adding constraints to existing descriptions  +  ’ Alternatives  seq(  s ; e,  t ) Open ended sequences x.  Parameterizing types by expressions.  e Supplying values to parameterized types. unit/bottomEmpty strings: ok/error absorb, compute, scan “Active types”

Summer School 2007 Semantics Overview Well formed DDC type:  |-  :  Representation for type  : [  ] rep Parse descriptor for type  : [  ] PD Parsing function for type  : [  ] –[  ] : bits * offset  offset * [  ] rep * [  ] PD

Summer School 2007 Type Kinding Kinding ensures types are well formed.  |-  :    |- e :   |-  e:   |-  : type  |-  ’ : type  |-  +  ’: type  |-  : type , x:[  rep  pd |- e : bool  |- {x:  | e}: type

Summer School 2007 Selected Representation Types DDCHost Language [C(e)] rep I ( C) + noval [  x: .  ’] rep [  ] rep * [  ’] rep [{x:  | e}] rep [  ] rep + ([  ] rep error) [  +  ’] rep [  ] rep + [  ’] rep + noval [  seq(  s ; e,  t ) ] rep int * ([  ] rep seq) [ x.  ] rep, [  e] rep [  ] rep [unit] rep unit unrecoverable error semantic error Note that we erase all dependencies.

Summer School 2007 Selected Parse Descriptor Types DDCHost Language [C(e)] PD pd_hdr [  x: .  ’] PD pd_hdr * [  ] PD * [  ’] PD [{x:  | e}] PD pd_hdr * [  ] PD [  +  ’] PD pd_hdr * ([  ] PD + [  ’] PD ) [  seq(  s ; e,  t ) ] PD pd_hdr * int * int * ( [  ] PD seq ) [ x.  ] PD, [  e] PD [  ] PD [unit] PD pd_hdr pd_hdr = int * errcode * span

Summer School 2007 Parsing Semantics of Types Semantics expressed as parsing functions written in the polymorphic -calculus. –[  ] : bits * offset  offset * [  ] rep * [  ] PD Product case: [  x: .  ’] = (B,  ). let (  1, r 1, p 1 ) = [  ] (B,  ) in let x = (r 1, p 1 ) in let (  2, r 2, p 2 ) = [  ’] (B,  2 ) in (  2, R  (r 1, r 2 ), P  (p 1, p 2 ))

Summer School 2007 Properties of the Calculus Theorem: If  |-  :  then –  |- [  ] : bits * offset  offset * [  ] rep * [  ] pd “Well-formed type  yields a parser that returns values with types corresponding to .” Theorem: Parsers report errors accurately. –Errors in parse descriptor correspond to errors in representation. –Parsers check all semantic constraints.

Summer School 2007 Making Use of the Calculus IPADS DDC t   IPADS t ::= C(e) | Pfun(x:s) = t | t e | Pstruct{fields} | Punion{fields} | Pswitch e of {alts t def ;} | Popt t | t Pwhere x.e | Palt{fields} | t Parray [t, t] | Pcompute e | Plit c fields ::= | fields x : t; alts ::= | alts e => t;

Summer School 2007 IPADS Example authid_t = Punion { unauth : “-”; id : Pstring(“ ”)}; response_t = Pfun(x:int) Puint16_FW(x) Pwhere y.100 <= y and y < 600; entry_t = Pstruct { client : Pip; “ ”; remoteid : authid_t; “ ”; response : response_t 3; academic : Pcompute(getdomain client = “edu” : bool); }; entry_t Parray(Peor, Peof) kfisher 208

Summer School 2007 Example: Popt and Plit Popt t   + unit t   Plit c  scan(absorb({x:char | x = c})) c : char unit  1 +  2 C(e) {x:  | e} absorb(  ) scan(  )

Summer School 2007 Example: Pswitch Pswitch e of {e 1 => t 1 ; e 2 => t 2 ; … t def }  ( c.{x:  1 | c = e 1 } + {x:  2 | c = e 2 } + …+  def ) e t i   i (i = 1…n)  +  ’ x.  {x:  |e} t def   def

Summer School 2007 Example: Pswitch Pswitch e of {e 1 => t 1 ; e 2 => t 2 ; … t def }  ( c.{x:  1 | c = e 1 } + {x:  2 | c = e 2 } + …+  def ) e t i   i (i = 1…n)  +  ’ x.  {x:  |e} t def   def But this encoding isn’t exactly right, as it parses the data as each branch until it reaches the matching tag.

Summer School 2007 Encoding Conditionals if e then t 1 else t 2  ({x:unit | !e} +  1 ) * ({x:unit | e} +  2 ) t 1   1 if e then t 1 else t 2 t 2   2

Summer School 2007 Pswitch e { e 1 => x 1 : t 1 … e n => x n : t n t def } Pswitch Revisted Encode Pswitch as a sequence of conditionals ( Pfun (x : int) = if x = e 1 then t 1 else … if x = e n then t n else t def ) e =

Summer School 2007 Other Features PacketTypes: arrays, where clauses, structures, overlays, and alternation. DataScript: set types (enumerations and bitmask sets), arrays, constraints, value-parameterized types, and (monotonically increasing labels).

Summer School 2007 Other Uses of the Semantics Bug hunting! –Non-termination of array parsing if no progress made. –Inconsistent parse descriptor construction. Principled extensions –Adding recursion (done) –Adding polymorphism (done in PADS/ML) Distinguishing the essential from the accidental –Highlights places where PADS/C sacrifices safety. –Pomit and Pcompute : much more useful than originally thought –Punion : what if correct branch has an error?

Summer School 2007 Summary Data description languages are well-suited to describing ad hoc data. No one DDL will ever be right. Different domains and applications will demand different languages with differing levels of expressiveness and abstraction. Our work defines the first semantics for data description languages. For more information, visit