Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQLite: An Overly Simplistic Query Language for Data Provenance CMPS203 Final Project University of California, Santa Cruz Jack Baskin School of Engineering Michael {Leece, Sevilla}
Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language Overview Introduction Current Work Design and Implementation Conclusions
Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language Terminology Applications Terminology Applications Introduction Provenance: history + ancestry of an object [1] – Processes – Data Provenance Aware Storage (PASS) – Transparent collection PQL: Path Query Language – Useful for provenance Terminology Ancestry Graph
Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language Terminology Applications Terminology Applications Introduction Security File System Search The Cloud New Hierarchical File Systems Yan Li’s Photo Album Applications
Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language Terminology Applications Terminology Applications Introduction Obtained PASSv2 Ran PQL query on provenance database – Infinite loops – {} PQL Broken
Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQL Broken PQL Undocumented Overview PQL Broken PQL Undocumented Overview Current Work Obtained PASSv2 Ran PQL query on provenance database – Infinite loops – {} “The problem with PQL and Sage is that the implementation… is really slow, and it’s perhaps too easy to generate PQL queries that do not return any data.” – PASS Team PQL Broken
Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQL Broken PQL Undocumented Overview PQL Broken PQL Undocumented Overview Current Work PQL Undocumented
Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language PQL Broken PQL Undocumented Overview PQL Broken PQL Undocumented Overview Current Work Overview Waldo Database Dump PASSv2 Modules Kernel Space VFS Lasagna FS App1 App2 User Space BDB.twig
Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language Use Case Language Specification Use Case Language Specification Design & Implementation What we have – [ P ] 1.0 INODE 4 INODE 12[ P ] 1.0 NAME 9 "/file.txt"[ P ] 1.0 TYPE 4 "FILE"[ P ] 1.0 FREEZETIME 8 TIME [ P ] 1.0 FREEZETIME 8 TIME [ P ] 1.0 FREEZETIME 8 TIME [AP ] 1.1 INPUT 12 --> 2.1[AP ] 1.2 INPUT 12 --> 8.1[AP ] 1.3 INPUT 12 --> 16.2[ PT] 2.0 ARGV 4 [1]"cat"[ PT] 2.0 ENV 64 [2]"SHELL=/bin/bash" [3]"TERM=xterm" [4]"XDG_SESSION_COOKIE=06c3f2775eb071081dfacb984bf6c " [5]"USER=root" [6]"LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*. tar=01;31:*.tgz=01;31:*.svgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz2=01 ;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01; 35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*. mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.r m=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi =00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:" [7]"MAIL=/var/mail/root" [8]"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" [9]"PWD=/test" [10]"LANG=en_US.UTF-8" [11]"SHLVL=1" [12]"HOME=/root" [13]"LOGNAME=root" [14]"LESSOPEN=| /usr/bin/lesspipe %s" [15]"LESSCLOSE=/usr/bin/lesspipe %s %s" [16]"_=/bin/cat" [17]"OLDPWD=/"[ ] 2.0 EXECTIME 8 TIME [ P ] 2.0 TYPE 4 "PROC"[ ] 2.0 PID 4 INT 13739[ P ] 2.0 NAME 8 "/bin/cat"[A ] 2.0 FORKPARENT 12 --> [ P ] 2.0 FREEZETIME 8 TIME What we want – A list of files or processes that are one-step ancestors of “/file.txt” Use Case
Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language Use Case Language Specification Use Case Language Specification Design & Implementation Use Case (cont.) Waldo Database Dump Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt" Query Parser Evaluator Dump Parser Ancestry Graph 1 -> file.txt 2 -> jazz.jpg 3 -> bacon.txt … Label Map Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")] Abstract Syntax Tree Response: [(MyNode "/usr/bin/pico" 1,1,[2]), (MyNode "/usr/bin/vi” 2,3,[17,16,15]), (MyNode "/bin/cat" 1,4,[0])] Use Case
Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language Use Case Language Specification Use Case Language Specification Design & Implementation Use Case (cont.) Waldo Database Dump Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt" Query Parser Evaluator Dump Parser Ancestry Graph 1 -> file.txt 2 -> jazz.jpg 3 -> bacon.txt … Label Map Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")] Abstract Syntax Tree Response: [(MyNode "/usr/bin/pico" 1,1,[2]), (MyNode "/usr/bin/vi” 2,3,[17,16,15]), (MyNode "/bin/cat" 1,4,[0])] Use Case
Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language Use Case Language Specification Use Case Language Specification Design & Implementation Use Case (cont.) Waldo Database Dump Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt" Query Parser Evaluator Dump Parser Ancestry Graph 1 -> file.txt 2 -> jazz.jpg 3 -> bacon.txt … Label Map Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")] Abstract Syntax Tree Response: [(MyNode "/usr/bin/pico" 1,1,[2]), (MyNode "/usr/bin/vi” 2,3,[17,16,15]), (MyNode "/bin/cat" 1,4,[0])] Use Case
Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language Use Case Language Specification Use Case Language Specification Design & Implementation Select Statement Language Specification
Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language Use Case Language Specification Use Case Language Specification Design & Implementation Select Statement Language Specification
Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language Use Case Language Specification Use Case Language Specification Design & Implementation Expression Language Specification
Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language Use Case Language Specification Use Case Language Specification Design & Implementation Expression Language Specification
Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language Use Case Language Specification Use Case Language Specification Design & Implementation Use Case (cont.) Waldo Database Dump Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt" Query Parser Evaluator Dump Parser Ancestry Graph 1 -> file.txt 2 -> jazz.jpg 3 -> bacon.txt … Label Map Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")] Abstract Syntax Tree Response: [(MyNode "/usr/bin/pico" 1,1,[2]), (MyNode "/usr/bin/vi” 2,3,[17,16,15]), (MyNode "/bin/cat" 1,4,[0])] Use Case
Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language What We Did Well Lessons Learned References What We Did Well Lessons Learned References Conclusions Functional – It works. (PQLite > PQL) Easy to use – Intuitive (SQL-like) way of querying a provenance graph – Getting stuff we care about What we did well What We Did Well
Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language What We Did Well Lessons Learned References What We Did Well Lessons Learned References Conclusions Infinite recursion in parsing – Left recursion in a recursive descent parser – Refined syntax Began coding too soon Monads are useful – IO(), Maybe, State, Parsec Lessons Learned
Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Language What We Did Well Lessons Learned References What We Did Well Lessons Learned References Conclusions 1)Margo Seltzer, Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Jonathan Ledlie. Provenance-Aware Storage Systems. (PDF) Harvard University Computer Science Technical Report TR-18-05, July )Stephanie Jones, Christina Strong, Darrell D. E. Long, Ethan L. Miller, Tracking Emigrant Data via Transient Provenance, Proceedings of the 3rd USENIX Workshop on the Theory and Practice of Provenance (TaPP '11), June )Kiran-Kumar Muniswamy-Reddy, Uri Braun, David A. Holland, Peter Macko, Diana Maclean, Daniel Margo, Margo Seltzer, and Robin Smogor. Layering in Provenance Systems. In proceedings of the 2009 USENIX Annual Technical Conference, San Diego, CA, June )PQL Language Guide and ReferencePQL Language Guide and Reference References