Download presentation
Presentation is loading. Please wait.
Published byAiden Germain Modified over 9 years ago
1
1 Jaql → pipes Unix pipes for the JSON data model Kevin Beyer, Vuk Ercegovac, Eugene Shekita, Jun Rao, Ning Li, Sandeep Tata IBM Almaden Research Center http://code.google.com/p/jaql/ http://code.google.com/p/jaql/ http://jaql.org/ Open Source!
2
2 Goals for Jaql Provide a simple, yet powerful language to manipulate semi- structured data. Use JSON as a data model Data is usually converted to/from JSON view Most data has a natural JSON representation Easily extended using Java, Python, JavaScript, … Exploit massive parallelism using Hadoop
3
3 What is in the upcoming release? User feedback on previous release Too XQuery-like (yuck factor) Too complex Too composable, too nested, too verbose Unclear what is parallelized Next release (planned 10/30/2008) Vastly simplified syntax Inspired by Unix Pipes
4
4 A query is a pipeline source sink operator $people = file …; $greetings = file …; $people -> filter $.type = 'friendly‘ -> map { hello: $.name } -> write $greetings; // declare files // read input (json array) // find friendly people // keep just name // write output Operations listed in natural order vs last operation first one map job
5
5 Aggregate $people -> filter by $.birthdate < date(‘1990-01-01’) -> aggregate count($); // count the older people Aggregate the input into a single value Using push-based, streaming, combining API to aggregate functions one map / combine / reduce job
6
6 Partition $people -> filter by $.birthdate < date(‘1990-01-01’) -> partition by $t = $.type// partition the older people by type |- aggregate { type: $t, n: count($) } -|; // aggregate per partition Partition one or more inputs Send each individual partition through a sub-pipe Merge the results one map / combine / reduce job
7
7 User-defined operators $people -> myBestMatches($, 3); // pass “standard input” to external code Call user code Similar to calling user program / script in Unix Input and output are pipelined Like “Hadoop streaming” Not Parallel!
8
8 Per partition sub-pipe Partition one or more inputs on a key Send each partition through (duplicate) sub-pipe Merge the results merge“split” partition $people -> partition by $.type// partition people by type |- sort by $.rating// sort partition by rating -> top 100// keep just the first 100 in partition -> myBestMatches($,3) -|;// find best matches per partition one map / reduce job
9
9 Partition by default Run sub-pipe on each partition of the input If input is a file, use its partition, else arbitrary Expresses parallelism of user-defined operator $file -> partition by default// run per file partition |- buildPartialModel($) -|// partial model built per partition -> unifyModels($);// unify all partial the models into one one map job + serial unify
10
10 Join $people = file …; $children = file …; join $people on $people.id, $children on $children.mother; People: [ { id: 1, name: ‘Jack’ }, { id: 2, name: ‘Jill’ }, … ] Children: [{ id: 3, name: ‘Becky’, father: 1, mother: 2 }, …] [ { people: { id: 2, name: ‘Jill’ }, children: { id: 3, name: ‘Becky’, father: 1, mother: 2 } }, … ] one map / reduce job result is record with inputs as values joins on multiple inputs with multiple conditions Inner, left-, right-, full-outer joins
11
11 Composite Operators Join Join two or more inputs on a key Inner/outer/full Multi-predicate, multi-way Merge Concatenate all inputs in any order User-defined operator (function) Union, Intersect, Difference… composite operator Examples: One input can come from current pipe. Remaining inputs are pipe variables or nested pipes.
12
12 Composite sinks Tee Send each input item to all output pipes $people -> tee |- filter $.gender == ‘F’ -> write $women |- map { $.name } -> write $names -|; Split Send each input item to one pipe
13
13 Rough Unix analogs of Jaql UnixJaql catvar -> merge join grepfilter cut, paste, sed, tr map sort headtop uniqdistinct sort > filenamewrite tee Unix: stream of bytes / lines Jaql: stream of JSON items more structure / types
14
14 Summary Unix pipes revolutionized scripting If you know Unix pipes, you understand Jaql
15
15 Questions?Comments?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.