Data-centric computing with Netezza Architecture DISC reading group September 24, 2007.

Data-centric computing with Netezza Architecture DISC reading group September 24, 2007

High Level Points Supercomputer use model today: –Compile, submit, wait –Does a poor job of taking advantage of human insight available in interactive models Large datasets can be interactively processed using Netezza

What is Netezza? Essentially: A big, fast SQL database

What is Netezza? Frontend provides SQL interface Backend is a large rack of specialized blades

Custom Backend Blades Commodity CPU, NIC, disk Custom FPGA replaces disk interface –Can do basic filtering in hardware, i.e., stream processing before data hits main memory

Division of Data Database distributed across multiple (100+) SPUs Each SPU controls, manages its slice of DB No info on data management, replciation, etc.

Division of Labor SPU FPGA handles basic filtering tasks SPU CPU handles record level processing: filtering, parsing, projecting, logging, etc. SPU CPU handles most operations on intermediate results: sorts, joins, aggregates Frontend CPU handles remaining operation >>> Processing close to disk

What can this be used for? Paper gives 3 examples: –Citation graph processing –Search for particular structure in electrical netlist –Word meaning disambiguation through search of ontology

Citation graph example Look through large, sparse graph (16 million nodes, 388 million edges) Find both strong (direct edge) and weak couplings (e.g., two papers cite the same work) Essentially same code for workstation and Netezza – no need to expose parallel architecture Workstation DNF; 80-100x speedup on smaller tests

IC netlist example Flattened netlist of 3.5 million transistors, 10 million wires Search for AND structure

IC example results Combinatorial explosion makes directly joining all possibilities for each element impossible Can constrain better using fanouts of signals internal to the circuit Individual SQL queries for finding possible matches for the individual transistors took under 10 seconds Found all uses of the AND macro, as well as many other (1300+) identical structures generated through other means

Ontology example Expand out all possible interpretations of a phrase Ontology specifies lexical elements, IS-A relations, concepts, and constraints on concepts Goal is to search the space, expand concepts to find all matches to given phrase

Ontology results Partially unfolded ontology –Greatly expands database size, but reduces iterations / recursions Recoded ontology triples as integers 5.58 sec. vs. 262 sec. can pipeline multiple queries

Issues Works if you can reduce your problem to SQL queries All of the problems were based on graph expansion / exploration – how about other domains? Issues of database partitioning? How does arbitrary slicing across 108 blades affect performance / scalability, esp. for non-sparse problems? Strawman comparison to workstation class machine: how does a traditional DB server / storage cluster compare?

Data-centric computing with Netezza Architecture DISC reading group September 24, 2007.

Similar presentations

Presentation on theme: "Data-centric computing with Netezza Architecture DISC reading group September 24, 2007."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data-centric computing with Netezza Architecture DISC reading group September 24, 2007.

Similar presentations

Presentation on theme: "Data-centric computing with Netezza Architecture DISC reading group September 24, 2007."— Presentation transcript:

Similar presentations

About project

Feedback