Download presentation
Presentation is loading. Please wait.
1
Data-centric computing with Netezza Architecture DISC reading group September 24, 2007
2
High Level Points Supercomputer use model today: –Compile, submit, wait –Does a poor job of taking advantage of human insight available in interactive models Large datasets can be interactively processed using Netezza
3
What is Netezza? Essentially: A big, fast SQL database
4
What is Netezza? Frontend provides SQL interface Backend is a large rack of specialized blades
5
Custom Backend Blades Commodity CPU, NIC, disk Custom FPGA replaces disk interface –Can do basic filtering in hardware, i.e., stream processing before data hits main memory
6
Division of Data Database distributed across multiple (100+) SPUs Each SPU controls, manages its slice of DB No info on data management, replciation, etc.
7
Division of Labor SPU FPGA handles basic filtering tasks SPU CPU handles record level processing: filtering, parsing, projecting, logging, etc. SPU CPU handles most operations on intermediate results: sorts, joins, aggregates Frontend CPU handles remaining operation >>> Processing close to disk
8
What can this be used for? Paper gives 3 examples: –Citation graph processing –Search for particular structure in electrical netlist –Word meaning disambiguation through search of ontology
9
Citation graph example Look through large, sparse graph (16 million nodes, 388 million edges) Find both strong (direct edge) and weak couplings (e.g., two papers cite the same work) Essentially same code for workstation and Netezza – no need to expose parallel architecture Workstation DNF; 80-100x speedup on smaller tests
10
IC netlist example Flattened netlist of 3.5 million transistors, 10 million wires Search for AND structure
11
IC example results Combinatorial explosion makes directly joining all possibilities for each element impossible Can constrain better using fanouts of signals internal to the circuit Individual SQL queries for finding possible matches for the individual transistors took under 10 seconds Found all uses of the AND macro, as well as many other (1300+) identical structures generated through other means
12
Ontology example Expand out all possible interpretations of a phrase Ontology specifies lexical elements, IS-A relations, concepts, and constraints on concepts Goal is to search the space, expand concepts to find all matches to given phrase
13
Ontology results Partially unfolded ontology –Greatly expands database size, but reduces iterations / recursions Recoded ontology triples as integers 5.58 sec. vs. 262 sec. can pipeline multiple queries
14
Issues Works if you can reduce your problem to SQL queries All of the problems were based on graph expansion / exploration – how about other domains? Issues of database partitioning? How does arbitrary slicing across 108 blades affect performance / scalability, esp. for non-sparse problems? Strawman comparison to workstation class machine: how does a traditional DB server / storage cluster compare?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.