Parallel Databases Michael French, Spencer Steele, Jill Rochelle When Parallel Lines Meet by Ken Rudin (BYTE, May 98)
What are Parallel/Scalable Databases? n Parallel/Scalable Databases: n Hardware Architecture Multiple Processors Multiple Disk Drives Large Memory Banks n Software Architecture Capable of processing parallel queries Data shipping capabilities
What makes Parallel Databases different from previous technologies?
Previous Technology n Hardware Single processor Small Disk Capacity Less Memory n Software Sequential Queries No partitioning of queries
Parallel Query: n A Query that partitions information to multiple processors and also has the ability to pipeline information
Information Partitioning n Divide the information into smaller tasks n Can have multiple meanings: –Distribution of info to multiple CPUs –Division of hard drive space to contain certain parts of the data
Information Partitioning 2
Information Pipelining n Allows separate processors to work on separate stages of a query –Scan –Join –Sort n Concept is akin to assembly line idea n Allows multiple queries to run at the same time
Information Pipelining 2
Sequential Query Example n Two Tables with 20 million rows each run on a uniprocessor machine –To perform scan, join & sort, query takes 12 mins. n Add partitioning –Query takes 3 mins. n Add Pipelining –12 queries can be run in 12 mins.
Parallel Kinds n Share-Everything –Hardware –Software n Share-Disk –Hardware –Software n Share-Nothing –Hardware –Software
Conclusion n Pros –Allows you to process more information –Provides for faster processing of queries n Cons –Expensive hardware & software –Much higher maintenance n Is a parallel database right for your organization?