Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by Marie-Gisele Assigue Hon Shea Thursday, March 31 st 2011.

Similar presentations


Presentation on theme: "Presented by Marie-Gisele Assigue Hon Shea Thursday, March 31 st 2011."— Presentation transcript:

1 Presented by Marie-Gisele Assigue Hon Shea Thursday, March 31 st 2011

2  A relational database software for reporting, data analysis and Business Intelligence  Meaning it has to analyze terabytes of data  VectorWise recently set a new record on the TPC-H benchmark

3  Financial services like banks, wall street  Desirable to query historical data as well as current positions  Data volume is simply too large to store cost effectively in memory  VectorWise delivers in-memory performance with data stored on-disk  Social media: for example for advertisement  E-commerce  Performance of database suffers as the amount of historical data grows  VectorWise is able to deliver good performance even when analyzing large amount of data

4  SIMD(SINGLE INSTRUCTION MULTIPLE DATA)  SIMD instructions allows the same operation performed on multiple data simultaneously  Traditional databases process data one tuple at a time  Vectorwise processes vector of hundreds of element at once  Using large CPU cache as execution memory  Size of vector is tuned to fit into cache  HARDWARE ACCELERATED STRING-BASED OPERATIONS  Supported by Intel Xeon processor  Speeds up operations like: Selections on strings using wild card matching Aggregations on string-based values Joins or sorts using string keys  Up to 2 – 4 times faster

5  Use of COLUMN-BASED storage  For data warehouse databases, most queries retrieve many rows  Row-based storage would generate a lot of unnecessary I/O  Column-based storage is generally accepted as a superior storage model for this type of workload

6  VectorWise’s Hybrid Column Store  By default data is stored column by column  For tables that are indexed on more than one column, indexed columns are stored together in the same block  This data storage model is known as PAX (Partition Attributes Across)  PAX delivers better cache performance

7  Guarantees ACID properties  Supports multi-version read consistency  Use of POSITIONAL DELTA TREES(PDTs)  Small inserts, updates or deletes is expensive in column-based database (as opposed to large bulk data load operations)  PDT is an in-memory structure that stores the position and the change (delta) at that position  PDTs use a configurable amount of memory. Once the memory pool is exhausted, updates are written to disk

8 DATA COMPRESSION - VectorWise compresses data on a column-by- column basis using these any one of these algorithm: RLE(Run Length Encoding, PFOR(Patched Frame Of Reference or delta encoding on top of PFOR)) - For instance the VectorWise Innovated use of data compression in order to improve performance by allocating a portion of physical memory for a memory-bases disk buffer called the CBM(Column Buffer Manager). The data is automatically pr- fetched from disk and stored in the CBM.

9

10 STORAGE INDEXES storage indexes in extreme cases can provide the same benefit as data partitioning does for other databases w/o the overhead of multiple database object or maintaining a partitioning strategy. - VectorWise automatically maintains a storage index per column storing minimum and maximum values for the data block. - Very efficient in determining whether a database block is a candidate block for a particular query.

11

12 PARALLEL EXECUTION  Parallel execution provides the greatest performance improvements in DSS (Decision support system) and data warehousing environments. The VectorWise engine is able to sustain a large amount of concurrent queries efficiently on a multi-core system  Ex.of Parallel Execution Server Connections and Buffers

13  New record set by Ingres for the TPC-H benchmark at the 100GB scale factor is an astounding 3.4 times faster than the old mark.  New record of 251,561 QphH (Queries per hour) for 100 GB of data was set by Ingres's VectorWise database running on one HP ProLiant DL380 G7.

14  Enables you to a workload on a server  Can lower the cost instantly by better utilizing your hardware (dynamic).  Achieve extremely fast performance for typical data warehouse and data mart workload.

15  http://www.itwire.com/business-it-news http://www.itwire.com/business-it-news  INGRES VectorWise Whitepaper, a technical whitepaper  http://uhesse.wordpress.com  http://kerryosborne.oracle-guy.com/2010/08/oracle- exadata-storage-indexes/ http://kerryosborne.oracle-guy.com/2010/08/oracle- exadata-storage-indexes/  http://www.openexpo.ch/fileadmin/documents/201 0Bern/Slides/25_OlafLaber.pdf http://www.openexpo.ch/fileadmin/documents/201 0Bern/Slides/25_OlafLaber.pdf  http://www.wikipedia.org http://www.wikipedia.org  Ailamaki, Anastassia. A Storage Model to Bridge the Processor/Memory Speed Gap. Carnegie Mello University, 2001  ANY QUESTION?


Download ppt "Presented by Marie-Gisele Assigue Hon Shea Thursday, March 31 st 2011."

Similar presentations


Ads by Google