Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ingres Plus X100 Equals Ingres Vectorwise. Agenda  Why?  Introduction to Vectorwise  Groundwork  Vectorwise and OPF  Vectorwise and QEF.

Similar presentations


Presentation on theme: "Ingres Plus X100 Equals Ingres Vectorwise. Agenda  Why?  Introduction to Vectorwise  Groundwork  Vectorwise and OPF  Vectorwise and QEF."— Presentation transcript:

1 Ingres Plus X100 Equals Ingres Vectorwise

2 Agenda  Why?  Introduction to Vectorwise  Groundwork  Vectorwise and OPF  Vectorwise and QEF

3 Names  X100 was the research project name –“faster times 100”  VectorWise (or Ingres VectorWise) is the product name  X100 or IVW or VW abbreviations are used internally  It's all pretty much interchangeable

4 Confidential — © 2009 Ingres CorporationSlide 4 Why Ingres?  Ingres – enterprise RDBMS –Full functioned data base server –User interfaces: SQL, embedded SQL, API, ODBC, JDBC, etc. –Application interfaces: OpenROAD, ABF, etc. –Utilities: backup, restore, rollforward, etc. –…  VectorWise – experimental RDBMS –Very (very, very, …) fast –But only QEF/DMF equivalent

5 Why Ingres?  Developing “required” components would take many years  Established sales force, customer base  Right sized company  Agreeable business arrangements Confidential — © 2009 Ingres CorporationSlide 5

6 What’s in VectorWise?  Column store data base – Hybrid row store capability is coming  Internal catalog – Store definitions of tables, columns, indexes, etc.  Relational algebra language interface – Used for DDL and DML operations, transaction management – Basis of Ingres interface  Various tools & utilities – iivwinfo, iivwfastload, iivwstats, x100pp, x100profgraph, x100_client Confidential — © 2009 Ingres CorporationSlide 6

7 VectorWise Algebra  Actual queries that go to VectorWise are recorded in the VectorWise log – DML can be seen by using trace point op207 (don’t forget to use x100pp)  Simple queries: CreateTable, MinMaxIndex, Savepoint, Commit  DML: Append, Update, Delete Confidential — © 2009 Ingres CorporationSlide 7

8 VectorWise Algebra  Retrievals – queries consist of nested operators, built by the OPF cross compiler – Project, Select, TopN, Window, Sort, Aggr, OrdAggr, Mscan, MergeJoin1, HashJoin01, HashJoinN, CartProd – “select sno, city from s where status > 50” generates: Project( Select( Mscan(_s = ‘_s’, [‘_status’, ‘_city’, ‘_sno’] ), [‘est_card’ = ‘5’], >(_s._status, sint(‘50’)) ), [‘est_card’ = ‘1’], [_s._sno, _s._city] ) Confidential — © 2009 Ingres CorporationSlide 8

9 VectorWise Algebra  One more - “select r_name, n_name from region, nation where r_regionkey = n_regionkey order by r_name” generates: Sort ( Project ( HashJoin01 ( MScan ( _nation = '_nation', [ '_n_regionkey', '_n_name'] ) [ 'est_card' = '25' ], [ _nation._n_regionkey ], MScan (_region = '_region', [ '_r_regionkey', '_r_name'] ) [ 'est_card' = '5' ], [ _region._r_regionkey ], 0 ) [ 'est_card' = '25' ], [_region._r_name, _nation._n_name] ),[_region._r_name] ) Confidential — © 2009 Ingres CorporationSlide 9

10 VectorWise – Why is it so Fast?  Primarily a column store – much less data read from disk  Compression techniques highly tuned to modern computer architectures (multi-level caching, etc.)  Lightweight indexing technique  Other column stores operate on re-constituted rows  VectorWise uses new and novel execution architecture that retains column structure and processes vectors of data at a time –SIMD, CPU cache, …  Research is ongoing and there’s more to come! Confidential — © 2009 Ingres CorporationSlide 10

11 Implementation Goals  Minimize special X100-only syntax  Minimize effect on server facilities not directly involved (e.g. SCF, DMF)  Localize changes to affected facilities as much as possible  (Optimizer in particular) Write new algorithms in such a way that Ingres tables could eventually take advantage

12 Groundwork  Initial thought was to add X100 as a new built-in Gateway, similar to IMA etc –Presumably would cause minimal changes to QEF –Would probably be too slow (every result row filtered one at a time from GWF to DMF then QEF)  Better idea, add Vectorwise as a new table storage type –OPF will generate special plans for IVW tables –Easy to do, minimal new syntax required  Ingres catalogs carry Vectorwise table definitions

13 Groundwork  Parser changes: –WITH STRUCTURE=VECTORWISE Config default for IVW installations SET RESULT_STRUCTURE … –Various checks to disallow VW tables in contexts where they aren't supported (e.g. DB procedure) –New “CALL VECTORWISE” statement to send non- SQL-related requests (e.g. combine) –New statement types for VW DDL statements Especially for constraint operations –New query-uses-VW-tables flag(s)

14 Groundwork  Front-end utilities changed to recognize VW table types –Some additional work required by VW restrictions, such as create index only allowed on empty tables  Essentially no sequencer changes –Minor change to recognize X100 interface facility  DMF changes to permit tables with no underlying table file  DMF changes for backup/restore  RDF changes to support (hidden) VW join indexes

15 VectorWise & OPF  Parsing is “easy”, but why did we think OPF could compile VectorWise queries?  Optimization is all about row sizes, cardinalities and the comparison of different plans  optimizedb works on VectorWise tables and produces good cardinality estimates  The optimizer architecture is designed to work for any target database engine Confidential — © 2009 Ingres CorporationSlide 15

16 VectorWise & OPF - Changes  Functional dependencies  Dependence relationships can be derived from unique constraints, referential relationships, joins, group by  Allows identification of DERIVED columns not needed for duplicates elimination & grouping operations – very important to VectorWise Confidential — © 2009 Ingres CorporationSlide 16

17 VectorWise & OPF - Functional Dependencies select c_custkey, c_name, sum(l_extendedprice * (1 - l_discount)) as revenue, c_address, c_phone, c_comment from... group by c_custkey, c_name, c_phone, c_address, c_comment generates: Project ( As ( Aggr ( Project (... ) [ 'est_card' = '56357' ], [_TRSDM_1 = *(_lineitem._l_extendedprice,+( - (_lineitem._l_discount), decimal('1'))), _customer._c_comment, _customer._c_address, _customer._c_phone, _customer._c_name, _customer._c_custkey] ), [_customer._c_comment DERIVED, _customer._c_address DERIVED, _customer._c_phone DERIVED, _customer._c_name DERIVED, _customer._c_custkey], [_revenue_3 = sum(_TRSDM_1)], 5636 ), __VT_3_0_2_1 ), [_c_custkey_1 = __VT_3_0_2_1._c_custkey, _c_name_2 = __VT_3_0_2_1._c_name, __VT_3_0_2_1._revenue_3, _c_address_4 = __VT_3_0_2_1._c_address, _c_phone_5 = __VT_3_0_2_1._c_phone, _c_comment_6 = __VT_3_0_2_1._c_comment] ) Confidential — © 2009 Ingres CorporationSlide 17

18 VectorWise & OPF - Changes  Clustering  Aggregation doesn’t need input sorted on GROUP BY, just clustered  Indexing, joins, other aggregations have clustering properties  OrdAggr() is much faster than Aggr() Confidential — © 2009 Ingres CorporationSlide 18

19 VectorWise & OPF - Changes  Referential relationships  OPF historically ignored referential relationships  Joins across referential relationships have additional properties – Cardinalities, clustering/functional dependencies  VectorWise join indexes enable very fast MergeJoin  New iirefrel catalog to track referential relationships Confidential — © 2009 Ingres CorporationSlide 19

20 VectorWise & OPF – Referential Relationships “select r_name, n_name from region, nation where r_regionkey = n_regionkey order by r_name” generates: Sort ( Project ( MergeJoin1 ( MScan (_nation = '_nation', [ '_n_regionkey', '_n_name', '__jnation'] ) [ 'clusterid' = '1', 'est_card' = '25' ], [ _nation.__jnation ], MScan (_region = '_region', [ '_r_regionkey', '_r_name', '__tid__'] ) [ 'clusterid' = '1', 'est_card' = '5' ], [ _region.__tid__ ] ) [ 'est_card' = '25' ], [_region._r_name, _nation._n_name] ),[_region._r_name, _nation._n_name] ) Confidential — © 2009 Ingres CorporationSlide 20

21 VectorWise & OPF – Reuse Segments  VectorWise can cache partial results for later reuse in same query  OPF searches for common table subexpressions  VectorWise materializes them once and caches them for later reuse Confidential — © 2009 Ingres CorporationSlide 21

22 VectorWise & OPF – Reuse Segments select s_acctbal, s_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp where p_partkey = ps_partkey and s_suppkey = ps_suppkey and... and ps_supplycost = ( select min(ps_supplycost) from partsupp, supplier where p_partkey = ps_partkey and s_suppkey = ps_suppkey) generates Project ( HashJoin01 ( As ( IIREUSESQ6 = Project ( HashJoin01 ( MergeJoin1 ( MScan ( _partsupp000 = '_partsupp', [ '_ps_suppkey', '_ps_partkey', '_ps_supplycost', '__jpartsupp']... ), __VT_6_1_3_1 ), [ __VT_6_1_3_1._p_partkey, __VT_6_1_3_1._ps_supplycost ], As ( Aggr ( As ( IIREUSESQ6, __VT_6_0_3_2... Confidential — © 2009 Ingres CorporationSlide 22

23 VectorWise & OPF – Cross Compiler  Optimizer compiles QEP  Code generator converts it to executable (by QEF) code form  Native Ingres query plans are fairly simple transformations from QEP  VectorWise query plans are algebra syntax built from QEP  QEF sees trivial query plan with single action – the VectorWise syntax Confidential — © 2009 Ingres CorporationSlide 23

24 VectorWise & OPF – Cross Compiler  Cross compiler analyzes QEP much like code generation does for native Ingres queries  QEP nodes result in VectorWise operators – ORIG nodes produce Mscan() operators – Join nodes produce Merge/HashJoin() operators – Etc.  Generates functions supported by VectorWise – “not like” becomes “!(like(...”  Challenging issues of scope in name management – Lots of “invented” table, column, partial result name Confidential — © 2009 Ingres CorporationSlide 24

25 Vectorwise and QEF select t2.str,count(t1.i) from t1 join t2 on t1.i=t2.i group by t2.str The full QP tree in brief: GET / | 0| QP / HAGF / | 1| HJN / \ | 2| | 3| ORIG vs The full QP tree in brief: X100Q

26 Vectorwise and QEF  New QEA_X100_QRY QP action header  Handles select, update, delete  X100 algebra text is sent to X100 server, reply rows (if any) returned to the user in the usual QEF manner –QEF arranges for X100 interface to materialize rows directly into SCF buffers  REPEAT query parameters substituted as text into the X100 algebra

27 Vectorwise and QEF  New action types for X100 create table, create/drop constraint  New X100 control blocks attached to existing QEF cb's for CREATE TABLE, COPY, INSERT  Little effect on existing QP nodes –QEN_QP extended to understand QEA_X100_QRY for VW → Ingres CTAS, VW scrollable cursors, future mixed query support  Transaction interface (start, abort, commit) –X100 done first as it's more likely to fail

28 COPY and INSERT  COPY uses Ingres COPY client side code but sends rows to X100 rather than DMF –No worker threads used –VW COPY obeys constraints, may fail at end  Bulkload (APPEND) vs single row (INSERT) –Both use X100 BinaryScan operator to read rows coming from Ingres –Append is used for COPY, INSERT/SELECT –Insert is used for INSERT VALUES

29 Vectorwise DDL  DDL does Ingres catalog DDL first, then VW DDL –Allows Ingres-style name checking, locking (sort of) –DMF knows that VW tables have no disk file  Constraints implemented directly in X100 rather than as system generated rules/DB procedures –New iirefrel catalog updated for referential constraints –Will someday be useful for Ingres constraints too  VW CREATE INDEX is really a MODIFY

30 X100 Interface  back/x100/x100 contains low level Vectorwise server interface:  (X100) Server and session control  NetBuffer Ingres ↔ X100 protocol  Ingres ↔ X100 data type and null translation  X100 → E_VWxxxx error code translation  Generation of simple canned queries for some operations –e.g. generates Append(BinaryScan(...)) for COPY

31 X100 Interface  X100 Server runs as a separate process, one per database –Ingres session does not connect until a VW query is issued –X100 Server doesn't start until it's needed –Once started, server persists until shutdown, terminate request, or destroydb –Active servers are tracked globally (in lock-log shared memory) so dmfjsp can access them (this is new)

32 X100 Interface  Simple “NetBuffer” protocol talks to x100 server –Send (text) X100 Algebra query to X100 “password” trailer validates the sender –Receive ack (after query successfully parsed) –Receive or send rows (if select or insert) Receive-with-timeout to poll for FE interrupts –Receive end-of-query –Error message packet terminates query  Interrupt to X100 handled with VW syscall from a transient connection

33 Futures  New iivwfastload –Direct pipe from client to X100 server –Special COPY variant will be created to handle X100 server checks, table permit validation, etc –Maybe hook to regular COPY too???  Mixed Ingres/VW queries –Mostly an optimizer problem –Partition query into Ingres and VW parts –Subquery results pushed into Ingres or VW temp tables as needed

34 Futures  QEF bypass (blue sky?) –Feed results directly to client or GCD  Apply new optimizer algorithms to Ingres queries –Reuse, iirefrel, functional dependency analysis, etc –Antijoins in QEF to reduce the use of Ingres Sejoins  Merge Ingres main and IVW (codev) code lines!

35 Questions?


Download ppt "Ingres Plus X100 Equals Ingres Vectorwise. Agenda  Why?  Introduction to Vectorwise  Groundwork  Vectorwise and OPF  Vectorwise and QEF."

Similar presentations


Ads by Google