Teradata Physical Implementation – Case Study
Create Table - Distribution Check & PI Change - Fallback Create Index - USI - NUSI Create Join Index Create & Collect Statistics
Create Table – Copy Data CREATE SET TABLE TPCH.Customer ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL ( C_CUSTKEY INTEGER NOT NULL, C_NAME VARCHAR(25) CHARACTER SET LATIN CASESPECIFIC NOT NULL, C_ADDRESS VARCHAR(40) CHARACTER SET LATIN CASESPECIFIC NOT NULL, C_NATIONKEY INTEGER NOT NULL, C_PHONE CHAR(15) CHARACTER SET LATIN CASESPECIFIC NOT NULL, C_ACCTBAL DECIMAL(15,2) NOT NULL, C_MKTSEGMENT CHAR(10) CHARACTER SET LATIN CASESPECIFIC NOT NULL, C_COMMENT VARCHAR(117) CHARACTER SET LATIN CASESPECIFIC NOT NULL) UNIQUE PRIMARY INDEX ( C_CUSTKEY ); SELECT databasename, Tablename, sum(CurrentPerm) FROM DBC.TABLESIZE where databasename = 'TPCH' group by databasename, Tablename DatabaseNameTableNameSum(CurrentPerm) TPCH ORDERTBL 7334912 TPCH LINEITEM 34191360 TPCH PARTTBL 1211392 TPCH PARTSUPP 5183488 TPCH NATION 5632 TPCH REGION 3072 TPCH CUSTOMER 1080832 TPCH SUPPLIER 67584 show table TPCH.Customer; If Privileges missing grant to your user GRANT SELECT ON DBC TO TRAINER; GRANT SELECT ON TPCH TO TRAINER; Login using your ID
Data Distribution Check 1) Create Customer Table in your User/Database; - Keep the same definition (No Fallback & Same PI) - You can create the table and get the data OR can be achieved as below. CREATE TABLE TRAINER.CUSTOMER AS TPCH.CUSTOMER WITH DATA; Show table and check the definition and the data in the table. show table TRAINER.Customer; 2) Check the Table size by AMP SELECT * FROM DBC.TABLESIZE where databasename = 'TRAINER' and Tablename= 'CUSTOMER‘ VprocDatabaseNameAccountNameTableNameCurrentPermPeakPerm 0TRAINER DBC CUSTOMER 540672540672 1TRAINER DBC CUSTOMER 540160540160 -- Current PI of the Table is C_CUSTKEY, Degree of Uniqueness is 100% select count(distinct C_CUSTKEY), count(1) from TRAINER.CUSTOMER Count(Distinct(C_CUSTKEY))Count(1) 60006000
Change in PI 1) Consider the different PI for CUSTOMER Table. CREATE MULTISET TABLE TRAINER.Customer_PI ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL ( C_CUSTKEY INTEGER NOT NULL, C_NAME VARCHAR(25) CHARACTER SET LATIN CASESPECIFIC NOT NULL, C_ADDRESS VARCHAR(40) CHARACTER SET LATIN CASESPECIFIC NOT NULL, C_NATIONKEY INTEGER NOT NULL, C_PHONE CHAR(15) CHARACTER SET LATIN CASESPECIFIC NOT NULL, C_ACCTBAL DECIMAL(15,2) NOT NULL, C_MKTSEGMENT CHAR(10) CHARACTER SET LATIN CASESPECIFIC NOT NULL, C_COMMENT VARCHAR(117) CHARACTER SET LATIN CASESPECIFIC NOT NULL) PRIMARY INDEX ( C_MKTSEGMENT ); 1) Consider the different PI for CUSTOMER Table. select C_MKTSEGMENT, count(1) from TRAINER.CUSTOMER group by C_MKTSEGMENT C_MKTSEGMENTCount(1) FURNITURE 1169 MACHINERY 1174 BUILDING 1296 HOUSEHOLD 1171 AUTOMOBILE1190 select count(distinct C_MKTSEGMENT), count(1) Count(Distinct(C_MKTSEGMENT))Count(1) 56000 2) Create the Table CUSTOMER_PI , same definition as Customer but with C_MKTSEGMENT as PI. 3) Check the size by AMP. VprocDatabaseNameAccountNameTableNameCurrentPermPeakPerm 0TRAINER DBC CUSTOMER_PI 444416444416 1TRAINER DBC CUSTOMER_PI 635904635904
Fallback Impact CREATE MULTISET TABLE TRAINER.Customer_FB , FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL ( C_CUSTKEY INTEGER NOT NULL, C_NAME VARCHAR(25) CHARACTER SET LATIN CASESPECIFIC NOT NULL, C_ADDRESS VARCHAR(40) CHARACTER SET LATIN CASESPECIFIC NOT NULL, C_NATIONKEY INTEGER NOT NULL, C_PHONE CHAR(15) CHARACTER SET LATIN CASESPECIFIC NOT NULL, C_ACCTBAL DECIMAL(15,2) NOT NULL, C_MKTSEGMENT CHAR(10) CHARACTER SET LATIN CASESPECIFIC NOT NULL, C_COMMENT VARCHAR(117) CHARACTER SET LATIN CASESPECIFIC NOT NULL) PRIMARY INDEX ( C_MKTSEGMENT ); 1) Create Table CUSTOMER_FB with the FALLBACK ON and check the Table Size. 2) Check the size by AMP. - Note the size is doubled in total. - This is just a two amp system so one AMP is the FALLBACK for the other so shows same size with FALLBACK. VprocDatabaseNameAccountNameTableNameCurrentPermPeakPerm 0TRAINER DBC CUSTOMER_FB 10792961079296 1TRAINER DBC CUSTOMER_FB 10792961079296
Create Table - Distribution Check & PI Change - Fallback Create Index - USI - NUSI Create Join Index Create & Collect Statistics
Creating USI Explain the below Query (PI is C_MKT_SEGMENT) EXPLAIN SELECT * FROM CUSTOMER_PI WHERE C_CUSTKEY = 1613 We do an all-AMPs RETRIEVE step from TRAINER.CUSTOMER_PI by way of an all-rows scan with a condition of ( Note – Its is doing the full Table scan 2) Create a USI on C_CUSTKEY CREATE UNIQUE INDEX IDX_CKEY (C_CUSTKEY) ON CUSTOMER_PI; 3) Note the change in the Explain for Index scan and the change in the response time CREATE MULTISET TABLE TRAINER.Customer_PI ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL ( C_CUSTKEY INTEGER NOT NULL, C_NAME VARCHAR(25) CHARACTER SET LATIN CASESPECIFIC NOT NULL, C_ADDRESS VARCHAR(40) CHARACTER SET LATIN CASESPECIFIC NOT NULL, C_NATIONKEY INTEGER NOT NULL, C_PHONE CHAR(15) CHARACTER SET LATIN CASESPECIFIC NOT NULL, C_ACCTBAL DECIMAL(15,2) NOT NULL, C_MKTSEGMENT CHAR(10) CHARACTER SET LATIN CASESPECIFIC NOT NULL, C_COMMENT VARCHAR(117) CHARACTER SET LATIN CASESPECIFIC NOT NULL) PRIMARY INDEX ( C_MKTSEGMENT ); EXPLAIN – WITH USI 1) First, we do a two-AMP RETRIEVE step from TRAINER.CUSTOMER_PI by way of unique index # 4 "TRAINER.CUSTOMER_PI.C_CUSTKEY = 1613" with no residual conditions. The estimated time for this step is 0.02 seconds. -> The row is sent directly back to the user as the result of statement 1. The total estimated time is 0.02 seconds. EXPLAIN – WITHOUT USI 1) First, we lock a distinct TRAINER."pseudo table" for read on a RowHash to prevent global deadlock for TRAINER.CUSTOMER_FB. 2) Next, we lock TRAINER.CUSTOMER_FB for read. 3) We do an all-AMPs RETRIEVE step from TRAINER.CUSTOMER_FB by way of an all-rows scan with a condition of ( "TRAINER.CUSTOMER_FB.C_CUSTKEY = 1613") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 707 rows. The estimated time for this step is 0.14 seconds. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.14 seconds.
Creating NUSI Explain the below Query EXPLAIN select * from CUSTOMER_PI where C_NATIONKEY = 17 Note – Its is doing the full Table scan 2) Create a NUSI on C_NATIONKEY CREATE INDEX IDX_NTKEY (C_NATIONKEY) ON CUSTOMER_PI; 3)Explain the same Query again Note the change in the Explain for Index scan and the change in the response time – The Index is not used - We can only Create the Index but can not enforce the usage. The Usage depends on the Optimizer. EXPLAIN – WITHOUT NUSI 1) First, we lock a distinct TRAINER."pseudo table" for read on a RowHash to prevent global deadlock for TRAINER.CUSTOMER_PI. 2) Next, we lock TRAINER.CUSTOMER_PI for read. 3) We do an all-AMPs RETRIEVE step from TRAINER.CUSTOMER_PI by way of an all-rows scan with a condition of ( "TRAINER.CUSTOMER_PI.C_NATIONKEY = 17") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 707 rows. The estimated time for this step is 0.14 seconds. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.14 seconds. EXPLAIN – WITH NUSI 1) First, we do a two-AMP RETRIEVE step 1) First, we lock a distinct TRAINER."pseudo table" for read on a RowHash to prevent global deadlock for TRAINER.CUSTOMER_PI. 2) Next, we lock TRAINER.CUSTOMER_PI for read. 3) We do an all-AMPs RETRIEVE step from TRAINER.CUSTOMER_PI by way of an all-rows scan with a condition of ( "TRAINER.CUSTOMER_PI.C_NATIONKEY = 17") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 283 rows. The estimated time for this step is 0.13 seconds. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.13 seconds.
Create Table - Distribution Check & PI Change - Fallback Create Index - USI - NUSI Create Join Index Create & Collect Statistics
Creating JOIN Index EXPLAIN the below query: EXPLAIN SELECT SUM(C_ACCTBAL) FROM CUSTOMER WHERE C_NATIONKEY = 10 2) Create a Single table Join Index and CREATE JOIN INDEX CUSTOMER_JI AS SELECT C_CUSTKEY, C_NATIONKEY, C_ACCTBAL, C_MKTSEGMENT PRIMARY INDEX (C_NATIONKEY); 3) Run the same explain plan again. EXPLAIN – WITHOUT JI 1) First, we lock a distinct TRAINER."pseudo table" for read on a RowHash to prevent global deadlock for TRAINER.CUSTOMER. 2) Next, we lock TRAINER.CUSTOMER for read. 3) We do an all-AMPs SUM step to aggregate from TRAINER.CUSTOMER by way of an all-rows scan with a condition of ("TRAINER.CUSTOMER.C_NATIONKEY = 10"). Aggregate Intermediate Results are computed globally, then placed in Spool 3. The size of Spool 3 is estimated with high confidence to be 1 row. The estimated time for this step is 0.13 seconds. 4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 1 row. The estimated time for this step is 0.03 seconds. -> The contents of Spool 1 are sent back to the user as the result of statement 1. EXPLAIN – WITH JI 1) First, we do a single-AMP SUM step to aggregate from TRAINER.CUSTOMER_JI by way of the primary index "TRAINER.CUSTOMER_JI.C_NATIONKEY = 10" with no residual conditions, and the grouping identifier in field1. Aggregate Intermediate Results are computed locally, then placed in Spool 3. The size of Spool 3 is estimated with high confidence to be 1 row. The estimated time for this step is 0.03 seconds. 2) Next, we do a single-AMP RETRIEVE step from Spool 3 (Last Use) by way of the primary index "TRAINER.CUSTOMER_JI.C_NATIONKEY = 10“ into Spool 1 (one-amp), which is built locally on that AMP. The size of Spool 1 is estimated with high confidence to be 1 row. The estimated time for this step is 0.03 seconds. -> The contents of Spool 1 are sent back to the user as the result of statement 1.
Create Table - Distribution Check & PI Change - Fallback Create Index - USI - NUSI Create Join Index Create & Collect Statistics
Collect Statistics COLLECT STATISTICS EXPLAIN – WITH STATISTICS Run the Below Explain EXPLAIN SELECT * FROM CUSTOMER_FB WHERE C_NATIONKEY = 10; Note the LOW confidence and the number of rows 707. - Collect Statistics COLLECT STATISTICS ON CUSTOMER_FB COLUMN(C_NATIONKEY); Explain Again Note the HIGH confidence and the number of rows 290. Actual count in the Table SELECT COUNT(1) WHERE C_NATIONKEY = 10 246 EXPLAIN – WITHOUT STATISTICS 1) First, we lock a distinct TRAINER."pseudo table" for read on a RowHash to prevent global deadlock for TRAINER.CUSTOMER_FB. 2) Next, we lock TRAINER.CUSTOMER_FB for read. 3) We do an all-AMPs RETRIEVE step from TRAINER.CUSTOMER_FB by way of an all-rows scan with a condition of ( "TRAINER.CUSTOMER_FB.C_NATIONKEY = 10") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 707 rows. The estimated time for this step is 0.14 seconds. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.14 seconds. EXPLAIN – WITH STATISTICS 1) First, we lock a distinct TRAINER."pseudo table" for read on a RowHash to prevent global deadlock for TRAINER.CUSTOMER_FB. 2) Next, we lock TRAINER.CUSTOMER_FB for read. 3) We do an all-AMPs RETRIEVE step from TRAINER.CUSTOMER_FB by way of an all-rows scan with a condition of ( "TRAINER.CUSTOMER_FB.C_NATIONKEY = 10") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 290 rows. The estimated time for this step is 0.13 seconds. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.13 seconds.