Download presentation
Presentation is loading. Please wait.
Published byΤρυφωσα Οφέλια Πολίτης Modified over 6 years ago
1
Teradata Basics 12/4/2018 Sayrite Inc.
2
Features capacity : Terabytes of detailed data stored in billions of rows Thousands of millions of instructions per second (MIPS) to process data parallel processing: makes Teradata Database faster than other relational systems. single data store : can be accessed by network-attached and channel-attached systems. supports the requirements of many diverse clients. fault tolerance: automatically detects and recovers from hardware failures. data integrity: ensures that transactions either complete or rollback to a stable state if a fault occurs. scalable growth : allows expansion without sacrificing performance. SQL : serves as a standard access language that permits users to control data. 12/4/2018 Sayrite Inc.
3
Teradata Database The Teradata Database is an information repository supported by tools and utilities that make it, as part of the Teradata Warehouse, a complete and active relational database management system. Attachment methods: Teradata Database can use either of two attachment methods to connect to other operational computer systems a) Channel Attachment - allows the system to be attached directly to an I/O channel of a mainframe computer. b) Network Attachment - allows the system to be attached to intelligent workstations through a Local Area Network (LAN). How to Communicate with the Teradata Database ? Using SQL - You can access, store, and operate on data using Teradata Structured Query Language (Teradata SQL). Teradata SQL, which is broadly compatible with IBM and ANSI SQL, extends the capabilities of SQL by adding Teradata-specific extensions to the generic SQL statements. 12/4/2018 Sayrite Inc.
4
Shared Information Architecture
A design goal of the Teradata Database was to provide a single data store for a variety of client architectures. This single source approach greatly reduces data duplication and inaccuracies that can creep into data that is maintained in multiple stores. This approach to data storage is known as the single version of the truth, and Teradata used Shared Information Architecture (SIA) to implement the database. SIA eliminates the need for maintaining duplicate databases on multiple platforms. With the SIA, most mainframe clients, network-attached workstations, and personal computers can access and manipulate the same database simultaneously. 12/4/2018 Sayrite Inc.
5
Shared Information Architecture
The following figure illustrates the principle of the SIA. In this figure the mainframes are attached via channel connections and other systems are attached via network connections. 12/4/2018 Sayrite Inc.
6
Relational Concepts Relational Model : The relational model for database management is based on concepts derived from the mathematical theory of sets. Roughly speaking, set theory defines a table as a relation. The number of rows is the cardinality of the relation, and the number of columns is the degree. Any manipulation of a table in a relational database has a consistent, predictable outcome, because the mathematical operations on relations are well-defined. By way of comparison, database management products based on hierarchical, network, or object-oriented architectures are not built on rigorous theoretical foundations. Therefore, the behavior of such products is not as predictable as that of relational products. The SQL Optimizer in the database uses relational algebra to build the most efficient access path to requested data. The Optimizer can readily adapt to changes in system variables by rebuilding access paths without programmer intervention. 12/4/2018 Sayrite Inc.
7
Relational Concepts Relational Database: Users perceive a relational database as a collection of objects, that is, as tables, views, macros, stored procedures, and triggers that are easily manipulated using SQL directly or specifically developed applications. Set Theory and Relational Database Terminology: Relational databases are a generalization of the mathematics of set theory relations. Thus, the correspondences between set theory and relational databases are not always direct. Set Theory Term Relational Database Term Relation Table Tuple Row (or record) Attribute Column Tables, Rows, and Columns: Tables are two-dimensional objects consisting of rows and columns. Data is organized in table format and presented to the users of a relational database. References between tables define the relationships and constraints of data inside the tables themselves. Table Constraints: You can define conditions that must be met before the Teradata Database writes a given value to a column in a table. These conditions are called constraints. Constraints can include value ranges, equality or inequality conditions, and inter column dependencies. The Teradata Database supports constraints at both the column and table levels. During table creation and modification, you can specify constraints on single-column values as part of a column definition or on multiple columns using the CREATE and ALTER statements. 12/4/2018 Sayrite Inc.
8
Tables 12/4/2018 Sayrite Inc. Permanent Tables
You can store the results of multiple SQL queries in tables. Permanent storage of tables is necessary when different sessions and users must share table contents. Temporary Tables When tables are required for only a single session, the system creates temporary tables. Using this type of table, you can save query results for use in subsequent queries within the same session. Also, you can break down complex queries into smaller queries by storing results in a temporary table for use during the same session. When the session ends, the system automatically drops the temporary table. Global Temporary Tables Global temporary tables are tables that exist only for the duration of the SQL session in which they are used. The contents of these tables are private to the session, and the system automatically drops the table at the end of that session. However, the system saves the global temporary table definition permanently in the Data Dictionary. The saved definition may be shared by multiple users and sessions with each session getting its own instance of the table. 12/4/2018 Sayrite Inc.
9
Tables 12/4/2018 Sayrite Inc. Volatile Temporary Tables Derived Tables
If you need a temporary table for a single use only, you can define a volatile temporary table. The definition of a volatile temporary table resides in memory but does not survive across a system restart. Using volatile temporary tables improves performance even more than using global temporary tables because the system does not store the definitions of volatile temporary tables in the Data Dictionary. Access-rights checking is not necessary because only the creator can access the volatile temporary table. Derived Tables A special type of temporary table is the derived table. You can specify a derived table in an SQL SELECT statement. A derived table is obtained from one or more other tables as the result of a subquery. The scope of a derived table is only visible to the level of the SELECT statement calling the subquery. Using derived tables avoids having to use the CREATE and DROP TABLE statements for storing retrieved information and assists in coding more sophisticated, complex queries. 12/4/2018 Sayrite Inc.
10
Teradata Database Hardware and Software Architecture
The hardware that supports Teradata Database software is based on off-the-shelf symmetric multi-processing (SMP) technology. The hardware can be combined with a communications network that connects the SMP systems to form massively parallel processing (MPP) systems. Topics include: • SMP and MPP platforms • Disk arrays • Cliques • Hot standby nodes • Virtual processors • Parsing engine request processing 12/4/2018 Sayrite Inc.
11
SMP and MPP Platforms The components of the SMP and MPP hardware platforms are: 1) Processor Node : Serves as the hardware platform upon which the database software operates SMP configuration :A hardware assembly containing several, tightly coupled, central processing units (CPUs) in an SMP configuration. A single processor node is connected to one or more disk arrays with the following installed on the node: • Teradata Database software • Client interface software • Operating system • Multiple processors with shared memory • Failsafe power provisions 12/4/2018 Sayrite Inc.
12
MPP configuration : SMP and MPP Platforms
An MPP is a configuration of two or more loosely coupled SMP nodes with shared SCSI access to multiple disk arrays. 2) BYNET: Implements broadcast, multicast,or point-to-point communication between processors, depending on the situation. It is a Hardware inter processor network to link nodes on an MPP system. 12/4/2018 Sayrite Inc.
13
BYNET At the most elementary level, you can look at the BYNET as a bus that loosely couples all the SMP nodes in a multi node system. But the BYNET, because the capabilities of the network range far beyond those of a simple system bus. The BYNET possesses high-speed logic arrays that provide bi-directional broadcast, multicast, and point-to-point communication and merge functions. A multi node system has at least two BYNETs. This creates a fault-tolerant environment and enhances inter processor communication. Load-balancing software optimizes the transmission of messages over the BYNETs. If one BYNET should fail, the second can handle the traffic. The total bandwidth for each network link to a processor node is 10 megabytes. The total throughput available for each node is 20 megabytes, because each node has two network links and the bandwidth is linearly scalable. For example, a 16-node system has 320 megabytes of bandwidth for point-to-point connections. The total, available broadcast bandwidth for any size system is 20 megabytes. The BYNET software also provides a standard TCP/IP interface for communication among the SMP nodes. 12/4/2018 Sayrite Inc.
14
BYNET The following figure shows how the BYNET connects individual SMP
nodes to create an MPP system. 12/4/2018 Sayrite Inc.
15
Disk Arrays Teradata employs Redundant Array of Independent Disks (RAID) storage technology to provide data protection at the disk level. You use the RAID Manager to group disk drives into arrays to ensure that data is available in the event of a disk failure. Each array typically consists of from one to four ranks of disks, with up to five disks per rank. Redundant implies that either data, functions, or components are duplicated in the architecture of the array. 12/4/2018 Sayrite Inc.
16
Logical Units Pdisks and Vdisks
The RAID Manager uses drive groups. A drive group is a set of drives that have been configured into one or more Logical Units (LUNs). A LUN is a portion of every drive in a drive group. This portion is configured to represent a single disk. Each LUN is uniquely identified and on NCR UNIX MP-RAS systems is sliced into one or more UNIX slices. The operating system recognizes a LUN as its disk and is not aware that it is actually writing to spaces on multiple disk drives. This technique allows RAID technology to provide data availability without affecting the operating system. The PDE translates LUNs into virtual disks (vdisks) using slices (in NCR UNIX MP-RAS) or partitions (in Microsoft Windows). Pdisks and Vdisks A pdisk is the portion of a LUN that is assigned to an AMP Each pdisk is uniquely identified and independently addressable. The group of pdisks assigned to an AMP is collectively identified as a vdisk. Using vdisks instead of direct connections to physical disk drives permits the use of RAID technology without affecting Teradata Database. 12/4/2018 Sayrite Inc.
17
Cliques The clique is a feature of multi node systems that physically groups nodes together by Multi ported access to common disk array units. Inter-node disk array connections are made using SCSI buses. Shared SCSI-II paths enable redundancy to ensure that loss of a processor node or disk controller does not limit data availability. The nodes do not share data. They only share access to the disk arrays. A clique is the mechanism that supports the migration of vprocs under PDE following a node failure. If a node in a clique fails, then AMP and PE vprocs migrate to other nodes in the clique and continue to operate while recovery occurs on their home node. PEs for channel-attached hardware cannot migrate because they are dependent on the hardware that is physically attached to the node to which they are assigned. PEs for LAN-attached connections do migrate when a node failure occurs, as do all AMPs. 12/4/2018 Sayrite Inc.
18
Cliques The following figure illustrates a four-node standard clique
12/4/2018 Sayrite Inc.
19
Hot Standby Nodes The Hot Standby Node feature allows spare nodes to be incorporated into the production environment so that the Teradata Database can take advantage of the presence of the spare nodes to improve availability. A hot standby node is a node that: • Is a member of a clique • Does not normally participate in the trusted parallel application (TPA) • Can be brought into the TPA to compensate for the loss of a node in the clique Configuring a hot standby node can eliminate the system-wide performance degradation associated with the loss of a single node in a single clique. When a node fails, the Hot Standby Node feature migrates all AMP and PE vprocs on the failed node to other nodes in the system, including the node that you have designated as the hot standby. 12/4/2018 Sayrite Inc.
20
Hot Standby Nodes 12/4/2018 Sayrite Inc.
21
Virtual Processors 12/4/2018 Sayrite Inc.
The versatility of the Teradata Database is based on virtual processors (vprocs) that eliminate dependency on specialized physical processors. Vprocs are a set of software processes that run on a node under the Teradata Parallel Database Extensions (PDE) within the multitasking environment of the operating system. Two types of vprocs: PE - The PE performs session control and dispatching tasks as well as parsing functions. AMP - The AMP performs database functions to retrieve and update data on the vdisks. A single system can support a maximum of 16,384 vprocs. The maximum number of vprocs per node can be as high as 128. Each vproc is a separate, independent copy of the processor software, isolated from other vprocs, but sharing some of the physical resources of the node, such as memory and CPUs. Multiple vprocs can run on an SMP platform or a node. Vprocs and the tasks running under them communicate using unique-address messaging, as if they were physically isolated from one another. This message communication is done using the BYNET . 12/4/2018 Sayrite Inc.
22
Parsing Engine The PE is the vproc that communicates with the client system on one side and with the AMPs (via the BYNET) on the other side. Each PE executes the database software that manages sessions, decomposes SQL statements into steps, possibly in parallel, and returns the answer rows to the requesting client. The PE software consists of the following elements: Parser - Decomposes SQL into relational data management processing steps Optimizer - Determines the most efficient path to access data Generator - Generates and packages steps Dispatcher - Receives processing steps from the parser and sends them to the appropriate AMPs Monitors the completion of steps and handles errors encountered during processing. Session Control - Manages session activities, such as logon, password validation, and logoff Recovers sessions following client or server failures 12/4/2018 Sayrite Inc.
23
Access Module Processor
The AMP is the heart of the Teradata Database. The AMP is a vproc that controls the management of the Teradata Database and the disk subsystem, with each AMP being assigned to a vdisk AMP functions include : database management tasks - Accounting - Journaling - Locking tables, rows, and databases - Output data conversion During query processing: - Sorting - Joining data rows - Aggregation file-system management disk space management. Each AMP, as represented in the following figure, manages a portion of the physical disk space. Each AMP stores its portion of each database table within that space. 12/4/2018 Sayrite Inc.
24
Access Module Processor
12/4/2018 Sayrite Inc.
25
Parsing Engine Request Processing
The SQL parser handles all incoming SQL requests in the following sequence: 1) The Parser looks in the Request cache to determine if the request is already there. If the request is in the Request cache then the parser reuses the plastic steps found in the cache and passes them to gncApply. Go to step 8 after checking access rights (step 4). If the request is not in the Request cache then the parser begins processing the request with the Syntaxer. 2) The Syntaxer checks the syntax of an incoming request. If there are no errors then the syntaxer converts the request to a parse tree and passes it to the Resolver. If there are errors then the syntaxer passes an error message back to the requestor and stops. 3) The Resolver adds information from the Data Dictionary (or cached copy of the information) to convert database, table, view, stored procedure, and macro names to internal identifiers. 12/4/2018 Sayrite Inc.
26
Parsing Engine Request Processing
4) The security module checks access rights in the Data Dictionary. If the access rights are valid then the security module passes the request to the Optimizer. If access rights are not valid then the security module aborts the request and passes an error message and stops 5) The Optimizer determines the most effective way to implement the SQL request. 6) The Optimizer scans the request to determine where to place locks, then passes the optimized parse tree to the Generator. 7) The Generator transforms the optimized parse tree into plastic steps and passes them to gncApply. 8) gncApply takes the plastic steps produced by the Generator and transforms them into concrete steps. Concrete steps are directives to the AMPs that contain any needed user- or session-specific values and any needed data parcels. 9) gncApply passes the concrete steps to the Dispatcher. 12/4/2018 Sayrite Inc.
27
The Dispatcher 12/4/2018 Sayrite Inc.
The Dispatcher controls the sequence in which steps are executed. It also passes the steps to the BYNET to be distributed to the AMP database management software as follows: 1) The Dispatcher receives concrete steps from gncApply. 2) The Dispatcher places the first step on the BYNET; tells the BYNET whether the step is for one AMP, several AMPS, or all AMPs; and waits for a completion response. Whenever possible, the Teradata Database performs steps in parallel to enhance performance. If there are no dependencies between a step and the following step, the following step can be dispatched before the first step completes, and the two execute in parallel. If there is a dependency, for example, the following step requires as input the data produced by the first step, then the following step cannot be dispatched until the first step completes. 3) The Dispatcher receives a completion response from all expected AMPs and places the next step on the BYNET. It continues to do this until all the AMP steps associated with a request are done. 12/4/2018 Sayrite Inc.
28
The AMPs AMPs obtain the rows required to process the requests (assuming that the AMPs are processing a SELECT statement). The BYNET transmits messages to and from the AMPs. An AMP step can be sent to one of the following: • One AMP • A selected set of AMPs, called a dynamic BYNET group • All AMPs in the system The following figure is based on the example in the next section. If access is through a primary index, and a request is for a single row, the PE transmits steps to a single AMP, as shown at PE1. If the request is for many rows (an all-AMP request), the PE makes the BYNET broadcast the steps to all AMPs as shown in PE2. To minimize system overhead, the PE can send a step to a subset of AMPs. 12/4/2018 Sayrite Inc.
29
Example: SQL Statement
As an example, consider the following Teradata SQL statements using a table containing checking account information. The example assumes that AcctNo column is the unique primary index for Table_01. 1. SELECT * FROM Table_01 WHERE AcctNo = ; 2. SELECT * FROM Table_01 WHERE AcctBal > 1000 ; In this example: • PEs 1 and 2 receive requests 1 and 2. • The data for account is contained in table row R9 and stored on AMP1. • Information about all account balances is distributed evenly among the disks of all four AMPs. The sample Teradata SQL statement is processed in the following sequence: 1) PE 1 determines that the request is a primary index retrieval, which calls for the access and return of one specific row. 2) The Dispatcher in PE 1 issues a message to the BYNET containing an appropriate read step and R9/AMP 1 routing information. After AMP 1 returns the desired row, PE 1 transmits the data to the client. 3) The PE 2 Parser determines that this is an all-AMPs request, then issues a message to the BYNET containing the appropriate read step to be broadcast to all four AMPs. 4) After the AMPs return the results, PE 2 transmits the data to the client. AMP steps are processed in the following sequence: 1) Lock—Serializes access in situations where concurrent access would compromise data consistency. 2) Operation—Performs the requested task. For complicated queries, there may be hundreds of operation steps. 3) End transaction—Causes the locks acquired in step 1 to be released. The end transaction step tells all AMPs that worked on the request that processing is complete. 12/4/2018 Sayrite Inc.
30
Parallel Database Extensions
Parallel Database Extensions (PDE) are a software interface layer on top of the operating system. The operating system can be either UNIX MP-RAS or Microsoft Windows. PDE provides the Teradata Database with the ability to: • Run the Teradata Database in a parallel environment. • Execute vprocs. • Apply a flexible priority scheduler to Teradata Database sessions. Trusted Parallel Applications The PDE provide a series of parallel operating system services to a special class of tasks called a trusted parallel application (TPA). On an SMP or MPP system, the TPA is the Teradata Database. TPA services include: • Facilities to manage parallel execution of the TPA on multiple nodes • Dynamic distribution of execution processes • Coordination of all execution threads, whether on the same or on different nodes • Balancing of the TPA workload within a clique 12/4/2018 Sayrite Inc.
31
Software Fault Tolerance
Teradata Database facilities for software fault tolerance are: • Vproc migration • Fallback tables • AMP clusters • Journaling • Archive/Recovery • Table Rebuild utility 12/4/2018 Sayrite Inc.
32
Vproc Migration Because the Parsing Engine (PE) and Access Module Processor (AMP) can migrate from their home node to another node within the same hardware clique if the home node fails for any reason. Vproc migration permits the system to function completely during a node failure, with some degradation of performance due to the non-functional hardware. The following figure illustrates vproc migration, where the large X indicates a failed node, and arrows pointing to nodes still running indicate the migration of AMP3, AMP4, and PE2. Note: PEs for channel-attached connections cannot migrate during a node failure, because they depend on the channel hardware physically attached to their node. 12/4/2018 Sayrite Inc.
33
Fallback Tables A fallback table is a duplicate copy of a primary table. Each fallback row in a fallback table is stored on an AMP different from the one to which the primary row hashes. This storage technique maintains availability should the system lose an AMP and its associated disk storage in a cluster. In that event, the system would access data in the fallback rows. The disadvantage of fallback is that this method doubles the storage space and the I/O (on INSERTs, UPDATEs, and DELETEs) for tables. The advantage is that data is almost never unavailable because of one down AMP. As a general rule, you should run all tables critical to your enterprise in fallback mode. You specify whether a table is fallback or not using the CREATE TABLE (or ALTER TABLE) statement. The default is not to create tables with fallback. 12/4/2018 Sayrite Inc.
34
AMP Clusters A cluster is a group of 2-16 AMPs that provide fallback capability for each other. A copy of each row is stored on a separate AMP in the same cluster. In a large system, you would probably create many AMP clusters. However, whether large or small, the concept of a cluster exists even if all the AMPs are in one cluster. 12/4/2018 Sayrite Inc.
35
One-Cluster Configuration
Below pictures explain AMP clustering. The figure illustrates a situation in which fallback is present with one cluster, which is essentially an unclustered system. Note that the fallback copy of any row is always located on an AMP different from the AMP which holds the primary copy. The system becomes unavailable if two AMPs in a cluster go down. In this example, the data on AMP3 is fallback protected on AMPs 4, 5, and 6. 12/4/2018 Sayrite Inc.
36
Smaller Cluster Configuration
The following figure illustrates smaller clusters. Decreasing cluster size reduces the likelihood that two AMP failures will occur in the same cluster. The illustration shows the same 8-AMP configuration now partitioned into 2 AMP clusters of 4 AMPs each. Compare this clustered configuration with the earlier illustration of an unclustered AMP configuration. In the example, the (primary) data on AMP3 is backed up on AMPs 1, 2, and 4 and the data on AMP6 is backed up on AMPs 5, 7, and 8. If AMPs 3 and 6 fail at the same time, the system continues to function normally. Only if two failures occur within the same cluster does the system halt. 12/4/2018 Sayrite Inc.
37
Journaling 12/4/2018 Sayrite Inc.
The Teradata Database supports tables that are devoted to journaling. A journal is a record of some kind of activity. Capabilities of the different Teradata Database journals: Down AMP recovery • Is active during an AMP failure only • Journals fallback tables only • Is used to recover the AMP after the AMP is repaired, then is discarded • It occurs always. Transient • Logs BEFORE images for all transactions • Is used by system to roll back failed transactions aborted either by the user or by the system • Captures Begin/End Transaction indicators • Captures "Before" row images for UPDATE and DELETE statements • Captures Row IDs for INSERT statements • Captures Control records for CREATE and DROP Statements • Keeps each image on the same AMP as the row it describes • Discards images when the transaction or rollback Completes Permanent • Is active continuously • Is available for tables or databases • Can contain "before" images, which permit rollback, or after images, which permit rollforward, or both before and after images • Provides rollforward recovery • Provides rollback recovery • Provides full recovery of nonfallback tables • Reduces need for frequent, full-table archives • It occurs as specified by the user. 12/4/2018 Sayrite Inc.
38
Teradata Archive/Recovery
The Teradata Archive/Recovery utility backs up and restores data for channel-attached and network-attached clients: Archive data - copy all or selected: • Tables • Databases • Data Dictionary tables Note: If your system is used only for decision support and is updated regularly with data loads, you may not want to archive the data. Restore data - copy an archive from the client or server back to the database, and restore data to all AMPs, to clusters of AMPs, or to a specific AMP (as long as the Data Dictionary contains the definitions of the table or database you want to restore). 12/4/2018 Sayrite Inc.
39
Table Rebuild Utility Use the Table Rebuild utility to recreate a table, database, or entire disk on a single AMP under the following conditions: • The table structure or data is damaged because of a software problem, head crash, power failure, or other malfunction. • The affected tables are enabled for fallback protection. Table rebuild can create all of the following on an AMP-by-AMP basis: • Primary or fallback portions of a table • An entire table (both primary and fallback portions) • All tables in a database • All tables on an individual AMP 12/4/2018 Sayrite Inc.
40
Hardware Fault Tolerance
The Teradata Database provides the following facilities for hardware fault tolerance: Multiple BYNETs - Multinode Teradata Database servers are equipped with at least two BYNETs. Interprocessor traffic is never stopped unless both BYNETs fail. Within a BYNET, traffic can often be rerouted around failed components. RAID disk units - Teradata Database servers use Redundant Arrays of Independent Disks (RAIDs) configured for use as RAID1, RAID5, or RAIDS.Non-array storage cannot use RAID technology. • RAID1 arrays offer mirroring, the method of maintaining identical copies of data. • RAID5 or RAIDS protects data from single-disk failures with a 25 percent increase in disk storage to provide parity. • RAID1 provides better performance and data protection than RAID5/RAIDS, but is more expensive. Hot swap capability for node components - The Teradata Database can allow some components to be removed and replaced while the system is running. This process is known as hot swap. Teradata Database offers hot swap capability for the following: • Disks within RAID arrays • Fans • Power supplies Cliques • A clique is a group of nodes sharing access to the same disk arrays. A clique supports the migration of vprocs following a node failure. Others - Battery backup, Power supplies and fans 12/4/2018 Sayrite Inc.
41
Communication Between the Client and the Teradata Database
Clients can connect to the Teradata Database using one of the following methods: • Channel-attached through an IBM mainframe • Network-attached through a Local Area Network (LAN) Client applications that manipulate data on the Teradata Database server communicate with the database indirectly by means of communications interfaces: • Call Level Interface Version 2 (CLIv2) for channel-attached systems • Call Level Interface Version 2 (CLIv2) for network-attached systems Both versions provide the same functions. The CLIv2 is a library of service routines that act as subroutines of the application. The modules in the CLIv2 library vary based on whether the client is channel- or network attached. Other types of communications interfaces are available including interfaces for systems running Microsoft Windows 2000 and interfaces for systems running NCR UNIX MP-RAS. The interfaces include: • Windows Call Level Interface (WinCLI) (Windows-based system) • Open Database Connectivity (ODBC) (Windows and UNIX MP-RAS-based systems) • Java Database Connectivity (JDBC) (Windows and UNIX MP-RAS-based systems) 12/4/2018 Sayrite Inc.
42
CLIv2 for Channel-Attached Systems
CLIv2 is a collection of callable service routines that provide the interface between applications and the Teradata Director Program (TDP) on an IBM mainframe client. TDP is the interface between CLIv2 and the Teradata Database server. CLIv2 can operate with all versions of IBM operating systems, including Multiple Virtual Storage (MVS), OS/390, Customer Information Control System (CICS), Information Management System (IMS), and Virtual Machine (VM). What CLIv2 for Channel-Attached Clients Does By way of TDP, CLIv2 sends requests to the server, and provides the application with a response returned from the server by way of TDP. CLIv2 provides support for: • Managing multiple serially executed requests in a session • Managing multiple simultaneous sessions to the same or different servers • Using cooperative processing so that the application can perform operations on the client and the server at the same time Teradata Director Program TDP manages communications between CLIv2 and a server. Functions of TDP include the following: • Session initiation and termination • Logging, verification, recovery, and restart • Physical input to and output from the server, including session balancing and queue maintenance • Security 12/4/2018 Sayrite Inc.
43
CLIv2 for Network-Attached Systems
CLIv2 is a collection of callable service routines that provide the interface between applications on a LAN-connected client and the Teradata Database server. What CLIv2 for Network-Attached Clients Does CLI is the interface between the application program and the Micro Teradata Director Program (MTDP). CLIv2 can: • Build parcels that MTDP packages for sending to the Teradata Database using the Micro Operating System Interface (MOSI) • Provide the application with a pointer to each of the parcels returned from the Teradata Database Micro Teradata Director Program The MTDP must be linked to applications that will be network-connected to the Teradata Database. The MTDP performs many of the same functions as the channel-based TDP including: • Session initiation and termination • Physical input to and output from the server • Logging, verification, recovery, and restart Unlike TDP, MTDP does not control session balancing. 12/4/2018 Sayrite Inc.
44
Teradata Application Development
Application development for the Teradata Database falls into one of two categories: • Explicit SQL • Implicit SQL Explicit SQL Development The following tools are examples of explicit SQL application development: • Embedded SQL • Macros • Teradata stored Procedures • EXPLAIN statement Implicit SQL Development Teradata and third-party products are examples of implicit SQL application development 12/4/2018 Sayrite Inc.
45
Embedded SQL 12/4/2018 Sayrite Inc. What Is Embedded SQL?
When you write applications using embedded SQL, you insert SQL statements into your application program which must be written in one of the supported programming languages like C, Cobol. Because third-generation application development languages do not have facilities for dealing with results sets, embedded SQL contains extensions to executable SQL that permit declarations. Embedded SQL declarations include: • Code to encapsulate the SQL from the application language • Cursor definition and manipulation How Does an Application Program Use Embedded SQL? The client application languages that support embedded SQL are all compiled languages. SQL is not defined for any of them. For this reason, you must precompile your embedded SQL code to translate the SQL into native code before you can compile the source using a native compiler. The precompiler tool is called Preprocessor2, and you use it to: • Read your application source code to look for the defined SQL code fragments • Interpret the intent of the code after it isolates all the SQL code in the application and translates it into Call Level Interface (CLI) calls • Comment out all the SQL source The output of the precompiler is native language source code with CLI calls substituted for the SQL source. After the precompiler generates the output, you can process the converted source code with the native language compiler. 12/4/2018 Sayrite Inc.
46
Macro Teradata macros are SQL statements that the server stores and executes. Macros provide an easy way to execute frequently used SQL operations. Macros are particularly useful for enforcing data integrity rules, providing data security, and improving performance. SQL Used to Create a Macro You use the CREATE MACRO statement to create Teradata macros. For example, suppose you want to define a macro for adding new employees to the Employee table and incrementing the EmpCount field in the Department table. The CREATE MACRO statement looks like this: CREATE MACRO NewEmp (name (VARCHAR(12)), number (INTEGER, NOT NULL), dept (INTEGER, DEFAULT 100) ) AS (INSERT INTO Employee (Name, EmpNo, DeptNo VALUES (:name, :number, :dept ; UPDATE Department SET EmpCount=EmpCount+1 WHERE DeptNo=:dept This macro defines parameters that users must fill in each time they execute the macro. A leading colon (:) indicates a reference to a parameter within the macro. 12/4/2018 Sayrite Inc.
47
Macro Example 12/4/2018 Sayrite Inc. Macro Usage
The following example shows how to use the NewEmp macro to insert data into the Employee and Department tables. The information to be inserted is the name, employee number, and department number for employee H. Goldsmith. The EXECUTE macro statement looks like this: EXECUTE NewEmp (‘Goldsmith H’, 10015, 600); SQL Used to Modify a Macro The following example shows how to modify a macro. Suppose you want to change the NewEmp macro so that the default department number is 300 instead of 100. The REPLACE MACRO statement looks like this: REPLACE MACRO NewEmp (name (VARCHAR(12)), number (INTEGER, NOT NULL), dept (INTEGER, DEFAULT 300) ) AS (INSERT INTO Employee (Name, EmpNo, DeptNo VALUES (:name, :number, :dept ; UPDATE Department SET EmpCount=EmpCount+1 WHERE DeptNo=:dept SQL Used to Delete a Macro The example that follows shows how to delete a macro. Suppose you want to drop the NewEmp macro from the database. The DROP MACRO statement looks like this: DROP MACRO NewEmp; 12/4/2018 Sayrite Inc.
48
Stored Procedures Teradata stored procedures are database applications created by combining SQL control statements with other SQL elements and condition handlers. They provide a procedural interface to the Teradata Database and many of the same benefits as embedded SQL. SQL Used to Create Stored Procedures Teradata SQL supports creating, modifying, dropping, renaming, and controlling access rights of stored procedures through DDL and DCL statements. You can create or replace a stored procedure through the COMPILE command in Basic Teradata Query Facility (BTEQ) and BTEQ for Microsoft Windows systems (BTEQWIN). You must specify a source file as input for the COMPILE command. You can also create or modify a stored procedure using the CREATE PROCEDURE or REPLACE PROCEDURE statement from CLIv2, ODBC, and JDBC applications. 12/4/2018 Sayrite Inc.
49
Stored Procedure Example
Assume you want to create a stored procedure named NewProc that you can use to add new employees to the Employee table and retrieve th department name of the department to which the employee belongs. You can also report an error, in case the row that you are trying to insert already exists, and handle that error condition. The following stored procedure definition includes nested, labeled compound statements. The compound statement labeled L3 is nested within the outer compound statement L1. Note that the compound statement labeled L2 is the handler action clause of the condition handler. This stored procedure defines parameters that must be filled in each time it is called (executed). CREATE PROCEDURE NewProc (IN name CHAR(12), IN num INTEGER, IN dept INTEGER, OUT dname CHAR(10) INOUT p1 VARCHAR(30)) L1: BEGIN DECLARE CONTINUE HANDLER FOR SQLSTATE value '23505' L2: BEGIN SET p1='Duplicate Row'; END L2; L3: BEGIN INSERT INTO Employee (EmpName, EmpNo, DeptNo) VALUES (name, num, dept); SELECT DeptName INTO dname FROM Department WHERE DeptNo = :dept; IF SQLCODE <> 0 THEN LEAVE L3; ... END L3; END L1; 12/4/2018 Sayrite Inc.
50
EXPLAIN Statement 12/4/2018 Sayrite Inc.
Teradata SQL supplies a very powerful EXPLAIN statement that allows you to see the execution plan of a query. The EXPLAIN modifier in front of any SQL statement displays the execution plan for that statement, which is parsed and optimized in the usual fashion, but is not submitted for execution. The EXPLAIN statement not only explains how a statement will be processed, but provides an estimate of the number of rows involved and the performance impact of the request. When you perform an EXPLAIN against any SQL statement, that statement is parsed and optimized. The access and join plans generated by the Optimizer are returned in the form of a text file that explains the (possibly parallel) steps used in the execution of the statement. Also included is the relative time required to complete the statement given the statistics with which the Optimizer had to work. If the statistics are not reasonably accurate, the time estimate may not be accurate. EXPLAIN helps you to evaluate complex queries and to develop alternative, more efficient, processing strategies. You may be able to get a better plan by collecting more statistics on more columns, or by defining additional secondary indexes. Your knowledge of the actual demographics information may allow you to identify row count estimates that seem badly wrong, and help to pinpoint areas where additional statistics would be helpful. 12/4/2018 Sayrite Inc.
51
Example of the usage 12/4/2018 Sayrite Inc.
EXPLAIN select b.department_name from customer_service.department a, customer_service.department b where a.budget_amount > b.budget_amount and a.department_name = 'research and development'; *** Help information returned. 21 rows. *** Total elapsed time was 1 second. Explanation 1) First, we lock a distinct customer_service."pseudo table" for read on a RowHash to prevent global deadlock for customer_service.b. 2) Next, we lock customer_service.b for read. 3) We do a two-AMP RETRIEVE step from customer_service.a by way of unique index # 4 "customer_service.a.department_name = 'research and development'" with no residual conditions into Spool 2, which is duplicated on all AMPs. The size of Spool 2 is estimated with high confidence to be 80 to 6,400 rows. The estimated time for this step is 0.21 seconds. 4) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to customer_service.b. Spool 2 and customer_service.b are joined using a product join, with a join condition of ("Spool_2.budget_amount > customer_service.b.budget_amount"). The result goes into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 80 rows. The estimated time for this step is 0.20 seconds. 5) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.40 seconds. 12/4/2018 Sayrite Inc.
52
Teradata Indexes Indexing is one of the most important features of the Teradata RDBMS. In the Teradata RDBMS, an index is used to define row uniqueness and retrieve data rows, it also can be used to enforce the primary key and unique constraints for a table. The Teradata RDBMS support five types of indexes: Unique Primary Index (UPI) Unique Secondary Index (USI) Non-Unique Primary Index (NUPI) Non-Unique Secondary Index (NUPI) Join Index The typical index contains two fields: a value and a pointer to instances of that value in a data table. Because the Teradata RDBMS uses hashing to distribute rows across the AMPs, the value is condensed into an entity called a row hash, which is used as the pointer. The row hash is not the value, but a mathematically transformed address. The Teradata RDBMS uses this transformed address as a retrieval index. 12/4/2018 Sayrite Inc.
53
Teradata Indexes The following rules apply to the indexes used in the Teradata Relation database: An index is a scheme used to distribute and retrieve rows of a data table. It can be based on the values in one or more columns of the table. A table can have a number of indexes, including one primary index, and up to 32 secondary indexes. An index for a relational table may be primary or secondary, and may be unique or non-unique. Each kind of index affects system performance, and can be important to data integrity. An index is usually defined on a table column whose values are frequently used in specifying WHERE constraints or join conditions. An index is used to enforce PRIMARY KEY and UNIQUE constraints. CREATE TABLE statement allows UNIQUE and PRIMARY Keys as defined constraints on a table, and each index may be given a name, which will allow the Teradata SQL statements refer to it. 12/4/2018 Sayrite Inc.
54
Primary Index Primary index determines the distribution of table rows on the disks controlled by AMPs. In Teradata RDBMS, a primary index is required for row distribution and storage. When a new row is inserted, its hash code is derived by applying a hashing algorithm to the value in the column(s) of the primary code (as show in the following figure). Rows having the same primary index value are stored on the same AMP. 12/4/2018 Sayrite Inc.
55
Rules for defining a Primary Index
The primary indexes for a table should represent the data values most used by the SQL to access the data for the table. Careful selection of the primary index is one of the most important steps in creating a table. Defining primary indexes should follow the following rules: A primary index should be defined to provide a nearly uniform distribution of rows among the AMPs, the more unique the index, the more even the distribution of rows and the better space utilization. The index should be defined on as few columns as possible. Primary index can be either Unique or non-unique. A unique index must have a unique value in the corresponding fields of every row; a non-unique index permits the insertion of duplicate field values. The unique primary index is more efficient. Once created, the primary index cannot be dropped or modified, the index must be changed by recreating the table. 12/4/2018 Sayrite Inc.
56
Creating primary index
Unique primary index for a table is created using the (UNIQUE) PRIMARY INDEX clause of the CREATE TABLE statement. Non-unique primary indexes are creating in the same way, but omit the keyword UNIQUE. If an index is defined on more than one column, all index columns must be specified in the WHERE clause of a request in order for a row or rows to be directly accessed. Once created, the primary index cannot be dropped or modified, the index must be changed by recreating the table. Example create table organic (serial_No integer, organic_name char(15), Carbon_number smallint, amount smallint) unique primary index (serial_No); 12/4/2018 Sayrite Inc.
57
Hashing and Data placement
Traditional File systems: rows are stored either randomly or sequentially within pre-allocated file space with some space reserved for overflow. Rows for any given table may span one or more disk drives. This file storage technique tends to serialize access. Adding a large number of rows often requires a total reorganization or a migration. Traditional databases uses: value-based table partitioning, they require sizing, pre-allocation and placement is manpower-intensive and complex, because partitions will need to be monitored and adjusted overtime. As such databases grow, unloading and reloading the data to re-align partitions is common. 12/4/2018 Sayrite Inc.
58
Hashing and Data placement
In a Teradata database the rows of every table are distributed randomly and evenly across all of the VPROCs(the units of parallelism). The DBA is never given a choice to only populate some selected VPROCs or nodes. This even and automatic distribution ensures equal processing effort, as well as data balance across the entire system, no matter how large it grows or what type of query activity it faces. Achieving this balance will depend on the table’s primary index columns being unique or nearly unique. There is a never a need for DBA – intensive activities such as databse reorgs with teradata. Teradata’s data placement addresses unbalanced work across parallel units or nodes in the system. 12/4/2018 Sayrite Inc.
59
Terada’s hash partitioning
Terada’s hash partitioning Teradata assigns data rows to VPROCs using a single partitioning scheme, a scheme that enforces even distribution of data – hash partitioning. The value found in the table’s primary index is put through the hashing algorithm and two outputs are produced: A Hash Bucket, which maps to one VPROC in the system A Hash-ID, which becomes the physical identifier of the row on disk 12/4/2018 Sayrite Inc.
60
Terada’s hash partitioning
To retrieve a row the primary index value is again passed to the hashing algorithm, which generates the two hash outputs, the hash bucket pointing to the VPROC and Hash-ID locating the row on that particular VPROC’s disk. In addition to being a distribution technique, this hash approach also serves as an indexing strategy. There is no space or processing overhead involved in either building a pimary index or accessing a row through its primary index value, as no special index structure is built to support the primary index. Evenly distributing up to 100 terabytes of data is the key to evenly distributing the application workload. Aggregate Processing is shared across VPROCs using hashing Hashing Helps Balance Join Processing Hashing is the Foundation of the Reconfiguration Utility 12/4/2018 Sayrite Inc.
61
PI - hashing Teradata’s primary index is a direct hash. This is extremely fast for most operations. There are two levels of indexing the hash codes. The first level is locked in memory. This guarantees that the worst case for access to a single row is 2 I/Os. I/O caching gives a high probability that the second level index is in cache and some chance that the data block is in memory as well. Advantages Fast access for single row operations The hashed structure is sorted by hash and optimized for use by the Hash Merge Join algorithm. No preparation of the table is necessary, if the table is joined via its primary index. The join algorithm reads it directly. The value contained in these indexed columns is used by Teradata to determine the VPROC which will own the data as well as a logical storage location within the VPROC’s associated disk space. 12/4/2018 Sayrite Inc.
62
Access data using primary index
The primary index should be based only on an equality search. When a query contains WHERE constraint, which has the unique primary index value(s), the request is processed by hashing the value to locate the AMP where the row is stored, and then retrieve the row that contains a matching value in the hash code portion of its rowID. For example, since employee_number is the unique primary index for the Customer_Service.employee table, assume that an employee_number value is used as an equality constraint in a request as follows: SELECT employee_number FROM customer_service.employee WHERE employee_number = 1024; This request is process by hashing 1024 to do the following: Locate the AMP where the row is stored. Retrieve the row that contains a matching value in the hash code portion of its rowID. The Teradata RDBMS processes data most efficiently if table rows are uniformly distributed (hashed) across the AMPs on which they are stored. 12/4/2018 Sayrite Inc.
63
Primary index versus primary key
Term Primary Key Primary Index Definition A relational concept used to determine relationships among entities and to define referential constraints Used to store rows on disk Requirement Not required, unless referential integrity checks are to be performed Required Defining Define by CREATE TABLE statement Defined by CREATE TABLE statement Uniqueness Unique Unique or non-unique Function Identifies a row uniquely Distributes rows Values can be changed? No Yes Can be null? Related to access path? The column(s) chosen to be the primary index for a table are frequently the same as the primary key during the data modeling process, but there are conceptual differences between these two terms: 12/4/2018 Sayrite Inc.
64
Secondary Index In addition to a primary index, up to 32 unique and non-unique secondary indexes can be defined for a table. Comparing to primary indexes, Secondary indexes allow access to information in a table by alternate, less frequently used paths. A secondary index is a subtable that is stored in all AMPs, but separately from the primary table. The subtables, which are built and maintained by the system, contain the following: RowIDs of the subtable rows Base table index column values RowIDs of the base table rows (points) 12/4/2018 Sayrite Inc.
65
Secondary Index 12/4/2018 Sayrite Inc.
As shown in the following figure, the secondary index subtable on each AMP is associated with the base table by the rowID. 12/4/2018 Sayrite Inc.
66
Defining and creating secondary index
Secondary index are optional. Unlike the primary index, a secondary index can be added or dropped without recreating the table. There can be one or more secondary indexes in the CREATE TABLE statement, or add them to an existing table using the CREATE INDEX statement or ALTER TABLE statement. DROP INDEX can be used to dropping a named or unnamed secondary index. Since secondary indexes require subtables, these subtables require additional disk space and, therefore, may require additional I/Os for INSERTs, DELETEs, and UPDATEs. Generally, secondary index are defined on column values frequently used in WHERE constraints. Example : create table organic (serial_No integer, organic_name char(15), Carbon_number smallint, amount smallint) unique primary index (serial_No), unique index (organic_name); 12/4/2018 Sayrite Inc.
67
USI - Hashing 12/4/2018 Sayrite Inc.
If an index is defined as unique it will function in Teradata as a global index. The value being indexed will be hashed, and the resulting hash bucket will determine which VPROC will hold the index information for that value. A given VPROC’s portion of the USI subtable will hold rows that point to base table rows that reside on any of VPROC’s across the system. To access a base table row using a USI, the key value is hashed, the index entry will be located within the VPROC identified in the resulting hash bucket, and the RowID of the base table will be used to locate the base table row’s VPROC and hashID. This operation will usually involve two different VPROCs, one that holds the index entry, the second that holds the associated base table row, as shown in the following illustration. 12/4/2018 Sayrite Inc.
68
NUSI - Hashing 12/4/2018 Sayrite Inc.
Non-Unique Secondary Indexes in Teradata are always local structures and do not involve hashing. That is, a given VPROC’s NUSI subtables only contain entries for base table rows owned by that VPROC. Each VPROC will have a secondary index row for each distinct secondary index value found within its subset of base table rows. The rowIDs point to base table rows on this VPROC only. 12/4/2018 Sayrite Inc.
69
Access data using secondary index
If a Teradata SQL request uses secondary index values in a WHERE constraint, the optimizer may use the rowID in a secondary index subtable to access the qualifying rows in the data table. If a secondary index is used only periodically by certain applications and is not routinely used by most applications, disk space can be saved by creating the index when it is needed and dropping it immediately after use. A unique secondary index (USI) is very efficient, it typically allows access of only two AMPs, requires no spool file, and has one row per value, therefore, when a unique secondary index is used to access a row, two AMPs are involved. Unique secondary indexes can thus improve performance by avoiding the overhead of scanning all AMPs. For example, if a unique secondary index is defined on the department_name column of the Customer_service.department table (assuming that no two departments have the same name), then the following query is processed using two AMPs: SELECT department_number FROM customer_service.department WHERE department_name = 'Education'; In this example, the request is sent to AMP n, which contains the rowID for the secondary index value "Education", this AMP, in turn, sends the request to AMP m, where the data row containing that value is stored. Note that the rowID and the data row may reside on the same AMP, in which case only one AMP is involved. 12/4/2018 Sayrite Inc.
70
Access data using secondary index
A non-unique secondary index (NUSI) may have multiple rows per value. As a general rule, the NUSI should not be defined if the maximum number of rows per value exceeds the number of data blocks in the table. A NUSI is efficient only if the number of rows accessed is a small percentage of the total number of data rows in the table. It can be useful for complex conditional expressions or processing aggregates. For example, if the contact_name column is defined as a secondary index for the customer_service.contact table, the following statement can be processed by secondary index: SELECT * FROM customer_service.contact WHERE contact_name = 'Mike'; After request is submitted, the optimizer first will determine if it is faster to do a full-table scan of the base table rows or a full-table scan of the secondary index subtable to get the rowIDs of the qualifying base table rows; then place those rowIDs into a spool file; finally use the resulting rowIDs to access the base table rows. Non-unique secondary indexed accessed is used only for request processing when it is less costly than a complete table search. 12/4/2018 Sayrite Inc.
71
Join Index A join index is an indexing structure containing columns from multiple tables, specifically the resulting columns form one or more tables. Rather than having to join individual tables each time the join operation is needed, the query can be resolved via a join index and, in most cases, dramatically improve performance. Effects of Join index Depending on the complexity of the joins, the Join Index helps improve the performance of certain types of work. The following need to be considered when manipulating join indexes: Load Utilities The join indexes are not supported by MultiLoad and FastLoad utilities, they must be dropped and recreated after the table has been loaded. Archive and Restore Archive and Restore cannot be used on join index itself. Fallback Protection Join index subtables cannot be Fallback-protected. Permanent Journal Recovery The join index is not automatically rebuilt during the recovery process. Triggers A join index cannot be defined on a table with triggers. Collecting Statistics In general, there is no benefit in collecting statistics on a join index for joining columns specified in the join index definition itself. 12/4/2018 Sayrite Inc.
72
Defining, creating, accessing data using join index
Join indexes can be created and dropped by using CREATE JOIN INDEX and DROP JOIN INDEX statements. Join indexes are automatically maintained by the system when updates (UPDATE, DELETE, and INSERT) are performed on the underlying base tables. Additional steps are included in the execution plan to regenerate the affected portion of the stored join result. Example: create join index all_chemical as select (organic_name, carbon_number), (inorgnic_name, cation, anion) from organic inner join inorganic on organic_name = inorgnic_name; Access data using Join index Join index is useful for queries where the index structure contains all of the columns referenced by one or more joins in a query. Join index was developed so that frequently executed join queries could be processed more efficiently. Like the other indexes, a join index store rowID pointers to the associated base table rows. Another use of the join index is to define it on a single table. This will improve the performance of single table scans that can be resolved without accessing the base table. 12/4/2018 Sayrite Inc.
73
Using Index to Process SQL Statement or Access Data
Each type of index has a specific effect on system performance. Row selection is more efficient using a unique index. When a SELECT statement uses a unique index in a WHERE clause, no spool file needs be created for intermediate storage of the result because only one row is expected. An index that is not unique allows more than one row to have the same index value. Therefore, row selection using a non-unique index may require a spool file to hold intermediate rows for final processing. The Teradata Relational database systems do not permit explicit use of indexes in SQL queries. When a request is entered, the optimizer examines the following available information about the table to determine whether an index is used during processing: Number of rows in the table Statistics collected for the table Number and types of indexes defined for the table UPDATES, DELETES, and PRIMARY KEY and UNIQUE constraints The optimizer will decides which index or indexes to use to optimize the queries, it selects whichever index or indexes will return the query result most quickly. 12/4/2018 Sayrite Inc.
74
Example of using Index to process SQL statement and access data
Process SQL statement and access data using primary index (using EXPLAIN to display how data is accessed). *The rxw05.order_log table have unique primary index: log_No, secondary index: student_name. BTEQ -- Enter your DBC/SQL request or BTEQ command: select * from rxw05.order_log where log_No = 15; *** Query completed. 1 rows found. 4 columns returned. *** Total elapsed time was 1 second. log_No student_name order_date checkin_date 15 Jone Smith 12/02/99 12/16/99 11 Adam 11/01/99 11/15/99 BTEQ -- Enter your DBC/SQL request or BTEQ command: explain select * from rxw05.order_log where log_No = 15; *** Help information returned. 6 rows. *** Total elapsed time was 1 second. Explanation 1) First, we do a single-AMP RETRIEVE step from rxw05.order_log by way of the unique primary index "rxw05.order_log.log_No = 15" with no residual conditions. The estimated time for this step is seconds. -> The row is sent directly back to the user as the result of statement 1. The total estimated time is 0.03 seconds. 12/4/2018 Sayrite Inc.
75
Cont.. Process SQL statement and access data using secondary index (using EXPLAIN to display how data is accessed). *The rxw05.organic table have unique primary index: serial_No, secondary index: organic_name. BTEQ -- Enter your DBC/SQL request or BTEQ command: select * from rxw05.organic where organic_name = 'methanol'; *** Query completed. 1 rows found. 4 columns returned. *** Total elapsed time was 1 second. s erial_No organic_name Carbon_number amount 1 methanol 1 500 BTEQ -- Enter your DBC/SQL request or BTEQ command: explain select * from rxw05.organic where organic_name = 'methanol'; *** Help information returned. 6 rows. *** Total elapsed time was 1 second. Explanation 1) First, we do a two-AMP RETRIEVE step from rxw05.organic by way of unique index # 4 "rxw05.organic.organic_name = 'methanol'" with no residual conditions. The estimated time for this step is seconds. -> The row is sent directly back to the user as the result of statement 1. The total estimated time is 0.07 seconds. 12/4/2018 Sayrite Inc.
76
Cont.. Process SQL statement and access data by full base table scan (using EXPLAIN to display how data is accessed). *The rxw05.order_log table have unique primary index: log_No, secondary index: student_name. BTEQ -- Enter your DBC/SQL request or BTEQ command: select * from rxw05.order_log where log_no>10; *** Query completed. 2 rows found. 4 columns returned. *** Total elapsed time was 1 second. log_No student_name order_date checkin_date 15 Jone Smith 12/02/99 12/16/99 11 Adam 11/01/99 11/15/99 BTEQ -- Enter your DBC/SQL request or BTEQ command: explain select * from rxw05.order_log where log_no>10; *** Help information returned. 12 rows. *** Total elapsed time was 1 second. Explanation 1) First, we lock a distinct rxw05."pseudo table" for read on a RowHash to prevent global deadlock for rxw05.order_log. 2) Next, we lock rxw05.order_log for read. 3) We do an all-AMPs RETRIEVE step from rxw05.order_log by way of an all-rows scan with a condition of ("rxw05.order_log.log_No > 10") into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 27 rows. The estimated time for this step is 0.15 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.15 seconds. 12/4/2018 Sayrite Inc.
77
Full Table Scans Is a Full Table Scan good or bad thing to do? Do databases allow FTS? A Full Table Scan is another way to access data without using the Primary Index. In evaluating a SQL request, the Parser examines all possible access methods and chooses the one it believes to be the most efficient. The coding of the SQL request with the demographics of the table and the availability of indexes all play a role in the decision of the Parser. Some coding constructs always cause a Full Table Scan. In other cases, it might be choosen because it is the most efficient method. In general, if the number of physical reads exceeds the number of data blocks then a Full Table Scan may be faster. In the Teradata Database System, Full Table Scans are always an all VPROC operation, because every VPROC reads every row of the table. Some examples of when a Full Table Scan is performed: - SQL statement does not contain the WHERE statement. - The WHERE statement does not use the Primary or Secondary index. 12/4/2018 Sayrite Inc.
78
RI: Defining Referential Constraints
Referential constraint provide a means of ensuring that changes made to the database by authorized users do not result in a loss of data consistency. The constraints are in the following forms: Key declarations The stipulation that certain attributes form a candidate key for a given entity set. The set of legal insertions and updates is constrained to those that do not create two entities with the same value on a candidate key. Form of a relationship Many to many, one to many, one to one. A one-to-one or one-to-many relationship restricts the set of legal relationships among entities of a collection of entity sets. So in general, the referential constraint is the combination of the foreign key, the parent key, and the relationship between them. Referential constraints must meet the following criteria: The Parent key must exist when the referential constraint is defined, it must be either a unique primary index or a unique secondary index. The foreign and parent keys must have the same number of columns and their data types must match, they cannot exceed 16 columns. Duplicate referential constraints are not allowed. self-reference is allowed, but the foreign and parent keys cannot consist of identical columns. There can be no more than 64 referential constraints per table. 12/4/2018 Sayrite Inc.
79
Referential constraint checks
The check clause in the Teradata RDBMS permit data to be restricted in powerful way that most programming language type systems do not permit. The referential constraint can be defined by CREATE TABLE statement. The Teradata RDBMS performs referential constraints checks whenever any of the following occur: A referential constraint is added to a populated table The table will be check by the new referential constraint, if the referential integrity is violated, error message appears and the request is aborted. A row is inserted, deleted or updated A parent or foreign key is modified Check if there are any violations of the referential constraints, if there are, error message appears and the request is aborted. 12/4/2018 Sayrite Inc.
80
Concurrency Control 12/4/2018 Sayrite Inc.
Concurrency control involves preventing concurrently running processes from improperly inserting, deleting, or updating the same data. A system maintains concurrency control through two mechanisms: • Transactions • Locks Transaction A transaction is a logical unit of work and the unit of recovery. The statements nested within a transaction must either all happen or not happen at all. Transactions are atomic. A partial transaction cannot exist. ANSI Mode Transactions All ANSI transactions are implicitly opened. Either of the following events opens an ANSI transaction: Execution of the first SQL statement in a session. Execution of the first statement following the close of a previous transaction. Transactions close when the application performs a COMMIT, ROLLBACK, or ABORTstatement. In ANSI mode, the system rolls back the entire transaction if the current request: Results in a deadlock. Performs a DDL statement that aborts. Executes an explicit ROLLBACK or ABORT statement. Teradata Mode Transactions Teradata mode transactions can be either implicit or explicit. An explicit, or user-generated, transaction is a single set of BEGIN TRANSACTION/END TRANSACTION statements surrounding one or more requests. All other requests are implicit transactions. If an error occurs during a Teradata transaction, then the system rolls back the entire transaction. 12/4/2018 Sayrite Inc.
81
Locks A lock is a means of claiming usage rights to some resource.
Most locks used on Teradata resources are obtained automatically. Users can override some locks by making certain lock specifications, but the Teradata Database only allows overrides when it can assure data integrity. The data integrity requirement of a request decides the type of lock that the system uses. A request for a locked resource by another user is queued (in the case of a conflicting lock level) until the process using the resource releases its lock on that resource. Lock Levels The Teradata lock manager implicitly locks the following objects. Object Locked Description Database Locks rows of all tables in the database. Table Locks all rows in the table and any index and fallback subtables. Row hash Locks the primary copy of a row and all rows that share the same hash code within the same table. 12/4/2018 Sayrite Inc.
82
Locks 12/4/2018 Sayrite Inc. Levels of Lock Types
Lock Type Description Exclusive The requester has exclusive rights to the locked resource. No other process can read from, write to, or access the locked resource in any way. Write The requester has exclusive rights to the locked resource except for readers not concerned with data consistency. Read Several users can hold Read locks on a resource, during which the system Permits no modification of that resource. Read locks ensure consistency during read operations such as those that occur during a SELECT statement. Access The requestor is willing to accept minor inconsistencies of the data while accessing the database (an approximation is good enough). An access lock permits modifications on the underlying data while the SELECT operation is in progress. 12/4/2018 Sayrite Inc.
83
Teradata Space There are 3 types of space in teradata – Perm, spool and temp. Perm Space: Perm is for Permanent tables, SI and Permanent journals It is calculated by adding up all the available space on the AMPs attached disks and that is the size of your TD warehouse. When the system is delivered the user DBC owns all the Permanent space. Teradata always calculates all space on per AMP basis. DBC gives some of the perm space to others, the space will be owned by multiple users or databases. Permanent space is always defined at the USER or Database level. Permanent space is released when data is deleted or when objects are dropped Permanent space defines the upper limit of space for a database or user. 12/4/2018 Sayrite Inc.
84
Teradata Space Spool Space
Each user who runs queries is allocated a certain amount of spool space for the query answer set. Spool is allocated on a user basis. Spool space is literally unused perm space. Spool space is system wide so any where there is perm space it can be used for spool Spool space is a limit. You don’t add or subtract when you are giving someone spool. The total amount of spool space in the system will always exceed the actual space. There are only 2 times you run out of spool space that is when the system is completely out of free space or when your query results exceed your spool limit. For users who want to use global temporary tables, another space is used and it is called TEMP space. 12/4/2018 Sayrite Inc.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.