Presentation is loading. Please wait.

Presentation is loading. Please wait.

Teradata Platform Introduction

Similar presentations


Presentation on theme: "Teradata Platform Introduction"— Presentation transcript:

1 Teradata Platform Introduction
Hardware and Software Components in Enterprise Data Warehouse Derek Jones March 2005

2 Teradata in the Enterprise
9/17/2018 Teradata is relational database management system Acts as central enterprise-wide database Contains information extracted from operational systems Central placement minimizes data duplication and provides single view of business The Teradata Database is a relational database management system (RDBMS) that drives a company's data warehouse. A data warehouse is a central, enterprise-wide database that contains information extracted from the operational systems. A Data Warehouse has a centrally located logical architecture which minimizes data synchronization and provides a single view of the business. Data warehouses have become more common in corporations where enterprise-wide detail data may be used in on-line analytical processing to make strategic and tactical business decisions. Warehouses often carry many years worth of detail data so that historical trends may be analyzed using the full power of the data. "Linear scalability" means that as you add components to the system, the performance increase is linear. Adding components allows the system to accommodate increased workload without decreased throughput. Linear scalability enables the system to grow to support more users/data/queries/complexity of queries without experiencing performance degredation. As the configuration grows, performance increase is linear, slope of 1. A database is a collection of permanently stored data that is: Logically related (the data was created for a specific purpose). Shared (many users may access the data). Protected (access to the data is controlled). Managed (the data integrity and value are maintained). The Teradata Database is a relational database. Relational databases are based on the relational model, which is founded on mathematical Set Theory. The relational model uses and extends many principles of Set Theory to provide a disciplined approach to data management. A relational database is designed to: Represent a business and its business practices. Be extremely flexible in the way that it can be selected and used. Be easy to understand Model the business, not the applications All businesses to quickly respond to changing conditions Relational databases present data as of a set of tables. A table is a two-dimensional representation of data that consists of rows and columns. According to the relational model, a valid table does not have to be populated with data rows, it just needs to be defined with at least one column. A relational database is a set of logically related tables. Tables are logically related to each other by a common field, so information such as customer telephone numbers and addresses can exist in one table, yet be accessible for multiple purposes. The example below shows customer, order, and billing statement data, related by a common field. The common field of Customer ID lets you look up information such as a customer name for a particular statement number, even though the data exists in two different tables.

3 Key Teradata Differentiators
Parallelism throughout platform Shared Nothing Architecture Proprietary intelligent system inter-connect

4 Teradata Scales Linearly
9/17/2018 Scaling achieved via ‘shared nothing’ architecture and unconditional parallelism Power is in linear scalability, where slope = 1 Scales with data Scales with users Scales with work More nodes More work More users More data "Linear scalability" means that as you add components to the system, the performance increase is linear. Adding components allows the system to accommodate increased workload without decreased throughput. Linear scalability enables the system to grow to support more users/data/queries/complexity of queries without experiencing performance degredation. As the configuration grows, performance increase is linear, slope of 1. The Teradata Database was the first commercial database system to scale to and support a trillion bytes of data. The origin of the name Teradata is "tera-," which is derived from Greek and means "trillion." The Teradata Database acts as a single data store, with multiple client applications making inquiries against it concurrently. Instead of replicating a database for different purposes, with the Teradata Database you store the data once and use it for all clients. The Teradata Database provides the same connectivity for an entry-level system as it does for a massive enterprise data warehouse. Node Work Users Data

5 The Teradata Difference “Multi-dimensional Scalability”
9/17/2018 Data Volume (Raw, User Data) Mixed Workload Query Concurrency Data Freshness Query Complexity Data Volume (Raw, User Data) - Raw data stored in the warehouse.  This is the user data stored in the warehouse.  It does not include generated data that also takes space within the warehouse such as indexes, summarizations, aggregations, duplicated data, and system overhead. Query Concurrency - The volume of work that can be done at the same time.  Most commonly the number of queries that the database can process at the same time.  It can also include load and in-database transformation work and stored procedure processing activity.  Logged-on users not currently executing a query do no add to the concurrency workload. Query Complexity - The degree to which queries are complex in areas that make a query difficult or resource intensive for a database system.  These areas include the number of tables involved in joins, complex "where" constraints in the SQL, aggregations and statistical functions, and the use of views in addition to base tables.  Business intelligence query tools often generate very complex queries. Schema Sophistication – The ability to chose the scheme to meet my business requirements verses limiting the complexity of the schema due to technology performance limitations of the database. It’s the ability to be able to deploy a denormalized star schema, a sophisticated and complex normalized 3NF schema, or a combination of the two or anywhere in between to meet the requirements of the business. Query Data Volume - Refers to how much data must be touched to satisfy a query.  Teradata features that can be cited as reducing the amount of data touched would include our unsurpassed compression capabilities, efficient row storage, strong indexing capabilities, and lack of storage requirement for primary index. Query Freedom - The ability for users to ask any question of the data at the time best for the business.  This is an indication of how free the users are to ask exploratory, broad, or complex questions as well as expected and tuned queries and to ask new types of questions associated with new applications. Data Freshness - The ability to load data into the warehouse and to update data in the warehouse at the speed the business operates.  This is an indication of whether the data in the warehouse can be kept current and in sync with business processes and operations to the degree necessary to respond to events and business activities as well as to provide meaningful analyses. Mixed Workload - The ability of the database to handle the broad mix of tasks for which a data warehouse is used today without impacting the effectiveness in any area.  For example, data warehouses must answer complex strategic questions as well as brief tactical questions or customer inquiries.  At the same time, data must be loaded and updated.  Can the database handle the various workloads concurrently while meeting the very different service level agreement attributes (e.g., response time, performance consistency) of the various types of work?  Does the database require separation of work (e.g., batch windows)? Query Freedom Schema Sophistication Query Data Volume

6 The Teradata Difference “Multi-dimensional Scalability”
Data Volume (Raw, User Data) Mixed Workload Query Concurrency Competition can be Tuned to Meet a Static Environment Business Needs Change Data Freshness Query Complexity Query Freedom Schema Sophistication Query Data Volume

7 The Teradata Difference “Multi-dimensional Scalability”
Data Volume (Raw, User Data) Mixed Workload Query Concurrency Competition can be Tuned to Meet a Static Environment Business Needs Change Desire to Increase User/ Query Concurrency Data Freshness Query Complexity Competition Scales One Dimension at the Expense of Others But At the Expense of Another Dimension Query Freedom Schema Sophistication Query Data Volume

8 The Teradata Difference “Multi-dimensional Scalability”
Data Volume (Raw, User Data) Mixed Workload Query Concurrency Teradata can Scale Simultaneously Across Multiple Dimensions Driven by Business! Competition Scales One Dimension at the Expense of Others Limited by Technology! Data Freshness Query Complexity Query Freedom Schema Sophistication Query Data Volume

9 Key Teradata Differentiators
Parallelism throughout platform Shared Nothing Architecture Proprietary intelligent system inter-connect

10 Node Architecture (‘Shared Nothing’)
9/17/2018 Each Teradata Node is made up of hardware and software Each node has CPUs, system disk, memory and adapters Each node runs copy of OS and database SW A Teradata Database node requires three distinct pieces of software: The Teradata Database can run on the following operating systems: UNIX MP-RAS Microsoft Windows 2000 The Parallel Database Extensions (PDE) software layer was added to the operating system by NCR to support the parallel software environment. A Trusted Parallel Application (TPA) uses PDE to implement virtual processors (vprocs). The Teradata Database is classified as a TPA. The four components of the Teradata Database TPA are: AMP (Top Right) PE (Bottom Right) Channel Driver (Top Left) Teradata Gateway (Bottom Left) A Parsing Engine (PE) is a vproc that manages the dialogue between a client application and the Teradata Database, once a valid session has been established. Each PE can support a maximum of 120 sessions. The AMP is a vproc that controls its portion of the data on the system. AMPs do the physical work associated with generating an answer set (output) including sorting, aggregating, formatting, and converting. The AMPs perform all database management functions on the required rows in the system. The AMPs work in parallel, each AMP managing the data rows stored on its single vdisk. AMPs are involved in data distribution and data access in different ways. Channel Driver software is the means of communication between an application and the PEs assigned to channel-attached clients. There is one Channel Driver per node. Teradata Gateway software is the means of communication between an application and the PEs assigned to network-attached clients. There is one Teradata Gateway per node.

11 Node Architecture (‘Shared Nothing’)
9/17/2018 Each Teradata Node is made up of hardware and software Each node runs copy of OS, database SW, & virtual processes (above line) Each node has CPUs, system disk, memory & adapters (below line) PE vproc AMP vproc Vdisk UNIX PDE V2 Virtual Processors (Vprocs) A Teradata Database node requires three distinct pieces of software: The Teradata Database can run on the following operating systems: UNIX MP-RAS Microsoft Windows 2000 The Parallel Database Extensions (PDE) software layer was added to the operating system by NCR to support the parallel software environment. A Trusted Parallel Application (TPA) uses PDE to implement virtual processors (vprocs). The Teradata Database is classified as a TPA. The four components of the Teradata Database TPA are: AMP (Top Right) PE (Bottom Right) Channel Driver (Top Left) Teradata Gateway (Bottom Left) A Parsing Engine (PE) is a vproc that manages the dialogue between a client application and the Teradata Database, once a valid session has been established. Each PE can support a maximum of 120 sessions. The AMP is a vproc that controls its portion of the data on the system. AMPs do the physical work associated with generating an answer set (output) including sorting, aggregating, formatting, and converting. The AMPs perform all database management functions on the required rows in the system. The AMPs work in parallel, each AMP managing the data rows stored on its single vdisk. AMPs are involved in data distribution and data access in different ways. Channel Driver software is the means of communication between an application and the PEs assigned to channel-attached clients. There is one Channel Driver per node. Teradata Gateway software is the means of communication between an application and the PEs assigned to network-attached clients. There is one Teradata Gateway per node.

12 NCR 5400 Server Value Prop Better Price/Performance
9/17/2018 Better Price/Performance 20% Performance Improvement 12% Price/Performance Improvement Advanced Cabinet Design Up to 10 Nodes Per Cabinet Up to a 40% Reduction in Floor Space Investment Protection Multi Generation (5) Coexistence 32-bit/64-bit Transition Platform Let’s do a quick review of some important dates. The release we are announcing internally today is the NCR 5400 Server with MP-RAS and existing storage. The external GCA date and supporting press release is March Also available in this timeframe are new Storage Tek Tape Libraries and the Teradata AWS with Windows Server 2003. On the heals of the February release in April, the a new NCR Enterprise Storage cabinet, the 6842 will be announced along with FICON Channel Connectivity and product updates from our 3 BAR software partners. Later in 2nd quarter, support for the Microsoft Windows 2000 and the Microsoft Windows Server 2003 operating systems on the 5400 will be released, along with a new box for the AWS and a new box for the SMP. More information will be available at release time. Let’s start by looking at the 3 key messages associated with the 5400 release.

13 NCR 5400 Server Key Messages #2 – Advanced Cabinet Design
9/17/2018 Revolutionary cabinet increases reliability and provides greater configuration flexibility. up to 10 nodes per cabinet enable a 20% - 40% smaller footprint than the 5380 30% increase in system storage reliability with new advanced cooling mechanisms Extend supported distance for large systems (65+ nodes) between cabinets to 300 – 600 meters with new BYNET V3. Doubles the number of configurable nodes to 1,024 3 1 BYNET V3 Switches Five UPS Modules Ethernet Switches Up to 10 nodes within each cabinet Server Management Module (3GSM) FC Switches In addition, the new design provides greater flexibility in data center configuration options. The new design increases the number of nodes in a cabinet up to 10. Previously, we supported up to 4 nodes per cabinet, now with up to 10 nodes/cabinet, we reduce the footprint and floor space required for larger Teradata systems. And with the inclusion of the new BYNET release, BYNET V3, we’ve extended the physical distance that customers can put between the cabinets for very large systems. Customers can now split a Teradata system in 2 distinct physical locations on their data center floor with as much as 300 – 600 meters between them. Additionally, the BYNET V3 doubles our system scalability, enabling up to 1024 nodes in a single system. While we don’t expect many customers to approach this limit in the near term, it does support our unlimited scalability story and provide for future growth in the long term.

14 Key Teradata Differentiators
Parallelism throughout platform Shared Nothing Architecture Proprietary intelligent system inter-connect

15 Parallelism via BYNET Interconnect
9/17/2018 BYNET high-speed interconnect facilitates system communication All nodes connected via BYNET Hardware network Software runs on each node Vproc to Vproc Multicast (1 to Many) Broadcast (1 to All) Different communication paths facilitate system parallelism 1 to 1 1 to Many 1 to All The BYNET (pronounced, "bye-net") is a high-speed interconnect (network) that enables multiple nodes in the system to communicate. The BYNET handles the internal communication of the Teradata Database. All communication between PEs and AMPs is done via the BYNET. When the PE dispatches the steps for the AMPs to perform, they are dispatched onto the BYNET. The messages are routed to the appropriate AMP(s) where results sets and status information are generated. This response information is also routed back to the requesting PE via the BYNET. Depending on the nature of the dispatch request, the communication between nodes may be to all nodes (Broadcast message) or to one specfic node (Point-to-point message) in the system The BYNET has several unique features: Scalable: As you add more nodes to the system, the overall network bandwidth scales linearly. This linear scalability means you can increase system size without performance penalty -- and sometimes even increase performance. High performance: An MPP system typically has two BYNET networks (BYNET 0 and BYNET 1). Because both networks in a system are active, the system benefits from having full use of the aggregate bandwidth of both the networks. Fault tolerant: Each network has multiple connection paths. If the BYNET detects an unusable path in either network, it will automatically reconfigure that network so all messages avoid the unusable path. Additionally, in the rare case that BYNET 0 cannot be reconfigured, hardware on BYNET 0 is disabled and messages are re-routed to BYNET 1. Load balanced: Traffic is automatically and dynamically distributed between both BYNETs. The BYNET hardware and software handle the communication between the vprocs and the nodes. Hardware: The nodes of an MPP system are connected with the BYNET hardware, consisting of BYNET boards and cables. Software: The BYNET driver (software) is installed on every node. This BYNET driver is an interface between the PDE software and the BYNET hardware. SMP systems do not contain BYNET hardware. The PDE and BYNET software emulate BYNET activity in a single-node environment. Point-to-Point Messages With point-to-point messaging between vprocs, a vproc can send a message to another vproc on: The same node (using PDE and BYNET software) A different node using two steps: Send a point-to-point message from the sending node to the node containing the recipient vproc. This is a communication between nodes using the BYNET hardware. Within the recipient node, the message is sent to the recipient vproc. This is a point-to-point communication between vprocs using the PDE and BYNET software. Multicast Messages A vproc can send a message to multiple vprocs using two steps: Send a broadcast message from the sending node to all nodes. This is a communication between nodes using the BYNET hardware. Within the recipient nodes, the PDE and BYNET software determine which, if any, of its vprocs should receive the message and delivers the message accordingly. This is a multicast communication between vprocs within the node, using the PDE and BYNET software. Broadcast Messages A vproc can send a message to all the vprocs in the system using two steps: Within each recipient node, the message is sent to all vprocs. This is a broadcast communication between vprocs using the PDE and BYNET software.

16 MPP System Configuration
9/17/2018 Nodes grouped to increase data availability and system uptime Not shared storage but access within group Improves data availability Improves system up time Allows for VPROC migration The diagram below shows three cliques. The nodes in each clique are cabled to the same disk arrays. The overall system is connected by the BYNET. If one node goes down in a clique the vprocs will migrate to the other nodes in the clique, so data remains available. However, system performance decreases due to the loss of a node. System performance degradation is proportional to clique size. Vprocs are distributed across all nodes in the system. Multiple cliques in the system should have the same number of nodes.

17 Teradata Clique Clique is group of nodes that access same arrays
9/17/2018 Clique is group of nodes that access same arrays VPROC smallest unit of parallelism VPROC has assigned storage within clique VPROCs can migrate within clique Improves system up time, data availability, and ease of recovery Node Disk Array A clique (pronounced, "kleek") is a group of nodes that share access to the same disk arrays. Each multi-node system has at least one clique. The cabling determines which nodes are in which cliques -- the nodes of a clique are connected to the disk array controllers of the same disk arrays. Cliques Provide Resiliency  In the rare event of a node failure, cliques provide for data access through vproc migration. When a node resets, the following happens to the AMPs: When the node fails, the Teradata Database restarts across all remaining nodes in the system. The vprocs (AMPs) from the failed node migrate to the operational nodes in its clique. Disks managed by the AMP remain available and processing continues while the failed node is being repaired. Vprocs are distributed across all nodes in the system. Multiple cliques in the system should have the same number of nodes. The diagram below shows three cliques. The nodes in each clique are cabled to the same disk arrays. The overall system is connected by the BYNET. If one node goes down in a clique the vprocs will migrate to the other nodes in the clique, so data remains available. However, system performance decreases due to the loss of a node. System performance degradation is proportional to clique size. = VPROC

18 Teradata Clique and VPROC
9/17/2018 VPROC smallest unit of parallelism VPROC smallest unit of parallelism or work Data distributed by hash to all VPROCs VPROC has assigned storage within clique VPROCs can migrate within clique Improves system up time, data availability, and ease of recovery Data fully available at degraded performance until node returns. Node Node Node Node X Disk Array Disk Array Disk Array Disk Array A clique (pronounced, "kleek") is a group of nodes that share access to the same disk arrays. Each multi-node system has at least one clique. The cabling determines which nodes are in which cliques -- the nodes of a clique are connected to the disk array controllers of the same disk arrays. Cliques Provide Resiliency  In the rare event of a node failure, cliques provide for data access through vproc migration. When a node resets, the following happens to the AMPs: When the node fails, the Teradata Database restarts across all remaining nodes in the system. The vprocs (AMPs) from the failed node migrate to the operational nodes in its clique. Disks managed by the AMP remain available and processing continues while the failed node is being repaired. Vprocs are distributed across all nodes in the system. Multiple cliques in the system should have the same number of nodes. The diagram below shows three cliques. The nodes in each clique are cabled to the same disk arrays. The overall system is connected by the BYNET. If one node goes down in a clique the vprocs will migrate to the other nodes in the clique, so data remains available. However, system performance decreases due to the loss of a node. System performance degradation is proportional to clique size. = VPROC

19 Teradata Clique with Hot Standby
9/17/2018 Node Hot Standby X Disk Array Fibre Channel Switches And finally, for any system 8 nodes or greater, implementing these two new solutions together you receive all the performance continuity benefits of Hot Standby Node. By implementing Large Cliques with Hot Standby Nodes, your system will have fewer cliques overall, requiring fewer Hot Standby Nodes.

20 Teradata Optimizer 9/17/2018 The Teradata Optimizer is the most robust in the industry Optimizer is parallel-aware, understands available system components Handles mixed work loads Multiple complex queries Joins per query Unlimited ad-hoc processing Output is least expensive plan (resources) to answer request The Optimizer is parallel-aware, meaning that it has knowledge of system components (how many nodes, vprocs, etc.). It determines the least expensive plan (time-wise) to process queries fast and in parallel.

21 Teradata Request Cycle
9/17/2018 DBase AccessRights TVM TVFields Indexes DATA parcel AMP STEPS STATISTICS GENERATOR OPTIMIZER RESOLVER SYNTAXER DD REQUEST Parcel CACHED? No GNCAPPLY SECURITY Yes Request flow diagram Each request parcel contains at least one SQL statement Six main component steps Syntaxer Resolver Security Optimizer Generator gncApply AMP steps are instructions sent to AMP VPROCs to complete the request Following completion each request generates a success/fail parcel with any necessary records. A Request parcel contains one or more whole SQL statements. Normally, a Request parcel represents a single transaction. Some transactions may require multiple Request parcels. A REQUEST parcel is followed by zero or one DATA parcel plus one RESPOND parcel. The RESPOND parcel identifies response buffer size. A RESPOND parcel may be sent by itself as a continuation request for additional data. A SUCCESS parcel may be followed by DATA parcels. Every REQUEST parcel generates a SUCCESS/FAIL parcel. SQL Parser Overview (done by PE) The flowchart provides an overview of the SQL parser. As you can see, it is composed of six main sections: Syntaxer, Resolver, Security, Optimizer, Generator and gncApply. When the parser sees a Request parcel it checks to see if it has parsed and cached the execution steps for it. If the answer is NO, then the Request must pass through all the sections of the parser as follows: The Syntaxer checks the Request for valid syntax. The Resolver breaks down Views and Macros into their underlying table references through use of DD information. Security determines whether the Requesting UserID has the necessary permissions. The Optimizer chooses the execution plan. The Generator creates the steps for execution. gncApply binds the data values into the steps. (This phase of the Parser is known as Optapply.) Note: If the steps in the Request parcel are in cache, the Request passes directly to gncApply (after a check by Security). This is illustrated on the flow chart by the YES path from the CACHED? decision box. A REQUEST parcel is followed by zero or one DATA parcel plus one RESPOND parcel. The RESPOND parcel identifies response buffer size. A RESPOND parcel may be sent by itself as a continuation request for additional data. A SUCCESS parcel may be followed by DATA parcels. Every REQUEST parcel generates a SUCCESS/FAIL parcel.

22 Data Protection (Object Locks)
9/17/2018 Locks protect data from simultaneous access Vary by type Exclusive, Write, Read, & Access Vary by object locked Database, Table, & Row Hash Locks enforced by hierarchy Temporary locks can be placed on data to prevent multiple users from simultaneously changing it: Exclusive Lock Write Lock Read Lock Access Lock Locks may be applied at three levels: Database Locks: Apply to all tables and views in the database. Table Locks: Apply to all rows in the table. Row Hash Locks: Apply to a group of one or more rows in a table. Exclusive Exclusive locks are applied only to databases or tables, never to rows. They are the most restrictive type of lock. With an exclusive lock, no other user can access the database or table. Exclusive locks are used rarely, most often when structural changes are being made to the database. An exclusive lock on a database or table prevents other users from obtaining the following type of locks on the locked data: Exclusive locks Write locks Read locks Access locks Write Write locks enable users to modify data while maintaining data consistency. While the data has a write lock on it, other users can obtain an access lock only. During this time, all other locks are held in a queue until the write lock is released. Write locks prevent other users from obtaining the following locks on the locked data: Read Read locks are used to ensure consistency during read operations. Several users may hold concurrent read locks on the same data, during which time no data modification is permitted. Read locks prevent other users from obtaining the following locks on the locked data: Access Access locks can be specified by users unconcerned about data consistency. The use of an access lock allows for reading data while modifications are in process. Access locks are designed for decision support on large tables that are updated only by small, single-row changes. Access locks are sometimes called "stale read" locks, because you may get "stale data" that has not been updated. Access locks prevent other users from obtaining the following locks on the locked data:

23 Data Protection (RAID-1)
9/17/2018 RAID data protection RAID-1 (disk mirroring) Disk pair increases read performance and data availability In failure scenario, mirrored drive re-built by array controller Several types of data protection are available with the Teradata Database. Redundant Array of Inexpensive Disks (RAID) is a storage technology that provides data protection at the disk drive level. It uses groups of disk drives called "arrays" to ensure that data is available in the event of a failed disk drive or other component. The Teradata Database has journals that can be used for specific types of data or process recovery: Permanent Journals Recovery Journals Fallback is accomplished by grouping AMPs into clusters. When a table is defined as Fallback-protected, the system stores a second copy of each row in the table on the disk space managed by an alternate "Fallback AMP" in the AMP cluster. Below is a cluster of four AMPs. Each AMP has a combination of Primary and Fallback data rows: Primary Data Row: A record in a database table that is used in normal system operation. Fallback Data Row: The online backup copy of a Primary data row that is used in the case of an AMP failure. Permanent Journals are an optional feature of the Teradata Database to provide an additional level of data protection. You specify the use of Permanent Journals at the table level. It provides full-table recovery to a specific point in time. It also can reduce the need for costly and time-consuming full-table backups. The Teradata Database uses Recovery Journals to automatically maintain data integrity in the case of: An interrupted transaction (Transient Journal) An AMP failure (Down-AMP Recovery Journal) Recovery Journals are created, maintained, and purged by the system automatically, so no DBA intervention is required. Recovery Journals are tables stored on disk arrays like user data is, so they take up additional disk space on the system.

24 Data Protection (Fallback)
9/17/2018 Fallback table data Copy of table rows maintained by database on second AMP VPROC Fallback copies grouped logically in CLUSTERS so data fully available when physical CLIQUE is off-line. Fallback + RAID increase data availability Several types of data protection are available with the Teradata Database. Fallback is a Teradata Database feature that protects data against AMP failure. Fallback is accomplished by grouping AMPs into logical clusters. When a table is defined as Fallback-protected, the system stores a second copy of each row in the table on the disk space managed by an alternate "Fallback AMP" in the AMP cluster. Below is a cluster of four AMPs. Each AMP has a combination of Primary and Fallback data rows: Primary Data Row: A record in a database table that is used in normal system operation. Fallback Data Row: The online backup copy of a Primary data row that is used in the case of an AMP failure. No two AMP VPROCs in a cluster should reside in the same physical clique (node group) to prevent a single point of hardware failure that would disrupt data availability. Permanent Journals are an optional feature of the Teradata Database to provide an additional level of data protection. You specify the use of Permanent Journals at the table level. It provides full-table recovery to a specific point in time. It also can reduce the need for costly and time-consuming full-table backups. The Teradata Database uses Recovery Journals to automatically maintain data integrity in the case of: An interrupted transaction (Transient Journal) An AMP failure (Down-AMP Recovery Journal) Recovery Journals are created, maintained, and purged by the system automatically, so no DBA intervention is required. Recovery Journals are tables stored on disk arrays like user data is, so they take up additional disk space on the system.

25 Data Storage and Access
9/17/2018 Data stored by hash Primary Index is chosen for data distribution, not same as primary key Primary Index value hashed Hash value creates bucket assignment Hash Map assigns buckets to AMP VPROCs AMP VPROCs reside on specific node AMP VPROC writes row to disk Data and algorithm exceptions require Uniqueness value for guaranteed unique Row ID Bucket # Row Hash Parsing Engine Primary Index value = 25 Message Passing Layer AMP Hash Map Hashing Algorithm 25 A hashing algorithm is a standard data processing technique that takes in a data value, like last name or order number, and systematically mixes it up so that the incoming values are converted to a number in a range from zero to the specified maximum value. A successful hashing scheme scatters the input evenly over the range of possible output values. It is predictable in that Smith will always hash to the same value and Jones will always hash to another (hopefully different) value. With a good hashing algorithm any patterns in the input data should disappear in the output data. Teradata’s algorithm works predictably well over any data, typically loading each AMP with variations in the range of .1% to .5% between AMPs. For extremely large systems, the variation can be as low as .001% between AMPs. A Row Hash is the 32-bit result of applying a hashing algorithm to an index value. The DSW (Destination Selection Word) or Hash Bucket is represented by the high order 16 bits of the Row Hash. Teradata also uses hashing quite differently than other data storage systems. Other hashed data storage systems equate a bucket with a physical location on disk. In Teradata, a bucket is simply an entry in a hash map. Each hash map entry points to a single AMP. Therefore, changing the number of AMPs does not require any adjustment to the hashing algorithm. Teradata simply adjusts the hash maps and redistributes any affected rows. The hash maps must always be available to the Message Passing Layer. A Teradata Version 2 a hash map has 65,536 entries. When the hash bucket has determined the destination AMP, the full 32-bit row hash plus the Table-ID is used to assign the row to a cylinder and a data block on the AMPs disk storage. In Version 2 the algorithm can produce over 4,000,000,000 row hash values. Hash values will be the same for non-unique primary indexes and for hash synonyms (values where different inputs produce same output), so a ‘Uniqueness value’ is added to the Row Hash. This combined value becomes the Row ID and is truly unique for every row on the database.

26 Data Access by Primary Index
9/17/2018 Data accessed by row hash value Need 3 pieces of information to find a row Table ID Row Hash of PI value Output of hash algorithm on PI Value PI Value Operation involves only one AMP VPROC Table-id Row-hash Cyl 1 Index Cyl 2 Cyl 3 Cyl 4 Cyl 5 Cyl 6 Cyl 7 M a s t e r I n d x Data Row DATA BLOCK AMP #3 Cylinder # PI Value Locating a row on an AMP VPROC requires three inputs: Table ID Row Hash of the PI PI value Table ID and Row hash are used in Hash Map to identify the AMP VPROC that has the row (#3 in this case) The AMP VPROC uses the Table-id and Row hash to look up the cylinder number in the Master Index (memory resident structure on each AMP VPROC) Then the cylinder index is reference (also memory resident) to find the specific data block on disk that contains the row. The Row ID In order to differentiate each row in a table, every row is assigned a unique Row ID. The Row ID is a combination of the row hash value plus a uniqueness value. The AMP appends the uniqueness value to the row hash when it is inserted. The Uniqueness Value is used to differentiate between PI values that generate identical row hashes. The first row inserted with a particular row hash value is assigned a uniqueness value of 1. Each new row with the same row hash is assigned an integer value one greater than the current largest uniqueness value for this Row ID. If a row is deleted or the primary index is modified, the uniqueness value can be reused. Only the Row Hash portion is used in Primary Index operations. The entire Row ID is used for Secondary Index support.

27 Data Access by Unique Secondary Index (USI)
Unique Secondary Index (USI) Access Data Access by Unique Secondary Index (USI) 9/17/2018 Hashing Algorithm Customer Table ID = 100 USI Value = 56 PE Table ID 100 Row Hash 602 Index Value 56 CREATE UNIQUE INDEX (cust) on customer; SELECT * FROM customer WHERE cust = 56; Create USI Access via USI USI Subtable RowID Cust RowID 244, , 1 505, , 1 744, , 9 757, , 1 135, , 6 296, , 5 602, , 7 969, , 1 288, , 1 339, , 1 372, , 1 588, , 3 175, , 1 489, , 2 838, , 2 919, , 1 Message Passing Layer AMP 1 AMP 2 AMP 3 AMP 4 USI Data access Index is created on table SQL uses USI by value PE VPROC managing session uses same information as primary index access (Table ID, Row Hash, Index Value) This process involves two AMP VPROC operations After USI subtable lookup, process similar to primary index access Table ID 100 Row Hash 778 Unique Val 7 Base Table RowID Cust Name Phone USI NUPI 471, Adams 555, Brown 717, Adams 884, Smith 147, Smith 147, Young 388, Jones 822, Black RowID Cust Name Phone USI NUPI 107, White 536, Rice 638, Adams 640, Smith AMP 1 AMP 2 AMP 3 AMP 4 Message Passing Layer Locating a row on an AMP VPROC requires three inputs: Table ID Row Hash of the PI PI value A secondary index is an alternate path to the data. Secondary Indexes are used to improve performance by allowing the user to avoid scanning the entire table. A Secondary Index is like a Primary Index in that it allows the user to locate rows. It is unlike a Primary Index in that it has no influence on the way rows are distributed among AMPs. A database designer typically chooses a secondary index because it provides faster set selection. Primary Indexes requests require the services on only one AMP to access rows, while secondary indexes require at least two and possibly all AMPs, depending on the index and the type of operation. A secondary index search will typically be less expensive than a full table scan. Secondary indexes add overhead to the table, both in terms of disk space and maintenance, however they may be dropped when not needed, and recreated whenever they would be helpful. Just as with primary indexes, there are two types of secondary indexes – unique (USI) and non-unique (NUSI). Secondary Indexes may be specified at table creation or at any time during the life of the table. It may consist of up to 16 columns, however to get the benefit of the index, the query would have to specify a value for all 16 values. Unique Secondary Indexes (USI) have two possible purposes. They can speed up access to a row which otherwise might require a full table scan without having to rely on the primary index. Additionally, they can be used to enforce uniqueness on a column or set of columns. This is sometimes the case with a Primary Key which is not designated as the Primary Index. Making it a USI has the effect of enforcing the uniqueness of the PK. All secondary indexes cause an AMP local subtable to be built and maintained as column values change. Secondary index subtables consist of rows which associate the secondary index value with one or more rows in the base table. When the index is dropped, the subtable is physically removed. RowID Cust Name Phone USI NUPI 639, Jones 778, Peters 778, Smith 915, Marsh

28 Data Access via Non-unique Secondary Index (NUSI)
9/17/2018 Customer Table ID = 100 Table ID 100 Row Hash 567 Index Value ‘Adams’ Hashing Algorithm NUSI Value = ‘Adams’ PE CREATE INDEX (name) on customer; SELECT * FROM customer WHERE name = ‘Adams’; Create NUSI Access via NUSI NUSI Subtable RowID Name RowID 432, 8 Smith 640, 1 448, 1 White 107, 1 567, 3 Adams 638, 1 656, 1 Rice 536, 5 Message Passing Layer AMP 1 AMP 2 AMP 3 AMP 4 432, 1 Smith 147, 1 448, 4 Black 822, 1 567, 6 Jones 338, 1 770, 1 Young 147, 2 155, 1 Marsh 915, 9 396, 1 Peters 778, 3 432, 5 Smith 778, 7 567, 1 Jones 639, 1 432, 3 Smith 884,1 567, 2 Adams 471,1 717,2 852, 1 Brown 555,6 Index is created on table SQL uses NUSI by value PE VPROC managing session uses same information as primary index access (Table ID, Row Hash, Index Value) This process involves all-AMP VPROC operations Just as with primary indexes, there are two types of secondary indexes – unique (USI) and non-unique (NUSI). Secondary Indexes may be specified at table creation or at any time during the life of the table. It may consist of up to 16 columns, however to get the benefit of the index, the query would have to specify a value for all 16 values. Non-Unique Secondary Indexes (NUSI) are usually specified in order to prevent full table scans. NUSI’s however do activate all AMPs - after all, the value being sought might well live on many different AMPs (only Primary Indexes have same values on same AMPs). If the optimizer decides that the cost of using the secondary index is greater than a table scan would be, it opts for the table scan. All secondary indexes cause an AMP local subtable to be built and maintained as column values change. Secondary index subtables consist of rows which associate the secondary index value with one or more rows in the base table. When the index is dropped, the subtable is physically removed. Notice in all cases the data used to access the index is the same. Base Table RowID Cust Name Phone NUSI NUPI 147, Smith 147, Young 388, Jones 822, Black 107, White 536, Rice 638, Adams 640, Smith AMP 1 AMP 2 AMP 3 AMP 4 639, Jones 778, Peters 778, Smith 915, Marsh 471, Adams 555, Brown 717, Adams 884, Smith

29 Teradata Structures Users Databases Database structures Tables Views
9/17/2018 Database structures Users Databases Tables Views Macros Triggers Stored Procedures User Defined Functions Database: The Teradata Definition In Teradata, a "database" provides a logical grouping of information. A Teradata Database also provides a key role in space allocation and access control. A Teradata Database is a defined, logical repository that can contain objects, including: Databases: A defined object that may contain a collection of Teradata Database objects. Users: Databases that each have a logon ID and password for logging on to the Teradata Database. Tables: Two-dimensional structures of columns and rows of data stored on the disk drives. (Require Perm Space) Views: A virtual "window" to subsets of one or more tables or other views, pre-defined using a single SELECT statement. (Use no Perm Space) Macros: Definitions of one or more Teradata SQL and report formatting commands. (Use no Perm Space) Triggers: One or more Teradata SQL statements associated with a table and executed when specified conditions are met. (Use no Perm Space) Stored Procedures: Combinations of procedural and non-procedural statements run using a single CALL statement. (Require Perm Space) User: A Special Kind of Database A User can be thought of as a colllection of tables, views, macros, triggers, and stored procedures. A User is a specific type of Database, and has attributes in addition to the ones listed above: Logon ID Password A table in a relational database management system is a two-dimensional structure made up of columns and physical rows stored in data blocks on the disk drives. Each column represents an attribute of the table. Attributes identify, describe, or qualify the table. Each column is named and all the information contained within it is of the same type, for example, Department Number. A view is like a "window" into tables that allows multiple users to look at portions of the same base data. A view may access one or more tables, and may show only a subset of columns from the table(s). A view does not exist as a real table and does not occupy disk space. It serves as a reference to existing tables or views. A view is a logical structure with no actual data -- it accesses data that is stored in a table and returns the requested rows from the table to the user. A macro is a Teradata Database extension to ANSI SQL that defines a sequence of prewritten Teradata SQL statements. Macros are pre-defined, stored sets of one or more SQL commands and/or report-formatting (BTEQ) commands. Macros can also contain comments. Macros can be a convenient shortcut for executing groups of frequently-run or complex SQL statements (queries) or sets of operations. When you execute the macro, the statements execute as a single transaction. Macros reduce the number of keystrokes needed to perform a complex task. This saves you time, reduces the chance for errors, and reduces the communication volume to the Teradata Database. A trigger is a set of SQL statements usually associated with a column or table that are programmed to be run (or "fired") when specified changes are made to the column or table. The pre-defined change is known as a triggering event, which causes the SQL statements to be processed. A stored procedure is a pre-defined set of statements invoked through a single CALL statement in SQL. While a stored procedure may seem like a macro, it is different in that it can contain: Teradata SQL data manipulation statements (non-procedural) Procedural statements (in the Teradata Database, referred to as Stored Procedure Language)

30 Teradata is an Open System
9/17/2018 Virtually any application or middleware framework can be integrated with Teradata. Messages Web JSM JSP IIOP ASP EJB JAVA CORBA .NET JDBC JDBC ODBC OLE-DB Message Bus As you would expect, any system of this level of maturity must be an open system. TERADATA is committed to supporting data interchange with mainframes. TERADATA is committed to supporting open standards as they emerge. Teradata Utilities Teradata Adapter(s) Publish & Subscribe Teradata Utilities Adapter(s) Queues

31 64-bit Teradata Solution Teradata on SuSE Linux 2H 2005
9/17/2018 64-Bit 32-Bit 3rd Party Partners Client-Tier Client-Tier Teradata Application Intel Also IBM/Power PC SUN/SPARC HP/PA-RISC Teradata Tools & Utilities DELL HP IBM Teradata System Mgmt 64-BIT Application Server-Tier 32-BIT Application Server-Tier Teradata Database Operating System Many customers are asking about the availability of Teradata on Linux and / or when will Teradata will be available in 64-bit. We are fulfilling both of these requirements in 2005 with the release of the Teradata on 64-bit Linux solution. Currently, the 5400 is 64-bit capable as it uses the new Xeon EMT64 chip. However our complete Teradata 64-bit solution, that includes the operating system and the database will be available in late As such, we are not marketing the 5400 as a 64-bit solution. We will begin marketing the 64 bit solution in it’s entirety in mid 2005. While we are not yet offering this solution to our customers, if asked, you should articulate that the Teradata solution is already 90% there. As you can see from this picture, customers can run both their 32 bit and 64 applications from a client or server tier with the Teradata Database today. The Teradata Tools and Utilities are all available on 64-bit Linux and any 3rd party application using standard interfaces can connect without issue today. With the 5400, now we have the 64-bit platform, and in late 2005, we will release the complete solution with Linux and the Teradata Database. Now let’s move on to the recommended configuration options for the 5400. Linux Intel Platform 2H 2005 Intel 64-Bit Database Server-Tier Teradata Database on Intel 32-bit and 64-bit will support both 32-bit and 64-bit applications & clients concurrently

32 Teradata’s Real-Time Enterprise Reference Architecture
9/17/2018 Legacy Environment Enterprise Users — (Browsers and/or Portal) Legacy Environment C/S EDI Consumers Suppliers Internal Partners EDI C/S WAN / VAN Internet / Intranet WAN / VAN Transactional Services Analytic & Decision Making Services Enterprise Message Bus NW MSG-MW MSG-MW MSG-MW MSG-MW MSG-MW MSG-MW NW ASP / JSP TX1 APPL TX2 APPL TX3 APPL TX4 APPL Strategic APPL Tactical APPL BI APPL BI APPL Service Brokers DA-MW DA-MW DA-MW DA-MW DA-MW DA-MW DA-MW DA-MW QD QD MSG-MW MSG-MW MSG-MW KEY MESSAGE: The ADW (TERADATA) is an integrated part of the Real Time Enterprise. KEY MESSAGE: The EDW is NOT simply a mirror of the transactional data models. It IS a data model specifically designed for decision support. KEY MESSAGE: Driven by a business need for concurrent tactical & strategic decision making, a new class of applications need to access the ADW. Traditional (legacy) applications also need access to the ADW, either directly, or indirectly via data sharing between the transactional and decision making environments. There are four programming models illustrated in the diagram, all of which can inter-operate with in the ADW architecture: Client Server: Applications, such as, BTEQ, Cognos, MicroStrategy, are examples of “tightly bound CS applications” that can inter-operate with TERADATA. Web Services: Frameworks, such as, SeeBeyond, WebMethods, BEA WebLogic, etc., can be used to deploy database service applications based on the emerging Web Services model. Publish & Subscribe: Frameworks, such as TIBCO, can be used to deploy applications that access based on the a publish & subscribe model. EDI: Using EDI VAN vendors, such as GE Global eXchange, Get2Connect.net, Sterling Commerce, TERADATA can participate in electronic transactions with trading partners. Data is shared between the transactional and decision making environments. Data Acquisition & Integration: Data from the transactional environment is captured and copied to the ADW. Based on business needs, data can be moved to the ADW in a streaming fashion, or in a more traditional batch operation. Information Feedback: Raw data in the ADW is analyzed and transformed into actionable information, some of which is fed back to the transactional systems. TERADATA is contained in the Decision Making Environment, and is fully integrated in the Active Enterprise using a number of standard interfaces and programming models. Not all components are required to achieve the goal of an Active Enterprise. A system designer is free to choose those components that provide business value to their overall IT mission. Event Notification Business Rules Event Detection EDW — A EDW — B DA-MW DA-MW DA-MW RS OLTP1 OLTP2 OLTP3 OLTP4 RDBMS Based Event Processing Business Process Automation Streaming Batch Transactional Repositories Data Acquisition & Integration Analytic & Decision Making Repositories

33 Transactional Services
9/17/2018 Application Scope Applications have narrow scope. Tuned for specific book-keeping or transactional services. Transactional Services Transactional Application Services Applications that perform book-keeping or transactional services for the enterprise NW MSG-MW MSG-MW MSG-MW TX1 APPL TX2 APPL TX3 APPL TX4 APPL DA-MW DA-MW DA-MW DA-MW Data Access Middleware Occurs via standards, such as; ODBC, OLE-DB, JDBC, as well as proprietary techniques KEY MESSAGE: Transactional system tend to have a very narrow scope. KEY MESSAGE: Many transactional repositories exist in the enterprise. Examples include TPF, SAP, Siebel, ORCL-financials, etc OLTP Data Repositories Data that reflects the current state of various business process Limited history Tuned for transaction workload OLTP1 OLTP2 OLTP3 OLTP4 Transactional Repositories

34 Transactional User Base
9/17/2018 Legacy Environment Enterprise Users — (Browsers and/or Portal) C/S EDI Consumers Suppliers Internal Partners Transactional User Base Consumers, Suppliers, Internal, and Trading Partners WAN / VAN Internet / Intranet Transactional Services Enterprise Message Bus Service Brokers J2EE, CORBA, DCOM, Web Services NW MSG-MW MSG-MW MSG-MW ASP / JSP TX1 APPL TX2 APPL TX3 APPL TX4 APPL Service Brokers DA-MW DA-MW DA-MW DA-MW User-level Integration Occurs via standard EAI services, such as JAVA, Web Sphere, .NET, Tibco, and SeeBeyond KEY MESSAGE: The ADW (TERADATA) is an integrated part of the Real Time Enterprise. KEY MESSAGE: The EDW is NOT simply a mirror of the transactional data models. It IS a data model specifically designed for decision support. KEY MESSAGE: Driven by a business need for concurrent tactical & strategic decision making, a new class of applications need to access the ADW. Traditional (legacy) applications also need access to the ADW, either directly, or indirectly via data sharing between the transactional and decision making environments. There are four programming models illustrated in the diagram, all of which can inter-operate with in the ADW architecture: Client Server: Applications, such as, BTEQ, Cognos, MicroStrategy, are examples of “tightly bound CS applications” that can inter-operate with TERADATA. Web Services: Frameworks, such as, SeeBeyond, WebMethods, BEA WebLogic, etc., can be used to deploy database service applications based on the emerging Web Services model. Publish & Subscribe: Frameworks, such as TIBCO, can be used to deploy applications that access based on the a publish & subscribe model. EDI: Using EDI VAN vendors, such as GE Global eXchange, Get2Connect.net, Sterling Commerce, TERADATA can participate in electronic transactions with trading partners. Data is shared between the transactional and decision making environments. Data Acquisition: Data from the transactional environment is captured and copied to the ADW. Based on business needs, data can be moved to the ADW in a streaming fashion, or in a more traditional batch operation. Information Feedback: Raw data in the ADW is analyzed and transformed into actionable information, some of which is fed back to the transactional systems. TERADATA is contained in the Decision Making Environment, and is fully integrated in the Active Enterprise using a number of standard interfaces and programming models. Not all components are required to achieve the goal of an Active Enterprise. A system designer is free to choose those components that provide business value to their overall IT mission. OLTP1 OLTP2 OLTP3 OLTP4 Client/Server Styles 2-Tier and 3-Tier RPC style interfaces Transactional Repositories

35 Data Warehouse Services
9/17/2018 Application Scope Strategic and Tactical decision making applications. Though BI tools or custom applications. Analytic & Decision Making Services MSG-MW MSG-MW MSG-MW NW Strategic APPL Tactical APPL BI APPL BI APPL Application Services Applications that provide predictive analysis and assisted decision making DA-MW DA-MW DA-MW DA-MW QD QD Data Access Middleware Occurs via standards, such as ODBC, OLE-DB, JDBC, as well as proprietary techniques EDW — A EDW — B KEY MESSAGE: The ADW (TERADATA) is an integrated part of the Active Enterprise. KEY MESSAGE: Driven by a business need for concurrent tactical & strategic decision making, a new class of applications need to access the ADW. Traditional (legacy) applications also need access to the ADW, either directly, or indirectly via data sharing between the transactional and decision making environments. There are four programming models illustrated in the diagram, all of which can inter-operate with in the ADW architecture: Client Server: Applications, such as, BTEQ, Cognos, MicroStrategy, are examples of “tightly bound CS applications” that can inter-operate with TERADATA. Web Services: Frameworks, such as, SeeBeyond, WebMethods, BEA WebLogic, etc., can be used to deploy database service applications based on the emerging Web Services model. Publish & Subscribe: Frameworks, such as TIBCO, can be used to deploy applications that access based on the a publish & subscribe model. EDI: Using EDI VAN vendors, such as GE Global eXchange, Get2Connect.net, Sterling Commerce, TERADATA can participate in electronic transactions with trading partners. TERADATA is contained in the Decision Making Environment, and is fully integrated in the Active Enterprise using a number of standard interfaces and programming models. Not all components are required to achieve the goal of an Active Enterprise. A system designer is free to choose those components that provide business value to their overall IT mission. RS RDBMS Based Event Processing Enterprise Data Warehouse Consolidated enterprise data Crosses multiple business domains Integrated data model Analytic & Decision Making Repositories

36 Decision Support User Base
9/17/2018 Enterprise Users — (Browsers and/or Portal) Legacy Environment Consumers Suppliers Internal Partners EDI C/S Internet / Intranet WAN / VAN Analytic & Decision Making Services DW User Base Consumers, Suppliers, Internal, and Trading Partners Enterprise Message Bus MSG-MW MSG-MW MSG-MW NW ASP / JSP Strategic APPL Tactical APPL BI APPL BI APPL Service Brokers DA-MW DA-MW DA-MW DA-MW Service Broker Styles J2EE, CORBA, DCOM, Web Services KEY MESSAGE: The ADW (TERADATA) is an integrated part of the Real Time Enterprise. KEY MESSAGE: The EDW is NOT simply a mirror of the transactional data models. It IS a data model specifically designed for decision support. KEY MESSAGE: Driven by a business need for concurrent tactical & strategic decision making, a new class of applications need to access the ADW. Traditional (legacy) applications also need access to the ADW, either directly, or indirectly via data sharing between the transactional and decision making environments. There are four programming models illustrated in the diagram, all of which can inter-operate with in the ADW architecture: Client Server: Applications, such as, BTEQ, Cognos, MicroStrategy, are examples of “tightly bound CS applications” that can inter-operate with TERADATA. Web Services: Frameworks, such as, SeeBeyond, WebMethods, BEA WebLogic, etc., can be used to deploy database service applications based on the emerging Web Services model. Publish & Subscribe: Frameworks, such as TIBCO, can be used to deploy applications that access based on the a publish & subscribe model. EDI: Using EDI VAN vendors, such as GE Global eXchange, Get2Connect.net, Sterling Commerce, TERADATA can participate in electronic transactions with trading partners. Data is shared between the transactional and decision making environments. Data Acquisition: Data from the transactional environment is captured and copied to the ADW. Based on business needs, data can be moved to the ADW in a streaming fashion, or in a more traditional batch operation. Information Feedback: Raw data in the ADW is analyzed and transformed into actionable information, some of which is fed back to the transactional systems. TERADATA is contained in the Decision Making Environment, and is fully integrated in the Active Enterprise using a number of standard interfaces and programming models. Not all components are required to achieve the goal of an Active Enterprise. A system designer is free to choose those components that provide business value to their overall IT mission. QD QD User-level Integration occurs via standard EAI services, such as Web Services, JAVA, .NET, Tibco, and SeeBeyond EDW — A EDW — B RS Client/Server Styles 2-Tier and 3-Tier RPC style interfaces RDBMS Based Event Processing Analytic & Decision Making Repositories

37 Data Acquisition Services
9/17/2018 Data Extraction Data is extracted from OLTP systems Partner ETL tools are frequently used here Data Transformation Services Data cleansing Data transformation (normalization) Streaming data for frequent updates Batch data moves for bulk operations Partner ETL tools are typically used to perform these services Data Load Data is loaded into EDW system using Teradata Load tools FastLoad MultiLoad TPump Data Acquisition Options Traditional load utilities (bulk or continuous loads) Loads through – “in-flight” Message Passing Replication – Table level replication from source to target QD QD EDW — A EDW — B RS KEY MESSAGE: The ADW (TERADATA) is an integrated part of the Active Enterprise. KEY MESSAGE: Driven by a business need for concurrent tactical & strategic decision making, a new class of applications need to access the ADW. Traditional (legacy) applications also need access to the ADW, either directly, or indirectly via data sharing between the transactional and decision making environments. There are four programming models illustrated in the diagram, all of which can inter-operate with in the ADW architecture: Client Server: Applications, such as, BTEQ, Cognos, MicroStrategy, are examples of “tightly bound CS applications” that can inter-operate with TERADATA. Web Services: Frameworks, such as, SeeBeyond, WebMethods, BEA WebLogic, etc., can be used to deploy database service applications based on the emerging Web Services model. Publish & Subscribe: Frameworks, such as TIBCO, can be used to deploy applications that access based on the a publish & subscribe model. EDI: Using EDI VAN vendors, such as GE Global eXchange, Get2Connect.net, Sterling Commerce, TERADATA can participate in electronic transactions with trading partners. Data is shared between the transactional and decision making environments. Data Acquisition: Data from the transactional environment is captured and copied to the ADW. Based on business needs, data can be moved to the ADW in a streaming fashion, or in a more traditional batch operation. Information Feedback: Raw data in the ADW is analyzed and transformed into actionable information, some of which is fed back to the transactional systems. Direct Information Access: The transactional systems can also access the ADW directly. TERADATA is contained in the Decision Making Environment, and is fully integrated in the Active Enterprise using a number of standard interfaces and programming models. Not all components are required to achieve the goal of an Active Enterprise. A system designer is free to choose those components that provide business value to their overall IT mission. OLTP1 OLTP2 OLTP3 OLTP4 RDBMS Based Event Processing Streaming Batch Transactional Repositories Data Acquisition & Integration Analytic & Decision Making Repositories

38 Event-Driven Business Processes
9/17/2018 Business Process Automation Event Detection Applied Business Rules Event Notification Messages are passed via P2P, Web Services or Enterprise Message Bus RDBMS Based Event Processing Real-time events are detected through a combination of Triggers, Stored Procedures, and UDFs Event engine performs query Messages are passed via P2P, Web Services or Enterprise Message Bus Enterprise Message Bus QD QD MSG-MW MSG-MW MSG-MW Event Notification Business Rules Event Detection EDW — A EDW — B DA-MW DA-MW DA-MW RS KEY MESSAGE: The ADW (TERADATA) is an integrated part of the Active Enterprise. OLTP1 OLTP2 OLTP3 OLTP4 RDBMS Based Event Processing Business Process Automation Streaming Batch Transactional Repositories Data Acquisition & Integration Analytic & Decision Making Repositories

39 Application Integration
9/17/2018 Decision Making Applications interact with bookkeeping applications via standard Enterprise services, such as Web Services, JAVA, .NET, -or- through the use of traditional client/server technology. Transactional Services Analytic & Decision Making Services Enterprise Message Bus NW MSG-MW MSG-MW MSG-MW MSG-MW MSG-MW MSG-MW NW ASP / JSP TX1 APPL TX2 APPL TX3 APPL TX4 APPL Strategic APPL Tactical APPL BI APPL BI APPL Service Brokers DA-MW DA-MW DA-MW DA-MW DA-MW DA-MW DA-MW DA-MW QD QD MSG-MW MSG-MW MSG-MW KEY MESSAGE: The ADW (TERADATA) is an integrated part of the Active Enterprise. Event Notification Business Rules Event Detection EDW — A EDW — B DA-MW DA-MW DA-MW RS OLTP1 OLTP2 OLTP3 OLTP4 RDBMS Based Event Processing Business Process Automation Streaming Batch Transactional Repositories Data Acquisition & Integration Analytic & Decision Making Repositories

40 Dual Active Solution Replication Services Changed data capture in V2R6
Update propagation via “GoldenGate” Teradata Query Director Query routing control based on business rules Business Continuity, workload sharing Analytic & Decision Making Services MSG-MW MSG-MW MSG-MW NW Strategic APPL Tactical APPL BI APPL BI APPL DA-MW DA-MW DA-MW DA-MW QD QD Dual Data Load Input data stream is split into two independent load streams Input data is filtered so that only critical data is loaded on the Secondary “Active” system Secondary “Active” system does not need to be as large as primary system EDW — A EDW — B RS RDBMS Based Event Processing Streaming Batch Data Acquisition & Integration Analytic & Decision Making Repositories


Download ppt "Teradata Platform Introduction"

Similar presentations


Ads by Google