Parallelism Concepts.

Parallelism Concepts

Objectives After completing this lesson, you should be able to:
Identify the benefit of using parallel operations Describe system conditions under which to use parallel operations Describe how parallel operations work Control the parallel execution server pool List operations that can be parallelized

Introduction to Parallel Execution
CPU scan CPU scan CPU idle CPU scan CPU idle CPU scan CPU idle CPU scan Introduction to Parallel Execution Parallel execution can dramatically reduce response time for data-intensive operations on large databases typically associated with decision support systems (DSSs) and data warehouses. Symmetric multiprocessing (SMP), clustered systems, and massively parallel systems gain the largest performance benefits from parallel execution because statement processing can be split up among many CPUs on a single database. Parallelism is the idea of breaking down a task so that, instead of one process doing all the work in a query, many processes do part of the work at the same time. The improvement in performance can be quite high. Parallel execution helps systems scale in performance by making optimal use of hardware resources. If your system’s CPUs and disk controllers are already heavily loaded, you must alleviate the load or increase the hardware resources before using parallel execution to improve performance. Some tasks are not well-suited for parallel execution. For example, many OLTP operations are relatively fast, completing in mere seconds or fractions of seconds, and the overhead of utilizing parallel execution would be large, relative to the overall execution time. Server without parallelism Server with parallelism

System Conditions to Implement Parallelism
SMP, MPP, clusters using RAC I/O bandwidth CPUs used less than 30% Sufficient memory System Conditions to Implement Parallelism Parallel execution benefits systems that ideally have all of the following characteristics: Symmetric multiprocessors (SMP), RAC clusters, or massively parallel systems (for example, multiple CPUs) Sufficient I/O bandwidth Underused or intermittently used CPUs (for example, systems where CPU usage is typically less than 30%) Sufficient memory to support additional memory-intensive processes such as sorts, hashing, and I/O buffers If your system lacks some of these characteristics, parallel execution may not significantly improve performance. In fact, parallel execution can reduce system performance on overused systems or systems with insufficient I/O bandwidth.

Operations That Can Be Parallelized
Access methods: Table scans, fast full index scans Partitioned index range scans Various SQL operations Joins: Nested loop, sort merge Hash, star transformation, partitionwise join DDL statements: CTAS, CREATE INDEX, REBUILD INDEX [PARTITION] MOVE, SPLIT, COALESCE PARTITION DML statements: INSERT SELECT, UPDATE, DELETE, MERGE SQL*Loader Operations That Can Be Parallelized The Oracle server can use parallel execution for any of the following operations: Access methods: Table scans, fast full index scans, and partitioned index range scans Joins: Nested loop, sort merge, hash, and star transformation DDL statements: CREATE TABLE AS SELECT, CREATE INDEX, REBUILD INDEX, REBUILD INDEX PARTITION, and MOVE SPLIT COALESCE PARTITION. You can normally use parallel DDL where you use regular DDL. However, there are additional details to consider when designing your database. One restriction is that parallel DDL cannot be used on tables with object or LOB columns. DML statements: INSERT SELECT, UPDATE, MERGE, and DELETE. Parallel DML (PDML) uses parallel execution mechanisms to speed up or scale up large DML operations against large database tables and indexes. You can also use INSERT ... SELECT statements to insert rows into multiple tables as part of a single statement. You can normally use parallel DML where you use regular DML. Remember that you must enable parallel DML for the session before attempting PDML. Miscellaneous SQL operations: GROUP BY, ORDER BY, NOT IN, EXISTS, IN, SELECT DISTINCT, UNION, UNION ALL, MINUS, INTERSECT, CUBE, and ROLLUP, as well as aggregate and table functions

Parallelization Rules
A SQL statement can be parallelized if: It includes a PARALLEL hint Parallelization is forced using the ALTER SESSION FORCE command The object operated on is or was declared with a PARALLEL clause (dictionary DOP greater than one) Degree of parallelism (DOP) is determined by looking at referenced objects: Parallel queries use the largest specified or dictionary DOP Parallel DDL sets the DOP to the one specified by the PARALLEL clause Parallelization Rules A SQL statement can be parallelized if it includes a PARALLEL hint or if the table or index being operated on has been declared PARALLEL with a CREATE or ALTER statement. In addition, a DDL statement can be parallelized by using the PARALLEL clause. It is also possible to force parallelization by using the ALTER SESSION FORCE PARALLEL command. Most of the time, the hint specification takes precedence over the ALTER SESSION command that takes precedence over parallel declaration specification. To determine the DOP, the Oracle Database server looks at the referenced objects: The parallel query looks at each table and index, in the portion of the query being parallelized, to determine which is the reference table. The basic rule is to pick the table or index with the largest DOP. For parallel DML (INSERT, UPDATE, MERGE, and DELETE), the reference object that determines the DOP is the table being modified. Parallel DML also adds some limits to the DOP to prevent deadlock. If the parallel DML statement includes a subquery, the DOP of the subquery is the same as the DML operation. For parallel DDL, the reference object that determines the DOP is the table, index, or partition being created, rebuilt, split, or moved. If the parallel DDL statement includes a subquery, the DOP of the subquery is the same as the DDL operation. The DOP is determined by the specification in the PARALLEL clause.

Enabling Parallel DML/DDL/QUERY
The ALTER SESSION statement enables parallel mode: Used to override dictionary DOPs with FORCE ALTER SESSION ENABLE DISABLE FORCE PARALLEL n PARALLEL DML DDL QUERY Enabling Parallel DML/DDL/QUERY The PARALLEL parameter and ENABLE/DISABLE clauses specify whether all subsequent DML, DDL, and query statements in the session are considered for parallel execution. This clause enables you to override the degree of parallelism of tables during the current session without changing the tables. You can execute this clause for DML only between committed transactions. Uncommitted transactions must be committed or rolled back before executing this clause for DML. ENABLE executes subsequent statements in the session in parallel. This is the default for DDL and query statements: DML: Executes the session’s DML statements in parallel mode if a parallel hint or a PARALLEL clause is specified DDL: Executes the session’s DDL statements in parallel mode if a PARALLEL clause is specified QUERY: Executes the session’s queries in parallel mode if a parallel hint or a PARALLEL clause is specified DISABLE specifies that subsequent statements be executed serially. This is the default for DML statements: DML: Executes the session’s DML statements serially DDL: Executes the session’s DDL statements serially QUERY: Executes the session’s queries serially

You can use V$SESSION to look at session status: PDML_STATUS PDDL_STATUS PQ_STATUS Values for the columns listed above can be: ENABLED DISABLED FORCED Enabling Parallel DML/DDL/QUERY (continued) There is no initialization parameter for enabling parallel DML: FORCE forces parallel execution of subsequent statements in the session. If no PARALLEL clause or hint is specified, then a default degree of parallelism is used. This clause overrides any parallel_clause specified in subsequent statements in the session, but is overridden by a parallel hint. DML: Provided that no parallel DML restrictions are violated, it executes subsequent DML statements in the session with the default degree of parallelism, unless a specific degree is specified in this clause. DDL: Executes subsequent DDL statements in the session with the default degree of parallelism, unless a specific degree is specified in this clause. Resulting database objects have associated with them the prevailing degree of parallelism. Using FORCE DDL automatically causes all tables created in this session to be created with a default level of parallelism. The effect is the same as if you had specified the parallel_clause (with default degree) with the CREATE TABLE statement. QUERY: Executes subsequent queries with the default degree of parallelism, unless a specific degree is specified in this clause PARALLEL integer: Explicitly specifies a degree of parallelism

How Parallel Execution Works
With serial execution, only one process is used. With parallel execution: One parallel execution coordinator process is used Many parallel execution servers are used Table is dynamically divided into granules Serial process Coordinator process SELECT COUNT(*) FROM sales SELECT COUNT(*) FROM sales SALES How Parallel Execution Works When serial execution is being used, a single server process performs all necessary processing for the sequential execution of a SQL statement. In the illustration given in the slide, the graphic on the left illustrates a full table scan executed serially on the SALES table. Parallel execution performs these operations in parallel using multiple parallel processes. A process, known as the parallel execution coordinator, dispatches the execution of a statement to several parallel execution servers and coordinates the results from all of the server processes to send the results back to the user. The graphic on the right illustrates several parallel execution servers performing a scan of the SALES table. The table is divided dynamically into load units called granules and each granule is read by a single parallel execution server. Note: The parallel execution coordinator calls upon the parallel execution servers during the execution of the SQL statement, not during the parsing of the statement. Therefore, when parallel execution is used with the shared server, the server process that processes the EXECUTE call of a user’s statement becomes the parallel execution coordinator for the statement. SALES Parallel execution servers

The Granule The basic unit of work in parallelism is called the granule. Type of granules: Block range granules are dynamically generated at execution time. Partition granules are statically determined by the number of partitions. One granule is read per parallel execution server. Parallel execution servers progress from one granule to the next. The type of granule used is dependent on the kind of parallel operation being performed. The Granule The basic unit of work in parallelism is called a granule. A parallel operation (a table scan, table update, or index creation, and so on) is divided into granules. Parallel execution processes execute the operation one granule at a time. That is, when an execution server finishes reading the rows of a granule, it gets another granule from the coordinator if there are any granules remaining. This continues until all granules are exhausted. The number of granules and their sizes correlate with the number of parallel execution servers. This also affects how well the work is balanced across parallel execution servers. There is no way you can enforce a specific granule strategy because the database makes this decision internally. Block range granules are the basic units of most parallel operations. Block range granules are ranges of physical blocks from a table. The number and the size of the granules are computed during run time by the database to optimize and balance the work distribution for all affected parallel execution servers. When partition granules are used, a parallel execution process works on an entire partition or subpartition of a table or index. Because partition granules are statically determined by the structure of the table or index when a table or index is created, partition granules do not give you the flexibility in parallelizing an operation that block granules do.

Parallel Operations Execution Servers Coordinator
SELECT cust_last_name, cust_first_name FROM customers ORDER BY cust_last_name; Execution Servers Consumers Producers Table on disk SQL Data sort A-K scan Dispatching results sort L-S scan Coordinator sort T-Z scan Table’s dynamic partitioning (granules) Parallel Operations After the optimizer determines the execution plan of a statement, the parallel execution coordinator determines the parallelization method for each operation in the execution plan. The coordinator must decide whether an operation can be performed in parallel and, if so, how many parallel execution servers to enlist. The number of parallel execution servers in a set for any given operation is the degree of parallelism (DOP). To illustrate intra- and inter-operation parallelism, consider the statement given in the slide. The execution plan implements a full scan of the CUSTOMERS table followed by a sorting of the retrieved rows based on the value of the CUST_LAST_NAME column. For the sake of this example, assume this column is not indexed. Also, assume that the degree of parallelism for the query is set to 3, which means that three parallel execution servers can be active for any given operation. Each of the two operations (scan and sort) performed concurrently is given its own set of parallel execution servers. Therefore, both operations have parallelism. Parallelization of an individual operation, where the same operation is performed on smaller sets of rows by parallel execution servers, achieves what is termed intraoperation parallelism. When two operations run concurrently on different sets of parallel execution servers with data flowing from one operation into the other, you achieve what is termed as interoperation parallelism. Because of the producer/consumer nature of the Oracle server’s operations, only two operations in a given tree need to be performed simultaneously to minimize execution time. Intraparallelism Intraparallelism DOP=3 Interparallelism

Parallel Execution with Real Application Clusters
Execution slaves have node affinity for execution coordinator but will expand if needed. Node 1 Node 2 Node 3 Node 4 Execution coordinator Parallel Execution with Real Application Clusters It is possible for a query to be processed in parallel using query processes on a single server, or on multiple servers if you are using Real Application Clusters (RAC), with the decision being made on the basis of the volume of work being handled by each of the servers. If there is a minimal load on the system, the work is spread across as many servers as warranted by the query definition. If the system is fully loaded, then a few local servers are used to minimize any additional overhead required to coordinate local processes and to reduce any interinstance overhead. An Oracle data warehouse can leverage a RAC cluster in two ways: Every node of the cluster is symmetrical. They equally share all aspects of the data warehouse workload. Nodes of the cluster are assigned to different tasks. For example, some nodes can be assigned for ETL processing, whereas other nodes are dedicated to query processing. This is a mechanism for implementing workload partitioning and guaranteeing that ETL processing does not impact query processing or vice versa. It is important to understand that in practice, the database is biased to run a parallel statement on a single server, as long as there are enough resources left on that server to do so. Otherwise, statements span nodes if there are sufficient resources. You can create PARALLEL_INSTANCE_GROUPS to influence the internode parallelism. Shared disks Parallel execution server

How Parallel Execution Servers Communicate
Rows distribution: PARTITION HASH RANGE ROUND-ROBIN BROADCAST QC(ORDER) QC(RANDOM) QC Parallel Execution Server Set 1 Parallel Execution Server Set 2 DOP=3 How Parallel Execution Servers Communicate To execute a query in parallel, Oracle generally creates a producer queue server and a consumer server. The producer queue server retrieves rows from tables and the consumer server performs operations such as join, sort, DML, and DDL on these rows. Each server in the producer execution process set has a connection to each server in the consumer set. This means that the number of virtual connections between parallel execution servers increases as the square of the DOP. Each communication channel has at least one, and sometimes up to three memory buffers. Multiple memory buffers facilitate asynchronous communication among the parallel execution servers. As you will see in this lesson, the execution plan of a parallel statement stores, in the DISTRIBUTION column of PLAN_TABLE, the method used to distribute rows from producer query servers to consumer query servers. The possible values are: PARTITION: Maps rows to query servers based on the partitioning of a table or index HASH: Maps rows to query servers using a HASH function on the join key RANGE: Maps rows to query servers using ranges of the sort key ROUND-ROBIN: Randomly maps rows to query servers BROADCAST: Broadcasts the rows of the entire table to each query server QC(ORDER): The execution coordinator consumes the input in order. QC(RANDOM): The execution coordinator consumes the input randomly.

Degree of Parallelism Degree of parallelism (DOP) is the number of parallel execution servers used by one parallel operation. DOP applies only to intraoperation parallelism. If interoperation parallelism is used, then the number of parallel execution servers can be twice the DOP. No more than two sets of parallel execution servers can be used for one parallelized statement. When using partition granules, use a relatively high number of partitions. Degree of Parallelism The number of parallel execution servers associated with a single operation is known as the degree of parallelism. Note that the degree of parallelism applies directly only to intraoperation parallelism. If interoperation parallelism is possible, the total number of parallel execution servers for a statement can be twice the specified degree of parallelism. No more than two sets of parallel execution servers can execute simultaneously. Each set of parallel execution servers may process multiple operations. Only two sets of parallel execution servers need to be active to guarantee optimal interoperation parallelism. Oracle Database provides several ways to manage resource utilization in conjunction with parallel execution environments, including: The adaptive multiuser algorithm, which is enabled by default, and which reduces the degree of parallelism as the load on the system increases User resource limits and profiles, which allow you to set limits on the amount of various system resources available to each user as part of a user’s security domain The Database Resource Manager, which enables you to allocate resources to groups of users. Note: When Oracle Database uses partition granules for parallel access to a table or index, you should use a relatively large number of partitions (ideally, three times the DOP), so that the work can effectively be balanced across the query server processes.

Default Degree of Parallelism
The default DOP: Is used for a parallel operation that does not specify a DOP Is dynamically calculated at run time Depends on: Total number of CPUs PARALLEL_THREADS_PER_CPU May be reduced depending on the availability of parallel execution servers or by the Oracle Resource Manager Default Degree of Parallelism The default DOP is used when you want to parallelize an operation but you do not specify a DOP in a hint or within the definition of a table or index. The default DOP is appropriate for most applications. The default DOP for a SQL statement is determined by the following factors: The number of CPUs for all Oracle RAC instances in the system, and the value of the PARALLEL_THREADS_PER_CPU parameter. This parameter specifies the number of processes each CPU can handle on a particular configuration. For parallelizing by partition, the number of partitions that will be accessed, based on partition pruning For parallel DML operations with global index maintenance, the minimum number of transaction free lists among all the global index partitions to be updated. This is a requirement to prevent self-deadlock. The factors mentioned above determine the default number of parallel execution servers to use. However, the actual number of processes used is limited by their availability on the requested instances during run time.

Parallel Execution Plan
For the same statement, a parallel plan generally differs from the corresponding serial plan. In order to generate the execution plan, use the EXPLAIN PLAN command, or execute the statement. In order to view the execution plan: Select directly from PLAN_TABLE Select directly from V$SQL_PLAN Run $ORACLE_HOME/rdbms/admin/utlxplp.sql Use the DBMS_XPLAN.DISPLAY table function Columns of interest: OBJECT_NODE OTHER_TAG DISTRIBUTION OTHER (Prior to 10g) Parallel Execution Plan A statement is submitted to Oracle Database and parsed. During optimization, a serial execution plan and a parallel execution plan are considered. These two plans may be different, and the following slides help you understand how to interpret a parallel execution plan. You can generate such a plan by using the EXPLAIN PLAN command in which case the statement is not executed but its execution plan is stored in PLAN_TABLE. In this case, it is possible to view the execution plan by directly selecting from PLAN_TABLE, by using the ultxplp.sql script stored in $ORACLE_HOME/rdbms/admin, or by using the DBMS_XPLAN.DISPLAY table function. Another alternative is to execute the statement and then look at the generated execution plan in V$SQL_PLAN. Compared to a serial plan, the columns (listed in the slide) in PLAN_TABLE or V$SQL_PLAN can be used to interpret parallel execution information.

OTHER_TAG Column OTHER_TAG Interpretation SERIAL Serial execution
SERIAL_FROM_REMOTE (S -> R) Serial execution at a remote site PARALLEL_FROM_SERIAL (S -> P) Serial execution: Output partitioned or broadcast to PX PARALLEL_TO_PARALLEL (P -> P) Parallel execution: Output repartitioned to second set of PX PARALLEL_TO_SERIAL (P -> S) Parallel execution: Output returns to coordinator PARALLEL_COMBINED_WITH_ PARENT (PCWP) Parallel execution: Output used by the same PX in the next step OTHER_TAG Column The slide explains the possible values for this column, which can be found in PLAN_TABLE or in V$SQL_PLAN. For more information about this, refer to the Oracle Database 10g Performance Tuning Guide. Note: PX stands for parallel execution servers. PARALLEL_COMBINED_WITH_ CHILD (PCWC) Parallel execution: Input from previous step used by same PX

Serial and Parallel Execution Plans
HASH GROUP BY HASH JOIN TABLE ACCESS FULL CUSTOMERS PARTITION RANGE ALL TABLE ACCESS FULL SALES SELECT cust_city,sum(amount_sold) FROM sales s, customers c WHERE s.cust_id=c.cust_id GROUP BY cust_city; PX COORDINATOR PX SEND QC P->S QC(RANDOM) HASH GROUP BY PCWP PX RECEIVE PCWP PX SEND P->P HASH HASH GROUP BY PCWP HASH JOIN PCWP PX RECEIVE PCWP PX SEND P->P (BROADCAST) PX BLOCK ITERATOR PCWC TABLE ACCESS FULL CUSTOMERS PCWP PX BLOCK ITERATOR PCWC TABLE ACCESS FULL SALES PCWP Serial and Parallel Execution Plans The slide shows the serial and the parallel execution plans generated for the same statement. Note that this result is obtained by using the utlxplp.sql script. One of the major differences between the two plans is that the parallel plan has an extra step (HASH GROUP BY). Also, the parallel execution plan, on its right part, shows extra information not presented in the serial plan. This information indicates whether the corresponding step is executed in parallel or serially, and the method used to distribute rows from producer servers to consumer servers in case of inter-parallelism (P->P HASH). These two extra pieces of information are visible from the OTHER_TAG and DISTRIBUTION columns, respectively. You can also see that the parallel plan has some additional PX operations that the serial plan does not have. These operations are indicators for parallel operations’ coordination between the different set of slaves, and the coordinator. Note: As shown in the following slide, P->P stands for PARALLEL_TO_PARALLEL, PCWP stands for PARALLEL_COMBINED_WITH_PARENT, PCWC stands for PARALLEL_COMBINED_WITH_CHILD, and P->S stands for PARALLEL_TO_SERIAL.

Parallel Plan Interpretation
SELECT /*+ PARALLEL */ cust_city, sum(amount_sold) FROM sales s, customers c WHERE s.cust_id=c.cust_id GROUP BY cust_city; PX Coordinator PX Send QC Hash Group By Hash Join PX Receive PX Receive PX Block Iterator PX Send Hash PX Send Broadcast Full Scan Sales Hash Group By PX Block Iterator Parallel Plan Interpretation This slide shows you the parallel execution plan corresponding to the query. Note that the tree representation is equivalent to the one shown by utlxplp.sql. Here, the query uses the PARALLEL hint to force the statement parallelization. Full Scan Customers

PX Coordinator PX Send QC Set 2 PCWP Hash Group By P->P(B) Hash Join PX Receive PX Block Iterator PX Receive :TQ10 Full Scan Sales Set 1 PX Send Hash PX Send Broadcast Hash Group By PX Block Iterator Parallel Plan Interpretation (continued) The first step to resolve this query in parallel is to assign one set of parallel execution servers to scan the CUSTOMERS table. Here, each parallel execution server reads part of CUSTOMERS. This part is determined by the PX Block Iterator and is combined with the Full Scan operation. In addition, a second set of parallel execution servers is also started. Its role is to build in parallel the hash table that will be used later to probe the join with the SALES table. As you can see from the slide, the PX Send Broadcast operation is executed by each parallel execution server of the first set in direction of each parallel execution server of the second set. This is called a “parallel to parallel” type of operation. The Broadcast part of the operation is an optimization used by the system in this case because the input of the Send operation is small. As a result, each server in the first set sends its complete input to each server in the second set. The PX Receive operation is done by each parallel execution server of the second set and is a parallel operation combined with its parent operation in the tree. The result is that each server in the second set builds a hash table corresponding to all of the distinct values of the CUST_ID column found in the CUSTOMERS table. The communication between the two sets of parallel execution servers is done via a table queue memory structure. In the slide, it is called :TQ10. The name of the table queue corresponding to the operation can be found in the OBJECT_NODE column of PLAN_TABLE. Full Scan Customers

PX Coordinator PX Send QC Set 2 Hash Group By Hash Join PX Block Iterator PX Receive Full Scan Sales PX Send Hash Hash Group By Parallel Plan Interpretation (continued) After the hash table is build, each server in the second set can start to read its part of the SALES table. Again, each part is determined by the PX Block Iterator operation combined with the Full Scan operation in the highlighted part of the graphic shown in the slide. While reading its part of the SALES table, one server can probe the join with the existing hash table. The complete join will be determined by merging the results of each local join.

PX Coordinator P->P(H) PX Send QC Set 2 Set 1 Hash Group By Hash Join :TQ11 PX Receive PX Send Hash Hash Group By Parallel Plan Interpretation (continued) However, as can be seen from the execution plan, the HASH JOIN and first HASH GROUP BY steps are parallelized by using the same set of parallel execution servers: second set in the slide. This is indicated by the OTHER_TAG column for the HASH JOIN step by a PARALLEL_COMBINED_WITH_PARENT operation. This means that the second set is doing the join and the GROUP BY operations simultaneously. This is not sufficient to resolve the initial query, because each parallel execution server in the second set is grouping data only on a subset of the complete join. This subset depends on the CUST_ID values found in each SALES part. This means that for two different CUST_ID values (not part of the same hash group), the same CUST_CITY can potentially exist. Thus, the output from the first HASH GROUP BY operation is sent in parallel to the last step in the tree by using again a HASH function taking CUST_CITY as argument. This is done using a new table queue called :TQ11 (in the slide). This means that the first set of parallel execution servers is now moved to the second HASH GROUP BY operation. Note: This is an example of how Oracle Database can parallelize aggregates. PCWP

P to S PX Coordinator Set 1 :TQ12 PX Send QC Hash Group By PX Receive Parallel Plan Interpretation (continued) As you can see, the query executed by each parallel execution server in the first set is grouping data on CUST_CITY. This time, the result produced by each parallel execution server is guaranteed to be independent. So, the result of each parallel execution server can be sent to the execution coordinator. That is why, the OTHER_TAG column for the PX Send QC operation shows a PARALLEL_TO_SERIAL operation using a new table queue called :TQ12 in the slide. Note that the DISTRIBUTION column shows a QC(RANDOM)value, which means that the execution coordinator consumes the input randomly because there is no ORDER BY clause in the original query.

Parallel Execution Server Pool
A pool of servers are created at instance startup. Minimum pool size is determined by PARALLEL_MIN_SERVERS. Pool size can increase based on demand. Maximum pool size is determined by PARALLEL_MAX_SERVERS. If a parallel execution server is idle for more than a threshold period of time, it is terminated. Slaves specified in the minimum set are never terminated. Parallel Execution Server Pool When an instance starts up, a pool of parallel execution servers are created that are available for any parallel operation. The PARALLEL_MIN_SERVERS initialization parameter specifies the number of parallel execution servers created at instance startup. When executing a parallel operation, the parallel execution coordinator obtains parallel execution servers from the pool and assigns them to the operation. If required, additional parallel execution servers can be created for the operation. However, the Oracle server never creates more parallel execution servers for an instance than the value specified by the PARALLEL_MAX_SERVERS initialization parameter. These parallel execution servers remain with the operation throughout the job execution, and then become available for other operations. After the statement has been processed completely, the parallel execution servers return to the pool. If the number of parallel operations decreases, the database terminates any parallel execution servers that have been idle for a threshold period of time. The size of the pool is not reduced below the value of PARALLEL_MIN_SERVERS, no matter how long the parallel execution servers have been idle. Note: It is recommended that PARALLEL_MAX_SERVERS not be set manually under normal conditions.

Minimum Number of Parallel Execution Servers
In order for an operation to be parallelized, at least two parallel execution servers must be started. You can specify a minimum percentage of available parallel execution servers for one operation to be parallelized. PARALLEL_MIN_PERCENT specifies this minimum percentage. If this minimum percentage is not available, an error is reported (ORA-12827). Default value is 0. Minimum Number of Parallel Execution Servers Operations can be executed in parallel as long as at least two parallel execution servers are available. If fewer parallel execution servers are available than optimally needed, your SQL statement may execute slower than expected by using what is available. You can specify the minimum percentage of requested parallel execution servers that must be available in order for the operation to execute. This strategy ensures that your SQL statement executes with a minimum acceptable parallel performance. If the minimum percentage of requested parallel execution servers is not available, the SQL statement does not execute and returns an error: ORA-12827: insufficient parallel query slaves available. The PARALLEL_MIN_PERCENT initialization parameter specifies the desired minimum percentage of requested parallel execution servers. This parameter affects DML and DDL operations as well as queries. For example, if you specify 50 for this parameter, then at least 50 percent of the parallel execution servers requested for any parallel operation must be available for the operation to succeed. If 20 parallel execution servers are requested, then at least 10 must be available or an error is returned to the user. If PARALLEL_MIN_PERCENT is set to 0, then all parallel operations proceed as long as at least two parallel execution servers are available for processing. Otherwise, the operation is performed serially.

Object’s PARALLEL Clause
Can be specified for tables and indexes View degree of parallelism in the DEGREE column of DBA_TABLES (dictionary DOP) Modified by using corresponding ALTER command Used to specify the DOP during object’s DDL CREATE INDEX CREATE TABLE … AS SELECT Partition maintenance commands Object’s PARALLEL Clause With the PARALLEL clause, you can parallelize the creation of table, indexes, and partitions. You can also use it to set the degree of parallelism for queries, DML statements, and DDL statements on the object after its creation. This degree of parallelism can be seen through the DEGREE column of DBA_TABLES and is called the dictionary DOP. You can specify the DOP within a table or index definition by using one of the following statements: CREATE TABLE, ALTER TABLE, CREATE INDEX, or ALTER INDEX. Note that in this case, the specified DOP also represents the DOP used to parallelize the statement. For example, you can use it to create an index in parallel, or create a table as select in parallel. It is also possible to create, rebuild, split, and move partitions in parallel. Note: If you do not specify the integer for the PARALLEL clause, the default DOP is used. NOPARALLEL PARALLEL integer

PARALLEL Clause: Examples
CREATE INDEX ord_customer_ix ON orders (customer_id) NOLOGGING PARALLEL; ALTER TABLE customers PARALLEL 5; ALTER TABLE sales SPLIT PARTITION sales_q4_2000 AT ('15-NOV-2000') INTO (PARTITION sales_q4_1, PARTITION sales_q4_2) PARALLEL 2; PARALLEL Clause: Examples If the ORDERS table had been created using a fast parallel load, you might issue the CREATE INDEX statement shown in the first example in the slide to quickly create an index in parallel. (Oracle Database chooses the appropriate degree of parallelism). Note that in this case, the default degree of parallelism used to create the index is also stored as the dictionary DOP. The second example changes the dictionary DOP of the CUSTOMERS table. The last example splits the SALES_Q4_2000 partition into two new partitions. This operation is done in parallel with a DOP explicitly set to two. Note: The TO_DATE() function for the last SQL example was omitted for simplicity reasons, and one should never rely on implicit data type conversion with the DATE data type.

Using Parallelization Hints
The following parallelization hints are used to override existing DOPs. PARALLEL (table_name, DOP_value) NOPARALLEL (table_name) PARALLEL_INDEX (table_name, index, DOP_value) NOPARALLEL_INDEX (table_name, index) SELECT /*+PARALLEL(SALES,9)*/ * FROM SALES; SELECT /*+ PARALLEL_INDEX(c,ic,3)*/ * FROM customers c WHERE cust_city = 'CLERMONT'; Using Parallelization Hints The hints shown in the slide are used to override the calculated DOPs: The PARALLEL hint enables you to specify the desired DOP used for a parallel operation. The hint applies to the SELECT, INSERT, UPDATE, DELETE, and MERGE portions of a statement. The PARALLEL hint must use the table alias if an alias is specified in the query. The hint can then optionally take a value that specifies the DOP to use for the given table. If not specified, the default DOP is then used. The NOPARALLEL hint executes the corresponding statement serially. The PARALLEL_INDEX hint specifies the desired number of concurrent servers that can be used to parallelize index range scans for partitioned indexes. You must specify the table on which indexes are to be scanned. Optionally, you can specify the index name with the DOP to be used to scan it. If not specified, the default DOP is used for the corresponding indexes. If no index is specified, then the DOP is used for all indexes attached to the table. The NOPARALLEL_INDEX hint overrides a PARALLEL attribute setting on an index to avoid a parallel index scan operation. Note: If any parallel restrictions are violated, then the hint is ignored.

Parallelization Hints
SELECT /*+ FULL(s) ORDERED USE_HASH(c) PARALLEL(s) PARALLEL(c) PQ_DISTRIBUTE(c,NONE,BROADCAST) */ c.channel_desc, SUM(amount_sold) FROM sales s, channels c WHERE s.channel_id = c.channel_id GROUP BY c.channel_desc; SORT GROUP BY |P->S|QC (RANDOM) SORT GROUP BY |P->P|HASH HASH JOIN |PCWP| PARTITION RANGE ALL |PCWP| TABLE ACCESS FULL SALES |PCWP| TABLE ACCESS FULL CHANNELS |P->P|BROADCAST Parallelization Hints The PQ_DISTRIBUTE hint improves parallel join operation performance. You can do this by specifying how rows of joined tables should be distributed among producer and consumer query servers. Use the DISTRIBUTION column of PLAN_TABLE to identify the distribution chosen by the optimizer. The hint takes three arguments: Name or alias of a table to be used as the inner table of a join; C in the preceding example The distribution for the outer table The distribution for the inner table In the slide example, the distribution method used to communicate with the second set of slaves is BROADCAST. This means that all rows, from the inner table, retrieved by each producer, are sent to each consumer. This technique can be used to improve performance of hash and merge join operations in which a very large join result set is joined with a very small result set (size being measured in bytes, rather than number of rows). This is because the CHANNELS table is very small compared to the SALES table. The decision to broadcast rows cannot be made by the optimizer unless the hint is used, or the session initialization parameter PARALLEL_BROADCAST_ENABLED is set to true in the session executing the statement. Note: The initialization parameter OPTIMIZER_FEATURES_ENABLE should be set to at least ALTER SESSION SET PARALLEL_BROADCAST_ENABLED=TRUE

Parallelization Hints
PQ_DISTRIBUTE(<inner table>, <outer distribution>, <inner distribution>) Possible distribution combinations: HASH,HASH NONE,BROADCAST BROADCAST,NONE PARTITION,NONE NONE,PARTITION NONE,NONE Parallelization Hints (continued) In the preceding example, if PARALLEL_BROADCAST_ENABLED is set to FALSE, and the hint is not used, the optimizer is using the hash distribution method. The slide lists all possible supported combinations for the last two parameters of the PQ_DISTRIBUTE hint: HASH,HASH: Maps the rows of each table to consumer query servers using a hash function on the join keys. This hint is recommended when the tables are comparable in size and the join operation is implemented by hash join or sort merge join. NONE,BROADCAST: All rows of the inner table are broadcast to each consumer server. The outer table rows are randomly partitioned. This hint is recommended when the inner table is very small compared to the outer table. A general rule is: Use this if the size of the inner table * number of query servers < size of the outer table. You can also use the similar Broadcast,None combination to reverse inner and outer table roles.

Parallelism and Cost-Based Optimization
Always analyze tables used in parallel by using: DBMS_STATS package to gather statistics: DEGREE parameter specifies DOP to be used If not specified, or set to NULL, the tables dictionary DOP is used. Use DBMS_STATS.DEFAULT_DEGREE to use default DOP. Statistics on indexes are not gathered in parallel. It is also recommended to gather system statistics. New GATHER AUTO option ANALYZE commands (always run serially) Parallelism and Cost-Based Optimization Cost-based optimization is a sophisticated approach to finding the best execution plan for SQL statements. Oracle automatically uses cost-based optimization with parallel execution. You should use the DBMS_STATS package to gather current statistics for cost-based optimization. In particular, tables used in parallel should always be analyzed. Always keep your statistics current by using the DBMS_STATS package. This package contains many procedures to gather object statistics. Most of the procedures have the possibility to gather statistics in parallel by using the DEGREE parameter of the corresponding procedures, or by using the dictionary DOP specified during object’s creation. If the DEGREE parameter is set to NULL, Oracle uses the dictionary DOP of corresponding objects. Use the constant DBMS_STATS.DEFAULT_DEGREE to specify the default DOP. You should also gather system statistics by using the GATHER_SYSTEM_STATS procedure. Note that this procedure has a new value for its OPTIONS parameter: 'GATHER AUTO '. The goal of this option is to simplify statistics gathering at the schema level. Oracle8i introduced the value 'GATHER STALE' in order to collect statistics only on objects that have the MONITORING flag set. 'GATHER AUTO' is equivalent of running DBMS_STATS.GATHER_SCHEMA_STATISTICS procedure both with OPTIONS set to 'GATHER STALE' and 'GATHER EMPTY'. Note: If any table in a query has a DOP greater than one, Oracle uses the cost-based optimizer for that query, even if OPTIMIZER_MODE is set to RULE or if there is a RULE hint in the query itself.

Summary In this lesson, you should have learned how to:
Benefit from using parallel operations Validate system conditions under which to use parallel operations Control the parallel execution server pool Read a parallel execution plan

Types of Parallel Operations

List the types of parallel operations Describe parallel query, parallel DDL, and parallel DML Determine when partitionwise join is used Describe locking behavior for parallel DML operations Explain parallelism and undo Discuss parallel execution of functions

Parallelization Rules Revisited
A SQL statement can be parallelized if: It includes a parallel hint Parallelization is forced using the ALTER SESSION FORCE command The object operated on is/was declared with a parallel clause (dictionary DOP greater than one) Not prevented from doing so by the Resource Manager DOP is determined by looking at referenced objects: Parallel queries use the largest specified or dictionary DOP. Parallel DML sets the DOP to the number of partitions of the manipulated object. Parallel DDL sets the DOP to the one specified by the PARALLEL clause. Parallelization Rules A SQL statement can be parallelized if it includes a parallel hint or if the table or index being operated on has been declared PARALLEL with a CREATE or ALTER statement. In addition, a DDL statement can be parallelized by using the PARALLEL clause. It is also possible to force parallelization by using the ALTER SESSION FORCE PARALLEL command (see the following slide). Most of the time, the hint specification takes precedence over the ALTER SESSION command that takes precedence over parallel declaration specification. To determine the DOP, Oracle looks at the referenced objects: Parallel query looks at each table and index, in the portion of the query being parallelized, to determine which is the reference table. The basic rule is to pick the table or index with the largest DOP. For parallel DML (INSERT, UPDATE, MERGE, and DELETE), the reference object that determines the DOP is the table being modified. Parallel DML also adds some limits to the DOP to prevent deadlock. If the parallel DML statement includes a subquery, the DOP of the subquery is the same as the DML operation. The maximum DOP you can achieve is equal to the number of partitions. For parallel DDL, the reference object that determines the DOP is the table, index, or partition being created, rebuilt, split, or moved. If the parallel DDL statement includes a subquery, the DOP of the subquery is the same as the DDL operation. The DOP is determined by the specification in the PARALLEL clause.

The ALTER SESSION statement enables parallel mode: Used to override dictionary DOPs with FORCE QUERY only starting with Oracle 8i R2 ALTER SESSION ENABLE DISABLE FORCE PARALLEL n PARALLEL DML DDL QUERY Enabling Parallel DML/DDL/QUERY This clause specifies whether all subsequent DML, DDL, and query statements in the session are considered for parallel execution. This clause enables you to override the degree of parallelism of tables during the current session without changing the tables themselves. You can execute this clause for DML only between committed transactions. Uncommitted transactions must either be committed or rolled back before executing this clause for DML. You cannot specify the optional PARALLEL integer with ENABLE or DISABLE. ENABLE executes subsequent statements in the session in parallel. This is the default for DDL and query statements: DML: Executes the session’s DML statements in parallel mode if a parallel hint or a parallel clause is specified DDL: Executes the session’s DDL statements in parallel mode if a parallel clause is specified QUERY: Executes the session’s queries in parallel mode if a parallel hint or a parallel clause is specified DISABLE specifies that subsequent statements be executed serially. This is the default for DML statements: DML: Executes the session’s DML statements serially DDL: Executes the session’s DDL statements serially QUERY: Executes the session’s queries serially

You can use V$SESSION to look at sessions status: PDML_STATUS PDDL_STATUS PQ_STATUS Values for the columns listed above can be: ENABLED DISABLED FORCED Enabling Parallel DML/DDL/QUERY (continued) There is no initialization parameter for enabling parallel DML: FORCE forces parallel execution of subsequent statements in the session. If no parallel clause or hint is specified, then a default degree of parallelism is used. This clause overrides any PARALLEL clause specified in subsequent statements in the session, but is overridden by a parallel hint. DML: Provided that no parallel DML restrictions are violated, executes subsequent DML statements in the session with the default degree of parallelism, unless a specific degree is specified in this clause DDL: Executes subsequent DDL statements in the session with the default degree of parallelism, unless a specific degree is specified in this clause. Resulting database objects have associated with them the prevailing degree of parallelism. Using FORCE DDL automatically causes all tables created in this session to be created with a default level of parallelism. The effect is the same as if you had specified the PARALLEL clause (with default degree) with the CREATE TABLE statement. QUERY: Executes subsequent queries with the default degree of parallelism, unless a specific degree is specified in this clause PARALLEL integer: Explicitly specifies a degree of parallelism

Parallel Query The various query types that can be parallelized are:
Access methods: Table scans, index full scans Partitioned index range scans Various SQL operations: GROUP BY, ORDER BY, NOT IN, EXISTS, IN, SELECT DISTINCT, UNION, UNION ALL, MINUS, INTERSECT, CUBE, ROLLUP, aggregates Join methods: Nested loop, sort merge Hash, star transformation, partitionwise join Parallel Query You can parallelize queries and subqueries in SELECT statements. You can also parallelize the query portions of DDL statements and DML statements (INSERT, UPDATE, DELETE, and MERGE). You can also query external tables in parallel. Parallelization is also supported for queries on index-organized tables. The following scan methods can be used for index-organized tables with overflow areas and for index-organized tables that contain LOBs: Parallel fast full scan of a nonpartitioned index-organized table Parallel fast full scan of a partitioned index-organized table Parallel index range scan of a partitioned index-organized table

Parallel Partitioned Table Scan
Scan by ROWID for partitioned tables. SELECT /*+ PARALLEL(SALES,9)*/ * FROM SALES; PQ4 PQ5 PQ6 PQ3 PQ7 PQ2 PQ8 Parallel Partitioned Table Scan A table scan is broken into pieces delimited by high and low ROWID values. Extension of ROWID parallel table scans to partitioned tables is straightforward. No ROWID range spans a partition, and partition numbers are sent with ROWID ranges. Parse and run-time predicates on partitioning columns are evaluated to restrict ROWID ranges to relevant partitions of tables (partition pruning). This means that a parallel query that accesses a partitioned table by a table scan performs the same or less overall work than does the same query on a nonpartitioned table. The query on the partitioned table executes with equivalent parallelism. Note that you can use parallel hints to influence the processing of the database, but hints are not specifically required to induce parallelism. In general, the effective use of such hints requires an extensive knowledge of the partitioning schemes for each table and index. In practice, it is much simpler (and effective) to let the database determine the need for parallel resources. PQ1 SALES PQ9

Parallel Partitioned Index Scan
Scans can be performed by partition for partitioned indexes and tables. The hint must contain the table name, index name, and degree of parallelism. SELECT /*+ PARALLEL_INDEX(c,ic,3)*/ * FROM customers c WHERE cust_city = 'MARSEILLE'; PQ1 Nonprefixed index IC Indexed column cust_city PQ2 Parallel Partitioned Index Scan Partitioned indexes are scanned in parallel by giving different slave processes different partitions of the index to scan. The number of parallel query slaves is limited by the number of partitions. One slave per partition is the maximum allowed. PQ3

Partitionwise Joins Partial partitionwise join Full partitionwise join
QS QS QS QS QS QS Partitionwise Joins Partitionwise joins reduce query response time by minimizing the amount of data exchanged among parallel execution servers when joins execute in parallel. This reduces response time and improves the use of both CPU and memory resources. In Oracle Real Application Clusters environments, partitionwise joins limit the data traffic over the interconnect, which is the key to achieving good scalability for massive join operations. Partitionwise joins can be full or partial. The database decides which type of join to use. Full partitionwise join: A full partitionwise join is a join performed on two tables that are equipartitioned in advance on their join keys. The joins are performed sequentially if the query is executed serially, or in parallel when each pair is joined by a separate query slave. The results of each parallel scan are joined without redistributing the data. Partitionwise joins are limited by the number of partitions. Partial partitionwise join: Unlike full partitionwise joins, partial partitionwise joins require you to partition only one table on the join key, not both tables. The partitioned table is referred to as the reference table. The other table may or may not be partitioned. Partial partitionwise joins are more common than full partitionwise joins and do not require an equipartitioning of the tables. Partial partitionwise join Full partitionwise join Partitioned table Query slave Partition

Non-Partitionwise Join: Example
Consider the table: SALES to be hash partitioned on PROD_ID TIMES to be hash partitioned on CALENDAR_YEAR SELECT * FROM sales s, times t WHERE s.time_id = t.time_id; HASH JOIN P->S QC PARTITION HASH (ALL)[1 --> m] PCWP TABLE ACCESS FULL (s) P->P PARTITION HASH (ALL)[1 --> n] PCWP TABLE ACCESS FULL (t) P->P Non-Partitionwise Join: Example In the slide, both tables are partitioned but not on the join key (TIME_ID) of the SELECT statement. This situation prevents the Oracle database from using the partitionwise join technique. In this case, if the statement runs in parallel, two sets of slaves are created and each slave in the reading set can send its data to any of the joining slave from the other set. This operation implies repartitioning of rows sent to the joining set because each reading slaves can read data pertaining to each internal hash partition created for solving the join. This is illustrated in the execution plan above by the PARALLEL_TO_PARALLEL (P > P) indication on both tables. Note: m and n represent the number of partitions of the SALES and TIMES tables, respectively.

Partial Partitionwise Join: Example
Consider the table: SALES to be hash partitioned on TIME_ID TIMES to be hash partitioned on CALENDAR_YEAR SELECT * FROM sales s, times t WHERE s.time_id = t.time_id; HASH JOIN P->S QC PARTITION HASH (ALL)[1 --> m] PCWP TABLE ACCESS FULL (s) PCWP PARTITION HASH (ALL)[1 --> n] PCWP TABLE ACCESS FULL (t) P->P Partial Partitionwise Join: Example In this case, only the SALES table is partitioned on the join key (TIME_ID) of the query. The main difference between this example and the previous one is that the internal repartitioning is not necessary for the reader slaves of the sales table. Because this repartitioning is still necessary for rows read from times table, this technique is called partial partitionwise join. The execution plan clearly shows the difference with the previous example. Here, row access for the SALES table is a PARALLEL_COMBINED_WITH_PARENT operation, which avoids the repartitioning phase. Thus, the reader slaves of SALES are also the ones that join the rows coming from TIMES. Only the reader slaves of the TIMES table are forced to repartition their rows. This is illustrated in the plan (shown in the slide) by the PARALLEL_TO_PARALLEL operation performed on the TIMES table.

Full Partitionwise Join: Example
Consider the table: SALES to be partitioned on TIME_ID TIMES to be partitioned on TIME_ID Both tables are equipartitioned. SELECT * FROM sales s, times t WHERE s.time_id = t.time_id; HASH JOIN P->S QC PARTITION HASH (ALL)[1 --> m] PCWP TABLE ACCESS FULL (s) PCWP PARTITION HASH (ALL)[1 --> n] PCWP TABLE ACCESS FULL (t) PCWP Full Partitionwise Join: Example In the example shown in the slide, both tables are partitioned on the join key of the query and are equipartitioned, which means that if one row in the first partition of SALES can match a row in TIMES, then that second row must be located in the first partition of TIMES. The repartitioning phase is not necessary, and only one set of slave can be used. Each slave can read and join data for two corresponding partitions, one from each table. This is illustrated in the execution plan (shown in the slide) by the two tables access with a PARALLEL_COMBINED_WITH_PARENT operation. Note: In the generated plan, both "PARTITION HASH (ALL)" steps are not shown.

Full Partitionwise Join: Example
Consider the table: SALES to be range partitioned on PROD_ID and to be subpartitioned by hash on TIME_ID TIMES to be hash partitioned on TIME_ID Both tables are equipartitioned on hash dimension. SELECT * FROM sales s, times t WHERE s.time_id = t.time_id; HASH JOIN P->S QC PARTITION RANGE (ALL)[1 --> n] PCWP PARTITION HASH (ALL)[1 --> m] PCWP TABLE ACCESS FULL (s) PCWP PARTITION HASH (ALL)[1 --> m] PCWP TABLE ACCESS FULL (t) PCWP Full Partitionwise Join: Example (continued) In this case, both tables are equipartitioned on the query’s join key. The main difference between this example and the previous one is that SALES is composite partitioned, and both tables are equipartitioned on the hash dimension. This difference is shown in the execution plan by an extra "PARTITION RANGE" step. In fact, each range partition of the SALES table is equipartitioned with the entire TIMES table. Thus, each range partition can be joined with the entire TIMES table by using the full partitionwise join technique.

Partitionwise Join Compatibility
Number of partitions should be a multiple of DOP. Partitions should be of equal size. Use preferably: Hash partitioning or composite partitioning Number of hash partitions as a power of two Possible cases for full partitionwise join: R/S Range Hash List Composite Range Range N/A N/A Range Hash N/A Hash N/A Hash List N/A N/A List List Partitionwise Join Tips The following tips are guidelines only and are absolutely not mandatory. This might help you achieve better performance in some cases. Nevertheless, if your design does not follow these guidelines, it does not mean that it will not work correctly: To guarantee an equal work distribution, the number of partitions should always be a multiple of the degree of parallelism. If you do not correctly identify the partition bounds so that you have partitions of equal size, data skew during the execution may result. This will penalize the workload distribution amongst the slaves. This should not happen in the case where both tables are equipartitioned on the hash dimension provide the number of partitions is a power of two, which limits the risk of skewing. The table given in the slide lists the possible cases for a full partitionwise join to take place. R and S are considered to be two different equipartitioned tables. These tables can be partitioned by range, hash, list, composite range/hash, or composite range/list. The table gives you the dimension on which both tables need to be equipartitioned in order for full partitionwise join to be possible. N/A corresponds to cases where this is not applicable. Composite Range Hash List Range, hash, or list

Parallel DDL The parallel DDL statements for nonpartitioned tables and indexes are: CREATE INDEX CREATE TABLE ... AS SELECT ALTER INDEX ... REBUILD The parallel DDL statements for partitioned tables and indexes are: ALTER TABLE ... MOVE PARTITION ALTER TABLE ... SPLIT PARTITION ALTER TABLE ... COALESCE PARTITION ALTER INDEX ... REBUILD PARTITION ALTER INDEX ... SPLIT PARTITION Parallel DDL You can parallelize DDL statements for tables and indexes that are partitioned or nonpartitioned. The slide summarizes the operations that can be parallelized in DDL statements. All of these DDL operations can be performed in no-logging mode for either parallel or serial execution. Note Parallel rebuild index is possible only over a nonpartitioned index. For partitioned tables, local indexes are rebuilt in parallel on a per-partition basis. The last statement shown in the slide can be executed in parallel only if the (global) index partition being split is usable. Parallel DDL cannot occur on tables with object columns or LOB columns. Clustered tables cannot be created and populated in parallel.

Space Management for Parallel DDL
Creating tables or indexes in parallel could lead to: External fragmentation Internal fragmentation Each parallel execution server creates its own temporary segment using specified storage attributes. Internal fragmentation will almost always happen. External fragmentation should be taken care of only for dictionary-managed tablespaces: Temporary segments might be trimmed. Use MINIMUM EXTENT at tablespace level. Space Management for Parallel DDL When you create a table or index in parallel, each parallel execution server uses the values in the STORAGE clause of the CREATE statement to create temporary segments to store the rows. Therefore, a table created with a NEXT setting of 5 MB and a PARALLEL DEGREE of 12 consumes at least 60 megabytes (MB) of storage during table creation because each process starts with an extent of 5 MB. When the parallel execution coordinator combines the segments, some of the segments may be trimmed, and the resulting table may be smaller than the requested 60 MB. So, when you create a table or index in parallel, it is possible to create pockets of free space. This occurs when the temporary segments used by the parallel execution servers are larger than what is needed to store the rows. Depending on storage specifications, this extra free space can be huge. That is why, in general, Oracle trims the last extent (above its HWM) of a generated temporary segment before merging it with others to produce the resulting table. The drawback of this is to also generate unevenly sized extents. This ultimately can cause what is called external fragmentation. This problem can be avoided by using either locally managed tablespaces (with autoallocate or uniform extents management), or dictionary-managed tablespaces for which the MINIMUM EXTENT clause has been specified. In the latter case, the trimming will not happen. Nevertheless, internal fragmentation cannot be avoided, and will be used for future modifications. Note: Allocation of extents is the same for rebuilding indexes in parallel and for moving, splitting, or rebuilding partitions in parallel.

Fragmentation and Parallelism
PQ1 PQ2 PQ3 PQ4 … … … HWM HWM HWM HWM Internal Fragmentation External Fragmentation HWM … … … Fragmentation and Parallelism The slide illustrates the previous discussion. The idea is to create and populate a table using a statement similar to the following: create table customers parallel 4 tablespace test as select * from sh.customers; At the top of the slide, you can see the existing data used to populate the new table (in the middle). It is supposed that the DOP is set to 4, which implies that four parallel execution servers (shown as circles) are used to read the existing data, and populate the new table. Each parallel execution server is dynamically assigned a granule from the existing table, and creates its own temporary segment used to insert the corresponding data. At the end of the insertion phase, each temporary segment contains its own high-water mark (HWM), with unused data above it. Before merging those temporary segments to form the new table, some of them might be trimmed above their HWM. The result is shown at the bottom of the slide. Except the segment on the extreme left, the others are trimmed. The resulting free space represents the external fragmentation, and the hashed zone represents the internal fragmentation. External fragmentation (free extents of different size) can become a problem when using dictionary-managed tablespaces. With locally managed tablespaces, although external fragmentation can appear, it is handled by the system itself that can reuse all kind of external fragmentation efficiently. Data Unused portion below HWM Free space Unused portion above HWM

Creating Indexes in Parallel
Nonpartitioned index Object Parallel execution server Index piece Data Creating Indexes in Parallel Multiple processes can work together simultaneously to create an index. By dividing the work necessary to create an index among multiple server processes, the database can create the index more quickly than if a single server process created the index sequentially. When the table and index are not partitioned, parallel index creation works the same way as a table scan with an ORDER BY clause. The table is randomly sampled and a set of table keys is found that equally divides the index into the same number of pieces as the degree of parallelism (DOP). A first set of query processes scans the table, extracts index key-row ID pairs, and sends each pair to a process in a second set of query processes based on index key. Each process in the second set sorts the keys and builds an index in the usual fashion. After all index pieces are built, the parallel coordinator concatenates the pieces to form the index.

Global partitioned index Object Parallel execution server Index piece Data Creating Indexes in Parallel (continued) When the index is a global partitioned index, the strategy used to create the index is the same as previously, except that the DOP is limited to the number of index partitions.

Local partitioned index Object Parallel execution server Index piece Data Creating Indexes in Parallel (continued) When the index is a local index, things are more simple because only one parallel execution server is needed. Within the set, each parallel execution server reads one table partition, and simultaneously builds the corresponding index partition. Parallel local index creation can be run with a higher DOP, because half as many server processes are used for a given DOP.

Parallel DDL: Example DDL statements are parallelized by specifying a PARALLEL clause: CREATE BITMAP INDEX fk_sales_prod ON sales(prod_id) PARALLEL 16 LOCAL NOLOGGING; Parallel DDL: Example The slide shows an example of a parallel CREATE INDEX statement for the partitioned table SALES that contains 16 partitions. Note that in the example in the slide, the bitmap index must be local.

Parallel DML: Overview
Complement parallel query architecture by providing parallelization for: INSERT UPDATE DELETE MERGE Useful when changing big tables Parallel DML: Overview Parallel DML (PARALLEL, INSERT, UPDATE, DELETE, and MERGE) uses parallel execution mechanisms to speed up or scale up large DML operations against large database tables and indexes. Parallel DML (PDML) is useful in a DSS environment where the performance and scalability of accessing large objects are important. Parallel DML complements parallel query in providing you with both querying and updating capabilities for your DSS databases. Some online transaction processing (OLTP) operations may also benefit from parallel DML. It is possible to perform parallel DML on nonpartitioned tables for DELETE and UPDATE operations as long as no bitmap indexes exist.

When to Use Parallel DML
Scenarios where parallel DML is used include: Refreshing large tables in a data warehouse Creating intermediate summary tables Using scoring tables Updating historical tables Running batch jobs When to Use Parallel DML Parallel DML operations are mainly used to speed up large DML operations against large database objects. Parallel DML is useful in DSS environments where performance and scalability of accessing large objects are important. Parallel DML complements parallel query in providing both querying and updating capabilities for your DSS databases. Large tables need to be refreshed (updated) periodically with new or modified data from the production system. You can do this efficiently by using parallel DML combined with updatable join views. You can also use the MERGE statement. Table maintenance operations such as online table redefinition and ALTER TABLE … MOVE can also take advantage of parallelism. In a DSS environment, many applications require complex computations that involve constructing and manipulating many large intermediate summary tables. These summary tables are often temporary and frequently do not need to be logged. Parallel DML can speed up the operations against these large intermediate tables.

Restrictions on Parallel DML
After a PDML statement modifies an object, it is no longer possible to query or modify this object in the same transaction (ORA-12838). Limited integrity constraint support is provided. Clustered tables are not supported. PDML is not allowed: On tables with enabled triggers On remote objects When the operation is part of a distributed transaction Restrictions on Parallel DML The following restrictions apply to parallel DML (including direct-path INSERT): A transaction can contain multiple parallel DML statements that modify different tables, but after a parallel DML statement modifies a table, no subsequent serial or parallel statement (DML or query) can access the same table again in that transaction. This restriction also exists after a serial direct-path INSERT statement. The following error message is returned if tried: ORA-12838: cannot read/modify an object after modifying it in parallel. Parallel DML operations cannot be performed on tables with enabled triggers. Thus, replication functionality is not supported for parallel DML. Parallel DML cannot occur in the presence of certain constraints: self-referential integrity (if the referenced keys are involved), delete cascade, and deferred integrity. In addition, for direct-path INSERT, there is no support for any referential integrity. PDML is not allowed on any remote object. A transaction involved in a parallel DML operation cannot be or become a distributed transaction. Clustered tables are not supported. Note: A clustered table is not the same as a table in a cluster.

Performance Benefits of Parallel DML
Can dramatically speed up DML transactions that can be parallelized Serial update: … UPDATE sales SET amount_sold=amount_sold* ; … Parallel update: alter session enable parallel DML; UPDATE /*+PARALLEL(sales,12)*/ sales SET amount_sold=amount_sold* ; … Performance Benefits of Parallel DML The major advantage of parallel DML is performance: If the underlying hardware supports parallel DML, any DML that accesses a large number of rows benefits from it. The performance of parallelism is based not only on the number of processes spawned. The number of CPUs, the number of disk drives on which the data is located, and some other factors, all have an impact. Looking at an Oracle Database 10g plan, you can see that the data is redistributed to an index maintenance row source, which also provides a performance benefit. Note This method can be used for UPDATE, DELETE, and MERGE operations. One slave process can be assigned to more than one partition for purposes of resource sharing and use. …

Automatic Parallelization of DML
Multiple sessions are unnecessary. You do not need to know the key value or ROWID ranges to divide the level of work. Statements do not need to be manually coordinated. Session 1 UPDATE sales SET amount_sold=amount_sold* WHERE time_id > '30-JAN-2002'; Session 2 UPDATE sales SET amount_sold=amount_sold* WHERE time_id < '30-JAN-2002'; Automatic Parallelization of DML Using parallel DML operations, you can provide a degree of parallelism in a transaction without having to establish multiple sessions to execute statements against a select data set. When using parallel DML, you do not need to know key value ranges or ROWID ranges. The level of work is automatically parallelized across the data set affected by the DML operation. If you execute statements across multiple sessions, you lose transaction control; if a statement in one session commits, a second statement in another session may fail and roll back. Although executed across multiple sessions, the statement should be treated as one logical transaction. Parallel DML provides the advantage of executing one statement as a single transaction in a single session, which entirely commits or rolls back. This provides a high degree of control or atomicity over the DML transaction, in essence operating similarly to a two-phase commit operation. Also, with manual parallelization, you lack resource information affinity. You must know affinity information to issue the right DML statement at the right instance when running an Oracle Real Application Cluster. You should also find out about current resource usage to balance workload across instances.

Enabling Parallel DML You must enable PDML.
The ALTER SESSION statement enables parallel DML mode: Parallel queries are still parallelized, even if parallel DML is disabled. The default mode of a session is DISABLE PARALLEL DML. ALTER SESSION ENABLE PARALLEL DML DISABLE ALTER SESSION FORCE PARALLEL DML PARALLEL n Enabling Parallel DML A DML statement can be parallelized only if you have explicitly enabled, or forced parallel DML in the session. This mode is required because parallel DML and serial DML have different locking, transaction, and disk space requirements. ENABLE enables parallel execution of subsequent statements in the session. The default mode of a session is DISABLE PARALLEL DML. When parallel DML is disabled, no DML statement is executed in parallel even if the PARALLEL hint is used. When parallel DML is enabled in a session, all DML statements in this session are considered for parallel execution. However, even if parallel DML is enabled, the DML operation may still execute serially if there are no parallel hints or no tables with a parallel attribute or if restrictions on parallel operations are violated. FORCE forces parallel execution of subsequent statements in the session. If no PARALLEL clause or hint is specified, then a default degree of parallelism is used. This clause overrides any PARALLEL clause specified in subsequent statements in the session, but is overridden by a parallel hint. Provided no parallel DML restrictions are violated, subsequent DML statements in the session are executed with the default degree of parallelism, unless a degree is specified in this clause. The PARALLEL DML mode of the session does not influence the parallelism of SELECT statements, DDL statements, and the query portions of DML statements. Thus, if this mode is not set, the DML operation is not parallelized, but scans or join operations within the DML statement may still be parallelized.

Parallel DML: Example MERGE /*+ PARALLEL(c,3) PARALLEL(d,3) */
INTO customers c USING diff_customers d ON (d.cust_id = c.cust_id) WHEN MATCHED THEN UPDATE SET c.cust_last_name = d.cust_last_name, c.cust_city = d.cust_city WHEN NOT MATCHED THEN INSERT (c.cust_id,c.cust_last_name) VALUES (d.cust_id,d.cust_last_name); Parallel DML: Example In a data warehouse system, large tables need to be refreshed (updated) periodically with new or modified data from the production system. You can do this efficiently by using parallel DML using the MERGE statement.

Direct-Path Insert HWM HWM HWM … HWM HWM HWM … … Data Segment
You can use direct-path INSERT on both partitioned and nonpartitioned tables. The following four cases can be distinguished: Serial direct-path insert into partitioned and nonpartitioned tables: The single process inserts data beyond the current high-water mark of the table segment or of each partition segment. (The high-water mark is the level at which blocks have never been formatted to receive data.) When a COMMIT executes, the high-water mark is updated to the new value, making the data visible to users. Parallel direct-path insert into partitioned tables: This case is currently an exception to the “one process per partition” rule. Each parallel execution server can insert more than one partition. The main benefit is that load balancing is possible with data skew amongst partitions. Parallel direct-path insert into a nonpartitioned table: Each parallel execution server allocates a new temporary segment and inserts data into that temporary segment. When a COMMIT executes, the parallel execution coordinator merges the new temporary segments into the primary table segment, where it is visible to users. For more information, refer to external fragmentation in the lesson. Note: The slide graphic describes only the “inserters” for the parallel cases. You should have the “scanners” to read the table. This operation is generally performed in parallel as well with row redistribution to the “inserters.” Data Segment Parallel Execution Server

Enabling Direct-Path Insert
Supported for: INSERT…SELECT version of the INSERT command MERGE command Used by default when PDML mode is enabled In serial mode, use the APPEND hint: Right after the INSERT keyword Right after the SELECT keyword Right after the MERGE keyword Enabling Direct-Path Insert When you are inserting in parallel DML mode, direct-path INSERT is the default. When you are inserting in serial mode, you must activate direct-path INSERT by specifying the APPEND hint in each INSERT or MERGE statement, either immediately after the INSERT or MERGE keyword, or immediately after the SELECT keyword in the subquery of the INSERT statement. This mode also works for multitable inserts. Note: You can disable direct-path INSERT by specifying the NOAPPEND hint in each INSERT or MERGE statement. Doing so overrides parallel DML mode.

Direct-Path Insert Logging
Whether to log redo and undo during the insert operation on a table or index Object attribute: LOGGING NOLOGGING Specified at creation or alteration time of: Object itself Tablespace in which the object resides Direct-Path Insert Logging With direct-path insert, you can choose whether to log redo and undo information during the insert operation. You can specify logging mode for a table, partition, index, or LOB storage at create time (in a CREATE statement) or subsequently (in an ALTER statement). If you do not specify either LOGGING or NOLOGGING at these times, the logging attribute: Of a partition defaults to the logging attribute of its table Of a table or index defaults to the logging attribute of the tablespace in which it resides The logging attribute of LOB storage defaults to LOGGING if you specify CACHE for LOB storage. If you do not specify CACHE, the logging attribute defaults to that of the tablespace in which the LOB values resides. You set the logging attribute of a tablespace in a CREATE TABLESPACE or ALTER TABLESPACE statement.

PDML and Undo Segments UPDATE /*+PARALLEL(sales,4)*/ sales
SET amount_sold=amount_sold* ; Coor PQ1 PQ2 PQ3 PQ4 PDML and Undo Segments A parallel DML operation is executed by more than one independent slave transaction. This means that each slave transaction can be assigned different undo segments. It also means that two slaves can be assigned to the same undo segment, because they are treated as separate transactions. However, the atomicity of the parallel DML statement ensures that the statement is either completely committed or rolled back. To ensure user-level transactional atomicity, the coordinator uses a two-phase COMMIT protocol to commit the changes performed by the slave transactions. This two-phase COMMIT protocol is a simplified version that makes use of shared disk architecture to speed up transaction status lookups, especially during transactional recovery. In-doubt transactions never become visible to users. The SET TRANSACTION USE ROLLBACK SEGMENT statement is not valid for parallel DML operations. If you are using manually created rollback segments, then you should have equally sized rollback segments, because you have no control over which slave transaction is assigned to which rollback segment. If you are using automatic undo segment management, then you should not worry about undo segment sizing. In this case, you should ensure to have enough space in your undo tablespace. Note: The coordinator also has its own coordinator transaction, which can have its own undo segment and is used for serial updates in the transaction. RBS_1 RBS_2 RBS_3

Recovery for PDML User-issued rollback: Process recovery:
Performed in parallel by coordinator and parallel execution servers Process recovery: PMON roll backs work done by dead process Others roll back their work System recovery: SMON coordinates the dead transactions Fast-start fault recovery is used if enabled Fast-Start On-Demand Rollback Fast-Start Parallel Rollback FAST_START_PARALLEL_ROLLBACK=HIGH Recovery for PDML The time required to roll back a parallel DML operation can be roughly equal to the time it takes to perform the forward operation. Oracle supports parallel rollback after transaction and process failures, and after system failures: Following a transaction failure due to statement error, a user-issued rollback is performed in parallel by the parallel execution coordinator and the parallel execution servers. The rollback takes approximately the same amount of time as the forward transaction. If a parallel execution server or a parallel execution coordinator fails, PMON rolls back the work from that process, and all other processes in the transaction roll back their changes. Note that if multiple slave processes fail, PMON rolls back all of their work serially.

PDML Locking: Considerations
Exclusive locks held on each impacted partition Increase value of the following parameters: DML_LOCKS ENQUEUE_RESOURCES Example: Table with 600 partitions, and parallel DELETE with DOP of 100 involving all partitions: Coordinator: 1 table lock SX, 600 partition locks X Each slave: 1 table lock SX, 1 partition lock NULL and 1 partition-wait lock S per owned partition Concurrent queries use the same behavior with normal DML operations except for the session executing the PDML operation. PDML Locking: Considerations When performing PDML against a partitioned table, the query coordinator takes out exclusive locks on the partitions involved, which means that no DML is possible against these partitions even if you do not want the same rows. As a result, PDML requires more locks partition in which PDML is being performed. Note that this behavior is similar for direct-path insert operations even on nonpartitioned tables. Therefore, because parallel DML holds many more locks, you should increase the starting value of the ENQUEUE_RESOURCES and DML_LOCKS parameters. For example, consider a table with 600 partitions running with a DOP of 100. Assume all partitions are involved in a parallel UPDATE or DELETE statement with no row migrations (in which case the number should also be increased). The coordinator acquires: 1 table lock SX, and 600 partition locks X Each parallel execution server acquires: 1 table locks SX, 1 partition locks NULL and 1 partition-wait locks S for each owned partition. The total is then: 100 table locks SX, 600 partition locks NULL, and 600 partition-wait locks S. Note that partition-wait DML locks appear as TM locks with ID2 set to one in the V$LOCK view.

Parallel Execution of Functions
A function must be declared with the PARALLEL_ENABLE keyword. If the function is declared in a package, then it must be declared with: PRAGMA RESTRICT_REFERENCES with at least WNDS, WNPS, RNPS [, RNDS]. Any variable is private to each parallel execution server. Parallel Execution of Functions SQL statements can contain user-defined functions written in PL/SQL or Java, or as external procedures in C. These functions can appear as part of the SELECT list, SET clause, or WHERE clause. When the SQL statement is parallelized, these functions are executed on a per-row basis by the parallel execution server. Any PL/SQL package variables or Java static attributes used by the function are entirely private to each individual parallel execution process and are newly initialized when each slave session is created, rather than being copied from the original session. Because of this, not all functions will generate correct results if executed in parallel. This is also true for user-written table functions appearing in the FROM clause. In order for a user function to be executed in parallel, it must have been declared with the PARALLEL_ENABLE keyword. In addition, if the function is part of a package or type, it must also have a PRAGMA RESTRICT_REFERENCES that indicates all of WNDS, RNPS, and WNPS, if the function is used for parallel queries only. If the function is used for parallel DML, or DDL, it must also indicate the RNDS option.

Identify the types of parallel operations Explain parallel query, parallel DDL, and parallel DML Determine when partitionwise join is used Describe locking behavior for parallel DML operations Describe parallelism and undo Discuss parallel execution of functions

Monitoring and Tuning Parallel Operations

Set initialization parameters related to parallel operations tuning Query dynamic performance views related to parallel operations tuning Determine your memory requirement needs Tune PDML operations

Tuning Parameters for Parallel Execution
The initial computed values of parallel execution parameters should be acceptable in most cases. They are based on the values of CPU_COUNT and PARALLEL_THREADS_PER_CPU at database startup. Oracle Corporation recommends that you use the default settings. Manual tuning of parallel execution is more complex than using default settings. Manual parallel execution tuning requires more attentive administration than automated tuning. Manual tuning is prone to user-load and system resource miscalculations. Tuning Parameters for Parallel Execution The initial computed values of the parallel execution parameters should be acceptable for the majority of installations. These parameters affect memory usage and the degree of parallelism used for parallel operations. Oracle Database computes defaults for these parameters on the basis of the values at database startup of CPU_COUNT and PARALLEL_THREADS_PER_CPU. The parameters can also be manually tuned, increasing or decreasing their values to suit specific system configurations or performance goals. You can also manually tune parallel execution parameters; however, it is recommended that you use default settings for parallel execution. Manual tuning of parallel execution is more complex than using default settings for two reasons: Manual parallel execution tuning requires more attentive administration than automated tuning, and manual tuning is prone to user-load and system-resource miscalculations. Note: In a RAC environment, CPU_COUNT is the number of CPUs in the cluster.

Using Default Parameter Settings
Increasing 4K to 8K improves parallel performance if SGA is large enough 2 KB (port specific) PARALLEL_EXECUTION_ MESSAGE_SIZE Maximizes the number of processes used by parallel execution CPU_COUNT x PARALLEL_THREADS_ PER_CPU x (1; 2 if PGA_AGGREGATE_TARGET > 0) x 5 PARALLEL_MAX_SERVERS Throttles DOP requests to prevent system overload TRUE PARALLEL_ADAPTIVE_ MULTI_USER Comments Defaults Parameter Using Default Parameter Settings By default, the database automatically sets parallel execution parameters, as shown in the table (in the slide). For most systems, you do not need to make further adjustments to have an adequately tuned parallel execution environment. Note that you can set some parameters in such a way that Oracle Database is constrained. For example, if you set PROCESSES to 20, you will not be able to spawn 25 slaves.

Balancing the Workload
To optimize performance, all parallel execution servers should have equal workloads. For parallelization by block range or parallel execution servers, the workload is dynamically divided among the parallel execution servers. By choosing an appropriate DOP, you can: Minimize workload skew Optimize performance Balancing the Workload To optimize performance, all parallel execution servers should have equal workloads. For SQL statements parallelized by block range or by parallel execution servers, the workload is dynamically divided among the parallel execution servers. This minimizes workload skewing, which occurs when some parallel execution servers perform significantly more work than the other processes. If the workload is evenly distributed among the partitions, you can optimize performance by matching the number of parallel execution servers to the number of partitions or by choosing a DOP in which the number of partitions is a multiple of the number of processes. This applies to partitionwise joins and PDML on tables created before Oracle9i. For example, suppose a table has ten partitions, and a parallel operation divides the work evenly among them. You can use ten parallel execution servers (DOP equal to 10) to do the work in approximately one-tenth the time that one process would take. You might also use five processes to do the work in one-fifth the time, or two processes to do the work in one-half the time. However, if you use nine processes to work on ten partitions, the first process to finish its work on one partition then begins work on the tenth partition; and as the other processes finish their work, they become idle.

Adaptive Multiuser and DOP
The adaptive multiuser feature adjusts the DOP on the basis of user load. PARALLEL_ADAPTIVE_MULTI_USER set to: TRUE improves performance in a multiuser environment (default) FALSE is used for batch processing PARALLEL_AUTOMATIC_TUNING has been deprecated in Oracle Database 10g. Kept for backward compatibility only Adaptive Multiuser and DOP The DOP specifies the number of available processes, or threads, used in parallel operations. Each parallel thread can use one or two query processes, depending on the query’s complexity. The adaptive multiuser feature adjusts the DOP on the basis of user load. For example, you might have a table with a DOP of 5. This DOP may be acceptable with ten users. However, if ten more users enter the system and you enable the PARALLEL_ADAPTIVE_MULTI_USER feature, the DOP is reduced to spread resources more evenly according to the perceived load. After the DOP for a query is determined, the DOP does not change for the duration of the query. It is best to use the parallel adaptive multiuser feature when the probability that users will process simultaneous parallel execution operations is high. By default, PARALLEL_ADAPTIVE_ MULTI_USER is set to TRUE, which optimizes the performance of systems with concurrent parallel SQL execution operations. If PARALLEL_ADAPTIVE_MULTI_USER is set to FALSE, each parallel SQL execution operation receives the requested number of parallel execution server processes regardless of the impact to the performance of the system as long as sufficient resources have been configured. Unless you have a need to do otherwise, it is recommended that the default value for PARALLEL_ADAPTIVE_ MULTI_USER remain unchanged.

PX Message Pool Parallel execution requires additional memory buffers used to communicate between slaves. No recommended size for those buffers Let Oracle size these buffers automatically by: Setting PARALLEL_AUTOMATIC_TUNING to TRUE Buffers allocated in Large Pool instead of Shared Pool In case of ORA-4031, raise your Large Pool. PX Message Pool Starting with Oracle8i, parallel execution message buffers are allocated from the large pool whenever PARALLEL_AUTOMATIC_TUNING, a new initialization parameter, is set to TRUE. In past releases, this allocation was taken from the shared pool. If you are migrating or upgrading to Oracle9i and you choose to set PARALLEL_AUTOMATIC_TUNING to TRUE, you can avoid problems by modifying the settings for the initialization parameters SHARED_POOL_SIZE, and LARGE_POOL_SIZE. Typically, you should reduce the setting for SHARED_POOL_SIZE and raise the setting for LARGE_POOL_SIZE to avoid problems. Alternatively, you can reduce the setting for SHARED_POOL_SIZE and let Oracle calculate the setting for LARGE_POOL_SIZE. Oracle calculates a default LARGE_POOL_SIZE only if PARALLEL_AUTOMATIC_TUNING is set to TRUE and LARGE_POOL_SIZE is unset. It is still possible to see an error message such as: ORA-04031: unable to allocate … bytes of shared memory (large pool). If this is happening, you should raise your Large Pool. Note: With the exception of parallel update and delete, parallel operations do not generally benefit from larger buffer pool sizes. Parallel update and delete benefit from a larger buffer pool when they update indexes. This is because index updates have a random access pattern, and I/O activity can be reduced if an entire index or its interior nodes can be kept in the buffer pool.

PX Message Pool Total PX Message Pool should be:
Size = PARALLEL_EXECUTION_MESSAGE_SIZE Users = # parallel concurrent users running with optimal DOP Groups = # parallel execution server groups per statement Memory = 3*Size*Users*Groups*(DOP²+2*DOP) PX Message Pool (continued) ORA-4031 does not report the cumulative amount of memory needed. To resolve the problem, gradually increase the value for LARGE_POOL_SIZE. Unfortunately, this technique is not very precise. To calculate the initial amount of memory required, use the formula given in the slide, where: Size = PARALLEL_EXECUTION_MESSAGE_SIZE Users = the number of concurrent parallel execution users that you expect to have running with the optimal DOP. Groups = the number of query server process groups used per query. A simple SQL statement requires only one group. However, if your queries involve subqueries that will be processed in parallel, then Oracle uses an additional group of query server processes. Connections = (DOP² + 2 x DOP). This represents the number of connections established between two sets of parallel execution servers. Each slave of the producer set is connected to each slave in the consumer set. Additionally, each slave has a connection to the coordinator. Note: This formula does not take into account Real Application Clusters environments.

Using V$PX_PROCESS_SYSTAT
SELECT name, SUM(bytes) FROM V$SGASTAT WHERE pool = 'large pool' GROUP BY ROLLUP (name); NAME SUM(BYTES) PX msg pool free memory SELECT * FROM V$PX_BUFFER_ADVICE WHERE RTRIM(statistic) = 'Buffers HWM'; Using V$PX_PROCESS_SYSTAT The first query in the preceding slide gives you the current allocation for the PX Message Pool that is around 38 MB in the example. The second query reports the highest number of buffers that were allocated for parallel execution servers. In the slide example, this represents 3,620 buffers. By multiplying the number of buffers by PARALLEL_EXECUTION_MESSAGE_SIZE (4,096 if PARALLEL_AUTOMATIC_TUNING is set to TRUE), you obtain approximately 15 MB. In this case, the high-water mark has reached approximately 40 percent of its capacity. STATISTIC VALUE Buffers HWM

Shared Pool If PX Message buffers are allocated in this Pool, you should use the previous formula to accommodate the Shared Pool. Parallel execution plans consume more space than serial plans. More cursors are used in parallel mode. Monitor recompilation hit ratio in V$SQLAREA. Shared Pool As mentioned earlier, if PARALLEL_AUTOMATIC_TUNING is FALSE, Oracle allocates query server processes from the shared pool. In this case, tune the Shared Pool as described on the previous pages. You must also take into account that using parallel execution generates more cursors. Also, parallel execution plan consume more space then serial plans in general. Look at statistics in the V$SQLAREA view to determine how often Oracle recompiles cursors. If the cursor hit ratio is poor, increase the size of the pool. This should happen only when you have a large number of distinct queries.

PGA Sizing Use the automatic PGA memory management using the PGA_AGGREGATE_TARGET initialization parameter. Available only for dedicated server mode Set to at least: (Total System Memory*80%)*50% If using WORKAREA_SIZE_POLICY set to MANUAL: HASH_AREA_SIZE >= MAX(SQRT(S)/2,1M)/#PES S being the smallest table input in megabytes #PES being the number of concurrent parallel execution servers executing hash joins SORT_AREA_SIZE should range between 256 KB and 4 MB PGA Sizing For complex queries, a big portion of the run-time area is dedicated to work areas allocated in the PGA by memory-intensive operators, such as sorts and hash joins. Prior to Oracle9i, the maximum size of these working areas was controlled using the SORT_AREA_SIZE, HASH_AREA_SIZE, BITMAP_MERGE_AREA_SIZE, and CREATE_BITMAP_AREA_SIZE parameters. These parameters are difficult to tune because they are rather static. That is why it is recommended to use the automatic PGA memory management feature by setting the PGA_AGGREGATE_TARGET parameter to at least 40% of the total amount of available system memory. The idea is to reserve 20% of the total memory to the operating system and 80% to Oracle. Out of those 80%, at least 50% should be used by work areas in a data warehouse environment. This is just a starting point, and DBAs can monitor the number of “multipass” operations by using V$SYSSTAT to decide whether to raise this percentage. If you are using the old manual tuning, you should set the HASH_AREA_SIZE to be approximately half of the square root of S summed over all parallel processes. S is the size in megabytes of the smallest of the inputs to the join operation. In any case, the value for HASH_AREA_SIZE should not be less than 1 MB. Note: The minimum value for PGA_AGGREGATE_TARGET is 10 MB. This parameter is dynamic at the system level.

Resource Manager and the DOP
Resource plan: NIGHT_PLAN Resource consumer group OLTP DSS Allocation method parameters CPU = 25% Max degree of parallelism = 2 CPU = 75% Max degree of parallelism = 20 Resource Manager and the DOP The Database Resource Manager provides a mechanism for allocating the resources of a data warehouse among different sets of end users. Rather than requiring an administrator to monitor the performance, the Database Resource Manager can enforce resource utilization policies. For example, suppose that the marketing department and the sales department share a data warehouse. Using the Database Resource Manager, a warehouse administrator can specify that the marketing department receive at least 75 percent of the CPU resources of the machines, whereas the sales department receive 25 percent of the CPU resources. The database administrator could further specify the maximum degree of parallelism for any operation within a consumer group. The Database Resource Manager provides a very flexible mechanism for allocating CPU resources. The example in the slide has a very simple allocation scheme, in which the CPU resources are divided between two departments. However, an administrator can create a multilevel allocation scheme. In this case, resources are initially allocated to level one users, and then remaining resources are allocated to level two users, and so forth.

Data Warehouse Scenario
Tables in third normal form, and star schema and summary tables Ad hoc queries and high volume processing to convert a repository into the star schema and the summary tables System characteristics: CPUs: 8 Memory : 2 GB Users: 40 Disk: 80 GB Data Warehousing Scenario In the data warehouse scenario, the workload involves some ad hoc queries and a high volume of batch operations to convert a central repository into summary tables and star schemas. The database has source tables, end-user tables in a star schema, and summary form only. In addition, the DBA can create resource consumer groups to control the parallelism degree. The system characteristics are described in the slide.

Parameter Setting: Example
Manually PARALLEL_ADAPTIVE_MULTI_USER=FALSE PARALLEL_THREADS_PER_CPU=4 SHARED_POOL_SIZE=20M Automatically PROCESSES=123, SESSIONS=140 PARALLEL_MAX_SERVERS=80 LARGE_POOL_SIZE=78MB PARALLEL_EXECUTION_MESSAGE_SIZE= 4096 Parameter Setting: Example The DBA disables PARALLEL_ADAPTIVE_MULTI_USER for batch operations. To determine the degree of parallelism, PARALLEL_THREADS_PER_CPU is set to 4. In addition, the DBA alters (or creates) tables with the following kind of statement: ALTER TABLE sales PARALLEL; As a result for simple queries, Oracle can use the DOP of 32 (CPU*PARALLEL_THREADS_PER_CPU).

Are There Execution Problems?
Use V$PQ_TQSTAT to find out whether there is data distribution skew or bad object statistics: Contains traffic statistics between slaves at table queue level Valid only when queried from the parallel session Check I/O and CPU bound at the operating system level and decide on the parallelism. Check for contention. Execution Problems If you use parallel execution, find out whether there is an unevenness in workload distribution. The statistics in V$PQ_TQSTAT show rows produced and consumed per parallel SQL process, and is updated after each parallel execution. This view provides a detailed report of message traffic at the level of the table queue. It is valid only when queried from a session that is executing parallel SQL statements. The view contains a row for each parallel server process that reads or writes each table queue. For example, a table queue connecting 10 consumer processes to 10 producer processes has 20 rows in the view. Find out whether the data itself is skewed or the distribution is uneven. If the data itself is skewed, this may indicate a low cardinality, or a low number of distinct values. If I/O problems occur, you may need to reorganize your data, spreading it over more devices. If there is no skew in workload distribution, check for the following conditions: Are there enough disk controllers to provide adequate I/O bandwidth? Check the operating system CPU to see whether a lot of time is being spent in system calls. If yes, decrease the degree of parallelism. The resource may be overloaded, and too much parallelism may cause processes to compete with each other. You can also use V$PQ_TQSTAT to determine whether object structural statistics have enough precision.

Data Distribution and V$PQ_TQSTAT
SELECT /*+PARALLEL*/ cust_city, sum(amount_sold) FROM sales s, customers c WHERE s.cust_id=c.cust_id GROUP BY cust_city; ... PX SEND QC (RANDOM) :TQ P->S QC (RAND) HASH GROUP BY PCWP PX RECEIVE PCWP PX SEND HASH :TQ P->P HASH HASH GROUP BY PCWP HASH JOIN PCWP PX RECEIVE PCWP PX SEND BROADCAST :TQ P->P BROADCAST PX BLOCK ITERATOR PCWC TABLE ACCESS FULL CUSTOMERS PCWP PX BLOCK ITERATOR PCWC TABLE ACCESS FULL SALES PCWP Data Distribution and V$PQ_TQSTAT The execution plan for the query shown in the slide can be displayed using DBMS_XPLAN.DISPLAY or the utlxplp.sql script. Note that for formatting reasons, only the OPERATION, OPTIONS, OBJECT_NAME, OBJECT_NODE, OTHER_TAG, and DISTRIBUTION columns are shown. To determine the workload distribution between each parallel execution server for the first query execution, you can use the following query: SELECT dfo_number,tq_id,server_type, process,num_rows FROM V$PQ_TQSTAT ORDER BY dfo_number,tq_id,server_type,process; Note that the DFO_NUMBER column represents the data flow operator (DFO) tree number to differentiate statements. Also, the TQ_ID column is the table queue ID within the query that represents the connection between two DFO nodes in the statement execution tree.

Data Distribution and V$PQ_TQSTAT
DFO_NUMBER TQ_ID SERVER_TYP PROCESS NUM_ROWS Consumer P Consumer P Producer P Producer P Consumer P Consumer P Producer P Producer P Consumer QC Producer P Producer P Data Distribution and V$PQ_TQSTAT (continued) This is the result from the V$PQ_TQSTAT query initiated after the original query execution. Note that three different table queues (0, 1, and 2) were used during this execution. These table queues can be matched with the previous execution plan showing TQ0, TQ1, and TQ2, respectively. By looking at both the execution plan and the report shown in the slide, you can understand how many rows are distributed between each set in the execution tree. TQ0 corresponds to the connections between the parallel execution server set reading the SALES table, and the parallel execution server set doing the hash join operation. The chosen DOP for this statement is two because you have two producers in the first set, and two consumers in the second set (interparallelism). As seen, the workload distribution between each producer for TQ0 is quite even (55310,55690). Seamlessly, for the same TQ0, the workload is also evenly distributed to each consumer. This situation is almost ideal because the workload is well-distributed among the parallel execution servers. All the way through this execution plan, you can say that the workload is evenly distributed. As a simple example, consider a hash join between two tables, with a join on a column with only two distinct values. At best, this hash function has one hash value to parallel execution server one and the other to parallel execution server two. A DOP of 2 is fine, but, if it is 4, then at least two parallel execution servers have no work.

Using Other Dynamic Performance Views
General information: V$FILESTAT V$SESSTAT, V$SYSSTAT Information about parallel execution: V$PX_SESSION V$PX_PROCESS V$PX_PROCESS_SYSSTAT V$PX_SESSTAT V$PQ_SESSTAT V$PQ_SLAVE V$PX_BUFFER_ADVICE TIMED_STATISTICS should be set to TRUE. Using Other Dynamic Performance Views You can also monitor parallel operations by using the following dynamic performance views: V$FILESTAT: Sums read and write requests, the number of blocks, and service times for every data file in every tablespace V$SESSTAT: Provides parallel execution statistics for each session. The statistics include total number of queries, DML and DDL statements executed in a session, and the total number of intra- and inter-instance messages exchanged during parallel execution during the session. V$SYSSTAT does the same as V$SESSTAT for the entire system. V$PX_PROCESS: Contains information about the processes (such as name, OS process ID, session ID, and state) of the sessions running parallel execution V$PX_PROCESS_SYSSTAT: Contains parallel statistic data about the processes V$PX_SESSION: Displays information, for example, the degree of parallelism of the sessions running parallel executions V$PX_SESSTAT: Provides a join of the session information from V$PX_SESSION and the V$SESSTAT table. Thus, all session statistics available to a normal session are available for all sessions performed using parallel execution.

Using V$PX_SESSION SELECT qcsid, sid,
server_group "Group", server_set "Set", degree "Degree", req_degree "ReqDegree" FROM V$PX_SESSION ORDER BY 1,3,4; QCSID SID Group Set Degree ReqDegree 9 9 Using V$PX_SESSION Use V$PX_SESSION to determine the configuration of the server group executing in parallel. In this example, session 9 is the query coordinator, whereas sessions 7 and 21 are in the first group, first set. Sessions 18 and 20 are in the first group, second set. The requested and granted DOP for this query is 2.

Using V$PX_SESSTAT SELECT qcsid, sid,
server_group "Group", server_set "Set", name "StatName", VALUE FROM V$PX_SESSTAT A, V$STATNAME B WHERE A.statistic# = B.statistic# AND name = 'physical reads' AND value > 0 ORDER BY 1,3,4; QCSID SID Group Set StatName VALUE physical reads 3863 physical reads 2 physical reads 2 physical reads 2 physical reads 2 Using V$PX_SESSTAT The example in the slide shows the execution of a join query to determine the progress of these processes in terms of physical reads. You can use this type of query to track statistics in V$STATNAME. Repeat this query as often as required to observe the progress of the query server processes.

Using V$PX_PROCESS SELECT * FROM V$PX_PROCESS;
SERV STATUS PID SPID SID SERIAL P002 IN USE P003 IN USE P004 AVAILABLE P005 AVAILABLE P000 IN USE P001 IN USE Using V$PX_PROCESS This query uses V$PX_PROCESS to check the status of the parallel execution servers.

Using V$SYSSTAT SELECT name, value FROM V$SYSSTAT
WHERE UPPER(name) LIKE '%PARALLEL OPERATIONS%' OR UPPER(name) LIKE '%PARALLELIZED%' OR UPPER(name) LIKE '%PX%'; Using V$SYSSTAT The V$SYSSTAT and V$SESSTAT views contain several statistics for monitoring parallel execution. Use these statistics (as shown in the preceding example) to track the number of parallel queries, DMLs, DDLs, data flow operators (DFOs), and operations. Each query, DML, or DDL can have multiple parallel operations and multiple DFOs. In addition, statistics also count the number of query operations for which the DOP was reduced, or downgraded, due to either the adaptive multiuser algorithm or the depletion of available parallel execution servers. Finally, statistics in these views also count the number of messages sent on behalf of parallel execution.

Using V$SYSSTAT NAME VALUE
queries parallelized DML statements parallelized DDL statements parallelized DFO trees parallelized Parallel operations not downgraded Parallel operations downgraded to serial Parallel operations downgraded 75 to 99 pct 252 Parallel operations downgraded 50 to 75 pct 128 Parallel operations downgraded 25 to 50 pct 43 Parallel operations downgraded 1 to 25 pct PX local messages sent PX local messages recv'd PX remote messages sent PX remote messages recv'd Using V$SYSSTAT (continued) Parallel operations not downgraded is incremented if the number of allocated slaves is equal to DOP. Parallel operations downgraded to serial is incremented if no slaves can be allocated. Parallel operations downgraded 75 to 99 pct is incremented if the number of allocated slaves*4/DOP is >= 0 and <1. Parallel operations downgraded 50 to 75 pct is incremented if the number of allocated slaves*4/DOP is >=1 and <2. Parallel operations downgraded 25 to 50 pct is incremented if the number of allocated slaves*4/DOP is >=2 and <3. Parallel operations downgraded 1 to 25 pct is incremented if the number of allocated slaves*4/DOP >= 3.

Tuning PDML Using local striping for local indexes
Using global striping for global indexes Increasing INITRANS for global indexes Using NOLOGGING Using multiple archivers Using multiple DBWRs or I/O slaves Tuning PDML Parallel updates and deletes can generate a high number of random I/O requests during index maintenance. For local index maintenance, local striping is most efficient in reducing I/O contention, because one server process goes only to its own set of disks and disk controllers. For global index maintenance (partitioned or nonpartitioned), globally striping the index across many disks and controllers is the best way to distribute the I/Os. If you have global indexes, a global index segment and global index blocks are shared by server processes of the same parallel DML statement. Even if the operations are not performed against the same row, the server processes can share the same index blocks. Each server transaction needs one transaction entry in the index block header before it can make changes to a block. Therefore, in the CREATE INDEX or ALTER INDEX statement, you should set INITRANS, the initial number of transactions allocated within each data block, to a large value, such as the maximum DOP against this index. Parallel DDL and DML operations can generate a large amount of redo logs. A single ARCH process to archive these redo logs may not be adequate. To avoid this problem, spawn multiple archiver processes. Parallel DML operations dirty large numbers of data, index, and undo blocks in the buffer cache during a short period of time, so you should consider increasing the number of DBWn processes.

Set initialization parameters related to parallel operations tuning Query dynamic performance views related to parallel operations tuning Determine your memory requirement needs Tune PDML operations

Parallelism Concepts.

Similar presentations

Presentation on theme: "Parallelism Concepts."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallelism Concepts.

Similar presentations

Presentation on theme: "Parallelism Concepts."— Presentation transcript:

Similar presentations

About project

Feedback