OPS-8: Effective OpenEdge® Database Configuration

Slides:



Advertisements
Similar presentations
DB-03: A Tour of the OpenEdge™ RDBMS Storage Architecture Richard Banville Technical Fellow.
Advertisements

Free Space and Allocation Issues
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
Recap of Feb 27: Disk-Block Access and Buffer Management Major concepts in Disk-Block Access covered: –Disk-arm Scheduling –Non-volatile write buffers.
New Generation of OpenEdge ® RDBMS Advanced Storage Architecture II Tomáš Kučera Principal Solution Engineer / EMEA Power Team.
MOVE-4: Upgrading Your Database to OpenEdge® 10 Gus Björklund Wizard, Vice President Technology.
Data Access Patterns. Motivation Most software systems require persistent data (i.e. data that persists between program executions). In general, distributing.
Database Storage Considerations Adam Backman White Star Software DB-05:
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Top Performance Enhancers Top Performance Killers in Progress Dan Foreman Progress Expert
Strength. Strategy. Stability.. Progress Performance Monitoring and Tuning Dan Foreman Progress Expert BravePoint BravePoint
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
Component 4: Introduction to Information and Computer Science Unit 6a Databases and SQL.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Using Progress® Analytical Tools Adam Backman White Star Software DONE-05:
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
for all Hyperion video tutorial/Training/Certification/Material Essbase Optimization Techniques by Amit.
SQL Basics Review Reviewing what we’ve learned so far…….
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
Keys and adding, deleting and modifying records in an array ● Record Keys ● Reading and Adding Records ● Partition or Sentinels Marking Space in Use ●
Hudson Fare Files 103 – Alternate Fare Files
Memory Management.
Table spaces.
Module 11: File Structure
Subject Name: File Structures
CHP - 9 File Structures.
Chapter 6 - Database Implementation and Use
Physical Changes That Don’t Change the Logical Design
Lecture 16: Data Storage Wednesday, November 6, 2006.
Outline Paging Swapping and demand paging Virtual memory.
Advanced QlikView Performance Tuning Techniques
Physical Database Design and Performance
File System Structure How do I organize a disk into a file system?
Chapter 11: File System Implementation
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
COMP 430 Intro. to Database Systems
Subject Name: File Structures
Software Architecture in Practice
Database Performance Tuning and Query Optimization
Walking Through A Database Health Check
SQL Server May Let You Do It, But it Doesn’t Mean You Should
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Database Implementation Issues
Chapter 11: File System Implementation
Database Implementation Issues
Physical Database Design
Chapter 11: File System Implementation
Computer Architecture
Introduction to Database Systems
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
File Storage and Indexing
DATABASE IMPLEMENTATION ISSUES
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Chapter 11 Database Performance Tuning and Query Optimization
ICOM 5016 – Introduction to Database Systems
Indexing 4/11/2019.
Data Structures Unsorted Arrays
OPS-14: Effective OpenEdge® Database Configuration
Data Warehousing Concepts
Chapter 11: File System Implementation
A – Pre Join Indexes.
Database Implementation Issues
Arrays.
CSE 326: Data Structures Lecture #14
Database Implementation Issues
Database Implementation Issues
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

OPS-8: Effective OpenEdge® Database Configuration Richard Shulman Principal Support Engineer

Agenda Performance The Physical Layout Other Considerations

Performance What performance gains could I expect:   By just moving to OpenEdge   By going to Type I Storage Areas   By going to Type II Storage Areas NOTE: YMMV* (Your Mileage May Vary) Just by moving to OpenEdge there are many improvements to the commands, the potential size of the database, and many characteristics of performance for the client. Type I Storage Areas allow greater ability to align the record sizes of an area to appropriately utilize space within the database block. Type II Storage Areas give additional capabilities to organize blocks of similar data together to improve disk throughput for reads and writes of large sequential disk operations. As the slide says, YMMV.

Performance Real Customer Experience: Manufacturing Industry Challenge was long running major processes Customer Statement process was taking over 25 min each (interactive process) Nightly MRP was taking over 4 hours

Real World Results Minutes In this example the customer migrated their old database to a simple database with everything in the schema area yielding just over 25 minutes for the run. Moving to a separate Type I area dropped the time to about 15 minutes. Upgrading to OpenEdge dropped the time to about 11 minutes. Re-architecting the database using Type II areas dropped the time to about 4 minutes.

Real World Results Minutes In this example the customer migrated their old database to a simple database with everything in the schema area yielding just over 4 hours for the nightly MRP run. Moving to a separate Type I area dropped the time to about 3 hours. Upgrading to OpenEdge dropped the time to about 140 minutes (2 hours 20 minutes). Re-architecting the database using Type II areas dropped the time to about 80 minutes.

Real World Results Why so big of a difference? Is this typical of what I can expect? How fast can I get there? For the previous slides discussions of times. The v9 baseline was a dump and load into a v9 schema area with standard records per block. All data was contained initially in the Schema Area. The move from schema to separate Type I area dropped the times down partially by allowing a different definition of records per block. The shift from v9 to OpenEdge gave initial improvements in some of the algorithms of the engine. The shift from Type I to Type II gave the capabilities of clusters and the improvement of data access due to contiguity of blocks. Is this typical? The percentages of improvement may not always be this great but improvements are expected from each of these steps. For the greatest improvements the typical course requires a dump and load into new areas. This is the limiter on how fast you can get there.

Agenda Performance The Physical Layout Other Considerations

Storage Areas? No adverse application effects Physical reorg does NOT change the application Object location is abstracted from the language by an internal mapping layer Different physical deployments can run with the same compiled r-code

Determining the Layout What process is used to determine what the new layout should be? Run a database analysis on the production database or recent, full copy of the production database. The analysis is only valid against the full set of data to help determine number of records, average record size, scatter factor, etc. Alternatively a sample piece of code was written to facilitate some calculations. The sample is not bulletproof but should work with most databases on most platforms. See the PSDN Website: http://www.psdn.com/library/entry.jspa?externalID=3637&categoryID=1800

Determining the Layout Every layout could be different Not every company uses the same application the same way or with the same options. Taking the analysis from one customer may have different results than from another customer. Often, customers adapt functionalities that don’t exactly meet there needs. The volume of data for some fields may be vastly different than what was planned for by the developer. Therefore variations in tables, fields, etc. can yield different record sizes, different usages of areas, different records per block, etc.

Determining the Layout Things to consider: Is your application from a Partner or do you maintain it? If it is from a Partner: Have you asked for their recommendations? Would they support you if you changed the layout? Do you have the capacity to re-org your database (you have the capability) Remember. If the application partner has any special coding which looks at the areas or makes definition file changes which are dependent on area information you may need to work with that AP to accommodate your changes.

Determining the Layout To do a layout……. Step 1: Run a dbanalys report of the full database (or on a copy of it if you have one)

Determining the Layout Step 1: The beginning…. Run a dbanalys report of the full database (or on a copy of it if you have one) What we look for: Large tables by record count Large tables by raw storage size Unused tables (no records) What is considered large? Sort the analysis by record count. You can use an Excel spreadsheet for this. Large is relative and subject to much discussion but if a table contains either 5% of the total records of your database or 5% of the total size of your database it is probably large in anyone’s book.

Determining the Layout Step 2: Initial Separation…. For each of the large tables, make a separate data area for the table and a separate index area for its indices. This will add 2 new storage areas FOR EACH large table. For the tables with no records, make a small storage area for the tables and a separate storage area for the indexes. This will add 2 new storage areas total. Define separate areas for those tables with the biggest number of records Define a separate index area for the indices of those tables (one area for all the indices of one table). Put all tables which have no records into one combined area and create a separate index area as well. Though these tables have no data now, they may be used later and this might help to keep an eye on them.

Determining the Layout Step 3: “Special” tables…. Every application has some of these: Control Tables (e.g. country codes) High Create/Destroy activity (e.g. batch report queue) Make a separate storage area for all control tables and a separate area for their indexes Make a separate storage area for all high create / destroy tables and a separate area for their indexes For control tables, define an area where all control tables will be housed. It shouldn’t need to be a big area because control tables are typically small and usually static.

Determining the Layout Step 4: The rest…. Group the remaining tables by mean record size into 32, 64, 128 and 256 record per block groups. Make a separate storage area for each grouping and a separate area for their indexes

Determining the Layout How to select the record per block and cluster setting?

Determining the Layout How to select the record per block, and why do I care? Incorrect settings waste recid pointers and can cause internal block fragmentation (less critical in 10.1B and later due to 64-bit recid introduction) You have approx. 8000 bytes in a 8k db block and 3900 bytes in a 4k db block For any versions of Progress / OpenEdge prior to 10.1B a 31-bit limit exists for the number of records which may be stored in 1 area. The limit is 31 bit (2 billion) because one of the bits is used by Progress to quickly signify if the record has been deleted. 10.1B and later has a limit of 63 bits (we still use one bit to signify the record has been deleted).

Determining the Layout RECORD BLOCK SUMMARY FOR AREA "Employee" : 7 -----------------------------------------------Record Size (B)----Fragments------Scatter Table Records Size Min Max Mean Count Factor Factor PUB.Benefits 21 848.0B 39 41 40 21 1.0 1.0 PUB.Department 7 211.0B 26 35 30 7 1.0 2.7 PUB.Employee 55 6.2K 99 135 115 55 1.0 0.8 PUB.Family 72 3.1K 38 51 44 72 1.0 1.1 So, in a 4k db block we could put 100 Benefits records, in a 8k db block we could put 200. However, neither of these are allowed values; WHAT DO WE DO?

Determining the Layout Do we choose based on Performance or Storage requirements? Choose the higher rpb setting for better performance Choose the lower rpb setting for better Storage needs

Determining the Layout How to select the cluster setting? Tables 64 or 512 block clusters Indexes 8 or 64 block clusters Typically we set 64 blocks per cluster for data tables as a moderate value. For those systems which have either a high record create or delete rate in short bursts of time or run large reports where sequential record read operations are common then 512 blocks per cluster may be better. Similar rules exist for Type II areas which contain indices but the use the lower values of 8 and 64 respectively.

Agenda General Migration Strategy The Fast and Easy Upgrade Other Considerations

Other Considerations The physical layout Combining Databases ? RAID considerations Separating files Fixed or Fixed/Floating Extents   Combining Databases ? RAID is almost always preferred to non-Raid. There are many reasons to continue the practice of separating files (if the disk layout permits) to maximize performance, recoverability, or the ability to monitor activity. The traditional recommendations for 1 variable (floating) extent may are often unnecessary due to the advent of the OpenEdge functionality to add extents while the database is online. However, if there is no constant administrator to add extents online to a database it may still be advisable to keep one variable data extent per area because the database will shutdown if the database runs out of space.

Other Considerations The Database rules of normalization and the impact on performance   Index considerations Many single component keys or fewer multi-component keys Though the rules of normalization are great to reduce duplication of data it makes crafting reports and potentially running reports less efficient. So long as the queries are written to match the index key order it is often better to have multi-component keys. If queries are written that do not include the initial fields of the index then multiple single key indices may be more appropriate.

Other Considerations Impact to startup parameters -B is number of db blocks   Impact to other environments Not just Production is impacted SQL Verify your SQL Width values using dbtool Don’t forget to UPDATE STATISTICS If during any modifications to the database, to improve performance, the database blocksize is changed, be aware this changes the amount of memory the database will use when it is started because the value of –B is in database blocks. Increasing the blocksize from 1k to 4k will mean more than 4 times the amount of memory will be used even though the same value of –B is used. Changes to 10.1C change the default client temp-table blocksize to the new value of 4k. This can make a big difference in the space used and memory used by the client application. In some cases it might be desirable to adjust the –Bt to limit the amount of blocks in memory and / or change the –tmpbsize to reduce the value back to the older default. To check if Update Statistics has ever been run on a database, look within a recent database analysis for _Systblstat _Sysidxstat _Syscolstat. If these tables do not exist or have zero values then update statistics has not been run for the database. Update Statistics should be run after any significant change to data (greater than 25%) is made.

In Summary Huge Performance Gains Possible Can be done in Phases You can do this ! Huge performance gains are possible but dependent on a number of factors, size of database, types of reads and writes, hardware. These migration operations can be done in stages to have smaller downtimes but more of them.

? Questions

Thank You