Some More Database Performance Knobs North American PUG Challenge


Similar presentations
B3: Putting OpenEdge Auditing to Work: Dump and Load with (Almost) No Downtime David EDDY Senior Solution Consultant.

Storing Data: Disk Organization and I/O
Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.
Index Rebuild Performance Hopefully youll never need it. Wei Qiu Principle Engineer Progress Software Inc.
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
DB-03: A Tour of the OpenEdge™ RDBMS Storage Architecture Richard Banville Technical Fellow.
© IBM Corporation Informix Chat with the Labs John F. Miller III Unlocking the Mysteries Behind Update Statistics STSM.
Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
T OP N P ERFORMANCE T IPS Adam Backman Partner, White Star Software.
OPS-28: A New Spin on Some Old Latches Richard Banville Fellow.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
External Sorting CS634 Lecture 10, Mar 5, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
1 Overview of Storage and Indexing Chapter 8 (part 1)
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
OS and Hardware Tuning. Tuning Considerations Hardware  Storage subsystem Configuring the disk array Using the controller cache  Components upgrades.
On-Demand Media Streaming Over the Internet Mohamed M. Hefeeda, Bharat K. Bhargava Presented by Sam Distributed Computing Systems, FTDCS Proceedings.
Backup and Recovery Part 1.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Database I/O Mechanisms
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
MOVE-4: Upgrading Your Database to OpenEdge® 10 Gus Björklund Wizard, Vice President Technology.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Administration etc.. What is this ? This section is devoted to those bits that I could not find another home for… Again these may be useless, but humour.
Birth, Death, Infinity Gus Björklund. Progress. Dan Foreman. BravePoint. PUG Challenge Americas, 9-12 June 2013.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
Physical Storage Susan B. Davidson University of Pennsylvania CIS330 – Database Management Systems November 20, 2007.
Strength. Strategy. Stability.. Progress Performance Monitoring and Tuning Dan Foreman Progress Expert BravePoint BravePoint
TEMPDB Capacity Planning. Indexing Advantages – Increases performance – SQL server do not have to search all the rows. – Performance, Concurrency, Required.
A first look at table partitioning PUG Challenge Americas Richard Banville & Havard Danielsen OpenEdge Development June 9, 2014.
CS Operating System & Database Performance Tuning Xiaofang Zhou School of Computing, NUS Office: S URL:
OPS-12: A New Spin on Some Old Latches Richard Banville Fellow.
Using Progress® Analytical Tools Adam Backman White Star Software DONE-05:
CS 540 Database Management Systems
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 7 – Buffer Management.
TOP 10 Thinks you shouldn’t do with/in your database
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
for all Hyperion video tutorial/Training/Certification/Material Essbase Optimization Techniques by Amit.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
CS 540 Database Management Systems
Module 11: File Structure
Lecture 16: Data Storage Wednesday, November 6, 2006.
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
External Sorting Chapter 13
Chapter Overview Understanding the Database Architecture
Lecture 10: Buffer Manager and File Organization
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
Evaluation of Relational Operations: Other Operations
Disk Storage, Basic File Structures, and Buffer Management
External Sorting Chapter 13
Selected Topics: External Sorting, Join Algorithms, …
CS222/CS122C: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
Troubleshooting Techniques(*)
Chapter 13: Data Storage Structures
CS222p: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
External Sorting Chapter 13
Chapter 13: Data Storage Structures
Evaluation of Relational Operations: Other Techniques
Mr. M. D. Jamadar Assistant Professor
Chapter 13: Data Storage Structures
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
Presentation transcript:

Some More Database Performance Knobs North American PUG Challenge Richard Banville Software Fellow OpenEdge Development

1 2 3 4 5 Agenda LRU (again) Networking: Message Capacity Networking: Resource Usage 4 Index Rebuild 5 Summary

1 2 3 4 5 Agenda LRU (again) Networking: Message Capacity Networking: Resource Usage 4 Index Rebuild 5 Summary

LRU (again) Least Recent Most Recent RM Block T1 IX Block I1 Replacement policy of database buffer pool Maintains working set of data buffers Just a linked list – a shared data structure Changes made orderly by LRU Latch Replace buffer at LRU end with newly read block from disk Just a linked list meandering though the buffer pool

LRU (again) Least Recent Most Recent RM Block T1 IX Block I1 Pros – proficient block usage predictor Maintains high buffer pool hit ratio Cons – housekeeping costs Single threads access to buffer pool (even if for an instant) High activity, relatively high nap rate Managing LRU: Private read only buffers: -Bp –BpMax (not w/-lruskips until 10.2b07) Alternate buffer pool: –B2 New: -lruskips -lru2skips Just a linked list meandering though the buffer pool

LRU (again) Least Recent Most Recent RM Block T1 IX Block I1 Find first T1. Just a linked list meandering though the buffer pool

LRU (again) Least Recent Most Recent RM Block T1 IX Block I1 Find first T1. RM Block T1 RM Block T3 IX Block I3 IX Block I1 RM Block T3 IX Block I3 IX Block I1 RM Block T1 Just a linked list meandering though the buffer pool

LRU (again) Least Recent Most Recent RM Block T3 IX Block I3 Find first T1. (again) RM Block T3 IX Block I3 RM Block T1 IX Block I1 RM Block T3 IX Block I3 IX Block I1 RM Block T1 Just a linked list meandering though the buffer pool What about … For each T1: end. For each w/many tables. For each w/many tables, many users.

Location, location, location Least Recent Most Recent With –B 1,000,000 What does it take to evict from the buffer pool? What does it take to go from MRU to LRU? Do we need MRU on EACH access then? I think not.

Improving Concurrency Least Recent Most Recent -lruskips <n> LRU and LFU combined Small numbers make a BIG difference Monitor OS Read I/Os and LRU latch contention Adjust online via _Startup. _Startup-LRU-Skips VST field Adjust online via promon R&D -> 4. Administrative Functions ... -> 4. Adjust LRU force skips

Performance – 10.2b06 & -lruskips Re-iterate - read performance improvement for high volume/contention no-lock read situation Performance starts to degrade at about 140 users. ~39% # Users

Performance – 10.2b06 & -lruskips (250 users) Note change in LRU latch waits vs buffer latch waits

Performance – 10.2b06 & -lruskips (250 users, big db) Note focus now is on LRU and BHT (not buf)

Performance – 10.2b06 & -lruskips (big db) ~52% ~15% ~44% Re-iterate - read performance improvement for high volume/contention no-lock read situation Performance starts to degrade at about 140 users. # Users

Conclusions -lruskips can eliminate the LRU bottleneck LRU isn’t the last bottleneck Overall improvement relative to other contention Data access limited by buffer level contention Table scans over small tables have more buffer contention than large tables Application changes can improve performance too!

1 2 3 4 5 Agenda LRU (again) Networking: Message Capacity Networking: Resource Usage 4 Index Rebuild 5 Summary

Networking Control Philosophy: Throughput by keeping server busy without remote client waits! Process based control -Ma, -Mn, -Mi Controls the order users are assigned to servers -PendCondTime Resource based control -Mm <n> Maximum size of network message Client & server startup New tuning knobs – resource based control Alleviate excessive system CPU usage by network layer Control record data stuffed in a network message Applicable for “prefetch” queries Polling for Unix only Message stuffing supported for both windows and unixes

Networking – Prefetch Query No-lock query with guaranteed forward motion or scrolling Multiple records stuffed into single network message Browsed static and preselected queries scrolling by default FOR EACH customer NO-LOCK: …. end. DO PRESELECT EACH customer NO-LOCK: …. end. Static queries are associated with class files only. Note that the third query MUST include the SCROLLING options define query cust-q for customer SCROLLING. open query cust-q FOR EACH customer NO-LOCK. repeat: get next cust-q. end.

Server Network Message Processing Loop Polling for Unix only Message stuffing supported for both windows and unixes

Server Network Message Processing Loop Polling for Unix only Message stuffing supported for both windows and unixes

Server Network Message Processing Loop -NmsgWait What’s new: Polling for Unix only Message stuffing supported for both windows and unixes

Server Network Message Processing Loop Polling for Unix only Message stuffing supported for both windows and unixes

Server Network Message Processing Loop Polling for Unix only Message stuffing supported for both windows and unixes

Server Network Message Processing Loop Poll() is system CPU intensive Polling for Unix only Message stuffing supported for both windows and Unixes 1ms = .001 mu => or 1000 mu = 1 ms 10 milliseconds to poll(0)! 10 microseconds to copy 1 record

Server Network Message Processing Loop Potential side effects What’s new: -Nmsgwait – Network message wait time -prefetchPriority – give prefetching of records priority over polling for new requests -prefetchPriority

Server Network Message Processing Loop Polling for Unix only Message stuffing supported for both windows and unixes

Process Waiting Network Message

Process Waiting Network Message Non-prefetch query request

Process Waiting Network Message 1st record of a prefetch query request

Process Waiting Network Message Secondary records of a prefetch query request: - threshold not met - default threshold is 16 records Default threshold is 16 records : 4096/16 = 256 bytes

Process Waiting Network Message Secondary records of a prefetch query request: - Client waiting - Threshold met - Send message

Process Waiting Network Message What’s new: Increase network message fill rate: - Improve TCP throughput - Improve overall server performance Defaults have not changed Provides control for you Every deployment is different

Process Waiting Network Message Disregard 1st record request check -prefetchDelay Threshold control – mention that first limit reached causes message to be sent. 0% disables it. Threshold control: # recs vs % full -prefetchNumRecs -prefetchFactor Potential side effects: Improved TCP/system performance Choppy behavior on remote client? NOTE: - -Mm size determines max -Mm 4096 / 16 rec = 256 bytes

Altering Network Message Behavior Promon Support (_Startup VST too!) Alter online R&D … 4. Administrative Functions … 7. Server Options … Server Options: 1. Server network message wait time: 2 seconds 2. Delay first prefetch message: Enabled 3. Prefetch message fill percentage: 90 % 4. Minimum records in prefetch message: 1000 5. Suspension queue poll priority: 0 7. Terminate a server Alter via _Startup vst

Performance – 10.2b06 & Networking changes Re-iterate - read performance improvement for high volume/contention no-lock read situation Performance starts to degrade at about 140 users. ~212% ~32% # Users

1 2 3 4 5 Agenda LRU (again) Networking: Message Capacity Networking: Resource Usage 4 Index Rebuild 5 Summary

Assumptions for best performance Index data is segregated from table data Indexes & tables are in different storage areas You have enough disk space for sorting You understand the impact of CPU and memory consumption Process allowed to use available system resources

Index Rebuild Parameters - Overview -TB sort block size (8K – 64K, note new limit) -datascanthreads # threads for data scan phase -TMB merge block size ( default -TB) -TF merge pool fraction of system memory (in %) -mergethreads # threads per concurrent sort group merging -threadnum # concurrent sort group merging -TM # merge buffers to merge each merge pass -rusage report system usage statistics -silent a bit quieter than before

Phases of Index Rebuild (“non-recoverable”) Scan index data area start to finish I/O Bound with little CPU activity Eliminated with area truncate Index Scan Scan table data area start to finish (area at a time) Read records, build keys, insert to temp sort buffer Sort full temp file buffer blocks (write if > -TF) I/O Bound with CPU Activity Data Scan/ Key Build Sort-merge –TF and/or temp sort file CPU Bound with I/O Activity I/O eliminated if –TF large enough Sort-Merge Read –TF or temp sort file Insert keys into index Formats new clusters; May raise HWM I/O Bound with little CPU Activity Index Key Insertion

Phases of Index Rebuild Scan index data area start to finish I/O Bound with little CPU activity Eliminated with area truncate Index Scan Area 9: Index scan (Type II) complete. Index area is scanned start to finish (single threaded) Block at a time with cluster hops Index blocks are put on free chain for the index Index Object is not deleted (to fix corrupt cluster or block chains) Order of operation: Blocks are read from disk, Blocks are re-formatted in memory Blocks are written to disk as –B is exhausted Causes I/O in other phases for block re-format Can be eliminated with manual area truncate where possible There is no start message for the index scan phase, only a complete message

Phases of Index Rebuild Scan index data area start to finish I/O Bound with little CPU activity Eliminated with area truncate Index Scan Data Scan/ Key Build Scan table data area start to finish (area at a time) Read records, build keys, insert to temp sort buffer Sort full temp file buffer blocks (write if > -TF) I/O Bound with CPU Activity Processing area 8 : (11463) Start 4 threads for the area. (14536) Area 8: Multi-threaded record scan (Type II) complete. Table data area is scanned start to finish (multi-threaded if –datascanthreads) Each thread processes next block in area (with cluster hops) Database re-opened by each thread in R/O mode Ensure file handle ulimits set high enough

… Data Scan/Key Build Thread reads next data block in data area DB RM Block Thread reads next data block in data area Record Extract next record from data block and build index key (sort order) Key Sort Block Sort Block Insert key into sort block (-TB 8K thru 64K) Sort/merge full sort block into merge block. (-TMB -TB thru 64K) Merge Block Write merge block to –TF, overflow to temp (-TMB sized I/O) -TF .srt1 .srt2 …

Sort Groups: -SG 3 (note 8 is minimum) Each index assigned a particular sort group (hashed index #) 1) -T /usr1/richb/temp/ Index 1 SG 1 .srt1 Index 4 2) <dbname>.srt 0 /usr1/richb/temp/ Record Index 2 SG 2 .srt2 3) <dbname>.srt 10240 /usr1/richb/temp/ 0 /usr1/richb/temp/ Index 3 SG 3 .srt3 4) <dbname>.srt 0 /usr1/richb/temp/ 0 /usr2/richb/temp/ 0 /usr3/richb/temp/ Each group has its own sort file Sort file location 1. Sort files in same directory (I/O contention) 4. Sort files in different location Ensure enough space

Phases of Index Rebuild Scan index data area start to finish I/O Bound with little CPU activity Eliminated with area truncate Index Scan Scan table data area start to finish (area at a time) Read records, build keys, insert to temp sort buffer Sort full temp file buffer blocks (write if > -TF) I/O Bound with CPU Activity Data Scan/ Key Build Sort-merge –TF and/or temp sort file CPU Bound with I/O Activity I/O eliminated if –TF large enough Sort-Merge Sorting index group 3 Spawning 4 threads for merging of group 3. Sorting index group 3 complete.

Sort-Merge Phase Sort blocks in each sort group have been sorted and merged into a linked list of individual merge blocks stored in –TF and temp files. These merge blocks are further merged –TM# at a time to form new larger “runs” of sorted merge blocks. -TM# of these new “runs” are then merged to form even larger “runs” of sorted merge blocks. When there is only one very large “run” left, all the key entries in the sort group are in sorted order. Sorted! Note: even thought the illustration does not depict merges of –TM “runs”, they are always merged –TM at a time. It is just too difficult to depict given the screen real estate.

-threadnum vs -mergethreads -TF .srt1 Thread 1 Merge phase group 1 -TF .srt2 Thread 2 Merge phase group 2 -TF .srt3

-threadnum vs -mergethreads B-tree insertion occurs as soon as a sort group’s merge is completed. -TF .srt1 Thread 0 begins b-tree insertion concurrently. Thread 0 -TF .srt2 Thread 2 Merge phase group 2 -TF .srt3 Thread 1 Merge phase group 3

-threadnum vs -mergethreads -threadnum 2 –mergethreads 3 Merge threads merge successive “runs” of merge blocks concurrently. Thread 3 -TF .srt1 Thread 1 Thread 4 Merge phase group 1 Thread 5 -TF .srt2 Thread 6 Thread 2 Merge phase group 2 Thread 7 Thread 8 -TF .srt3 Note: 8 actively running threads

-threadnum vs -mergethreads -threadnum 2 –mergethreads 3 -TF .srt1 -TF .srt2 Thread 6 Thread 2 Merge phase group 2 Thread 7 Thread 8 -TF .srt3 Thread 3 Thread 1 Merge phase group 3 Thread 4 Thread 5

-threadnum vs -mergethreads -threadnum 2 –mergethreads 3 B-tree insertion occurs as soon as a sort group’s merge is completed. -TF .srt1 Thread 0 begins b-tree insertion concurrently. Thread 0 -TF .srt2 Thread 6 Thread 2 Merge phase group 2 Thread 7 Thread 8 -TF .srt3 Thread 3 Thread 1 Merge phase group 3 Thread 4 Thread 5 Note: 9 actively running threads

Phases of Index Rebuild Scan index data area start to finish I/O Bound with little CPU activity Eliminated with area truncate Index Scan Scan table data area start to finish (area at a time) Read records, build keys, insert to temp sort buffer Sort full temp file buffer blocks (write if > -TF) I/O Bound with CPU Activity Data Scan/ Key Build Sort-merge –TF and/or temp sort file CPU Bound with I/O Activity I/O eliminated if –TF large enough Sort-Merge Index Key Insertion Read –TF or temp sort file Insert keys into index Formats new clusters; May raise HWM I/O Bound with little CPU Activity

Index Key Insertion Phase Building index 11 (cust-num) of group 3 … Building of indexes in group 3 completed. Multi-threaded index sorting and building complete. Key entries from sorted merge blocks are inserted into b-tree Performed sequentially entry at a time, index at a time Leaf level insertion optimization (avoids b-tree scan) Leaf level written to disk as soon as full (since never revisited) Index B-tree Root Leaf Leaf Leaf Leaf level optimization internally referred to as “cxfast” insertion Write leaf when full DB

2085 Indexes were rebuilt. (11465) Index rebuild complete 2085 Indexes were rebuilt. (11465) Index rebuild complete. 0 error(s) encountered.

Index Rebuild - Tuning Truncate index only area if possible Parameters .srt file Parameters -mergethreads: 2 or 4 and –threadnum 2 or 1 -datascanthreads: 1.5 * # CPUs -B 1024 –TF 80 (monitor physical memory paging) –TMB 64 –TB 64 –TM 32 –T: separate disk, RAM disk if not using -TF (no change) -rusage & -silent Double scan – how to avoid - removed

12 ½ hours  2 ½ hours 5X improvement! Performance Numbers Elapsed Time 12 ½ hours  2 ½ hours 5X improvement! Cost of each phase (in secs)

1 2 3 4 5 Agenda LRU (again) Networking: Message Capacity Networking: Resource Usage 4 Index Rebuild 5 Summary

Summary LRU Networking Index Rebuild Potential for a big win Always room for improvement Us and you! Networking You now have more control With power comes responsibility Index Rebuild Big improvements if Your database is setup properly You provide system resources to index rebuild Hopefully you’ll never need it

? Questions

Performance – 10.2b06 & -lruskips (250 users) Latches/sec

Performance – 10.2b06 & -lruskips (250 users, big db) Latches/sec

Networking – Promon Support (VSTs too!) No Prefetch Prefetch, default settings Activity: Servers Total Messages received 1183 Messages sent 1181 Bytes received 89960 Bytes sent 561065 Records received 0 Records sent 1180 Queries received 1181 Time slices 0 Activity: Servers Total Messages received 98 Messages sent 96 Bytes received 7500 Bytes sent 490709 Records received 0 Records sent 1180 Queries received 96 Time slices 1149 Prefetch, 90% capacity Prefetch, 90% cap. & delay Activity: Servers Total Messages received 81 Messages sent 79 Bytes received 6208 Bytes sent 489568 Records received 0 Records sent 1180 Queries received 79 Time slices 1148 Activity: Servers Total Messages received 80 Messages sent 78 Bytes received 6132 Bytes sent 489495 Records received 0 Records sent 1180 Queries received 78 Time slices 1148