Walking Through A Database Health Check

Walking Through A Database Health Check
QAD Midwest UG – Grand Rapids, MI Mike Furgal Director – DB and Pro2 Services October 9th, 2017 Progress

Progress Services Mike Furgal QAD Global Alliance Partner
Introduction Progress Services QAD Global Alliance Partner 50+ QAD Specialists Managed Database Services Much much more Mike Furgal Progress employee since 1989 Architect of the OpenEdge database Director of DB Services

How Healthy is your QAD environment
Best Practices Performance Metrics Reading Data Updating Data Memory Contention CPU Usage Disaster Recovery Plan

Base Configuration Parameters
Best Practices Reviewing Log Files Truncating Log Files Database Structure Base Configuration Parameters Progress Software

Log FIles Log files should not be large than 50 MB – Archive and truncate monthly Common Errors (5635) SYSTEM ERROR: -s exceeded (-----) TCP/IP write error occurred with errno 32 (-----) vv_flush:I/O error 5 on fd 1 (49) SYSTEM ERROR: Memory violation. (6072) SYSTEM ERROR: error writing, file (DBI File) You should monitor the log file daily or weekly

Database Structure Guidelines
5 to 10 extents per Storage Area 15%+ of free space to grow within each Storage Area Each Storage Area needs a variable overflow – just incase No User data (tables or indexes) should be in the Schema Area

After Imaging is a requirement for all production systems
After Imaging Enabled After Imaging is a requirement for all production systems After Imaging provides Point in Time recovery Restored Backup + Applied AI files = Point in Time Recovery Easy to turn on Easy to maintain Backup and AI Retention Requires some decisions

After Image Writer (AIW) Before Image Writer (BIW)
Background Processes After Image Writer (AIW) You need one Before Image Writer (BIW) Asynchronous Page Writer (APW) One should be sufficient Watchdog (WDOG)

Database Blocksize Miscellaneous
8K is best After Image and Before Image Blocksize 16K is best Large Files Enabled After Image memory and Before Image Memory Should match AI memory should NOT be 1.5 time BI memory – old information from Progress version 6.2

Best Practices Summary
Database Size (MB) DB Blocksize Large Files BI Blocksize AI Enabled AI Blocksize Ai Buffers BI Buffers Errors in Log Data in Schema Area mfgprd 242,826 8192 Yes 16384 100 200 tmsprd 114,442 50 qsxprd 8,534 No cpdprd 6,534 admprd 4,825 audprd 3,488 hlpprd 181 20 qxoprd 16 qxeprd 3

Performance Check CRUD vs DB Reads (Buffer Hit Ratio)
12,250 / 453 = 38 22 hour sample = too long

Performance Check Buffer Hit Ratio 10 Minute Sample
268,362 / 3,462 = 78 10 Minute Sample

Buffer Hit Ratio 10 Minute Samples

Database Reads Per Second
10 Minute Samples

What affects the Buffer Hit Ratio Memory Usage and Allocation
Database Scatter or Fragmentation Database Queries

Difficult to determine exactly
Memory Usage Need to allocate enough memory to hold the “working set” of the database in memory. Difficult to determine exactly Allocating 10% of database size to memory is typically a good starting point Requires running 64bit OpenEdge

Comes in 3 forms Database Scatter Record Fragmentation
Physical Scatter Logical Scatter DB Block 1 DB Block 2 DB Block 3 Record frag 1 Record frag 2 frag 3

Database Scatter Physical Scatter

Logical Scatter How ordered are the records by the most used index?
Database Scatter Logical Scatter How ordered are the records by the most used index? 1 250 167 40 407 424 177 124 456 321 286 46 37 390 244 259 202 135 336 359 480 108 83 274 462 394 224 30 341 441 451 95 51 316 172 104 32 374 150 323 219 282 436 23 62 400 75 427 473 458 67 41 125 326 198 309 152 213 284 255 234 65 385 267 248 337 439 147 383 460 364 474 419 39 170 465 405 154 252 80 414 191 268 197 144 411 164 84 130 60 123 461 377 307 272 11 221 450 76 351 3 109 290 128 380 118 429 74 345 116 63 121 18 357 468 106 420 434 295 155 475 301 141 22 397 437 183 300 449 218 233 347 261 187 196 9 143 146 20 17 24 331 243 303 45 452 192 269 79 256 422 239 199 7 367 356 153 432 446 56 157 131 194 312 457 242 349 102 348 190 49 260 201 245 291 158 85 204 388 416 319 174 181 279 391 6 220 113 280 225 58 138 229 477 313 373 200 103 69 217 235 230 15 289 276 72 188 264 50 298 61 402 184 318 470 71 270 241 119 127 163 148 360 262 297 142 251 165 314 384 173 294 305 315 342 13 471 363 186 472 35 410 403 404 296 479 110 333 185 476 382 33 401 317 205 379 418 93 231 240 73 285 171 136 425 57 226 206 393 166 232 114 408 343 339 223 160 370 31 381 101 389 327 392 21 212 409 358 145 53 406 299 426 365 362 78 448 350 98 92 322 175 54 90 96 354 70 105 431 126 249 273 43 34 330 36 328 338 447 361 412 454 100 2 320 423 26 210 387 263 162 208 139 292 19 428 247 87 52 463 209 133 430 455 180 195 398 266 111 134 467 435 372 308 156 310 168 469 169 369 5 253 81 112 48 258 215 352 444 4 288 86 120 466 47 12 478 443 395 216 324 55 340 203 334 287 332 417 355 413 386 129 64 91 459 438 16 265 464 97 227 42 346 325 421 44 306 182 277 238 302 179 371 376 214 82 77 8 353 28 246 396 207 211 433 378 14 271 25 140 335 68 445 151 375 59 161 283 88 329 453 122 38 293 415 189 99 159 137 193 117 27 237 115 94 236 222 304 178 89 176 281 440 29 278 254 275 10 107 66 368 344 442 366 149 311 257 399 132 228

Database Scatter Area (# Tables) Table Records Size Frag Factor
% Fragmented Scatter FIN (381) PUB.fcInstance 35,879,568 101.7G 1.3 29.30 1.0 Audit_Data (7) _aud-audit-data 109,006,237 21.8G 0.00 PUB.DocumentStorage 3,252,064 12.2G 1.2 15.50 MFG (867) PUB.uusg_det 50,394,220 9.2G PUB.PostingLine 22,331,654 4.1G PUB.spt_det 35,894,620 2.9G PUB.fcDaemonQueue 24,217,803 2.5G PUB.Posting 9,806,709 2.4G PUB.tr_hist 8,480,499 2.2G 0.40 PUB.fcSession 17,243,362 1.4G

CRUD To debug this issue, the –tablerangesize and –indexrangesize need to be set properly so CRUD stats can be collected from all tables using the _tablestat and _indexstat Virtual System Tables.

PUB.spt_det.spt_sim_elem 5 175,340 777.5M 57
Index Utilization Table.Index Fields Levels Blocks Size Utilization PUB.uusg_det.uusg_prod_date 4 428,908 3.0G 92 ._Audit-time 1 3 123,369 951.9M 99 PUB.uusg_det.uusg_date 124,392 852.4M 88 PUB.spt_det.spt_sim_elem 5 175,340 777.5M 57 PUB.uusg_det.uusg_sid_user 147,398 770.2M 67 PUB.spt_det.spt_sim_part 159,144 675.9M 55 PUB.uusg_det.oid_uusg_det 79,136 533.8M 87 PUB.spt_det.oid_spt_det 52,575 403.7M PUB.fcInstance.prim 2 58,706 358.5M 78 PUB.uusg_det.uusg_user_date 77,952 357.5M 59 Index spt_dey.spt_sim_elm is 777 MB, but uses 175,340 x 8K blocks or 1.40 GB of space on disk and more importantly in memory

This was all about Reading data
This was all about Reading data. Writing data is equally important and will be covered next.

Database Updates Speed of the disk Waits on IO Checkpoints

# proutil x –C truncate bi –bi 16384
Speed of the disk Do this test at home # proutil x demo # proutil x –C truncate bi –bi 16384 # time proutil x –C bigrow 2 –zextendSyncIO Do this both variable extent (as describe) and fixed extend Run multiple times to remove outliers (truncate in between runs) If the time to “bigrow” > 10 seconds, you have an IO problem

Waits on IO Promon – R&D – 2 – 5 BI Log Progress

Wait on IO This is time waiting for Before Image IO to happen. Increase –bibufs to make this go away. After Imaging has the same waits. Make sure –bibufs and –aibufs match. The memory is measured in Kilobytes, so be generous

Checkpoints Checkpointing is the periodic synchronization with the data in memory with the data on disk. It allows both smooth operation and predictable startup times. Checkpoint too frequently, it’s like a governer on the engine Checkpoint too infrequently and recovery times may be long Checkpoints between 1 and 5 minutes are desired

Promon – R&D – 3 – 4 Checkpoints
The Freq column tells you how often you are checkpoining. The value is in seconds

Checkpoints You can adjust the frequency by adjusting the Before Image Cluster Size Common values are 8 MB to 32 MB – the default is 512 KB, which is too small

Promon – R&D – 3 – 1 – Performance Indicators
Shared memory and CPU Promon – R&D – 3 – 1 – Performance Indicators

Latch Timeouts Goal is to be less than 10 per second. Adjust the –spin setting. If you have more than 20% idle CPU time, increase –spin to use it

There is much more to cover in latching Object Manager
Set it to 10,000 (-omsize) LRU Chain Skips Set it to 100 (-lruskips) Database Hot Spots Monitor to look for Alternative Buffer Pool opportunities

Recovery Time Objective (RTO)
Disastr Recovery Plan Terms Recovery Time Objective (RTO) How long it takes to recover from a disaster Recovery Point Objective (RPO) How much data are you willing to lose Are daily backups enough?

Disaster Recovery with OpenEdge Replication
Level 4 Offsite Storage of Backups and AI Level 3 After Imaging Level 2 Daily Backups Level 1

This provides you a high level overview of the Database Health Check
Summary This provides you a high level overview of the Database Health Check What was discussed here is typical for what we see in most QAD environments As always Your Mileage May Vary.

Walking Through A Database Health Check

Similar presentations

Presentation on theme: "Walking Through A Database Health Check"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Walking Through A Database Health Check

Similar presentations

Presentation on theme: "Walking Through A Database Health Check"— Presentation transcript:

Similar presentations

About project

Feedback