Chapter 4 Parallel Sort and GroupBy 4.1Sorting, Duplicate Removal and Aggregate 4.2Serial External Sorting Method 4.3Algorithms for Parallel External Sort.

Slides:



Advertisements
Similar presentations
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Advertisements

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Fall 2008Parallel Sorting1. Fall 2008Parallel Sorting2 Sort  Sorting operation is frequently used for database processing.  For example sorting may.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Chapter 2 Analytical Models 2.1Cost Models 2.2Cost Notations 2.3Skew Model 2.4Basic Operations in Parallel Databases 2.5Summary 2.6Bibliographical Notes.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting CS634 Lecture 10, Mar 5, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 11 External Sorting.
Chapter 16 Parallel Data Mining 16.1From DB to DW to DM 16.2Data Mining: A Brief Overview 16.3Parallel Association Rules 16.4Parallel Sequential Patterns.
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
FALL 2004CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Chapter 11 Grid Concurrency Control 11.1 A Grid Database Environment 11.2 An Example 11.3 Grid Concurrency Control (GCC) 11.4 Correctness of GCC 11.5 Features.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
Chapter 3 Parallel Search 3.1Search Queries 3.2Data Partitioning 3.3Search Algorithms 3.4Summary 3.5Bibliographical Notes 3.6Exercises.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
Chapter 7 Parallel Indexing 7.1Introduction 7.2Parallel Indexing Structures 7.3Index Maintenance 7.4Index Storage Analysis 7.5Parallel Search Query Algorithms.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Chapter 5 Parallel Join 5.1Join Operations 5.2Serial Join Algorithms 5.3Parallel Join Algorithms 5.4Cost Models 5.5Parallel Join Optimization 5.6Summary.
External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.
External Sorting Problem: Sorting data sets too large to fit into main memory. –Assume data are stored on disk drive. To sort, portions of the data must.
Database Management 9. course. Execution of queries.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
Sorting.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 13: Query Processing.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
1 External Sorting. 2 Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing gpa order.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
CS4432: Database Systems II Query Processing- Part 3 1.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 13 – Query Evaluation.
Lecture 24 Query Execution Monday, November 28, 2005.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
Query Processing CS 405G Introduction to Database Systems.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapters 13: 13.1—13.5.
CS 540 Database Management Systems
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
External Sorting. Why Sort? A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing gpa order Sorting.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
Chapter 10 The Basics of Query Processing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved External Sorting Sorting is used in implementing.
CS 440 Database Management Systems
Selected Topics: External Sorting, Join Algorithms, …
Lecture 2- Query Processing (continued)
Chapter 12 Query Processing (1)
Evaluation of Relational Operations: Other Techniques
CENG 351 Data Management and File Structures
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Lecture 11: B+ Trees and Query Execution
External Sorting Dina Said
Presentation transcript:

Chapter 4 Parallel Sort and GroupBy 4.1Sorting, Duplicate Removal and Aggregate 4.2Serial External Sorting Method 4.3Algorithms for Parallel External Sort 4.4Parallel Algorithms for GroupBy Queries 4.5Cost Models for Parallel Sort 4.6Cost Models for Parallel GroupBy 4.7Summary 4.8Bibliographical Notes 4.9Exercises

4.1. Sorting, Duplicate Removal and Aggregate Sorting is expressed by the ORDER BY clause in SQL Duplicate remove is identified by the keyword DISTINCT in SQL Basic aggregate queries: Scalar aggregates - produce a single value for a given table Aggregate functions - produce a set of values D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

GroupBy Groups by specific attribute(s) and performs an aggregate function for each group 4.1. Sorting, Duplicate Removal and Aggregate (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

4.2. Serial External Sorting External sorting assumes that the data does not fit into main memory Most common external sorting is sort-merge Break the file up into unsorted subfiles, sort the subfiles, and then merge the subfiles into larger and larger sorted subfiles until the entire file is sorted D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Example File size to be sorted = 108 pages, number of buffer = 5 pages Number of subfiles = 108/5 = 22 subfiles (the last subfile is only 3 pages long). Read, sort and write each subfile Pass 0 (merging phase), we use B-1 buffers (4 buffers) for input and 1 buffer for output Pass 1: read 4 sorted subfiles and perform 4-way merging (apply a need k-way algorithm). Repeat the 4-way merging until all subfiles are processed. Result = 6 subfiles with 20 pages each (except the last one which has 8 pages) Pass 2: Repeat 4-way merging of the 6 subfiles like pass 1 above. Result = 2 subfiles Pass 3: Merge the last 2 subfiles Summary: 108 pages and 5 buffer pages require 4 passes 4.2. Serial External Sorting (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Example Buffer size plays an important role in external sort 4.2. Serial External Sorting (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

4.3. Parallel External Sort Parallel Merge-All Sort Parallel Binary-Merge Sort Parallel Redistribution Binary-Merge Sort Parallel Redistribution Merge-All Sort Parallel Partitioned Sort D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Merge-All Sort A traditional approach Two phases: local sort and final merge Load balanced in local sort Problems with merging: Heavy load on one processor Network contention 4.3. Parallel External Sort (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Binary-Merge Sort Local sort similar to traditional method Merging in pairs only Merging work is now spread to pipeline of processors, but merging is still heavy 4.3. Parallel External Sort (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Binary-Merge Sort Binary merging vs. k-way merging In k-way merging, the searching for the smallest value among k partitions is done at the same time In binary merging, it is pairwise, but can be time consuming if the list is long System requirements: k-way merging requires k files open simultaneously, but the pipeline process in binary merging requires extra overheads 4.3. Parallel External Sort (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Redistribution Binary-Merge Sort Parallelism at all levels in the pipeline hierarchy Step 1: local sort Step 2: redistribute the results of local sort Step 3: merge using the same pool of processors Benefit: merging becomes lighter than without redistribution Problem: height of the tree D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Redistribution Merge-All Sort Reduce the height of the tree, and still maintain parallelism Like parallel merge-all sort, but with redistribution The advantage is true parallelism in merging Skew problem in the merging D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Partitioned Sort Two stages: Partitioning stage and Independent local work Partitioning (or range redistribution) may raise load skew Local search is done after the partitioning, not before No merging is necessary Main problem: Skew produced by the partitioning D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Partitioned Sort Bucket tuning: produce more buckets than the available processors Bucket tuning does not work in parallel sort, because in parallel sort, the order of processor is important Bucket tuning for load balancing will later be used in parallel join D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

4.4. Parallel GroupBy Traditional methods (Merge-All and Hierarchical Merging) Two-phase method Redistribution method D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Traditional Methods Step 1: local aggregate in each processor Step 2: global aggregation May use a Merge-All or Hierarchical method Need to pay a special attention to some aggregate functions (AVG) when performing a local aggregate process 4.4. Parallel GroupBy (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Two-Phase Method Step 1: local aggregate in each processor. Each processor groups local records according to the groupby attribute Step 2: global aggregation where all temp results from each processor are redistributed and then final aggregate is performed in each processor 4.4. Parallel GroupBy (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Redistribution Method Step 1 (Partitioning phase): redistribute raw records to all processors Step 2 (Aggregation phase): each processor performs a local aggregation 4.4. Parallel GroupBy (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

4.5. Cost Models for Parallel Sort Additional cost notations D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Serial External Merge-Sort I/O cost components are load and save costs Load cost is the cost for loading data from disk to main memory Save cost is the cost of writing data from the main memory to the disk, which is identical to load cost equation 4.5. Cost Models for Parallel Sort (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Serial External Merge-Sort CPU cost components are select, sorting, merging, and generation result costs Select cost is the cost for obtaining a record from the data page Sorting cost is the internal sorting cost which has O(N x log 2 N) Merging cost is applied to pass 1 onward Generating result cost is determined by the number of records being generated or produced in each pass before they are written to disk multiplied 4.5. Cost Models for Parallel Sort (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Merge-All Sort Local merge sort costs are I/O costs, CPU costs, and Communication costs I/O costs consist of load and save costs CPU costs consist of select, sorting, merging and generating results costs Communication costs for sending local sorted results to the host: 4.5. Cost Models for Parallel Sort (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Merge-All Sort Final merging costs are communication, I/O, and CPU costs Communication cost is the receiving cost from local sorting operators I/O cost is the load and save costs CPU cost is the select, merging, and generating results costs 4.5. Cost Models for Parallel Sort (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Binary-Merge Sort The costs consist of local merge-sort costs, and pipeline merging costs The local merge-sort costs are exactly the same as those of parallel merge-all sort, since the local sorting phase in both methods is the same Hence, focus on pipeline merging costs In pipeline merging, we need to determine the number of levels, which is log 2 (N) In level 1, the number of processors used is up to half (N’=N/2) The skew equation is then: 4.5. Cost Models for Parallel Sort (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Binary-Merge Sort Costs for level 1: where R’ indicates the number of records being processed at a node in a level of pipeline merging 4.5. Cost Models for Parallel Sort (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Binary-Merge Sort In the subsequent levels, the number of processors is further reduced by half. The new N’ value becomes N’ = N’ / 2. This also impact the skew equation At the last level of pipeline merging, the host performs a final binary merging, where N’ = 1 The total pipeline binary merging costs are: The values of R’ I and |R’ I | are not constant throughout the pipeline, but increase from level to level as the number of processors N’ is reduced by half when progressing from one level to another 4.5. Cost Models for Parallel Sort (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Redistribution Binary-Merge Sort Local merge-sort costs, and pipeline merging costs Local sort operation is similar to the previous two parallel sorts, but the temp results are being redistributed, which incurs additional overhead The compute destination cost is: where R i may involve data skew Pipeline merging costs are also similar to the those without redistribution Differences: number of processors involved in each level, where all processors participate. Hence we use R i and |R i |, and not R’ i and |R’ i |; and the compute destination cost are applicable to all levels in the pipeline 4.5. Cost Models for Parallel Sort (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Redistribution Binary-Merge Sort The pipeline merging costs are: 4.5. Cost Models for Parallel Sort (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Redistribution Merge-All Sort Local merge-sort costs and merging costs Local merge-sort costs are the same as those of parallel redistribution binary-merge sort with compute destination costs Merging costs are similar to those of parallel merge-all sort, except one main difference. Here we use R i and |R i |, not R and |R| The merging costs are then: 4.5. Cost Models for Parallel Sort (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Partitioned Sort Scanning/partitioning costs, and local merge-sort costs Scanning and partitioning costs involve I/O, CPU, and communication costs I/O costs consist of load cost: CPU costs consist of select costs: Communication costs consist of data transfer costs: 4.5. Cost Models for Parallel Sort (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Partitioned Sort The local merge-sort costs are similar to other local merge-sort costs, except communication costs are associated with data received from the first phase Communication cost for receiving data: I/O costs which are load and save costs: CPU costs are: 4.5. Cost Models for Parallel Sort (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

4.6. Cost Models for Parallel GroupBy Additional cost notations D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Two-Phase Method Phase 1: Local aggregation Scan cost: Select cost: Local aggregation cost: Reading/Writing of overflow buckets: Generating result records cost: Determining the destination cost: Data transfer cost: 4.6. Cost Models for Parallel GroupBy (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Two-Phase Method Phase 2: Consolidation (Merging) The number of records arriving at a processor: The first term is the number of selected records from the 1st phase The second term is the table size of the selected records Receiving records cost: Computing final aggregation value cost: Generating final result cost: Disk cost for storing the final result: 4.6. Cost Models for Parallel GroupBy (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Redistribution Method Phase 1: Distribution/Partitioning The scan and selection costs are the same as for those in the two- phase method Scan cost: Select cost: Apart from these two costs, the finding destination cost and the data transfer cost are added to this model Finding destination cost: Data transfer cost: If the number of groups is less than the number of processors, then R i = R / (Number of groups), instead of R i = R / N 4.6. Cost Models for Parallel GroupBy (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Parallel Redistribution Method Phase 2: Aggregation Receiving records cost: Only selected attributes are involved (  ) Computing aggregation cost: It does not include , because we take into account the number of records, not the record size Reading/Writing of overflow buckets cost: where s is the overall GroupBy selectivity ratio (  =  p x  g ) Generating final result cost: Disk cost: 4.6. Cost Models for Parallel GroupBy (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

4.7. Summary Sorting and duplicate removal are expressed in ORDER BY and DISTINCT in SQL Parallel algorithms for database sorting Parallel merge-all sort, parallel binary-merge sort, parallel redistribution binary-merge sort, parallel redistribution merge-all sort, and parallel partitioned sort Cost models for each parallel sort algorithm Buffer size Parallel redistribution algorithm is prone to processing skew If processing skew degree is high, then use parallel redistribution merge-all sort. If both data skew and processing skew degrees are high or no skew, then use parallel partitioned sort D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

4.7. Summary (cont’d) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008 Parallel groupby algorithms Traditional methods (merge-all and hierarchical methods) Two-phase method Redistribution method Two-phase and Redistribution methods perform better than the traditional and hierarchical merging methods Two-phase method works well when the number of groups is small, whereas the Redistribution method works well when the number of groups is large

Continue to Chapter 5…