VII-1 Stratification Case study to illustrate alternative methods to stratify a sampling frame Dr. Will Yancey, CPA This material is the property of the.

Slides:



Advertisements
Similar presentations
Sampling Distributions
Advertisements

Agricultural and Biological Statistics
UPHLC National Education Seminar March 6 – 9, 2006 Sampling for Unclaimed Property Dr. Will Yancey, CPA Independent Consultant, Dallas, Texas (972)
DR. WILL YANCEY, CPA CATHLEEN BUCHOLTZ, CPA TRUE PARTNERS CONSULTING Audits Act III. Sampling 2009 Annual Conference.
Chapter 10 Sampling and Sampling Distributions
Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 2000 LIND MASON MARCHAL 1-1 Chapter Seven Sampling Methods and Sampling Distributions GOALS When you.
Why sample? Diversity in populations Practicality and cost.
Chapter 9 Audit Sampling: An Application to Substantive Tests of Account Balances McGraw-Hill/Irwin ©2008 The McGraw-Hill Companies, All Rights Reserved.
Slides by JOHN LOUCKS St. Edward’s University.
QMS 6351 Statistics and Research Methods Chapter 7 Sampling and Sampling Distributions Prof. Vera Adamchik.
Statistics 800: Quantitative Business Analysis for Decision Making Measures of Locations and Variability.
7-1 Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall Chapter 7 Sampling and Sampling Distributions Statistics for Managers using Microsoft.
BA 427 – Assurance and Attestation Services
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 9 Audit Sampling: An Application to Substantive Tests of Account Balances McGraw-Hill/IrwinCopyright © 2012 by The McGraw-Hill Companies, Inc.
Chapter 9 Audit Sampling – Part b.
©2006 Prentice Hall Business Publishing, Auditing 11/e, Arens/Beasley/Elder Audit Sampling for Tests of Details of Balances Chapter 17.
Chapter 9 Audit Sampling: An Application to Substantive Tests of Account Balances McGraw-Hill/Irwin Copyright © 2008 by The McGraw-Hill Companies, Inc.
©2003 Prentice Hall Business Publishing, Auditing and Assurance Services 9/e, Arens/Elder/Beasley Audit Sampling for Tests of Details of Balances.
©2010 Prentice Hall Business Publishing, Auditing 13/e, Arens//Elder/Beasley Audit Sampling for Tests of Details of Balances Chapter 17.
©2012 Pearson Education, Auditing 14/e, Arens/Elder/Beasley Audit Sampling for Tests of Details of Balances Chapter 17.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 3 – Descriptive Statistics
Chapter 9 Audit Sampling: An Application to Substantive Tests of Account Balances This presentation focuses (like my course) on MUS. It omits the effect.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Sample Size Determination CHAPTER Eleven.
Estimation of Statistical Parameters
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Copyright ©2011 Pearson Education 7-1 Chapter 7 Sampling and Sampling Distributions Statistics for Managers using Microsoft Excel 6 th Global Edition.
Applied Quantitative Analysis and Practices LECTURE#11 By Dr. Osman Sadiq Paracha.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal.
1 Chapter 7 Sampling and Sampling Distributions Simple Random Sampling Point Estimation Introduction to Sampling Distributions Sampling Distribution of.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin 9-1 Chapter Nine Audit Sampling: An Application to Substantive.
13-1 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 13 Measures.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Managerial Economics Demand Estimation & Forecasting.
BUS216 Spring  Simple Random Sample  Systematic Random Sampling  Stratified Random Sampling  Cluster Sampling.
 The mean is typically what is meant by the word “average.” The mean is perhaps the most common measure of central tendency.  The sample mean is written.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Sampling and Sampling Distributions Basic Business Statistics 11 th Edition.
Basic Business Statistics
9-1 Copyright © 2016 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
CHAPTER 2: Basic Summary Statistics
McGraw-Hill/Irwin © The McGraw-Hill Companies 2010 Audit Sampling: An Application to Substantive Tests of Account Balances Chapter Nine.
Confidence Intervals. Point Estimate u A specific numerical value estimate of a parameter. u The best point estimate for the population mean is the sample.
Chapter 8 Sampling Methods and the Central Limit Theorem.
©2012 Prentice Hall Business Publishing, Auditing 14/e, Arens/Elder/Beasley Audit Sampling for Tests of Details of Balances Chapter 17.
Variability. The differences between individuals in a population Measured by calculations such as Standard Error, Confidence Interval and Sampling Error.
Variability.
Chapter 9 Audit Sampling: An Application to Substantive Tests of Account Balances McGraw-Hill/Irwin ©2008 The McGraw-Hill Companies, All Rights Reserved.
Data Mining: Concepts and Techniques
Audit Sampling for Tests of Details of Balances
Audit Sampling for Tests of Details of Balances
Sample Size Determination
Statistics: The Z score and the normal distribution
Introductory Mathematics & Statistics
Statistical Process Control (SPC)
Sampling: Theory and Methods
Slides by JOHN LOUCKS St. Edward’s University.
Substantive Test Sampling
2. Stratified Random Sampling.
BUS7010 Quant Prep Statistics in Business and Economics
Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka
LESSON 4: MEASURES OF VARIABILITY AND PROPORTION
St. Edward’s University
Advanced Algebra Unit 1 Vocabulary
Presentation transcript:

VII-1 Stratification Case study to illustrate alternative methods to stratify a sampling frame Dr. Will Yancey, CPA This material is the property of the presenter and cannot be reproduced or used without the expressed Written consent of the presenter.

VII-2 Outline A. Why stratify? B. Coefficient of Variation (CV) C. High and Low Thresholds D. Number of strata E. Strata Boundary Determination Case study data for this presentation: 185,083 rows of purchase invoice line items.

VII-3 A. Why stratify? Parable of the Footballs and the Fish You are asked to determine the weight of 1,000 footballs. You know they are identical in weight. You can weigh only one football at a time. How many must you weigh? You are asked to determine the weight of 1,000 footballs. You know they are identical in weight. You can weigh only one football at a time. How many must you weigh? You are asked to determine the weight of 1,000 different fish taken from a lake. They are highly variable in weight. You can weigh only one fish at a time. How many must you weigh? You are asked to determine the weight of 1,000 different fish taken from a lake. They are highly variable in weight. You can weigh only one fish at a time. How many must you weigh?

VII-4 Parable continued How could we organize the fish so we could get a reasonable estimate without weighing them all? How could we organize the fish so we could get a reasonable estimate without weighing them all? What feature would we use to organize the fish? What feature would we use to organize the fish? What features would probably not be useful for estimating total weight? What features would probably not be useful for estimating total weight? How many piles should we have? How many piles should we have?

VII-5 Effective Stratification Effective stratification: If possible, what we are measuring is similar within each stratum and different between strata. Stratifying (grouping, categorization, segmenting, etc.) Grouping by account, type, division, or other attribute. Grouping by account, type, division, or other attribute. Stratifying by dollar amount within group. Stratifying by dollar amount within group. A sales and use tax audit goal is to estimate total tax dollar error. Correlation of invoice line amounts with taxability or errors: If an error occurs, it is proportional to invoice line amount If an error occurs, it is proportional to invoice line amount The relative frequency of error occurrence might or might not be correlated with invoice amount. The relative frequency of error occurrence might or might not be correlated with invoice amount.

VII-6 Accounts Payable Case Study Data 185,083 rows of invoice line items Range $0.01 to $26,763,476 $493 million total population base 4% of items with amount ≥ $10,000

VII-7 A/P Case Study: Distribution of $ > $10K 4% of items with amount ≥ $10,000 contain $376 of the $493 million in population base = 76%

VII-8 B. Coefficient of Variation (CV ) CV is a relative measure of the dispersion around the mean. Dollar stratification results in lower CV within each stratum than in the combined unstratified sampling frame. Caution: When the mean is close to zero, CV is very sensitive to small changes.

VII-9 CV, stratification, and precision Reducing CV usually improves precision. (Remember Parable of Footballs and Fish.) For each stratum compute the CV of the items’ invoice line amounts. For a specific total sample size and stratified random sampling, the best precision usually occurs when the CV are relatively constant across the strata. Consider adjusting strata boundaries or adding more strata to adjust CV across the strata. Consider adjusting strata boundaries or adding more strata to adjust CV across the strata.

VII-10 Case Study: Coefficient of Variation Lower Boun d ≥ Upper Bound < Size (count items) Standard Devi- ation σ Standard Devi- ation σ Mean μ Mean μ Coefficient of Variation CV = σ / μ 0 $27 million $27 million Unstratified 185,083 86,331 86,331 2,663 2, % 0 1,000 1,000146, % 1,000 1,000 2,000 2,00019, ,281 1,28120% 2,000 2,000 3,000 3,0003, ,435 2,43512%

VII-11 C. High and Low Thresholds All items with dollar amount greater than High Threshold (H) will be detailed (actual basis exam) rather than sampled. “This removal of the extremes from the main body of the population reduces the skewness and improves the normal approximation.” Cochran, Sampling Techniques, 3 rd Edition, p. 44.

VII-12 Setting High Threshold (H) also known as ceiling, detail threshold Approximately top 0.1% to 0.2% of items (or some other %). Approximately top 0.1% to 0.2% of items (or some other %). Greater than 3 standard deviations from the unstratified population mean. Greater than 3 standard deviations from the unstratified population mean. As H decreases, the number of items in the detail stratum increases. Items above H are from relatively few major vendors or major projects.

VII-13 Case Study: High Threshold If H is: If H is: Count ≥ H % Population Size ≥ H Base $ ≥ H Base $ ≥ H % Base $ ≥ H 1,000,000 1,000, % 128,545, ,545,01426% 500, , % 197,766, ,766,29240% 250, , % 223,256, ,256,30145% 100, , % 242,946, ,946,61449% 50,000 50, % 261,323, ,323,16653% Population Size = 185,083. Population Base = $492,953,742. Exhibits in this presentation: H = $100,000.

VII-14 Low Threshold (L) also known as Floor or Basement Accounting transaction data files have many small dollar items – particularly for purchases with invoice line items. Delivery charges, processing fees, etc. Delivery charges, processing fees, etc. Some sampling plans set a Low Threshold (L) such that every item below L is: a. Excluded (no change), or b. Minimum sample size, or c. Project results from other sampled strata onto the stratum below L.

VII-15 Low Threshold (L) - criteria Policy for setting L depends on what will be done with items below L. Possible criteria for setting a value for L a. Less than 1% or 2% of population dollars are below L (or some other %). b. Greater than 3 standard deviations below the unstratified population mean. c. Divide H by 1,000.

VII-16 Case Study: Low Threshold If L is: If L is: Count < L Count < L % Population Count < L Base $ < L Base $ < L % Base $ < L 10 7,320 7,3204%40, % 25 19,472 19,47211%248, % 50 37,231 37,23120%887, % ,128 65,12835%2,792, % ,275 90,27549%6,441, % Exhibits in this presentation: L = $100.

VII-17 D. Number of Sampled Strata Adding more strata Adding more strata Reduces CV within stratum. Reduces CV within stratum. Minimum sample size per stratum may result in total sample that exceeds budget. Minimum sample size per stratum may result in total sample that exceeds budget. More than 6 strata probably does not improve precision [Neter and Loebbecke, Behavior of Major Statistical Estimators in Sampling Accounting Populations, (AICPA, 1975)]. More than 6 strata probably does not improve precision [Neter and Loebbecke, Behavior of Major Statistical Estimators in Sampling Accounting Populations, (AICPA, 1975)]. Pragmatic approach: Start with 3 strata and then add or delete strata as needed to achieve desired precision, CV, or other criteria. Pragmatic approach: Start with 3 strata and then add or delete strata as needed to achieve desired precision, CV, or other criteria.

VII-18 E. Strata Boundary Determination Precision is a function of strata boundaries combined with other attributes in population and the sampling plan. Precision is a function of strata boundaries combined with other attributes in population and the sampling plan. Unless otherwise stated, the following case study shows: Unless otherwise stated, the following case study shows: Five strata = 3 sampled strata + Low + High Low Threshold (L) = 100 High Threshold (H) = 100,000

VII-19 Equal Population Size Nearly equal population size in sampled strata 2, 3, and 4 Stra-tum Lower Bound ≥ Upper Bound < % Pop. Size % Pop. Base $ CV 1 0 L = 100 L = %0.6%61.6% %1.5%29.6% %4.7%37.6% 4 H = 100,000 H = 100, %44.0%152.7% 5 H = 100,000 27,000,000 27,000,0000.2%49.3%276.6% Observe: CV varies greatly across strata 2, 3, and 4.

VII-20 Equal Population Base $ Nearly equal population base $ in sampled strata 2, 3, and 4 Stratum Lower Bound ≥ Upper Bound < % Pop. Size % Pop. Base $ CV 1 0 L = 100 L = %0.6%61.6% 2 5, , %16.7%115.1% 3 15, , %16.7%27.0% 4 H = 100,000 H = 100,0001.6%16.7%56.5% 5 27,000,000 27,000,0000.2%49.3%276.6% Observe: CV varies greatly across strata 2, 3, and 4.

VII-21 Cumulative Square Root (CSR) Method Developed by Tore Dalenius, a Swedish statistician, in the 1950’s with the warning that it will not do well with all distributions. Developed by Tore Dalenius, a Swedish statistician, in the 1950’s with the warning that it will not do well with all distributions. See numerical example in New York State CAA Manual, Publication 132, pdf, pages See numerical example in New York State CAA Manual, Publication 132, pdf, pages pdf pdf Cumulative square root method can be distorted when begin from zero and there are lots of small $ items (such as under $10). Cumulative square root method can be distorted when begin from zero and there are lots of small $ items (such as under $10). Mitigate by setting L threshold greater than zero. Mitigate by setting L threshold greater than zero.

VII-22 Cumulative Square Root with Zero Low Threshold L = zero. 4 sampled strata. 1 detail stratum. Stra-tum Lower Bound ≥ Upper Bound < % Pop. Size % Pop. Base $ CV 1 L = 0 L = %5.6%105.6% 2 5, , %11.1%57.6% 3 16, , %17.8%29.7% 4 H = 100,000 H = 100,0001.6%16.2%56.0% 5 H = 100,000 27,000,000 27,000,0000.2%49.3%276.6% Observe: CV varies greatly across strata 1, 2, 3, and 4.

VII-23 Cumulative Square Root with $100 Low Threshold L = 100. Between L and H has 3 sampled strata Stra-tum Lower Bound ≥ Upper Bound < % Pop. Size % Pop. Base $ CV L = 100 L = %0.6%61.6% 2 1, , %11.4%78.2% 3 13, , %17.2%56.6% 4 H = 100,000 H = 100,0002.5%21.5%60.7% 5 H = 100,000 27,000,000 27,000,0000.2%49.3%276.6% Observe: CV is closer across strata 2, 3, and 4. Setting an appropriate L has improved the stratification.

VII-24 Geometric Ratio Method Developed by Will Yancey with co-authors Jane Horgan and Patricia Gunning at Dublin City University in Ireland in Developed by Will Yancey with co-authors Jane Horgan and Patricia Gunning at Dublin City University in Ireland in Assumes population distribution declines at a relatively constant rate. Assumes population distribution declines at a relatively constant rate. Requires setting thresholds L and H. Requires setting thresholds L and H. R = H / L = 100,000 / 100 = 1,000 For J=3 strata: r = R ^ (1/J) = 1,000 ^ (1/3) = 10.0 For J=4 strata: r = R ^ (1/J) = 1,000 ^ (1/4) = 5.623

VII-25 Geometric Ratio with 3 sampled strata Ratio upper to lower boundary is r=10 in strata 2, 3, and 4. Stratum Lower Bound ≥ Upper Bound < % Pop. Size % Pop. Base $ CV 10 L = %0.6%61.6% 2 1, %6.4%67.6% 31,00010, %16.7%87.2% 410,000 H = 100, %27.1%65.5% 5 27,000,0000.2%49.3%276.6% Observe: CV is relatively similar across strata 2, 3, and 4.

VII-26 Geometric Ratio with 4 sampled strata Ratio upper to lower boundary is r=5.623 in strata 2, 3, 4, and 5. Stratum Lower Bound ≥ Upper Bound < % Pop. Size % Pop. Base $ CV 10 L = %0.6%61.6% %3.3%48.2% 35623, %10.4%46.0% 43,16217,7836.7%22.5%45.5% 517,783 H =100, %14.1%53.4% 6 27,000,0000.2%49.3%276.6% Observe: Adding more strata lowers the CV.

VII-27 Summary of Stratification Procedures 1. Set a High Threshold (H). 2. Set a Low Threshold (L). 3. Choose number of strata. 4. Set boundaries with a method. 5. Compute CV in each stratum. 6. Adjust by changing L, H, boundaries, adding or deleting strata.