Estimating the Numbers of End Users and End User Programmers Christopher Scaffidi Brad Myers Mary Shaw Carnegie Mellon University EUSES Consortium VL/HCC.

Slides:



Advertisements
Similar presentations
Cost Behavior and Cost-Volume-Profit Analysis
Advertisements

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Balanced Assessment in the Classroom. Balanced Assessment Learning Objectives will answer the following essential questions: What is balanced assessment?
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
Challenges, Motivations, and Success Factors in the Creation of Hurricane Katrina "Person Locator" Web Sites Christopher Scaffidi, Brad Myers, Mary Shaw.
Situational Scan - Advancing Arizona’s Educational Attainment AzAIR Conference - Prescott April 4, 2008.
1 The Role of the Revised IEEE Standard Dictionary of Measures of the Software Aspects of Dependability in Software Acquisition Dr. Norman F. Schneidewind.
Leading the way to open data clarity Inaugural Public Sector benchmark survey on Open Data - February 2013 Media slides.
1 The (“Sampling”) Distribution for the Sample Mean*
Unit 4 Microeconomics: Business and Labor Chapters 9.1 Economics Mr. Biggs.
Who Are the “End Users”? Mary Shaw Carnegie Mellon University.
Carving up the Space of End User Programming EUSES, Lincoln, NE, Oct ‘05.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Dimensions Characterizing Programming Feature Usage by Information Workers Christopher Scaffidi, Andrew Ko, Brad Myers, Mary Shaw Carnegie Mellon University.
Forecasting and Short-Term Financial Planning
1 Prepared for SSAC by Semra Kilic-Bahi, Colby-Sawyer College, New London NH Modified by Fred Annexstein © The Washington Center for Improving the Quality.
Tool Support for Data Validation by End-User Programmers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.
Chapter Three THE RESEARCH PROCESS
Toped: Enabling End-User Programmers to Validate Data Chris Scaffidi, Brad Myers, Mary Shaw, Carnegie Mellon University, School of Computer Science,
1 4. Multiple Regression I ECON 251 Research Methods.
Statistics for Managers Using Microsoft® Excel 7th Edition
Pension Payment Level Estimation in the New Rural Pension System Reportor : CHEN Xiaojie North China Electric Power University, Beijing, China.
mankiw's macroeconomics modules
Estimating Software Size Part I. This chapter first discuss the size estimating problem and then describes the PROBE estimating method used in this book.
Unit 12 Employability and Career Development
Chapter 8 Introduction to Hypothesis Testing. Hypothesis Testing Hypothesis testing is a statistical procedure Allows researchers to use sample data to.
Fractions Chapter Two.
Introduction to Statistical Inferences
ECONOMIC OVERVIEW Janet Kelly Urban Studies Institute University of Louisville.
Bellwork 10-27/28 Take out your project slip from last class. – “Teenagers and Dating” – For your bellwork answer the first 3 questions on the paper. Use.
Lecture 1 What is Modeling? What is Modeling? Creating a simplified version of reality Working with this version to understand or control some.
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
Addressed Based Sampling as an Alternative to Traditional Sampling Approaches: An Exploration May 6, 2013.
Labor Market Trends Chapter 9, section 1.
Multimedia Developer Herbert Anthony Colon MUM 2702, Professor Calle Miami Dade College Spring 2007 Herbert Anthony Colon MUM 2702, Professor Calle Miami.
Economic Growth I CHAPTER 7.
Software Estimation and Function Point Analysis Presented by Craig Myers MBA 731 November 12, 2007.
Career Opportunities in Information Technology There are four main categories of IT jobs, grouped by the main focus of the job: Sales and support Software.
A student guide To completing Level 1 & 2 portfolios.
Toward Improving the Quality of Labor Force Data from the American Community Survey Thomas Palumbo Housing and Household Economic Statistics Division U.S.
Chapter 12 Sample Surveys *Sample *Bias *Randomizing *Sample Size.
1 Graduation Rates: Students Who Started 9 th Grade in 2000, 2001, and 2002.
Tuesday, February 26 th Career Pathways and Economic Prosperity Winning in the Human Age.
The “55M End-User Programmers” Estimate Revisited Christopher Scaffidi.
Zoran Bohaček Croatian Quants Day 22. II Business reasons to analyse a credit registry database.
1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.
Increasing Efficiency in Data Collection Processes Arie Aharon, Israel Central Bureau of Statistics.
Technical Science Scientific Tools and Methods Tables and Graphs.
SCHOOL FINANCE TOOLKIT “MINING THE FORMULA” Dr. Bill Caruthers, Supt., Silo Public Schools.
Estimating “Size” of Software There are many ways to estimate the volume or size of software. ( understanding requirements is key to this activity ) –We.
1 The Mortality of China’s Oldest Old: Comparisons from the Healthy Longevity Survey (HLS) and the 2000 Census Daniel Goodkind International Programs.
1 Planted-model evaluation of algorithms for identifying differences between spreadsheets Anna Harutyunyan, Glencora Borradaile, Christopher Chambers,
1 Ex Libris Alma TCO/ROI April Why Use Financial Tools like ROI?
Interacting with consumer Software Engineering. So far… What is Software Engineering? Different software process models waterfall, incremental, spiral.
 Work involved performing all of the tasks necessary to produce goods and provide services that meet human needs.  Many people begin work at 18 and.
Working with Cross-Section Time-Series Data Sometimes data has cross-section and time-series dimensions For example, consider following a group of firms,
Creating User Interfaces Qualitative vs Quantitative research. Sampling. Panels. Homework: Post proposal & work on user observation study. Next week:Review.
Rick Walker Evaluation of Out-of-Tolerance Risk 1 Evaluation of Out-of-Tolerance Risk in Measuring and Test Equipment Rick Walker Fluke - Hart Scientific.
Potential financial motivations for end-user programming Christopher Scaffidi.
SHAPE your School Mental Health System!
Data Analysis.
FW364 Ecological Problem Solving Class 6: Population Growth
Potential financial motivations for end-user programming
A Stitch in Time Saves Nine
A Data Model to Help End Users Shape Effective Software
Lesson 5 Computer-Related Issues
Why does sampling work?.
Technical Science Scientific Tools and Methods
Software Maintenance Part1 Introduction. Outlines What Is Software Maintenance Purposes of Maintenance Why We Need It Maintenance Difficilties Some Tips.
中国在新旧行业分类标准 对照及资料转换方面的经验
Presentation transcript:

Estimating the Numbers of End Users and End User Programmers Christopher Scaffidi Brad Myers Mary Shaw Carnegie Mellon University EUSES Consortium VL/HCC ’05, Sep 23, 2005

The number of end-user programmers in the U.S. alone is expected to reach 55 million by 2005, as compared to only 2.75 million professional programmers. The Old 55M Estimate

The number of users in U.S. businesses is expected to exceed 90 million by 2012, including over 55 million users of spreadsheets and/or databases, as compared to under 3 million professional programmers. Our New 90M Estimate

1.The Basic 55M Estimation Method –55M End User Programmers in Extending the Method –90M Users in 2012 –55M Spreadsheet and/or Database Users in Conclusions Outline

First appeared in COCOMO (circa 1995) –COCOMO is Boehm’s model for estimating the cost of developing software applications How many people would benefit from COCOMO? –To answer this, Boehm projected… # of professional programmers (2.75M in 2005) # of end user programmers (55M in 2005) –. History and Purpose of the 55M Estimate

Step #1: Project Worker Counts for 2005 Steps to generate the estimate 1.Get the Bureau of Labor Statistics (BLS) occupation projections for 2005 Occupational CategoryProjected # workers (2005) Managerial and Professional million Technical, Sales, Administration Service And so forth…

Step #2: Estimate what Fraction of Workers Use the Computer Steps to generate the estimate 1.Get the Bureau of Labor Statistics (BLS) occupation projections for Get the BLS computer usage rates by occupation for 1989 Occupational CategoryHow many used computers at work (1989) Managerial and Professional56.2% Technical, Sales, Administration55.1% Service10.2% And so forth…

Step #3: Multiply and Sum Up Steps to generate the estimate 1.Get the Bureau of Labor Statistics (BLS) occupation projections for Get the BLS computer usage rates by occupation for Multiply worker projections by computer usage rates Sum turns out to be -----> 55 M Occupational Category2005 Proj1989 Rate# Users Managerial and Professional M56.2% M Technical, Sales, Administration Service And so forth…

Step #4: Apply Adjustments Steps to generate the estimate 1.Get the Bureau of Labor Statistics (BLS) occupation projections for Get the BLS computer usage rates by occupation for Multiply worker projections by computer usage rates 4.Finally, adjust upward to account for rising usage rates, and adjust downward because not all users are programmers. Boehm originally relied on judgment to provide adjustments. –The two adjustments actually ended up canceling out!

Our Paper Provides Better Adjustments Adjustment #1: Rising Usage Rates –Use innovation diffusion to model rising usage rates. –We also extend the estimates to Adjustment #2: Not Everybody Programs –Be precise about what aspect of “programming” to address. –We can focus on spreadsheet/database users. –We can focus on users who self-reportedly “do programming.” –Each of these groups vastly outnumbers professionals.

Adjustment #1: Rising Usage Rates We incorporated additional BLS data –1984 –1989 (the only year used in old 55M estimate) –1993 –1997

Adjustment #1: Rising Usage Rates

Innovation diffusion theory to the rescue –Innovations diffuse through populations like diseases. –Researchers studied various functional forms for modeling this. –The simplest form (and most generally applicable) is S-shaped

Adjustment #1: Rising Usage Rates Projecting the computer usage rates –The S-shaped functional form had 3 free parameters (K, m, b) –We have 4 measurements from BLS (1984, 1989, 1993, 1997) –So we can fit to functional form for each occupation category –(Note that with so few points, “goodness of fit” means little.)

Adjustment #1: Rising Usage Rates Projecting the computer usage rates –The S-shaped functional form had 3 free parameters (K, m, b) –We have 4 measurements from BLS (1984, 1989, 1993, 1997) –So we can fit to functional form for each occupation category –(Note that with so few points, “goodness of fit” means little.) A somewhat better estimate –Get the BLS’s latest occupation projection (which happens to be for the year 2012) –Plug in t=2012 to forecast future computer usage rates –Multiply and sum as Boehm did –Result: 90M users in 2012

Validation Does it match 2001 BLS count of workplace users? –BLS modified their questions slightly in 2001 –Our fit predicts 71.9M users; actual = 72.3M –Incorporating this 2001 BLS data into our fit raises our estimate for 2012 from 90M users to 96M users Does it match 2003 Forrester count? –They found 129M users (work or home) age –Our fit predicts 80M workplace users for 2003 –Use BLS 2001 to adjust for age, add in home (non-work) users –Our result for comparison: a little over 123M (to their 129M) Excellent match.

Examining Assumptions We replace one assumption for another. –Old assumption: based on judgment –New assumption: applicability of innovation diffusion Implication of using our assumption –Questionable assumption! Ongoing improvements in computers will probably drive adoption still higher. –Therefore, 90M is probably a lower bound.

Adjustment #2: Not All Users Program One big count (of all users) isn’t too helpful. –It can only be used to argue, “This sure is big.” Relative usefulness of a collection of numbers –Not all users have the same needs, strengths, and goals! –How can we break down the estimate into smaller groups, to guide research and development?

Adjustment #2: Not All Users Program One approach: Group users by application usage. In 2001, BLS asked how workers use computers. –Total of 72M people used computers at work. –Over 60% of total (45M) used spreadsheets or databases. –About 15% of total (11M) said they “do programming.”

Adjustment #2: Not All Users Program One approach: Group users by application usage. In 2001, BLS asked how workers use computers. –Total of 72M people used computers at work. –Over 60% of total (45M) used spreadsheets or databases. –About 15% of total (11M) said they “do programming.” Carrying this forward to yield 2012 lower-bounds... –Total of 90M people will use computers at work. –Over 60% of total (55M) will use spreadsheets or databases. –About 15% of total (13M) will say they “do programming.” –BLS projects only 3M professional programmers. Our Extended Method

Conclusions New estimates for American workplaces in 2012: –At least 90M users –At least 55M spreadsheet and/or database users –About 13M users will say they “do programming” –Fewer than 3M professional programmers Our estimates are based on improved adjustments: –Model adoption rates using innovation diffusion theory –Group users according to how they use computers

Thank You To VL/HCC for the opportunity to present To NSF, Sloan, and NASA for funding To Barry Boehm for discussions of his 55M estimate