The “55M End-User Programmers” Estimate Revisited Christopher Scaffidi.

Slides:



Advertisements
Similar presentations
Usage Statistics in Context: related standards and tools Oliver Pesch Chief Strategist, E-Resources EBSCO Information Services Usage Statistics and Publishers:
Advertisements

PUBLICATIONS BOARD REPORT Joe Konstan SGB Publications Advisor.
Critical Reading Strategies: Overview of Research Process
Chapter 1 Business Driven Technology
CSE594 Fall 2009 Jennifer Wong Oct. 14, 2009
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
SEM II : Marketing Research
Early Effort Estimation of Business Data-processing Enhancements CS 689 November 30, 2000 By Kurt Detamore.
G. Alonso, D. Kossmann Systems Group
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Who Are the “End Users”? Mary Shaw Carnegie Mellon University.
Carving up the Space of End User Programming EUSES, Lincoln, NE, Oct ‘05.
1. Scopus Update November 2004 American University of Beirut Presented by:Amanda Hart Date: 11 November 2004.
Estimating the Numbers of End Users and End User Programmers Christopher Scaffidi Brad Myers Mary Shaw Carnegie Mellon University EUSES Consortium VL/HCC.
Empirically Assessing End User Software Engineering Techniques Gregg Rothermel Department of Computer Science and Engineering University of Nebraska --
MSc Software Engineering Dissertation Finding a Research Problem and Additional Guidance Stewart Green.
School of Computing and Mathematical Sciences
CS350/550 Software Engineering Lecture 1. Class Work The main part of the class is a practical software engineering project, in teams of 3-5 people There.
Administrivia Turn in ranking sheets, we’ll have group assignments to you as soon as possible Homeworks Programming Assignment 1 due next Tuesday Group.
The New Economy, High Tech Industries and the Role/Limits of State Economic Development Policy.
Introduction to Communication Research
Understanding Sampling Non Probability Sampling Lecture 13 th.
United Nations Statistics Division Overview. Overview  Of the many classifications in the Family, five reference classifications will be discussed at.
Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.
‘ KNOWLEDGE MANAGEMENT ’ ACCORDING TO B USINESS S CIENCE Omwoyo Bosire Onyancha University of South Africa Department of Information Science.
Software Engineering Experimentation Software Engineering Specific Issues (Mostly CS as well) Jeff Offutt
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
Unit 12 Employability and Career Development
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
Profile and a quick introduction Software Engineering: ) هندسة البرمجيات (in Arabic: is the branch of computer science Designed to develop a set rules.
Lecture 1 What is Modeling? What is Modeling? Creating a simplified version of reality Working with this version to understand or control some.
1SAS 03/ GSFC/SATC- NSWC-DD System and Software Reliability Dolores R. Wallace SRS Technologies Software Assurance Technology Center
Could You Use More Traffic?. If you’re like most marketers, the answer to this question is… YES!
CSCE 548 Secure Software Development Test 1 Review.
LIS Forum: Professional Dias in ICT Era. BY Dr. M.G. Basole Librarian, KRM Mahila College, Vazirabad Nanded , Maharashtra, India Shri Ranjeet G.
Research in Computing สมชาย ประสิทธิ์จูตระกูล. Success Factors in Computing Research Research Computing Knowledge Scientific MethodAnalytical Skill Funding.
SPSS Presented by Chabalala Chabalala Lebohang Kompi Balone Ndaba.
Thomas HeckeleiPublishing and Writing in Agricultural Economics1 Publishing and Writing in Agricultural Economics Promotionskolleg Agrarökonomik 1Introduction.
SCSC 311 Information Systems: hardware and software.
PAPER PRESENTATION: EMPIRICAL ASSESSMENT OF MDE IN INDUSTRY Erik Wang CAS 703.
StatLine 4 metadata implementation Edwin de Jonge Statistics Netherlands.
Research Paper Assignment CS 435 Winter, As an important part of the course requirement, each student will participate in a group project to prepare.
Keys to Successful Marketing  Must understand and meet customer needs and wants  To meet customer needs, marketers must collect information.
CHAPTER 2 Statistical Inference, Exploratory Data Analysis and Data Science Process cse4/587-Sprint
Creating Usable Data Usable Data and “Actionable” Information Jonathan Callahan Mazama Science M AZAMA S CIENCE Data – Information – Knowledge.
1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.
Enabling Reuse-Based Software Development of Large-Scale Systems IEEE Transactions on Software Engineering, Volume 31, Issue 6, June 2005 Richard W. Selby,
Increasing Efficiency in Data Collection Processes Arie Aharon, Israel Central Bureau of Statistics.
1 Document Writing and Presentations. 2 Writing reports and project documentation u Approaches to writing u Writing style u References u Other tips u.
Technical Science Scientific Tools and Methods Tables and Graphs.
Chapter 27 The Engineering Design Process. Learning Objectives Describe the various factors that are changing the design process Discuss the steps in.
1-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Introduction to Engineering Instructor Background Student Backgrounds Course Expectations – Session Reading Assignments – Homework Assignments – Engineering.
1 - 1 The Art of Modeling With Spreadsheets Stephen G. Powell Kenneth R. Baker © John Wiley and Sons, Inc. PowerPoint Slides Prepared By: Tava Olsen Washington.
INFO 4990: Information Technology Research Methods Guide to the Research Literature Lecture by A. Fekete (based in part on materials by J. Davis and others)
Estimation by Intervals Confidence Interval. Suppose we wanted to estimate the proportion of blue candies in a VERY large bowl. We could take a sample.
Continuous Improvement. Start Simple and Continually Improve E.g., Gmail Labels 1.
SEM II : Marketing Research 1.06 Collect secondary marketing data to ensure accuracy and adequacy of information for decision making.
Co-funded by the European Union Ref. number: LLP FI-ERASMUS-ENW WP2: Identification of Industrial Needs for Open innovation Education in.
IDENTIFYING THE INTERESTS OF DIFFERENT GENERATIONS These are the questions used for questionnaire to find out the interests of different generations.
Component D: Activity D.3: Surveys Department EU Twinning Project.
Introduction BIM Data Mining.
CSE594 Fall 2009 Jennifer Wong Oct. 14, 2009
SEM II : Marketing Research
Career Paths in Computing
How do I research effectively? Part 2
YOUR FUTURE AND INFORMATION TECHNOLOGY
Technical Science Scientific Tools and Methods
CSE594 Fall 2009 Jennifer Wong Oct. 14, 2009
Moshe Farjoun (1998) Strategic Management Journal
Presentation transcript:

The “55M End-User Programmers” Estimate Revisited Christopher Scaffidi

1.Introduction to a Popular Estimate 2.The Estimation Method –55M End-User Programmers in Extending the Method –90M End-Users in 2012 –A Survey of End-User Abstraction Practices 4.Conclusion 55M End-User Programmers > Table of Contents Table of Contents

55 Million End-User Programmers by 2005 “End-User” = –“The ultimate consumer of a product, especially the one for whom the product has been designed.” (Dictionary) –“People who are not employed as programmers” (citation on next slide) “Programmers” = –People who act “to create an application that serves some function” (Nardi, A Small Matter of Programming) –Researchers often use the term to include creators of spreadsheets. 55M End-User Programmers > Introduction to a Popular Estimate Introduction to a Popular Estimate

Context: –The authors of this conference paper added more abstraction capabilities to Excel, to boost Excel’s utility Usage: –“The number of end-user programmers in the U.S. alone is expected to reach 55 million by 2005, as compared to only 2.75 million professional programmers” Appeared in: –S. Jones, A. Blackwell, and M. Burnett. A User-Centered Approach To Functions in Excel. Proceedings of the 8th ACM SIGPLAN International Conference on Functional Programming, ACM Press, 2003, pp M End-User Programmers > Introduction to a Popular Estimate Example #1 of Estimate’s Usage

Context: –The magazine author discusses a grant awarded by NSF for research on improving the reliability of spreadsheets Usage: –“Experts estimate the number of so-called 'end-user programmers’ to reach 55 million by 2005,” said NSF spokesperson David Hart… “Nearly half of the programs created by these end-users have nontrivial bugs.” Appeared in: –Mike Martin, New Program Exterminates End-User Bugs. CIO Today, NewsFactor Network, June 9, M End-User Programmers > Introduction to a Popular Estimate Example #2 of Estimate’s Usage

Used in many places –Journal articles –Conference papers –Workshop papers –Grant applications? –Trade magazines –Web sites Used to make an important point –There are a lot of end-user programmers (in fact, many more than professional programmers). –Therefore they are a significant group of programmers. –Therefore we should not neglect their needs. 55M End-User Programmers > Introduction to a Popular Estimate Introduction to a Popular Estimate

First appeared in COCOMO 2.0 –COCOMO is a cost estimation model from Boehm et al. –Extended into COCOMO 2.0 (late 1990’s) modern practices COCOMO 2.0 is for professionals (not end-users) –How many people would/wouldn’t benefit from COCOMO 2.0? –To answer this, Boehm estimated projections of… # of professional programmers (2.75M by 2005) # of end-user programmers (55M by 2005) –B. Boehm et al. Cost Models for Future Software Life Cycle Processes: COCOMO 2.0. Annals of Software Engineering Special Volume on Software Process and Product Measurement (J. Arthur and S Henry, eds), J.C. Baltzer AG, Science Publishers, Amsterdam, The Netherlands, –Also widely disseminated through a book by Boehm in 2000, as well as IEEE Software. 55M End-User Programmers > The Estimation Method The Estimation Method

Steps to generate the estimate –Get the Bureau of Labor Statistics (BLS) occupation projections for M End-User Programmers > The Estimation Method Occupational CategoryProjected # workers (2005) Managerial and Professional million Technical, Sales, Administration Service And so forth…

The Estimation Method Steps to generate the estimate –Get the Bureau of Labor Statistics (BLS) occupation projections for 2005 –Get the BLS computer usage rates by occupation for 1989 (which were actual data from a survey, not a projection) 55M End-User Programmers > The Estimation Method Occupational CategoryHow many used computers at work (1989) Managerial and Professional56.2% Technical, Sales, Administration55.1% Service10.2% And so forth…

The Estimation Method Steps to generate the estimate –Get the Bureau of Labor Statistics (BLS) occupation projections for 2005 –Get the BLS computer usage rates by occupation for 1989 (which were actual data from a survey, not a projection) –Multiply occupation projections by computer usage rates and total up Sum of all end-user programmers turns out to be -----> 55 M 55M End-User Programmers > The Estimation Method Occupational Category2005 Proj1989 Rate# Users Managerial and Professional M56.2% M Technical, Sales, Administration Service And so forth…

The Estimation Method Steps to generate the estimate –Get the Bureau of Labor Statistics (BLS) occupation projections for 2005 –Get the BLS computer usage rates by occupation for 1989 (which were actual data from a survey, not a projection) –Multiply occupation projections by computer usage rates and total up –Bottom line = 55M end-user programmers in M End-User Programmers > The Estimation Method

Extending the Method Main inherent approximations –Computer usage rates by occupation will remain constant from 1989 through 2005 –All end-users are programmers 55M End-User Programmers > Extending the Method

Extending the Method Main inherent approximations –Computer usage rates by occupation will remain constant from 1989 through 2005 –All end-users are programmers Address these by –Using additional data to estimate how usage rates have grown –Developing a classification of end-users to capture their continuum of programming-like activities 55M End-User Programmers > Extending the Method

Approximation #1: Constant Usage Rates New computer usage rate data became available –Boehm based his estimate on usage rates measured in 1989 –BLS also measured those rates in 1984, 1993, and 1997 A valid approximation? –Not very –Usage rates have grown substantially for each of the occupational categories studied by BLS –In fact, in 1997, there were already around 64M end-users 55M End-User Programmers > Extending the Method

Approximation #1: Constant Usage Rates Interesting curve shape –Most of these curves (especially the lower ones) seem to have an S-shape trending to a horizontal asymptote 55M End-User Programmers > Extending the Method

Approximation #1: Constant Usage Rates Innovation diffusion theory to the rescue –Researchers have realized that innovations diffuse through populations like diseases. –They have studied various functional forms for describing this. –The simplest form (and most generally applicable) is S-shaped –J. Teng, V. Grover, and W. Güttler. Information Technology Innovations: General Diffusion Patterns and Its Relationships To Innovation Characteristics. Transactions on Engineering Management, Vol. 49, No. 1, February 2002, pp M End-User Programmers > Extending the Method

Approximation #1: Constant Usage Rates Projecting the computer usage rates –The S-shaped functional form had 3 free parameters (K, m, b) –We have 4 measurements from BLS (1984, 1989, 1993, 1997) –So we can fit to functional form for each occupation category –(Note that with so few points, “goodness of fit” means little.) A somewhat better estimate –Get the BLS’s latest occupation projection (which happens to be for the year 2012) –Plug in t=2012 to forecast future computer usage rates –Multiply and sum as Boehm did –Result: 90M end-users in M End-User Programmers > Extending the Method

90M End-Users in 2012 This uses a different approximation than Boehm’s –He assumed 1995 usage rates would equal 1989 usage rates. –We assume 2012 usage rates are predictable using a simple fit to the innovation diffusion function. 55M End-User Programmers > Extending the Method

90M End-Users in 2012 This uses a different approximation than Boehm’s –He assumed 1995 usage rates would equal 1989 usage rates. –We assume 2012 usage rates are predictable using a simple fit to the innovation diffusion function. Implication of using our assumption –Fairly questionable assumption! On-going improvements in computers will probably drive adoption still higher. –Therefore, 90M is probably something of a lower bound. 55M End-User Programmers > Extending the Method

You keep using that word. I do not think it means what you think it means. --Inigo Montoyo, Princess Bride What Does “Programmer” Mean? 55M End-User Programmers > Extending the Method

Approximation #2: All End-Users Program Usefulness of a big scalar number –55M or 90M is a number with no structure –Thus, it can only be used to argue, “This sure is big.” Usefulness of a collection of numbers –Can we break down the estimate into smaller groups? –Doing this right could help guide research and development. 55M End-User Programmers > Extending the Method

Approximation #2: All End-Users Program Usefulness of a big scalar number –55M or 90M is a number with no structure –Thus, it can only be used to argue, “This sure is big.” Usefulness of a collection of numbers –Can we break down the estimate into smaller groups? –Doing this right could help guide research and development. Possible categorizations –By industry (e.g.: shipping, manufacturing, transportation, …) –By occupation (e.g.: secretary, accountant, manager, …) –By education (e.g.: K-12, college, professional, …) –By technology skills (e.g.: Java, Oracle, HTML forms, …) –By enduring programming skills (e.g.: abstraction mastery, …) 55M End-User Programmers > Extending the Method

Approximation #2: All End-Users Program In building tools, researchers focus on abstractions –Helping end-users represent abstractions as functions: S. Jones, A. Blackwell, and M. Burnett. A User-Centered Approach to Functions in Excel. Proceedings of the 8th ACM SIGPLAN International Conference on Functional Programming, ACM Press, 2003, pp –Helping end-users map domain models to web app models: K. Kim, J. Carroll, M. Rosson. An Empirical Study of Web Personalization Assistants Supporting End-Users in Web Information Systems. IEEE 2002 Symposia on Human Centric Computing Languages and Environments, September 2002, pp –Helping end-users identify abstractions from examples: M. Balaban, E. Barzilay, M. Elhadad. Abstraction as a Means for End-User Computing in Creative Applications. IEEE Transactions on Systems, Man and Cybernetics, Part A, Vol. 32, No. 6, November 2002, pp –Helping end-users model abstractions in general: F. Paternò. From Model-based to Natural Development. Proceedings HCI International 2003, Universal Access in HCI, pp M End-User Programmers > Extending the Method

Approximation #2: All End-Users Program What abstraction issues are important? –We now have an improved estimate of how many end-users. –Actually, we also have surveys of what software they use. –We don’t have any survey of what abstractions they are using. –So what abstractions are important for new tools to address? 55M End-User Programmers > Extending the Method

Approximation #2: All End-Users Program What abstraction issues are important? –We now have an improved estimate of how many end-users. –Actually, we also have surveys of what software they use. –We don’t have any survey of what abstractions they are using. –So what abstractions are important for new tools to address? Study users’ needs and practices before building –That’s part of what I argued (in a business context) during my practicum talk last fall. –Why not apply it to research, too? 55M End-User Programmers > Extending the Method

Anticipated Work for 2005 Phase 1: Informal survey of abstraction practices –About to go live (~ Feb 7) –On-line aspects handled by partner, Information Week –Ask about usage of abstraction-oriented programming features Referencing data vs making copies (e.g.: using variables) Encapsulating reusable algorithms (e.g.: using functions) Representing common structures (e.g.: using data structures) –Ask about usage of other good programming practices Documentation Back-ups Testing –Ask about usage of the web Source/destination of documentation Source/destination of data Source/destination of other artifacts –Ask about background (for use as explanatory variables) 55M End-User Programmers > Extending the Method

Anticipated Work for 2005 Phase 2: Direct the survey at a controlled sample –We have IRB approval through June 1 –Tentative sample is marketing professionals They program with numbers, text, and rich text… very diverse. They likely program more than most end-users (  upper bound). Other options suggested by researchers: accounting & operations. –We’ll tweak the survey based on Information Week feedback. Phase 3: Target subgroups with interviews –Tentative dates: Fall –Just because people use a programming feature doesn’t mean that they actually understand the abstraction behind it. –Just because people don’t use a feature doesn’t mean they wouldn’t value it if it were implemented better. –Interviews let us “get under the hatch” into these issues. 55M End-User Programmers > Extending the Method

Conclusion “55M End-User Programmers” is a popular estimate –It makes the point that end-user programming is an important area of research! The estimate embodies two main approximations –Constant computer usage rates –All end-users are programmers We can begin to remove these approximations –Model adoption rates using innovation diffusion theory New estimate: 90M end-users in 2012 –Study end-users according to a classification scheme Use surveys and interviews to get guidance on research 55M End-User Programmers > Conclusion

The most powerful productivity strategy is to equip line workers with generalized programs and then to turn them loose. The same strategy, with generalized mathematical, statistical and programming capabilities will work for scientists. --Paraphrased from “No Silver Bullet: Essence and Accidents of Software Engineering”, Frederick Brooks, Computer Magazine, April 1987 Any Questions? 55M End-User Programmers > Summary

Context: –The author of this workshop paper describes why existing model-driven development approaches do not work well for end-user programmers Usage: –“Studies report that by 2005 there will be 55 million end-users, compared to 2.75 million professional users” Appeared in: –F. Paternò. From Model-based to Natural Development. Proceedings HCI International 2003, Universal Access in HCI, pp M End-User Programmers > Introduction to a Popular Estimate Example #3 of Estimate’s Usage

–Screenshot taken from B. Boehm et al. Cost Models for Future Software Life Cycle Processes: COCOMO 2.0. Annals of Software Engineering Special Volume on Software Process and Product Measurement (J. Arthur and S Henry, eds), J.C. Baltzer AG, Science Publishers, Amsterdam, The Netherlands, –Also widely disseminated through a book by Boehm in 2000, as well as IEEE Software. 55M End-User Programmers > The Estimation Method The Estimation Method