How Are Spreadsheet Templates Used in Practice: A Case Study on Enron

Slides:



Advertisements
Similar presentations
Validating and Improving Test- Case Effectiveness Yuri Chernak Presented by Michelle Straughan.
Advertisements

8. Introduction to Spreadsheet CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson Department of Computer and Information Science,
A spreadsheet is like a big table. It contains rows and columns which work together. Left-click to go to the next slide.
Lesson 14 Creating Formulas and Charting Data
MS Excel: Interface CSE1520 Erich Leung Fall 2009 The functionality of MS Excel 2003 and Excel 2007 are similar as far as the lab assignments are concerned.
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
Adaptability of learning objects by appropriate knowledge representation Anastas Misev Institute of Informatics Faculty of Natural Science and Mathematics.
Tutorial 7: Using Advanced Functions and Conditional Formatting
July 11 th, 2005 Software Engineering with Reusable Components RiSE’s Seminars Sametinger’s book :: Chapters 16, 17 and 18 Fred Durão.
RIT Software Engineering
SE 450 Software Processes & Product Metrics 1 Defect Removal.
Pour plus de modèles : Modèles Powerpoint PPT gratuitsModèles Powerpoint PPT gratuits Page 1 INTRODUCTION TO EXCEL LANDMARK UNIVERSITY COLLEGE.
Software Quality Assurance
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Industrial Application.
Spreadsheets and Business. Why the popularity of spreadsheets? “End user Computing” Long delays for IS department to do analysis and reports Ease-to-use,
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,
28/08/2015SJF L31 F21SF Software Engineering Foundations ASSUMPTIONS AND TESTING Monica Farrow EM G30 Material available on Vision.
Lecture 1 What is Modeling? What is Modeling? Creating a simplified version of reality Working with this version to understand or control some.
IMSS005 Computer Science Seminar
Microsoft Excel 2003 Spreadsheets. What is a spreadsheet Excel is a spreadsheet program that allows you to organise, analyse and attractively present.
Verification and Validation Overview References: Shach, Object Oriented and Classical Software Engineering Pressman, Software Engineering: a Practitioner’s.
Word Processing Notes: Mail Merge Understand business documents.2 Mail Merge Example Letter shows Merge Fields (placeholders) Letter is Personalized.
1 Performing Spreadsheet What-If Analysis Applications of Spreadsheets.
Definitions. Cell: Cell: Space in the intersection of a column (vertical division) and a row (horizontal division). Row: Row: A row runs horizontally.
Unit 5 Evidence Name: Corey jones. 1.What software did you use? When doing my unit 5 spreadsheet I used Microsoft excel. 2. Why did you use this rather.
C O L L E G E S U C C E S S ™ What Denton High Students Need to Know about the PSAT/NMSQT Critical Reading & Writing Skills COMING SOON! OCTOBER 17 th.
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
Chapter 6.  If a cell style will be used over and over again it can be modified in the cell styles gallery  Home ⇒ Cell Styles ⇒ right-click a style.
1 Microsoft Excel An Introduction to Spreadsheets Lecture 18.
Plotting in Microsoft Excel. 1) Enter your data into the Excel spreadsheet in table format. Your data should have column headers, row headers and data.
SOFTWARE ENGINEERING MCS-2 LECTURE # 4. PROTOTYPING PROCESS MODEL  A prototype is an early sample, model or release of a product built to test a concept.
Software Architectural Assumptions in Software Architecting Chen Yang a,b, Peng Liang a, Paris Avgeriou b a State Key Lab of Software Engineering, Wuhan.
Microsoft Excel P.6 Computer Studies Chapter 1 – Introduction of Microsoft Excel What is Microsoft Excel? Microsoft Excel is a software for.
QM Spring 2002 Statistics for Decision Making Excel for Statistics: An Overview.
Spreadsheet Engineering Builders use blueprints or plans – Without plans structures will fail to be effective Advanced planning in any sort of design can.
Chapter 3: Referencing and Names Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
Unit 5 Evidence Name: Corey jones. 1.What software did you use? When doing my unit 5 spreadsheet I used Microsoft excel. 2. Why did you use this rather.
Analysis Introduction Data files, SPSS, and Survey Statistics.
Excel 2007 Part (3) Dr. Susan Al Naqshbandi
Fast Reproducing Web Application Errors Jie Wang, Wensheng Dou, Chushu Gao, Jun Wei Institute of Software Chinese Academy of Sciences
CHAPTER 17 INTRODUCTION TO SPREADSHEETS. SPREADSHEETS Application Software designed to aid users in entering, moving,copying, labeling, displaying and.
1 Planted-model evaluation of algorithms for identifying differences between spreadsheets Anna Harutyunyan, Glencora Borradaile, Christopher Chambers,
UNIT 7: Using Excel in the Law Office. This Week’s Assignment Seminar Test 2 discussion questions.
TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Data Preparation: An Introduction to Getting Data Ready for Analysis.
Is Spreadsheet Ambiguity Harmful? Detecting and Repairing Spreadsheet Smells due to Ambiguous Computation Wensheng Dou 1, Shing-Chi Cheung 2, Jun Wei 1.
Chapter 3: Referencing and Names Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
1 - 1 The Art of Modeling With Spreadsheets Stephen G. Powell Kenneth R. Baker © John Wiley and Sons, Inc. PowerPoint Slides Prepared By: Tava Olsen Washington.
Data The fact and figures that can be recorded in system and that have some special meaning assigned to it. Eg- Data of a customer like name, telephone.
Web Page Creation Standard Grade Computing. WWW n The World Wide Web is a collection of information held in multimedia form on the Internet. n This information.
CTS130 Spreadsheet Lesson 6 Working with Math & Trig, Statistical, and Date & Time Functions.
Metadata requirements for archiving structured data Alice Born Statistics Canada Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (9-11 April.
CACheck: Detecting and Repairing Cell Arrays in Spreadsheets
Exploring Excel Chapter 8 The Expert User:
Mail Merge for Lotus Notes and Excel User Guide
VEnron A Versioned Spreadsheet Corpus and Related Evolution Analysis
Mail Merge for Lotus Notes and Excel User Guide
Detecting Table Clones and Smells in Spreadsheets
Verification and Validation Overview
Causal Analysis Report Project : <xyz>
Shuai Wang, Wensheng Dou, Chushu Gao, Jun Wei, Tao Huang
: Clone Refactoring Davood Mazinanian Nikolaos Tsantalis Raphael Stein
Microsoft Office Illustrated Introductory, Windows XP Edition
A Comprehensive Study on Real World Concurrency Bugs in Node.js
Detecting Faulty Empty Cells in Spreadsheets
Expandable Group Identification in Spreadsheets
iSRD Spam Review Detection with Imbalanced Data Distributions
Data Tables and Arrays.
Lesson 2: Cell References
Actively Learning Ontology Matching via User Interaction
Correct document structure Easy for authors and accessible to readers
Presentation transcript:

How Are Spreadsheet Templates Used in Practice: A Case Study on Enron ESEC/FSE 2018 New Ideas and Emerging Results How Are Spreadsheet Templates Used in Practice: A Case Study on Enron Liang Xu, Wensheng Dou, Jiaxin Zhu, Chushu Gao, Jun Wei, Tao Huang Institute of Software, Chinese Academy of Sciences University of Chinese Academy of Sciences

Motivating example Store trade data on October 4th 10-4 Alice 10-5 Store trade data on October 5th

Motivating example In October, Alice needs to create 31 similar worksheets Alice I designed a template

Predesigned template Contain necessary data layout and formulas Data placeholders instead of actual data Placeholder Data layout Computational logic Enter new data

Instance template Reuse an existing worksheet rather than designing a new template Actual data 10-4 Clean original data

Benefits of using spreadsheet templates Reduce users’ efforts Reuse data layout Reuse computational logic Avoid errors in writing formulas Users can focus on entering necessary data

The usage of spreadsheet templates can cause issues Lack of understanding how templates are used in practice Errors in templates can be propagated to their instances without users’ noticing Formula error Template propagate This cell is omitted Instance

Research questions RQ1: Are the predesigned templates well-designed? RQ2: How are errors introduced during template usage? RQ3: What is the difference between usages of two kinds of templates?

   Methodology overview Study subject: Enron corpus [1] Predesigned template identification  Instance template identification  Template and instance comparison Study subject: Enron corpus [1] Real-life spreadsheets used in Enron Corporation [1] F. Hermans and E. Murphy-Hill, “Enron’s Spreadsheets and Related Emails: A Dataset and Analysis,” in Proceedings of the 37th IEEE International Conference on Software Engineering (ICSE), 2015, pp. 7–16.

   Methodology overview Identify predesigned templates by keywords Predesigned template identification Instance template identification  Template and instance comparison  Identify predesigned templates by keywords “Template” “Blank” Identify corresponding instances in the same spreadsheet

   Methodology overview Assumption Randomly sample 100 spreadsheets Predesigned template identification Template and instance comparison  Instance template identification  Randomly sample 100 spreadsheets Identify similar worksheets Recover their creation order Chronological order e.g., “Jan 2001” and “Feb 2001” Assumption Two adjacent worksheets Template: created earlier Instance: created later Instance Template

   Methodology overview Predesigned template identification Instance template identification  Template and instance comparison  Template Instance Predesigned template 47 490 Instance template 168

   Methodology overview Predesigned template identification Instance template identification  Template and instance comparison  Template Errors Analysis Template design issues compare Root cause Changes Instance

RQ1: Are the predesigned templates well-designed? Incomplete table layout E.g., miss some necessary headers Insert the same headers at the same location Template Instances

RQ1: Are the predesigned templates well-designed? Common changes in creation of instances E.g., miss a data row Insert a data row at the same location Template Instances

RQ1: Are the predesigned templates well-designed? Missing formulas or formula errors Template Missing formulas Instance

RQ1: Are the predesigned templates well-designed? Implication: Template designers should carefully check the completeness and correctness of predesigned templates Not well designed Uncomplete table layout 8% Common change 19% Missing formulas 30% Formula errors 6% Total 40% Finding: 40% of predesigned templates have design issues

RQ2: How are errors introduced? Improper use of template E.g., cell type changes Unnecessary formula caused by cell type change Template Instances Change data cells to label cells

RQ2: How are errors introduced? Implication: A template-based error detection approach is promising Error propagation 560 missing formula errors and 38 formula errors are propagated from templates Finding: Errors in templates can be propagated to their instances

RQ3: What is difference between two kinds of template usage? Implication: An effective data cleaning approach can be helpful Data cleaning is common during usage of instance templates In 52% of cases, more than 50% of data cells were cleaned 101 missing formulas were caused by data cleaning Formulas usually look like data Missing formulas caused by data cleaning Instance template Instances Finding: Data cleaning can cause errors in the usage of instance templates

Summary We perform the first empirical study on the design and usage of spreadsheet templates used in practice 47 predesigned templates (490 instances) 168 template and instance pairs 40% of predesigned templates have design issues Improper use of templates and error propagation are main reasons for error introduction Data cleaning can cause errors in the usage of instance templates TmplEnron http://www.tcse.cn/~wsdou/project/TmplEnron

Thank you!