Detecting Table Clones and Smells in Spreadsheets

Slides:



Advertisements
Similar presentations
ADMINISTRATION Information Technology for Administrators SPREADSHEETS Click To Continue.
Advertisements

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Chapter 07: Lecture Notes (CSIT 104) 1111 Exploring Microsoft Office Excel 2007 Chapter 7 Data Consolidation, Links, and Formula Auditing.
7-1 IS 2101 Spring 2010 Chapter 7 Managing Workbooks and Analyzing Data.
Exploring Office Grauer and Barber 1 Committed to Shaping the Next Generation of IT Experts. Chapter 1 – Introduction to Excel: What is a Spreadsheet?
Chap 4 Formulas and Functions Exploring Spreadsheet Software.
Tutorial 7: Using Advanced Functions and Conditional Formatting
SIM5102 Software Evaluation
1 Convolution and Its Applications to Sequence Analysis Student: Bo-Hung Wu Advisor: Professor Herng-Yow Chen & R. C. T. Lee Department of Computer Science.
1 CS110: Lecture 2 Spreadsheets Prepared by Fred Annexstein University of Cincinnati CC Some rights reserved Today’s Topics Basics of Excel Spreadsheets.
Accelerated Computer Technologies Company Overview.
Spreadsheet in excel o Spreadsheet in excel o Uses of spreadsheet o Advantages Prepared by: Yusra Waseem 8 th C.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Industrial Application.
Tutorial 8: Working with Advanced Functions
XFindBugs: eXtended FindBugs for AspectJ Haihao Shen, Sai Zhang, Jianjun Zhao, Jianhong Fang, Shiyuan Yao Software Theory and Practice Group (STAP) Shanghai.
Calculations & Graphics using Spreadsheet ADE100- Computer Literacy Lecture 17.
Towards Automatic Structured Web Data Extraction System Tomas Grigalis, 2nd year PhD student Scientific supervisor: prof. habil. dr. Antanas Čenys.
Exploring Excel 2003 Revised - Grauer and Barber 1 Committed to Shaping the Next Generation of IT Experts. Chapter 1 – Introduction to Excel: What is a.
Excel Spreadsheet basics. Excel Sheets and Books  Spreadsheet: tool to analyze, chart and manage data for personal, business and financial use Worksheet:
Numeric Processing Chapter 6, Exploring the Digital Domain.
Exploring Engineering Chapter 3, Part 2 Introduction to Spreadsheets.
CMCD: Count Matrix based Code Clone Detection Yang Yuan and Yao Guo Key Laboratory of High-Confidence Software Technologies (Ministry of Education) Peking.
Unit 4, Lesson 3 Creating Worksheet Formulas. Objectives Understand formulas. Understand formulas. Create a formula. Create a formula. Identify and correct.
# 1# 1 Error Messages, VLookup, Practical Tips What use is VLookup? How do you error check in Excel? CS 105 Spring 2010.
Automatic Identification of Bug-Introducing Changes. Presenter: Haroon Malik.
CS&E 1111 ExFormulas Building a Spreadsheet Model: Cell Addressing in Excel Objectives: l To understand how formulas are copied l Relative & Absolute Cell.
Simulation is the process of studying the behavior of a real system by using a model that replicates the behavior of the system under different scenarios.
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Copyright © 2008 Pearson Prentice Hall. All rights reserved Copyright © 2008 Prentice-Hall. All rights reserved. Committed to Shaping the Next.
Spreadsheet Engineering Builders use blueprints or plans – Without plans structures will fail to be effective Advanced planning in any sort of design can.
ESD.70J Engineering Economy Module - Session 01 ESD.70J Engineering Economy Fall 2009 Session Zero Michel-Alexandre Cardin – Prof. Richard.
Excel 2007 Part (3) Dr. Susan Al Naqshbandi
Chapter 4. This presentation covers the following: – Worksheets – Workbooks – Rows and Columns – Cells – Ranges – Relative referencing – Absolute referencing.
Is Spreadsheet Ambiguity Harmful? Detecting and Repairing Spreadsheet Smells due to Ambiguous Computation Wensheng Dou 1, Shing-Chi Cheung 2, Jun Wei 1.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Microsoft Excel 2013 Chapter 9 Formula Auditing, Data Validation, and Complex Problem Solving.
CONSTRUCTING BASIC FORMULAS AND USING FUNCTIONS Microsoft Excel.
IT Security CS5493(74293). IT Security Q: Why do you need security? A: To protect assets.
1 Chapter 5: Essential Formulae in Project Appraisal A Coverage of the Formulae and Symbols Used to Evaluate Investment Projects.
Presented by Lu Xiao Drexel University Quantifying Architectural Debt.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
A PRELIMINARY EMPIRICAL ASSESSMENT OF SIMILARITY FOR COMBINATORIAL INTERACTION TESTING OF SOFTWARE PRODUCT LINES Stefan Fischer Roberto E. Lopez-Herrejon.
Exploring Office Grauer and Barber 1 Committed to Shaping the Next Generation of IT Experts. Chapter 1 – Introduction to Excel: What is a Spreadsheet?
CACheck: Detecting and Repairing Cell Arrays in Spreadsheets
ME 142 Engineering Computation I
Spreadsheet Engineering
Spreadsheet Engineering
Office tool for creating tables and charts
Excel IF Function.
Testing and Debugging PPT By :Dr. R. Mall.
VEnron A Versioned Spreadsheet Corpus and Related Evolution Analysis
Verification and Validation
Excel Adrressing and Linking
Unit 4, Lesson 3 Creating Worksheet Formulas
Computer Fundamentals
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
○Yuichi Semura1, Norihiro Yoshida2, Eunjong Choi3, Katsuro Inoue1
: Clone Refactoring Davood Mazinanian Nikolaos Tsantalis Raphael Stein
CSc4730/6730 Scientific Visualization
A Comprehensive Study on Real World Concurrency Bugs in Node.js
Detecting Faulty Empty Cells in Spreadsheets
MS Excel Scaffolding START.
Expandable Group Identification in Spreadsheets
How Are Spreadsheet Templates Used in Practice: A Case Study on Enron
Absolute and Relative cell referencing
Formatting vs. Rounding
Lesson 1 - Visualizing Dependencies
Review of Previous Lesson
Biostatistics Lecture (5).
Presentation transcript:

Detecting Table Clones and Smells in Spreadsheets Foundations of Software Engineering (FSE 2016), Seattle Detecting Table Clones and Smells in Spreadsheets Wensheng Dou, Shing-Chi Cheung, Chushu Gao, Chang Xu, Liang Xu, Jun Wei

Cloning in Spreadsheet Development How? Similar report Search Copy & Paste New data Fix formulas New report

Table Table: a rectangular block of numerical cells Table Sheet Q1 Not parts of a table … real example extracted from EUSES spreadsheet corpus

Table Clone Table Clone: two tables have the same computational semantics Sheet Q1 Same semantics! Sheet Q2

Clone-Related Smell Inconsistencies among table clones can be indications of potential smells Total responses are $B$7 Sheet Q2 Inconsistency Sheet Q3 Total responses must be 30, and never change!

Semantic Smell Clone-related smells can introduce errors when their input values change All cells give wrong values! Sheet Q3 3 31 If total responses change to 31

Existing Smell Detectors (1) No warnings are issued by Excel Syntactic smell detectors [1][2] (e.g., multiple operations) cannot detect clone-related smells No syntactic smells! Sheet Q3 [1] F. Hermans, et, al., “Detecting and Visualizing Inter-worksheet Smells in Spreadsheets”, ICSE 2012. [2] F. Hermans, et, al., “Detecting Code Smells in Spreadsheet Formulas”, ICSM 2012.

Existing Smell Detectors (2) CACheck[1] and CUSTODES[2] aggregate cells into clusters according to formula similarity Cell cluster with the same formula pattern Sheet Q2 Two correct clusters, no smells! Sheet Q3 Cell cluster with the same formula pattern [1] W. Dou, et, al., “CACheck: Detecting and Repairing Cell Arrays”, TSE 2016. [2] S.C. Cheung, et, al., “CUSTODES: Automatic Spreadsheet Cell Clustering and Smell Detection Using Strong and Weak Features”, ICSE 2016.

Our Goal Find tables with the same computational semantics Detect clone-related smells among table clones table1 table2 table3

Our Goal - Challenges Find tables with the same computational semantics Detect clone-related smells among table clones table1 No records indicate copy & paste table2 table3 Not all inconsistencies indicate smells

Our Key Insight Cell headers represent cells’ computational semantics Monthly : % Responses

Our Key Insight Tables with the same headers would be likely to be clones Sheet Q1 Same Headers Sheet Q2

Which Headers can be Used? Not all levels of headers are created equal Only First-level headers are used to detect clones Sheet Q1 Higher-level headers First-level headers Same Diff Higher-level headers First-level headers Sheet Q2

How to Find Table Clones? Two tables are likely a table clone if all their corresponding cells have the same headers Weekly : Responses Table clone

Inconsistency among Table Clones Not all inconsistencies indicate smells Which cells are smelly? Monthly responses / Total (C4/$C$7) Monthly responses / 30 (B4/30)

Detect Smells as Outliers As smelly cells normally occur in minority, they can be detected as outliers Monthly responses / Total (C4/$C$7 or B4/$B$7) Monthly responses / 30 (B4/30)

TableCheck Implementation One color for each clone group Mark smells with comments of referenced cells Sheet Q1 Clone Referenced Cells Sheet Q3

Evaluation Subject All EUSES spreadsheets with formulas [1], 1617 spreadsheets Manually validate all detected table clones and smells Do they have the same headers? Do they have the same computational semantics? Can smells be fixed by inspecting their referenced cells? [1] M. Fisher et al., “The EUSES spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet dependability mechanisms,” SIGSOFT Softw Eng Notes, 2005.

How Common are Table Clones? (RQ1) 21.8% spreadsheets contain confirmed table clones Category Spreadsheets Has Clone Confirmed Confirmed/Spreadsheets cs101 8 2 25.0% database 200 58 54 27.0% filby 1 0.0% financial 358 100 96 26.8% forms3 18 3 16.7% grades 282 57 52 18.4% homework 277 56 53 19.1% inventory 278 72 68 24.5% jackson n.a. modeling 190 25 21 11.1% personal 5 4 60.0% Total 1,617 377 352 21.8%

How Common are Smells? (RQ2) 5.6% spreadsheets contain clone-related smells 14.6% table clones contain smells 33.6% smelly cells contain wrong values (harmful) Category Spreadsheets Table Clones Smells All Smelly Error cs101 8 2 database 200 16 205 46 1,441 767 filby 1 financial 358 24 383 59 780 66 forms3 18 5 grades 282 11 183 17 267 19 homework 277 10 124 13 45 33 inventory 278 21 231 305 67 jackson modeling 190 77 6 personal 4 7 Total 1,617 90 (5.6%) 1,214 177 (14.6%) 2,892 971 (33.6%)

Is TableCheck Precise? (RQ3) The precision for table clone detection is 92.2% The precision for smell detection is 85.5% Category Table clones Smells Detected True Precision cs101 2 100.0% database 217 205 94.5% 1,524 1,441 94.6% filby - financial 396 383 96.7% 821 780 95.0% forms3 5 grades 202 183 90.6% 289 267 92.4% homework 145 124 85.5% 56 45 80.4% inventory 253 231 91.3% 637 305 47.9% jackson modeling 92 77 83.7% 46 97.8% personal 4 80.0% 7 Total 1,317 1,214 92.2% 3,382 2,892

Compare with Others (RQ4) Existing approaches can only detect at most 35.6% smells that TableCheck can detect (35.6%)

Experimental Results Table clones in spreadsheets are common 21.8% spreadsheets contain table clones Clone-related smells are common and harmful 14.6% table clones contain smells 33.6% smelly cells contain wrong values TableCheck detects table clones and smells precisely 92.2% and 85.5%, respectively TableCheck can detect smells that existing approaches fail to detect Only 35.6% smells can be detected by existing approaches

Summary http://www.tcse.cn/~wsdou/project/clone/ Table clones are common in spreadsheets. User may not consistently modify table clones TableCheck: automatically detects table clones and inconsistent smells among table clones Result TableCheck is precise Smells among table clones are harmful http://www.tcse.cn/~wsdou/project/clone/

Thank you!