Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
CS305: HCI in SW Development Evaluation (Return to…)
Advertisements

Challenges, Motivations, and Success Factors in the Creation of Hurricane Katrina "Person Locator" Web Sites Christopher Scaffidi, Brad Myers, Mary Shaw.
Measuring and reporting outcomes for your BTOP grant 1Measuring and Reporting Outcomes.
Carving up the Space of End User Programming EUSES, Lincoln, NE, Oct ‘05.
Topes: Reusable Abstractions for Validating Data Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.
Estimating the Numbers of End Users and End User Programmers Christopher Scaffidi Brad Myers Mary Shaw Carnegie Mellon University EUSES Consortium VL/HCC.
Unsupervised Inference of Data Formats in Human-Readable Notation Christopher Scaffidi Carnegie Mellon University.
Introduction to the EUSES Web Macro Scenario Corpus Allen Cypher, Sebastian Elbaum, Andhy Koesnandar, Brad Myers, Christopher Scaffidi.
Topes: Enabling End-User Programmers to Validate and Reformat Data Christopher Scaffidi Committee: Mary Shaw (chair)Institute for Software Research, Carnegie.
1 / 31 CS 425/625 Software Engineering User Interface Design Based on Chapter 15 of the textbook [SE-6] Ian Sommerville, Software Engineering, 6 th Ed.,
Dimensions Characterizing Programming Feature Usage by Information Workers Christopher Scaffidi, Andrew Ko, Brad Myers, Mary Shaw Carnegie Mellon University.
Topes: Enabling End-User Programmers to Validate and Reformat Data Christopher Scaffidi Key collaborators: Brad Myers, Mary Shaw Carnegie Mellon University.
Topes: Enabling End-User Programmers to Validate and Reformat Data Christopher Scaffidi Carnegie Mellon University.
Evaluating usability through claims analysis Suzette Keith Ann Blandford, Bob Fields, Richard Butterworth, Yin Leng Theng.
Tool Support for Data Validation by End-User Programmers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.
Brad A. Myers, CMU Pilot: Exploratory Programming for Interactive Behaviors: Unleashing Interaction Designers’ Creativity Brad Myers, Stephen Oney, John.
Usable Privacy and Security Carnegie Mellon University Spring 2008 Lorrie Cranor 1 Designing user studies February.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Toped: Enabling End-User Programmers to Validate Data Chris Scaffidi, Brad Myers, Mary Shaw, Carnegie Mellon University, School of Computer Science,
User interface design Designing effective interfaces for software systems Objectives To suggest some general design principles for user interface design.
Accommodating Data Heterogeneity in ULS Systems Christopher Scaffidi Mary Shaw Carnegie Mellon University.
A Lightweight Model for End Users’ Domain-Specific Data Christopher Scaffidi Carnegie Mellon University VL/HCC Graduate Consortium 2006.
Authentication for Humans Rachna Dhamija SIMS, UC Berkeley DIMACS Workshop on Usable Privacy and Security Software July 7, 2004.
A Data Model to Help End User Programmers Manipulate and Validate Data Christopher Scaffidi Carnegie Mellon University ISRI SSSG Oct 2006.
Phillip R. Rosenkrantz, Ed.D., P.E. Industrial & Manufacturing Engineering Department California State University, Pomona.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 16 Slide 1 User interface design.
Software Construction and Evolution - CSSE 375 Software Documentation 1 Shawn & Steve Right – For programmers, it’s a cultural perspective. He’d feel almost.
Electronic EDI e-EDI. The EDI has been in use since 1999 using a paper-based system and computerized spreadsheets to collect and manage EDI data. Over.
Web Design Process CMPT 281. Outline How do we know good sites from bad sites? Web design process Class design exercise.
1. Learning Outcomes At the end of this lecture, you should be able to: –Define the term “Usability Engineering” –Describe the various steps involved.
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
My Redneck Brother's Tire Size, and Other Unrelated Topes Christopher Scaffidi Carnegie Mellon University.
 Prototype for Course on Web Security ETEC 550.  Huge topic covering both system/network architecture and programming techniques.  Identified lack.
SiTEL LMS Focus Group Executive Summary Prepared: January 25, 2012.
Topes: Meeting the Challenges of User Input Validation Christopher Scaffidi Key collaborators: Brad Myers, Mary Shaw Carnegie Mellon University.
Part 1-Intro; Part 2- Req; Part 3- Design  Chapter 20 Why evaluate the usability of user interface designs?  Chapter 21 Deciding on what you need to.
C# Tutorial -1 ASP.NET Web Application with Visual Studio 2005.
Regular Expression (continue) and Cookies. Quick Review What letter values would be included for the following variable, which will be used for validation.
CS 390 Unix Programming Summer Unix Programming - CS 3902 Course Details Online Information Please check.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
Intelligently Creating and Recommending Reusable Reformatting Rules Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.
An Internet of Things: People, Processes, and Products in the Spotfire Cloud Library Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist.
 Whether using paper forms or forms on the web, forms are used for gathering information. User enter information into designated areas, or fields. Forms.
Midterm Stats Min: 16/38 (42%) Max: 36.5/38 (96%) Average: 29.5/36 (78%)
Presenter: Shanshan Lu 03/04/2010
Software Architecture
Technical Paper Review Designing Usable Web Forms – Empirical Evaluation of Web Form Improvement Guidelines By Amit Kumar.
MIT 6.893; SMA 5508 Spring 2004 Larry Rudolph Lecture Introduction Sketching Interface.
Assessment and Testing
Verification & Validation. Batch processing In a batch processing system, documents such as sales orders are collected into batches of typically 50 documents.
1 Planted-model evaluation of algorithms for identifying differences between spreadsheets Anna Harutyunyan, Glencora Borradaile, Christopher Chambers,
1 Year of Progress on Topes Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.
CoScripter and Topes: Putting Data into Usable Formats Christopher Scaffidi Carnegie Mellon University With Allen Cypher and Jimmy Lin IBM Almaden.
XML Schema – XSLT Week 8 Web site:
START Application Spencer Johnson Jonathan Barella Cohner Marker.
Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management.
6. (supplemental) User Interface Design. User Interface Design System users often judge a system by its interface rather than its functionality A poorly.
JavaScript, Sixth Edition
Creating an Account on Wikieducator
Sourcing Event Tool Kit Multiline Sourcing, Market Baskets and Bundles
Many Factors Affect Learning
Christopher Scaffidi Center for Applied Systems and Software
Multi Rater Feedback Surveys FAQs for Participants
Multi Rater Feedback Surveys FAQs for Participants
A Data Model to Help End Users Shape Effective Software
Unit 27 - Web Server Scripting
Multi-Mode Data Collection Approach
Developing a Data Model
Sr. Quality Engineering Manager,
Lab 2: Information Retrieval
Presentation transcript:

Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University

2 Contextual inquiry: What challenges do end users face? Observed 3 administrative assistants, 4 managers, and 3 webmasters/graphic designers (1-3 hrs, each) Background  Toped  Evaluation  New Opportunities

3 One person’s task: validate web forms-- but he didn’t know JavaScript / regexps Is the input valid? “EDSH 225” Is the input questionable? “GATE 225” Or is it obviously invalid? “ ” Background  Toped  Evaluation  New Opportunities 3

4 Hurricane Katrina “Person Locator” site: Many inputs unvalidated Background  Toped  Evaluation  New Opportunities 4

5 Spreadsheets contain lots of typos: inconsistent formatting & invalid strings Above: part of an actual spreadsheet on our university’s web site Plenty of invalid strings in users’ spreadsheets during contextual inquiry For thousands of other examples: EUSES Spreadsheet Corpus Background  Toped  Evaluation  New Opportunities

6 Needed: a usable mechanism for implementing validation 6 Background  Toped  Evaluation  New Opportunities

7 Coming Up… Background –Formative pilot study –Related work Toped Evaluations –Usability –Expressiveness New opportunities

8 Formative pilot study Motivation: Exploring the “gulf of execution” for data –User has to figure out how to map intentions to the features provided by a computer system –Poor “closeness of mapping” impedes system use  Before designing system, probe the concepts and terminology familiar to users Asked 4 administrative assistants to verbally describe two kinds of data –American mailing addresses –University project numbers Background  Toped  Evaluation  New Opportunities

9 Formative pilot study Participants identified and named the parts of data Eg: Street address, city, state, zip code –They hierarchically refined parts until sub-parts became small enough that they lacked names At that point, they described parts with constraints –Constraints were sometimes “soft”: not always true –They used adverbs of frequency to indicate softness Eg: “usually” or “sometimes” Implications –Users describe data in terms of constrained parts –Valid data sometimes violate certain constraints Background  Toped  Evaluation  New Opportunities

10 Alternate approaches: limited support for expressing constraints on structured strings Grammars based on sequences of characters –Context-free grammars (CFGs) Grammex Apple data detectors (CFGs + regexps) –Regular expressions (regexps) SWYN regexp editor Lapis patterns: constrained structured strings –Intentionally designed to support outlier is Number equal to /\d\d\d/ then "-" then Number equal to /\d\d\d\d/ ignoring nothing Background  Toped  Evaluation  New Opportunities

11 1. Name 2. Describe 3. Test 4. Save 11 Background  Toped  Evaluation  New Opportunities Toped: A form fill-in UI to mediate between users and grammars

12 The system generates an augmented CFG from format description A part that almost always has 1-8 lowercase letters: #WORD : #CHLIST : COUNT(#CH)>=1 && COUNT(#CH)<=8 {90} #CHLIST : #CH | #CH #CHLIST #CH : a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z More compact than a pure CFG More expressive than a pure CFG –Some constraints are impossible to represent as CFG –Some constraints need to be soft Background  Toped  Evaluation  New Opportunities

13 Testing strings against grammars Downgrade a parse if it violates constraints –Penalty = 1 – (strength of constraint)/100 –Multiply penalties –Propagate penalties up parse tree –Choose best parse (ie: parse with least penalties) Show error messages –Track violated constraints, concatenate into message If parse fails completely, show portions of format description that were used to generate unsatisfied CFG productions. –End-user development tools may offer user option of overriding some errors, depending on penalties. Background  Toped  Evaluation  New Opportunities

14 Showing error messages after testing strings against the generated CFGs 14 Background  Toped  Evaluation  New Opportunities

15 Usability: Does Toped help users to implement string validation? Between-subjects lab experiment –Direct comparison system: Lapis –(We also compare results to those of SWYN study – see paper) Recruited 17 participants (9 Toped, 8 Lapis) –Approx half were administrative assistants, approx half were master’s students (mostly information systems), distributed roughly equally across tools –1 participant mis-interpreted instructions (=> 8 & 8) Background  Toped  Evaluation  New Opportunities

16 Usability: Does Toped help users to implement string validation? Study structure –Background questionnaire –Tutorial (30 min) –3 tasks (20 min) –User satisfaction questionnaire Detail of a task: –Validate 1 kind of data phone numbers, mailing addresses, company names –User goal: For each kind, find typos in 25 strings Randomly drawn from EUSES spreadsheet corpus And we also retained 25 strings for further accuracy tests Background  Toped  Evaluation  New Opportunities

17 Usability: Users were nearly 2 times as fast and found 3 times as many typos TopedLapisRelative Improvement Significant? (Mann-Whitney) Tasks completed % p<0.01 Typos identified On 75 visible strings %p<0.01 On all 150 strings % p<0.01 F 1 accuracy measure On 75 visible strings %No On all 150 strings % No User satisfaction % p=0.02 Toped also compares favorably to SWYN regexp editor – see paper Background  Toped  Evaluation  New Opportunities

18 Expressiveness: Does Toped provide adequate primitives for validating real data? Logged data typed by 4 users into browser (3 weeks) –For each text string, we recorded: A label for the text field (e.g.: “Phone”) A regexp summarizing the string (e.g.: \d\d\d-\d\d\d-\d\d\d\d) Examined data, wrote scripts to cluster strings –94% of the 5897 strings were in 19 clusters –Each cluster had 1-2 formats Used Toped to create formats –Omitted 5 clusters that were for “general text”, usernames or passwords (so we could post format descriptions online) Background  Toped  Evaluation  New Opportunities

19 Expressiveness: Does Toped provide adequate primitives for validating real data? Overall, successful –We were able to create formats for each kind of data –The formats identified many probable typos Ideas for improvements –Ways to reuse constraints from format to format –Primitives for kinds of parts: Numeric, word-like, … Background  Toped  Evaluation  New Opportunities

20 Data Description Editor Toped + : an improved editor 20 Background  Toped  Evaluation  New Opportunities

21 Contributions and New Opportunities Toped – UI to mediate between users & grammars –Enables users to work faster & more effectively –Adequately expressive for validating many kinds of data –Provided a start for new line of similar editor tools New Opportunities (aka “Future Work”) –Extending Toped + to automatically reformat data [IUI’09] –Providing a repository for sharing formats (in-progress) –Developing new ways to make use of ability to identify strings that violate soft constraints Background  Toped  Evaluation  New Opportunities

22 Thank You… To Margaret Burnett, Brad Myers, Valentina Grigoreanu, Mary Beth Rosson, Mary Shaw and others in the EUSES Consortium for feedback over the years To NSF for funding To ISEUD 2009 for this opportunity to present

23 Toped + : key improvements vs Toped in terms of Cognitive Dimensions Better closeness of mapping –Constraints “belong” to parts in all formats Higher juxtaposability –Easy to view & compare multiple formats Lower error-proneness –Helps prevent senseless combinations of constraints Lower viscosity –Drag-and-drop / copy-and-paste speeds up edits Improved progressive evaluation –User can test each part individually Background  Toped  Evaluation  New Opportunities