A Lightweight Model for End Users’ Data: Progress and Future Work Christopher Scaffidi Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
With Folder HelpDesk for Outlook, support centres and other helpdesks can work efficiently with support cases inside Microsoft Outlook. The support tickets.
Advertisements

Programming Paradigms and languages
Tutorial 8: Developing an Excel Application
Challenges, Motivations, and Success Factors in the Creation of Hurricane Katrina "Person Locator" Web Sites Christopher Scaffidi, Brad Myers, Mary Shaw.
Calendar Browser is a groupware used for booking all kinds of resources within an organization. Calendar Browser is installed on a file server and in a.
1 Chapter 12 Working With Access 2000 on the Internet.
Who Are the “End Users”? Mary Shaw Carnegie Mellon University.
Carving up the Space of End User Programming EUSES, Lincoln, NE, Oct ‘05.
Fast, Accurate Creation of Data Validation Formats by End-User Developers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.
Topes: Reusable Abstractions for Validating Data Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.
Unsupervised Inference of Data Formats in Human-Readable Notation Christopher Scaffidi Carnegie Mellon University.
Introduction to the EUSES Web Macro Scenario Corpus Allen Cypher, Sebastian Elbaum, Andhy Koesnandar, Brad Myers, Christopher Scaffidi.
Dimensions Characterizing Programming Feature Usage by Information Workers Christopher Scaffidi, Andrew Ko, Brad Myers, Mary Shaw Carnegie Mellon University.
Topes: Enabling End-User Programmers to Validate and Reformat Data Christopher Scaffidi Key collaborators: Brad Myers, Mary Shaw Carnegie Mellon University.
Topes: Enabling End-User Programmers to Validate and Reformat Data Christopher Scaffidi Carnegie Mellon University.
Tool Support for Data Validation by End-User Programmers Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
Web Page Behavior IS 373—Web Standards Todd Will.
Performed by:Gidi Getter Svetlana Klinovsky Supervised by:Viktor Kulikov 08/03/2009.
Toped: Enabling End-User Programmers to Validate Data Chris Scaffidi, Brad Myers, Mary Shaw, Carnegie Mellon University, School of Computer Science,
Accommodating Data Heterogeneity in ULS Systems Christopher Scaffidi Mary Shaw Carnegie Mellon University.
A Lightweight Model for End Users’ Domain-Specific Data Christopher Scaffidi Carnegie Mellon University VL/HCC Graduate Consortium 2006.
Russell Taylor Lecturer in Computing & Business Studies.
A Data Model and Development Environment to Help End-User Programmers Validate and Reuse Data Christopher Scaffidi Thesis Proposal, May 8, 2007 Committee.
A Data Model to Help End User Programmers Manipulate and Validate Data Christopher Scaffidi Carnegie Mellon University ISRI SSSG Oct 2006.
Stimulating reuse with an automated active code search tool Júlio Lins – André Santos (Advisor) –
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Software Documentation Written By: Ian Sommerville Presentation By: Stephen Lopez-Couto.
This chapter is extracted from Sommerville’s slides. Text book chapter
Lab 8 – C# Programming Adding two numbers CSCI 6303 – Principles of I.T. Dr. Abraham Fall 2012.
IT Introduction to Website Development Welcome!
WorkPlace Pro Utilities.
My Redneck Brother's Tire Size, and Other Unrelated Topes Christopher Scaffidi Carnegie Mellon University.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
| | Tel: | | Computer Training & Personal Development Microsoft Office PowerPoint 2007 Expert.
No application is an island: Using topes to transform strings during data transfer Atipol Asavametha, Prashanth Ayyavu, Christopher Scaffidi School of.
Introduction to SPSS Edward A. Greenberg, PhD
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
Tutorial 121 Creating a New Web Forms Page You will find that creating Web Forms is similar to creating traditional Windows applications in Visual Basic.
Topes: Meeting the Challenges of User Input Validation Christopher Scaffidi Key collaborators: Brad Myers, Mary Shaw Carnegie Mellon University.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Usability Issues Documentation J. Apostolakis for Geant4 16 January 2009.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
CoFM: An Environment for Collaborative Feature Modeling Li Yi Institute of Software, School of EECS, Peking University Key Laboratory of High Confidence.
Chapter 8 Collecting Data with Forms. Chapter 8 Lessons Introduction 1.Plan and create a form 2.Edit and format a form 3.Work with form objects 4.Test.
Intelligently Creating and Recommending Reusable Reformatting Rules Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.
© 2006 IBM Corporation Agile Planning Web UI. © 2006 IBM Corporation Agenda  Overview of APT Web UI  Current Issues  Required Infrastructure  API.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Graphical Enablement In this presentation… –What is graphical enablement? –Introduction to newlook dialogs and tools used to graphical enable System i.
Systems Development Life Cycle
 Enhancing User Experience  Why it is important?  Discussing user experience one-by-one.
CS562 Advanced Java and Internet Application Introduction to the Computer Warehouse Web Application. Java Server Pages (JSP) Technology. By Team Alpha.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Software Reuse Course: # The Johns-Hopkins University Montgomery County Campus Fall 2000 Session 4 Lecture # 3 - September 28, 2004.
Information Integration 15 th Meeting Course Name: Business Intelligence Year: 2009.
1 Year of Progress on Topes Christopher Scaffidi Brad Myers, Mary Shaw Carnegie Mellon University.
A Data Model to Support End-User Software Engineering Christopher Scaffidi Carnegie Mellon University.
Chapter – 8 Software Tools.
ASET 1 Amity School of Engineering & Technology B. Tech. (CSE/IT), III Semester Database Management Systems Jitendra Rajpurohit.
CoScripter and Topes: Putting Data into Usable Formats Christopher Scaffidi Carnegie Mellon University With Allen Cypher and Jimmy Lin IBM Almaden.
Excel Services Displays all or parts of interactive Excel worksheets in the browser –Excel “publish” feature with optional parameters defined in worksheet.
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
IST 220 – Intro to Databases
GO! with Microsoft Office 2016
Chapter 1: Introduction
GO! with Microsoft Access 2016
Software Documentation
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
A Data Model to Help End Users Shape Effective Software
CIS16 Application Development – Programming with Visual Basic
Presentation transcript:

A Lightweight Model for End Users’ Data: Progress and Future Work Christopher Scaffidi Carnegie Mellon University

2 Target users In 2012, we project that there will be 90 million computer end users (“EUs”) in American workplaces. Of these, at least half will create spreadsheets, databases, and/or web applications. These are called end-user programmers (“EUPs”). [5] Both EUs and EUPs will benefit from this research, though the research is mainly aimed at EUPs (including EUs who become EUPs because of the research). introduction ● topes ● prototype ● future work ● evaluation

3 Contextual inquiry: What are the problems of EUs and EUPs? Observed 3 administrative assistants, 4 managers, and 3 webmasters/graphic designers (1-3 hrs, each) [3][9] introduction ● topes ● prototype ● future work ● evaluation

4 How can EUPs validate web forms if they do not know JavaScript or regexps? Is the input valid? “EDSH 225” Is the input nearly valid? “EDXH 225” Does it just need reformatting? “Smith 225” Or is it obviously invalid? “ ” introduction ● topes ● prototype ● future work ● evaluation

5 Other tasks, other data, other problems When building a staff roster by merging data sources into a single spreadsheet, one of the EUs: –Had to scrutinize data to identify questionable values that deserved double-checking (e.g.: A first name with 15 characters might be right) –Had to manually transform data to consistent format (e.g.: Put person names in Lastname, Firstname format) Contextual inquiries, interviews, and surveys identified other data validation and reuse tasks that are poorly supported by existing tools. [3][4][7][9] introduction ● topes ● prototype ● future work ● evaluation

6 Underlying problem: abstraction mismatch Tools support strings, integers, floats, sometimes dates. Problem domain involves higher-level categories of data: –University names“ Carnegie Mellon”, “CMU” –Person names“ Scaffidi, Christopher”, “Chris Scaffidi” –CMU phone numbers“ ”, “x8-1234” –CMU room numbers“ WeH 4623”, “Wean 4623” These data categories are: –Human-readable –Short (~ 1 input field) –Multi-format –Sometimes ambiguous / fuzzy (non-binary scale of validity) –Often particular to certain groups of people introduction ● topes ● prototype ● future work ● evaluation

7 Related Work Regexps / grammars / data detectors recognize data but do not specify how to transform multi-format data Types: –A value is or is not a valid instance of a type (non-fuzzy). –Typed languages are difficult for EUPs. Research on units (e.g.: Slate) and constraint systems (e.g.: Cues) typically only apply to numeric data in certain applications (e.g.: spreadsheets). Tools for integrating heterogeneous databases typically require a professional DBA and are specific to db data. introduction ● topes ● prototype ● future work ● evaluation

8 Approach: Create a new abstraction for each category of data Like software “libraries,” implementations of these abstractions could be reused in many programs. Abstractions would need to include functions for: –Recognizing instances of the category (“isa”) (for automating data validation) –Transforming instances among various formats (“trf”) (for automating data reformatting) introduction ● topes ● prototype ● future work ● evaluation

9Topes Tope = an abstraction for a data category –Greek word for “place,” because each tope corresponds to a data category with a natural place in the problem domain Topes in practice: 1.EUPs create new topes by using the basic tope editor (or another language, e.g.: if they happen to know JavaScript) 2.EUPs publish topes on repositories. 3.Other EUs & EUPs download topes to their local cache. 4.Tool plug-ins let EUs & EUPs browse their local cache and associate topes with variables and input fields. 5.Plug-ins get topes from local cache and use them at runtime to validate and transform data. introduction ● topes ● prototype ● future work ● evaluation

10 Example in our prototype format editor: CMU Campus Phone Number Features: Format inference Format/part names Soft constraints “isa” generation Testing features Format reusability EUP tool integration [1][6] (Similar UI style for implementing trfs) introduction ● topes ● prototype ● future work ● evaluation

11 Validation by associating a tope with a textbox Invalid inputs cause a targeted message to appear. Inputs that violate an always or never constraint cannot be submitted to the server. Inputs that violate an often constraint cause a warning, which the application user can override. introduction ● topes ● prototype ● future work ● evaluation

12 Evaluations to date Usability: –Controlled experiment shows that our format editor enables EUPs to validate data more quickly and accurately than with Lapis patterns or with regexps Expressiveness: –We have implemented formats for dozens of kinds of data (1) EUSES spreadsheet corpus (2) logs of EUPs’ web browsing Usefulness: –We have integrated topes with tools for creating web applications, databases, spreadsheets, and web macros.

13 Future work Implement enhancements to the basic editor –UI improvements; behind the scenes: new meta-data fields Implement repository system –Plug-ins will have a list of “known” repository servers –EUPs will be able to publish topes into repository servers –Repositories will provide various search features Search by example (based on [1]) Search by contextual keywords (based on [2]) Search by collaborative filtering (similar to Amazon) Search by tope reliability (see [8]) And of course, search by (non-unique) name introduction ● topes ● prototype ● future work ● evaluation

14 Evaluation: Can EUPs create topes? Claim #1: By representing formats as a series of constrained parts, the basic editor enables EUPs to implement topes for common categories of data. Evaluation: controlled experiment –Sample: information workers –Tasks: create topes for data revealed by previous studies –Comparison: have users verbally describe the data –Measures: success, time, match to users’ expectations (Our usability evaluation only covered isa, not trf.) introduction ● topes ● prototype ● future work ● evaluation

15 Evaluation: Do topes help EUPs? Claim #2: Extending existing tools with topes enables EUPs to more quickly and correctly validate and reuse data than is possible through currently practiced methods. Evaluation: controlled experiment –Sample: information workers –Tasks: use topes to do work revealed by previous studies –Measures: time, accuracy, satisfaction –Comparison: Lapis and manual performance (Our usability evaluation covered data validation, not reuse.) introduction ● topes ● prototype ● future work ● evaluation

16 Evaluation: Can EUPs share/reuse topes? Claim #3: Given suitable tools operating on tope meta- information, EUPs can share topes with one another. Evaluation: field test –Sample: CMU staff and students –Tasks: install our tools and use them for several weeks –Measures: logs of usage, satisfaction surveys –Comparison: normal way of doing work introduction ● topes ● prototype ● future work ● evaluation

17 Related papers Conference papers [1]C. Scaffidi. Unsupervised Inference of Data Formats in Human-Readable Notation. Proceedings of 9th International Conference on Enterprise Integration Systems (ICEIS'07), 2007, to appear. [2]C. Scaffidi, K. Bierhoff, E. Chang, M. Felker, H. Ng, C. Jin. Red Opal: Product-Feature Scoring from Reviews. Proceedings of 8th ACM Conference on Electronic Commerce (ACMEC'07), 2007, to appear [3]C. Scaffidi, A. Cypher, S. Elbaum, A. Koesnandar, and B. Myers. Scenario-Based Requirements for Web Macro Tools. Submitted for publication, [4]C. Scaffidi, A. Ko, B. Myers, M. Shaw. Dimensions Characterizing Programming Feature Usage by Information Workers. VL/HCC'06: Proceedings of the 2006 IEEE Symposium on Visual Languages and Human-Centric Computing, pp , [5]C. Scaffidi, M. Shaw, and B. Myers. Estimating the Numbers of End Users and End User Programmers. VL/HCC'05: Proceedings of the 2005 IEEE Symposium on Visual Languages and Human-Centric Computing, pp , Other papers [6]C. Scaffidi, B. Myers, M. Shaw. The Topes Format Editor and Parser, Technical Report CMU-ISRI , School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, May [7]C. Scaffidi, B. Myers, and M. Shaw. Trial By Water: Creating Hurricane Katrina "Person Locator" Web Sites. In Leadership at a Distance: Research in Technologically-Supported Work (S. Weisband, ed), Lawrence Erlbaum, pp , [8]C. Scaffidi, M. Shaw. Toward a Calculus of Confidence. First International Workshop on the Economics of Software and Computation, co-located with ICSE'07, 2007, to appear. [9]C. Scaffidi, M. Shaw, B. Myers. Games Programs Play: Obstacles to Data Reuse, 2nd Workshop on End User Software Engineering (WEUSE), 2006.

18 Thank You… …to the symposium committee/panel for the opportunity to present …to many people for helpful suggestions …to NSF and EUSES for funding (ITR and CCF ) Marwan Abi-AntounMargaret BurnettMartin ErwigAndy KoMary Beth Rosson Robin AbrahamOwen ChengGeorge FairbanksThomas LaTozaMary Shaw Matt BassCiera ChristopherThomas GreenAlon LavieJeff Stylos Nels BeckmanMichael CoblenzJosh GrossHenry LiebermanDean Sutherland Kevin BierhoffAllen CypherGreg HartmanLarry MaccheroneSteve Tanimoto Alan BlackwellUri DekelJim HerbslebBrad MyersSusan Wiedenbeck Barry BoehmSebastian ElbaumJohn HoskingJohn Pane

19 This slide intentionally left blank.

20 Interviews of web site creators: Confirmation of specific problems Interviewed 6 people involved in creating “person locator” web sites after Hurricane Katrina [7][9] Many omitted data validation on web forms –Hard to detect that “12 Years old” is an invalid street address (what would the regexp look like?) “Aggregator” sites were built to scrape and consolidate data from numerous person locator sites. –Hard to transform data into a single consistent format –Hard to identify probable duplicates in the merged data set Extra slides

21 Survey of EUPs: Better data-manipulation features needed Asked 831 information workers about use of 23 features in 5 tools (eg: creating spreadsheet macros, database stored procedures, and web forms) [4][9] The most widely used features were related to manipulating linked structures of data (eg: database tables) rather than imperative or macro programming Yet respondents complained about these features: –“Not always easy to move sturctured [sic] data or text” –“Not always integrated a lot of data manipulation redundant” –“Information entered inconsistently into database fields by different people leaves a lot of database cleaning” Extra slides

22 Proposed data model 1 tope implementation contains executable functions: –1 isa:string  [0,1] function per format, for recognizing instances of the format –0 or more trf:string  string function linking formats, for transforming values form one format to another A lightweight data model… –Only contains 2 kinds of functions (isa/trf) –These correspond to the operations that people had to keep performing manually in our studies. Extra slides

23 Example tope Notional representation An example tope for CMU room numbers –3 isa functions, 4 trf functions –A tope’s trf functions can be omitted if desired Formal building name & room number Elliot Dunlap Smith Hall 225 Building abbreviation & room number EDSH 225 Colloquial building name & room number Smith 225 introduction ● topes ● prototype ● future work ● evaluation

24 Prototype implementation System block diagram Spreadsheet Microsoft Excel Plug-in Microsoft Visual Studio.NET Plug-in Format editor Parser Web application Validator Extra slides

25 Proposed development environment Functional decomposition diagram Basic Topes Editor Repository Software Publishing ToolsSearch Tools Development Environment Plug-Ins EUPs implement topes in basic topes editor (or JavaScript), then publish in repositories. Other EUs and EUPs search for topes, download them, then use them through plug-ins. Extra slides

26 Sample task: web form validation The painful old way Drag widgets and validator onto page, select a regexp, customize if desired. Extra slides

27 Sample task: web form validation Results of the painful old way Invalid inputs cause a hard-coded message to appear. Oops, forgot to enter a message at design-time. For valid inputs, no error message appears. Hm, didn’t realize the area code was optional. What if I want to allow campus phone numbers? Extra slides

28 Sample task: validating person names Customizing constraints in our prototype User can add/edit constraints Extra slides

29 Expressiveness evaluation Four administrative assistants’ use of a web browser was logged for three weeks, resulting in nearly 6000 sample data values that they typed into web forms. Not logged verbatim: characters were generalized –Eg:  We manually grouped values into 19 semantic families (eg: address) based on widget’s HTML name and words visually nearby to the widgets Created and tested formats for 14 families (4250 values) –Omitted: username/passwords and long blocks of “text” –Inference & testing features were not used during format creation introduction ● topes ● prototype ● future work ● evaluation

30 Expressiveness evaluation results 9 families needed 1 format each; 5 needed 2 formats each The only error attributable to editor expressiveness: –1 of the 4250 test values had a trailing period on a street type (in an address line) –This particular version of the editor had no way to say that a part could contain a period but only at the end... And we have recently submitted conference papers discussing a fuller expressiveness evaluation as well as a small usability study. [6] introduction ● topes ● prototype ● future work ● evaluation

31 Future work Share/reuse via repositories Clients will have a list of “known” repository servers –Generally pre-configured to include a global server at CMU –Organizations will configure clients to include the organizational server –EUs and EUPs will be able to add new servers to their list To support publishing/searching, the repository will house meta-information about topes, including… –a human-visible non-unique name & description –an internally-used globally unique id (guid) based on the tope’s URL in the repository Extra slides

32 Future work Searching for relevant topes Search by keyword: –Search tope name and description –And match based on words that are visually near to topes Search by groups of people: –Within an organization, or by author’s domain –Within spaces that are “group-private” Search by groups of topes: –“If you liked this tope, you may also like XYZ” –Similar to Amazon.com’s product recommendations Search by example: –“Find me a tope that recognizes ” –For efficiency, filter based on “signature” (\d{3}-\d{3}-\d{4}) Extra slides

33 Future work Searching for reliable topes Evidence [8] EUs and EUPs may trust topes:Search features Explicit formal rolesCreated by their organization’s system administrators. Search by tope author Prior performanceFrom people who have previously supplied good topes. Model of motivationFrom vendors that care about brand image. Group membershipFrom people who are known to have a similar background. ReputationThat earned anonymous votes of confidence. Search by tope ratings (either anonymous or not) ReferencesThat present a list of high-profile people who like the topes. CertificationThat are inspected and certified by a third party. Social contextThat are actively maintained—that is, for which improved versions are regularly available. That are implemented in a familiar language/platform. Search by tope publication date and execution platform Extra slides

34 Future work Enhancing plug-ins Target tools –Microsoft Excel –Microsoft Visual Studio.NET –Robofox Operations supported –Assertions run isa on selected cells –Transformation run trf on selected cells –De-duplication run trf on selected cells Each will support basic editor topes & JavaScript topes Extra slides

35 Future work Recognizing exceptions in plug-ins Tope creators might overlook values. From the standpoint of a tope format, these “normal” values are exceptional cases that need to be tolerated. Simple approach: Record a whitelist of exceptions More sophisticated: For each format, record exceptions, infer a format (new isa function), and average this function’s score with the raw function’s score Exceptional values can be incorporated into the tope in the local cache and/or, at EUP’s discretion, propagated to the repository of the tope’s master copy Extra slides