Download presentation
Presentation is loading. Please wait.
Published byBuddy Jenkins Modified over 8 years ago
1
Xiaoyong Chai, Ba-Quy Vuong, AnHai Doan, Jeffrey F. Naughton University of Wisconsin-Madison Efficiently Incorporating User Feedback into Information Extraction and Integration Programs
2
The Need for Incorporating User Feedback Panels Chair
3
3 Current Approach … Code Data
4
4 This Is Not Just For DBLife A growing number of applications use IE and II –Avatar@IBM Almaden –AliBaba@Humboldt Univ. of Berlin –YAGO@MPI –Kylin@Univ. of Washington –… A systematic user-feedback solution could significantly benefit them
5
5 What User Feedback To Incorporate? Types of User Feedback Flagging an Error Fixing an Error Editing Data Editing Code Input Intermediate Results Output
6
6 Challenges How to expose program data for user feedback? How to incorporate user feedback? How to efficiently execute a program?
7
7 Exposing Program Data for User Feedback dataSources services Views User Interfaces extractConf crawl extractNames findRoles … 09/01/2008http://.../cidr09/ dateurl … Joe Hellerstein name PC ChairCIDR 2009 roleconf … …… namepagerole … …… url … Form Spreadsheet Wiki nameconfrole … … … namerolepage … … … roles Extracting conference services
8
8 Writing User-Feedback Rules to Expose Program Data Write extraction program, e.g., in xlog [Shen et al, 07] R 6 : dataSourcesForUserFeedback(url) : dataSources(url, date), date >= “01/01/2009” R 7 : rolesForUserFeedback(pos, page#no-edit)#spreadsheet-UI : roles(role, page) R 8 : servicesForUserFeedback(name, conf, role)#wiki-UI : services(name, conf, role) Write user-feedback rules to specify views and user interfaces #form-UI R 1 : pages(page) : dataSources(url, date), crawl(url, page) R 3 : names(name, page) : pages(page), extractNames(page, name) R 2 : conferences(conf, page): pages(page), extractConf(page, conf) R 5 : services(name, conf, role) : conferences(conf, page), roles(name, role, page) R 4 : roles(name, role, page) : names(name, page), findRoles(name, page, role)
9
9 Program Semantics Views url … nameconfrole … … … namerolepage … … … extractConf crawl extractNames findRoles dataSources … 09/01/2008http://.../cidr09/ dateurl … services Joe Hellerstein name PC ChairCIDR 2009 roleconf … …… namepagerole … …… roles User Interfaces Form Spreadsheet Wiki
10
10 Incorporating Previous User Feedback I O t t’ p Interpretation: for operator p, if t is in the output, change t into t’ name A. Smith A. Jones page p1p1 … D. Smith, A. Jones,... name A. Smith page p2p2 Dr. A. Smith is... … … Change “A. Smith” to “D. Smith” extractNames O’ I O p
11
11 Interpreting User Feedback Based On Tuple Provenance Provenance of output tuple t : –the set of input tuples that operator p used to produce t name A. Smith A. Jones page p1p1 extractNames p1p1 p1p1 Change “A. Smith” to “D. Smith” If the operator produces {“A. Smith”, “A. Jones”} from {p1}, then replace {“A. Smith”, “A. Jones”} with {“D. Smith”, “A. Jones”} p1p1 p2p2 page extractNames p1p1 p1p1 p2p2 name A. Smith A. Jones A. Smith
12
12 Challenges How to expose program data for user feedback? How to incorporate user feedback? How to efficiently execute a program? –Incremental execution –Improved concurrency control
13
13 Incrementally Executing the Program ? p2p2 p1p1 page … name extractNames p2p2 p1p1 page extractNames p3p3 Similar problem in incremental view maintenance Incremental-update properties –Closed-formed insertion –Closed-formed deletion –Input partitionability –Partition correlation –Attribute independence extractNames(I+ I) extractNames(I) = extractNames( I) +
14
14 Concurrently Executing Transactions dataSources services extractConf crawl extractNames findRoles … 09/01/2008http://.../cidr09/ dateurl … Joe Hellerstein name PC ChairCIDR 2009 roleconf … …… namepagerole … …… roles T2T2 T1T1 Locks only the input and output tables of the crawl operator Table-Locking Skips executing the join operator after updating the roles table Operator-Skipping
15
15 Experiment Setup Testbed –A 5-stage DBLife workflow –13 blackbox operators: 6 IE operators and 3 II operators Wrote xlog program and user-feedback rules in < 1 hr Simulated user-feedback transactions –On each stage of the workflow –Each transaction randomly deletes, inserts, or modifies 1/10 of the tuples in a table
16
16 Incremental-Update Properties are Broadly Applicable Inc. Update Properties DBLife Operatorscicdipaipc Get Data Pages Get People Variations Get Publication Variations Get Organization Variations Find People Variations Find Publication Variations Find Organization Variations Find People Entities Find Publication Entities Find Organization Entities Find Related People Find Authorship Find Related Organizations
17
17 Incremental Update Reduces Execution Time
18
18 Table-Locking and Operator-Skipping Improve Concurrency Degree Increase transaction throughput by 50% and 500% Reduce transaction response time by 43% and 98% MinMaxAverage Graph-locking~0s7,584s3,203s Table-locking 1s5,485s1,841s Operator-skipping~0s 457s 43s -43% -98%
19
19 Related Work User feedback in IE and II –[Doan et al, 01], [Chiticariu et al, 08], [Jeffery et al, 08] –Leveraging user feedback to improve results of individual operations Provenance –[Woodruff & Stonebraker, 97], [Cui & Widom, 01], [Buneman et al, 01], [Bohannon et al, 08] ], [Huang et al, 08] Incremental execution –View maintenance [Blakeley et al, 86], [Griffin & Libkin, 95], [Gupta & Mumick, 95] –Schema matching [Bernstein et al, 06], IE [Chen et al, 07]
20
20 Conclusions and Future Work Incorporating user feedback into IE and II programs is important Identify key issues and provide initial solutions: –Write user-feedback rules to expose program data to UIs –Model and incorporate user feedback –Efficiently execute program to process user feedback Future work: –Handle unreliable user feedback –Propagate user feedback down in the workflow –Conduct user study
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.