Intelligent Access to Text: Integrating Information Extraction Technology into Text Browsers Robert Gaizauskas 1, Patrick Herring 1, Michael Oakes 1 Micheline.

Slides:



Advertisements
Similar presentations
Introduction Lesson 1 Microsoft Office 2010 and the Internet
Advertisements

Chapter 11 Designing the User Interface
LeadManager™- Internet Marketing Lead Management Solution May, 2009.
Choose and Book Archive New functionality from November 2012.
Space Missions Can Your Library Automation Software Do This? David Hook MDA
Biomarkers Data Center Product Overview Partnership between DMS Data Systems and Cambridge Healthtech Institute.
Rationale To encourage all students to take a full part in the life of our school, college, workplace or wider community. To provide opportunities to enable.
Using the Web-based Training Tool MyFloridaMarketPlace Revised Date: 12/14/06.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Copyright 2004 Monash University IMS5401 Web-based Systems Development Topic 2: Elements of the Web (g) Interactivity.
Information Retrieval in Practice
Human Computer Interface. HCI and Designing the User Interface The user interface is a critical part of an information system -- it is what the users.
Case study - usability evaluation Howell Istance.
COMP6703 : eScience Project III ArtServe on Rubens Emy Elyanee binti Mustapha Supervisor: Peter Stradzins Client: Professor Michael.
An Overview of Database Access on the Web An Overview of Database Access on the Web Using ASP and Microsoft Database Technology Sheffield Hallam University.
Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.
Simfund Filing Training Introduction First Look Step by Step Training.
What is adaptive web technology?  There is an increasingly large demand for software systems which are able to operate effectively in dynamic environments.
Introduction to eValid Presentation Outline What is eValid? About eValid, Inc. eValid Features System Architecture eValid Functional Design Script Log.
Overview of Search Engines
Web 2.0 Testing and Marketing E-engagement capacity enhancement for NGOs HKU ExCEL3.
User Interface Design Chapter 11. Objectives  Understand several fundamental user interface (UI) design principles.  Understand the process of UI design.
IBM Proof of Technology Discovering the Value of SOA with WebSphere Process Integration © 2005 IBM Corporation SOA on your terms and our expertise WebSphere.
Planned Giving Design Center. What is the Planned Giving Design Center? National network of websites dedicated to advancing philanthropy.
Evaluation Framework Prevention vs. Intervention CHONG POH WAN 21 JUNE 2011.
1 Chapter 11 Implementation. 2 System implementation issues Acquisition techniques Site implementation tools Content management and updating System changeover.
‘One Sky for Europe’ EUROCONTROL © 2002 European Organisation for the Safety of Air Navigation (EUROCONTROL) Page 1 VALIDATION DATA REPOSITORY Overview.
Leveraging Oracle Data for Web- Based Reporting Northern California Oracle Users Group May 2001.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Multi-agent Research Tool (MART) A proposal for MSE project Madhukar Kumar.
ACSP Report – Review of Open Suggestions Nate Davis.
Creating Web Applications Using ASP.NET Chapter Microsoft Visual Basic.NET: Reloaded 1.
April, 2008 Better Together! Integrated GP & CRM AN INDEPENDENT MEMBER OF BAKER TILLY INTERNATIONAL 505 AFFILIATE OFFICES WORLDWIDE.
Current and Completed Research Projects Database (CCRPD) Advanced Search.
State Records Office of Western Australia.NET Proof of Concept Project Slideshow: Prototype Online Disposal Authority/Recordkeeping Plan System Project.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
History Study Centre Demonstration. History Study Centre A wealth of primary and secondary resources for historians. Content is selected and organised.
22 nd January 2004 UITV 2004 NewsBoy: an interactive news retrieval system Joemon M Jose The Information Retrieval Group Department of Computing Science.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
ITCS373: Internet Technology Lecture 5: More HTML.
User Support Chapter 8. Overview Assumption/IDEALLY: If a system is properly design, it should be completely of ease to use, thus user will require little.
Lecture 6 Title: Web Planning, Designing, Developing for E-Marketing By: Mr Hashem Alaidaros MKT 445.
Using the Right Method to Collect Information IW233 Amanda Murphy.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
Recuperação de Informação B Cap. 10: User Interfaces and Visualization , , 10.9 November 29, 1999.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Information Retrieval
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
Creating & Building the Web Site Week 8. Objectives Planning web site development Initiation of the project Analysis for web site development Designing.
An Introduction to NHS Evidence
G042 - Lecture 09 Commencing Task A Mr C Johnston ICT Teacher
1 Evaluating the User Experience in CAA Environments: What affects User Satisfaction? Gavin Sim Janet C Read Phil Holifield.
TEMPLATE DESIGN © Crawling is the process of automatically exploring a web application to discover the states of the application.
NSU Website Structure By: Debbie Jones, NSU Webmaster 1 NSU Web Services Publication - Author: NSU Webmaster Norfolk State University.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Practical IT Research that Drives Measurable Results Vendor Landscape Plus: Enterprise Content Management Suite ECM: A vendor marketing concept, not an.
Systems Analysis Lecture 5 Requirements Investigation and Analysis 1 BTEC HNC Systems Support Castle College 2007/8.
Evaluation of an Information System in an Information Seeking Process Lena Blomgren, Helena Vallo and Katriina Byström The Swedish School of Library and.
Human Computer Interaction Lecture 21 User Support
Human Computer Interaction Lecture 21,22 User Support
Thawatchai Piyawat Jantawan Noiwan Anthony F. Norcio
1 NSU Website Structure By: Debbie Lyn Jones, Information Technology Manager I / Norfolk State University Webmaster NSU Webmaster Publication – Created.
Architecture Components
User interface design.
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Presentation transcript:

Intelligent Access to Text: Integrating Information Extraction Technology into Text Browsers Robert Gaizauskas 1, Patrick Herring 1, Michael Oakes 1 Micheline Beaulieu 2, Peter Willett 2, Helene Fowkes 2, and Anna Jonsson 2 1 Department of Computer Science, 2 Department of Information Studies University of Sheffield

March, 2001 HLT01, San Diego Outline of Talk Is Information Extraction Technology Useful? Barriers to Deployment Information Seeking in Large Enterprises The TRESTLE System System Overview NEAT: Named Entity Access to Text SCAT: Scenario Access to Text Preliminary User Evaluation Evaluation Methodology Access Strategies User Perceptions Conclusions and Discussion

March, 2001 HLT01, San Diego Is Information Extraction Technology Useful? Information Extraction (IE) technology has led to impressive new abilities to extract structured information from texts Named entity recognition Template Element/Relation filling Scenario Template filling IE complements traditional Information Retrieval (IR) capabilities However, unlike IR, IE has not found its way into widely used end- user systems, such as Web search engines Document indexing systems Why not?

March, 2001 HLT01, San Diego Barriers to Deployment Porting Cost Moving to new domains requires considerable time + expertise to create/modify domain-specific resources + rule bases to annotate texts for supervised machine learning approaches Sensitivity to inaccuracies in extracted data MUC-7 results – F-measure scores 50-92% depending on task Thus, IE only appropriate for applications where some error is tolerable/readily detectable by end users Note: formal IR evaluation results comparable, but application contexts make error less significant Complexity of integration into end-user systems IE systems’ outputs must be incorporated into larger application systems, if end users are to benefit from them

March, 2001 HLT01, San Diego IE and Information Seeking in Large Enterprises To investigate the utility of IE in a real setting have developed an advanced text access facility to support information workers at GlaxoSmithKline TRESTLE – Text Retrieval Extraction and Summarisation Technology for Large Enterprises Aim: increase effectiveness of employees in “industry watch” function – current awareness/tracking of People Companies Products – particularly progress of new drugs through clinical trial/regulatory approval process Approach: provide enhanced access to Scrip the largest circulation pharmaceutical industry newsletter

March, 2001 HLT01, San Diego IE and Information Seeking in Large Enterprises User requirements study at GSK (questionnaire, observation, interviews) revealed 2 key types of information seeking: 1. Current awareness general updating (what's happened in the industry today/this week) entity or event-based tracking (e.g. what's happened concerning a specific drug or what regulatory decisions have been made) 2. Retrospective search historical tracking of entities or events of interest (e.g. where has a specific person been reported before, what is the clinical trial history of a particular drug) search for a specific event or a remembered context in which a specific entity played a role Note: both activities require identification of entities/events in the news = what IE systems do

March, 2001 HLT01, San Diego TRESTLE System Overview The system consists of two components Off-line component LaSIE IE system Input: Scrip texts delivered daily via the Internet Output: IE results Named entities: MUC-7 categories + drugs + diseases Scenario templates: Person Tracking; Clinical Trials; Regulatory Announcements Summary Writer Input: Scenario templates Output: Single sentence NL summaries of the templates Entity/Scenario Indexer Input: NE annotated texts; Scenario templates Output: Indices keyed by NE + date with pointers to source texts

March, 2001 HLT01, San Diego TRESTLE System Overview (cont) On-line component Browser scripts Input: User requests for information Output: Results to requests returned from annotated Scrip DB Entity/Scenario Index Search + Dynamic Page Generator Input: User information requests forwarded from Web server + entity/scenario indices + NE annotated texts/summaries Output: Relevant HTML pages with link info dynamically generated link information

March, 2001 HLT01, San Diego TRESTLE System Architecture User Scrip Index Search + Dynamic Page Creator LaSIE System Summary Writer Indexer Entity/ Scenario Indices Scenario Templates NE Tagged Texts Scenario Summaries Off-Line System Web Server Internet Web Browser Info Seeking

March, 2001 HLT01, San Diego TRESTLE Interface Overview TRESTLE browser-based interface allows 4 routes to access texts: by headline by named entity (NEAT: Named Entity Access to Text) by scenario summary (SCAT: Scenario Access to Text) by free text search For first 3 routes date range of accessed articles may be set to current day previous day last week last four weeks full archive

March, 2001 HLT01, San Diego TRESTLE Interface: Underlying Design Head Frame Access Frame Index Frame Text Frame Head Frame User state Date range selection Access Frame Choose access mode NE/Scenario/free text search Index Frame Headline list, or NE + headline list, or Summary list Text Frame Full text of source text embedded NE hyperlinks

March, 2001 HLT01, San Diego NEAT: Named Entity Access to Text RUN

March, 2001 HLT01, San Diego SCAT: Scenario Access To Text RUN

March, 2001 HLT01, San Diego Preliminary User Evaluation: Methodology Prelude to full end-user study: preliminary study with 8 Information Studies postgrad students Aim: to gain insight into ease of use and learnability of the system preferred strategies for accessing text problems in interpreting the interface Instruments: usability questionnaire, verbal protocols, observational notes Procedure: brief verbal introduction to evaluation and system undirected exploration of system, asking questions/providing comments simulated tasks of real end-user You've heard that one of your colleagues, Mr Garcia, has recently accepted an appointment at another pharmaceutical company. You want to find out which company he will be moving to and what post he has taken up.

March, 2001 HLT01, San Diego Preliminary User Evaluation: Access Strategies NEAT: access to named entities was made available in three ways: 1. by clicking directly on a list of NE categories in the access frame 2. through the NE index look up query box in the access frame 3. through highlighted entries in a full article displayed in the text frame Observation: users preferred 2 over 1 or 3, regardless of task perhaps because users knew what they were looking for perhaps more familiar than browsing NE’s perhaps because of prominence of NE lookup box in interface SCAT: Observation: for tasks where SCAT was appropriate users opted for NE index lookup perhaps because of novelty of scenario tracking perhaps because SCAT functionality not clear from interface

March, 2001 HLT01, San Diego Preliminary User Evaluation: User Perceptions Colour coding + hyper-linking of NE’s Highly noticeable; some objections to colour choice Disagreement about utility – distracting when reading full texts, but highly useful in leading to related previous Scrip Integration of current awareness + retrospective searching via NE’s highly appreciated NE index look-up Found very useful by all but one participant Some confusion over scope – differences wrt free-text search/only 5 searchable NE categories Exact string matching limiting (limitation now removed) Scenario Tracking Function misunderstood from labelling in access frame Confusion between SCAT summaries and headlines Flag icons for summaries in headline lists not well understood

March, 2001 HLT01, San Diego Conclusions (I) To date IE largely a “technology push” activity For IE technology to become usable and influenced by end user requirements (“user pull”), end user prototypes must be built which: exploit the significant achievement of the technology to date acknowledge its limitations TRESTLE attempts to do this by exploiting NE and scenario template IE technology to offer users novel ways to access textual information via a familiar text browsing interface

March, 2001 HLT01, San Diego Conclusions (II) Preliminary user evaluation has revealed: search options initially selected from the access frame were not always optimal for set tasks on the whole colour-coded textual/iconic cue in headline index + full text enabled users to exploit the different functions seamlessly interface supported interaction at procedural level, but some misunderstanding at the conceptual level – esp. scenario access other studies report similar issues in introducing more complex interactive search functions further investigation + modifications (e.g. to labelling) underway Full evaluation in real end user environment now being organised To answer question: can professional information workers use IE- based searching and awareness approaches effectively?

March, 2001 HLT01, San Diego The End