Information Extraction from Single and Multiple Sentences Mark Stevenson Department of Computer Science University of Sheffield, UK.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

History Data Service1 Good Design for Historical source based Databases History Data Service Hamish James.
Delivering textual resources. Overview Getting the text ready – decisions & costs Structures for delivery Full text Marked-up Image and text Indexed How.
A Machine Learning Approach to Coreference Resolution of Noun Phrases By W.M.Soon, H.T.Ng, D.C.Y.Lim Presented by Iman Sen.
SPECIAL TOPIC XML. Introducing XML XML (eXtensible Markup Language) ◦A language used to create structured documents XML vs HTML ◦XML is designed to transport.
A Corpus for Cross- Document Co-Reference D. Day 1, J. Hitzeman 1, M. Wick 2, K. Crouch 1 and M. Poesio 3 1 The MITRE Corporation 2 University of Massachusetts,
Critical Reading – Scottish Text
Limestone 1997 Past Paper Question Study the Ingleton OS map. (a) The Yorkshire Dales National Park, shown on the map, is characterised by upland Limestone.
1 Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper bypaper Mary Elaine Califf and Raymond.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Basic guidelines for the creation of a DW Create corporate sponsors and plan thoroughly Determine a scalable architectural framework for the DW Identify.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved. Business and Administrative Communication SIXTH EDITION.
Aki Hecht Seminar in Databases (236826) January 2009
The USE Project: Usability Evaluation and Software Design: Bridging the Gap University of Copenhagen Aalborg University Use Case Evaluation (UCE): A Method.
Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney.
Database Design (Data Modeling) DCO11310 Database Systems and Design By Rose Chang.
The Semantic Web Resource Description Framework (RDF) Michael B. Spring Department of Information Science and Telecommunications University of Pittsburgh.
 Manmatha MetaSearch R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst.
Using Information Extraction for Question Answering Done by Rani Qumsiyeh.
Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee:
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Proceedings of the 11 th National Conference on Artificial Intelligence,
Using Use Case Scenarios and Operational Variables for Generating Test Objectives Javier J. Gutiérrez María José Escalona Manuel Mejías Arturo H. Torres.
Desktop Publishing Unit 7. Unit Layout Five Assessment Objectives Unit Completion end of January – Allowing 2-3 Months contingency work.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
A Midsummer Night’s Dream
Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.
Section 13.1 Add a hit counter to a Web page Identify the limitations of hit counters Describe the information gathered by tracking systems Create a guest.
MATHEMATICS KLA Years 1 to 10 Understanding the syllabus MATHEMATICS.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Accounting systems design & evaluation 9434SB 12 April 2002.
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Natural Language Processing Group Department of Computer Science University of Sheffield, UK Improving Semi-Supervised Acquisition of Relation Extraction.
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group University of Sheffield, UK.
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
Understanding PML Paulo Pinheiro da Silva. PML PML is a provenance language (a language used to encode provenance knowledge) that has been proudly derived.
Evaluating a Research Report
Presentation to Legal and Policy Issues Cluster JISC DRP Programme Meeting 28 March 2006.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
Helpful Hints for writing an exam commentary or essay Remember that unlike your oral commentary, a written commentary is NOT chronological; you DON ’ T.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield.
Comparing Information Extraction Pattern Models Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
TimeML compliant text analysis for Temporal Reasoning Branimir Boguraev and Rie Kubota Ando.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
History: Summer Pack Name _________________ Class _________________ If you have lost your Source Pack you can download a new one at: OASPhistory.weebly.com.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Section B 8 Mark questions Comparison of Key People who improved Public Health Comparison of how much progress was made in different time periods Comparison.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
2 1 Database Systems: Design, Implementation, & Management, 7 th Edition, Rob & Coronel Data Models Why data models are important About the basic data-modeling.
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Annotating and measuring Temporal relations in texts Philippe Muller and Xavier Tannier IRIT,Université Paul Sabatier COLING 2004.
ORGANISATIONAL BEHAVIOUR BACHELOR OF MANAGEMENT STUDIES [LEVEL 200] INTRODUCTION TO ORGANISATIONAL BEHAVIOUR Dr. N. Yaw Oppong School of Business University.
INSPIRING CREATIVE AND INNOVATIVE MINDS QUALITATIVE DATA ANALYSIS 1.Collected data is made into text 2.Codes to sets of notes or transcript pages 3.Codes.
GCSE Business Studies Exam help Command Words Unit 3: Building a Business.
Introduction Ms. Binns.  Distinguish between qualitative and quantitative data  Explain strengths and limitations of a qualitative approach to research.
Section C 8 Mark questions Comparison of Key People who improved Surgery and Anatomy Comparison of how much progress was made in different time periods.
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
CRF &SVM in Medication Extraction
International Research and Development Institute Uyo
Steps of a Scientific Investigation
Introduction to Information Extraction
Social Knowledge Mining
Paper One: Answering Question 3
Extracting Semantic Concept Relations
Using Uneven Margins SVM and Perceptron for IE
Kasper Hornbæk Department of Computer Science University of Copenhagen
Presentation transcript:

Information Extraction from Single and Multiple Sentences Mark Stevenson Department of Computer Science University of Sheffield, UK

Introduction Information Extraction often viewed as the process of identifying events described in text Generally accepted that an event may be described across more than one sentence “Pace American Group Inc. said it notified two top executives it intends to dismiss them because an internal investigation found evidence of “self-dealing” and “undisclosed financial relationships”. “The executives are Don H. Pace, cofounder, president and chief executive officer; and Greg S. Kaplan, senior vice president and chief financial officer.”

Sentence-limited Approaches Some approaches have treated each sentence in isolation and extracting only the events described within them –Zelenko et. al. (2003) - SVM –Soderland (1999) – rule generalisation –Chieu and Ng (2002) – maximum entropy –Yangarber et. al. (2000) – pattern learning This restriction often makes IE more practical for ML How can results be compared against systems which extract all events? How much can be achieved by just analysing within sentences?

Experiment Compare two alternative annotations of the same corpus –Complete annotation identifies all events described in a document –Within sentence annotation marks only events described within a sentence Corpus used are the MUC-6 evaluation texts –Documents describe management succession events Complete annotation produced as part of formal evaluation Within sentence annotation due to Soderland (1999).

Event Definition Two annotations of this corpus have different definitions of what constitutes an event Events in both annotations are transformed into a common representation scheme Contains information encoded by both schemes Allows comparison Provides method for defining what constitutes an event Each event is stored as a database entry consisting of four fields: type  person_in or person_out person, post, organisation Minimal event description – at least two elements

MUC Annotation Annotations stored in complex nested template structure Core SUCCESSION_EVENT, refers to specific position Contains IN_AND_OUT events, movement of a single executive relative to that position Aliases list alternative ways of referring to event objects Representation does not directly link event objects to text so difficult to compute the proportion of events described with a sentence directly

:= SUCCESSION_ORG: POST: "chairman" IN_AND_OUT: := IO_PERSON: NEW_STATUS: IN OTHER_ORG: := ORG_NAME: "McCann-Erickson" ORG_ALIAS: "McCann" := PER_NAME: "John J. Dooner Jr." PER_ALIAS: "John Dooner“, "Dooner" type(person_in) post(chairman) org(‘McCann-Erickson’|`McCann’) person(`John J. Dooner Jr’| ‘John Dooner’| `Dooner’)

Within Sentence Annotation Soderland (1999) produced an alternative annotation of the same corpus Annotation is linked directly to source sentence so only events described within a single sentence are included Annotations use a flat structure inspired by case frames

“Daniel Glass was named president and chief executive officer of EMI Record Group” Succession {PersonIn DANIEL GLASS} {Post PRESIDENT AND CHIEF EXECUTIVE OFFICER} {Org EMI RECORD GROUP} event 1 type(person_in) event 2 type(person_in) person(`Daniel Glass’) org(‘EMI Records Group’) post(‘president’)post(‘chief executive officer’) Within Sentence Annotation: Example

Matching Allow two levels of matches between events in the two sets Full match –Events contain the same fields and each field shares at least one filler Partial match –Events share some fields and each of those fields share at least one filler Matching process compares each event in the within sentence events with each event in the MUC event set Allow only one-to-one mapping for full matches but many within sentence events can partially match onto a single MUC event

type(person_in) person(‘R. Wayne Diesel’|’Diesel’)person(‘R. Wayne Diesel’) org(‘Mechanical Technology Inc.’| ‘Mechanical Technology’) post(‘chief executive officer’) type(person_in) person(‘R. Wayne Diesel’|’Diesel’)person(‘R. Wayne Diesel’) org(‘Mechanical Technology Inc.’| ‘Mechanical Technology’) org(‘Mechanical Technology’) post(‘chief executive officer’) Fully matching events Partially matching events

Event Analysis All events Within sentence events Count Full40.6% (112) 45.2% (112) Partial39.1% (108) 47.6% (118) No match20.3% (56) 7.3% (18)

Mismatching Events Spurious events in limited annotation set – not matched to any event in MUC corpus 9 Events mentioned in limited annotation and text but not in MUC data (strict guidelines) 8 Event mentioned in limited annotation and text but not in MUC data 1 All events Within sentence events

Event Field Analysis Full match Partial match No matchTOTAL Type112/112100/1080/5676.8% Person112/112100/1080/5676.8% Org112/1126/1080/5343.2% Post111/11174/1080/5068.8% Total447/447280/4320/ %

Text Style Variation between event field can be explained by the structure of documents in this corpus Succession event often introduced at start of document. Generally complete –“Washington Post Co. said Katherine Graham stepped down as chairman and will be succeeded by her son, Donald E. Graham, the company’s chief executive.” Further events may not be described fully: –“Alan G. Spoon, 42, will succeed Mr. Graham as chief executive of the company”. –“Mr. Jones is succeeded by Mr. Green.” –“Mr. Smith assumed the role of CEO.”

Conclusion Analysis of a commonly used IE evaluation corpus showed that only 40.6% of events fully described within a single sentence. A larger proportion of the events are partially described but wide variation between the event fields; due to document style. These results should be borne in mind during the design/evaluation of IE systems. Additional implications for summarisation systems which select sentences and question answering systems.