University of Sheffield, NLP Module 6: ANNIC Kalina Bontcheva © The University of Sheffield, 1995-2014 This work is licensed under the Creative Commons.

Slides:



Advertisements
Similar presentations
AM Queries and Views. Overview Asset Manager provides sophisticated querying and reporting capability, from simple filters to a complex language that.
Advertisements

Document Properties: adding information to your Microsoft Office documents Step 1: Add information to Document Properties What are Document Properties.
Use this guide to help you schedule, modify, lookup and delete hearing blocks Click on the buttons to the right to discover more about their functionality.
Chapter 3 Creating a Business Letter with a Letterhead and Table
ANNIC ANNotations In Context GATE Training Course 27 – 28 April 2006 Niraj Aswani.
Scheduling Discoverer Reports Scheduling Standard Reports Printing & Re-printing Standard Reports Focus on Reports Session 2 To print: Right click Choose.
Lesson 11 Page Numbers, Headers, and Footers
Pasewark & Pasewark 1 Word Lesson 7 Working with Documents Microsoft Office 2007: Introductory.
Word Lesson 7 Working with Documents
Table of Contents II: Customize your TOC Change the appearance of your TOC You’ve created an automatic table of contents, but now you feel like making.
Microsoft ® Office Word 2007 Training Table of Contents II: Customize your TOC [Your company name] presents:
Patron Categories Ability to organize patrons into: different roles (staff, patron, student, juvenile, etc.) age groups (age requirement) patron types.
Access Lesson 2 Creating a Database
Overview Importing text files Creating Forms Creating Reports.
Adding metadata to intranet documents Please note: this is a temporary test document for use in internal testing only.
Pasewark & Pasewark 1 Word Lesson 3 Helpful Word Features Microsoft Office 2007: Introductory.
Education Google Calendar (GCal) English. Education Upon completion of this course, you will be able to:  Navigate the GCal interface  Search your calendar.
Advanced Tables Lesson 9. Objectives Creating a Custom Table When a table template doesn’t suit your needs, you can create a custom table in Design view.
1 Access Lesson 3 Creating Queries Microsoft Office 2010 Introductory Pasewark & Pasewark.
1 Access Lesson 3 Creating Queries Microsoft Office 2010 Introductory.
Chapter 3 Maintaining a Database
ACCESS CHAPTER 1. OBJECTIVES Tables Queries Forms Reports Primary and Foreign Keys Relationship.
Advanced Forms Lesson 10.
XP New Perspectives on Microsoft Access 2002 Tutorial 51 Microsoft Access 2002 Tutorial 5 – Enhancing a Table’s Design, and Creating Advanced Queries and.
ANNIC ANNotations In Context GATE Training Course October 2006 Kalina Bontcheva (with help from Niraj Aswani)
Mail merge I: Use mail merge for mass mailings Perform a complete mail merge Now you’ll walk through the process of performing a mail merge by using the.
Copyright 2007, Paradigm Publishing Inc. Word 2007 Chapter 8 BACKNEXTEND 8-1 LINKS TO OBJECTIVES Create and Merge Creating a Data Source Creating a Data.
Lesson 2.  To help ensure accurate data, rules that check entries against specified values can be applied to a field. A validation rule is applied to.
Dreamweaver MX. 2 Overview of Templates n Templates represent a web page design or _______ that will be common to multiple pages. n There are two situations.
University of Sheffield NLP Module 1: Introduction to GATE Developer © The University of Sheffield, This work is licenced under the Creative.
Access Manual 2 By Dhawala Kovuri Elham S.Khorasani Ismail Guneydas.
State of Kansas Travel Authorizations Statewide Management, Accounting and Reporting Tool Entering a Travel Authorization Navigation: Employee Self Service.
Office Management Tools II Ms Saima Gul. Office Management Tools II Ms Saima Gul.
Introduction to GATE Developer Ian Roberts. University of Sheffield NLP Overview The GATE component model (CREOLE) Documents, annotations and corpora.
Using As series of training presentations How to edit an existing project September,
Copyright 2007, Paradigm Publishing Inc. ACCESS 2007 Chapter 2 BACKNEXTEND 2-1 LINKS TO OBJECTIVES Creating Related Tables Creating Related Tables Determining.
Copyright 2007, Paradigm Publishing Inc. ACCESS 2007 Chapter 3 BACKNEXTEND 3-1 LINKS TO OBJECTIVES Modify a Table – Add, Delete, Move Fields Modify a Table.
Access Forms and Queries. Entering Data in Your Table  You can add data to your table in Datasheet view, by typing in the columns and rows.  This.
Pasewark & Pasewark 1 Access Lesson 3 Creating Queries Microsoft Office 2007: Introductory.
Database Applications – Microsoft Access Lesson 4 Working with Queries 36 Slides in Presentation.
Using Advanced Options Lesson 14 © 2014, John Wiley & Sons, Inc.Microsoft Official Academic Course, Microsoft Word Microsoft Word 2013.
Microsoft Office 2007: Introductory 1. Word – Lesson 3  Use automatic features including AutoCorrect, AutoFormat As You Type, Quick Parts, and AutoComplete.
Basic & Advanced Reporting in TIMSNT ** Part Three **
LANDESK SOFTWARE CONFIDENTIAL Tips and Tricks with Filters Jenny Lardh.
®® Microsoft Windows 7 for Power Users Tutorial 3 Managing Folders and Files.
Excel 2007 Part (3) Dr. Susan Al Naqshbandi
Key Applications Module Lesson 14 — Working with Tables Computer Literacy BASICS.
Setting up Categories and Grade Setup Grades 3-5.
Excel part 5 Working with Excel Tables, PivotTables, and PivotCharts.
Transportation Agenda 77. Transportation About Columns Each file in a library and item in a list has properties For example, a Word document can have.
Modify Tables and FieldsModify Tables and Fields Lesson 4 © 2014, John Wiley & Sons, Inc.Microsoft Official Academic Course, Microsoft Word Microsoft.
Unit 3: Text, Fields & Tables DT2510: Advanced CAD Methods.
Access Module Implementing a Database with Microsoft Access A Great Module on Your CD.
Access Queries and Forms. Adding a New Field  To insert a field after you have saved your table, open Access, and open the table  It is easier to add.
University of Sheffield NLP Module 1: Introduction to JAPE © The University of Sheffield, This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike.
Microsoft Visual C# 2010 Fourth Edition Chapter 3 Using GUI Objects and the Visual Studio IDE.
Formatting a Research Paper Lesson 10 © 2014, John Wiley & Sons, Inc.Microsoft Official Academic Course, Microsoft Word Microsoft Word 2013.
© 2008 MediServe, Inc. All Rights Reserved.MediServe Confidential 1 Guided Practice Exercises Systems Training Charting October 2009.
Software-Projekt 2008 Seminarvortrag“Short tutorial of MySql“ Wei Chen Verena Honsel.
Setting up Categories and Grade Setup Middle Schools.
Perform a complete mail merge Lesson 14 By the end of this lesson you will be able to complete the following: Use the Mail Merge Wizard to perform a basic.
ANNIC: Annotations in Context Niraj Aswani, Valentin Tablan Thomas Heitz University of Sheffield.
Practical Office 2007 Chapter 10
Database application MySQL Database and PhpMyAdmin
Module 1: Introduction to GATE Developer
Creating and Modifying Queries
Word Lesson 7 Working with Documents
Working with Headers and Footers
Microsoft Official Academic Course, Access 2016
ADVANCED GUIDE TO ING This guide is for people who can already use and send to a good standard but cant use the more advanced.
Presentation transcript:

University of Sheffield, NLP Module 6: ANNIC Kalina Bontcheva © The University of Sheffield, This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs Licence

University of Sheffield, NLP The art and craft of JAPE rules You know by now how to write some not so simple JAPE rules The question is: how do you design them? How do you find patterns which are frequent in your test corpus? Given a dataset of tweets, how can you be sure that the JAPE LHS pattern you are about to implement doesn’t do more harm than good?

University of Sheffield, NLP 3 ANNIC: Annotations in Context □Motivation ○Need for a corpus analysis tool ○Useful for authoring of IE patterns for rules □… is an IR engine that can search over: ○Document Content ○Meta-data (Annotation types, features and values) for example: Person.gender==”male”

University of Sheffield, NLP 4 ANNIC □… is based on Apache Lucene technology. □… can index any document supported by GATE □… is integrated in GATE as Searchable Serial DataStore (SSD) □… has an advanced GUI that provides: ○ view of annotation mark-ups over the matched patterns ○Interactive way of developing new patterns ○Annotation statistics

University of Sheffield, NLP 5 How does it work? □Integrated in GATE as Searchable Serial Datastore (SSD) ○Initialization □Where to store □What to Index and what to exclude □Context boundary (e.g. restricted within sentence or paragraph boundaries) ○Index actions linked with Datastore actions □ When document is saved, index or re-index if already indexed □ When document is deleted, delete it from the index

University of Sheffield, NLP Creating a Datastore In GATE, right click on Datastores, then Create Datastore Specify a new empty directory for the index By default, the annotation sets to be indexed are the default set ( ) and the Key set (where by convention we put gold- standard annotations We want to index only the PreProcess annotation set This needs to be specified at index creation time – we cannot change it later

University of Sheffield, NLP Create Lucene Datastore (2) Click on the pencil button opposite Annotation Sets In the list box, delete the default values, type PreProcess and press the Add button Uncheck “Create Tokens Automatically Leave all else with default values Click OK, the new datastore is now ready to use

University of Sheffield, NLP 8 ANNIC: The Query Language □JAPE –like LHS Pattern syntax ○String within quotes or without quotes e.g. “ubuntu” ○{AnnotationType} e.g. {Person} ○{AnnotationType == string} e.g. {Organization == “University of Sheffield”} ○{AT.featureName==value} e.g. {Person.gender == male} ○{AT.feature==value, AT.feature==value} e.g. {Token.orth == “upperInitial”, Token.length == “3”}

University of Sheffield, NLP 9 ANNIC: The Query Language (2) □Klene Operator + and * but they need to be quantified ○{Person}{Token}*3{Organization} – find all Person and Organization annotations within up to 3 tokens of each other □Logical | (OR) operator ○{A}({B} | {C}) □Order of query terms is very important

University of Sheffield, NLP Initiating ANNIC Pattern Searches Populate a corpus from the annic-documents directory Save the corpus to the newly created Lucene Datastore Double click on the datastore Click on the “Lucene Datastore Searcher” tab at the bottom This opens the ANNIC GUI Choose over which annotation set you wish to search (top right). By default you are searching over all sets, but this is confusing, especially if you have many sets Enter a test ANNIC query (e.g. {Lookup} or {Hashtag}) in the big search field, then press Search

University of Sheffield, NLP Example: Building a Date pattern Let us first start by checking the {Lookup} annotations in the PreProcess set and the context in which they appear

University of Sheffield, NLP Seeing More Context Click the Configure button In the dialog box, keep adding rows for the annotation types (and optionally features) that you’d like displayed in the viewer A good set for our example is this:

University of Sheffield, NLP Seeing More Context (2)

University of Sheffield, NLP Building Up A Date Pattern Let’s look for dates which contain a day of the week We start the query by typing {Lookup.minorType=="day"} 22 results are returned and we can see from inspection that the subsequent word is typically a Lookup of type month Expand the query: {Lookup.minorType=="day"}{Lookup.minorType=="month"} This still returns 22 results, which means we haven’t lost anything or introduced noise From inspection, we notice that what follows next is a number. These can be recognised from Token.kind == “number” Final Date LHS pattern: {Lookup.minorType=="day"}{Lookup.minorType=="month"}{Token.ki nd=="number"}

University of Sheffield, NLP Example Results

University of Sheffield, NLP 16 Hands-on: Expand to include the time □Double-click on the datastore, open the ANNIC GUI □In the ANNIC GUI: ○Expand the pattern to include the time expressions

University of Sheffield, NLP Converting the Pattern to a JAPE Rule You might wish to create several different annotations from this JAPE LHS, e.g. Date, Time, and Offset Use different named blocks in the pattern to achieve this We leave this as home work, especially if you wish to link the year (which appears at the end) with the rest of the date A relevant PR here is the DateNormalizer: –