GATE Mímir and cloud services Multi-paradigm indexing and search tool Pay-as-you-go large-scale annotation.

Slides:



Advertisements
Similar presentations
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Advertisements

Accessing and Using the e-Book Collection from EBSCOhost ® When an arrow appears, click to proceed to the next slide at your own pace. To go back, click.
 Copyright I/O International, 2013 Visit us at: A Feature Within from Sales Rep User Friendly Maintenance – with Zip Code.
MAE Training for User July 8, Agenda Wiki FishEye Crucible Stash.
Student Manager Catalog Builder An ACEware Webinar.
Acceleratio Ltd. is a software development company based in Zagreb, Croatia, founded in Acceleratio specializes in developing high-quality enterprise.
Servlets and a little bit of Web Services Russell Beale.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
LCT2506 Internet 2 Data-driven web sites Week 5. LCT2506 Internet 2 Current Practice  Combining web pages and data stored in a relational database is.
Microsoft ® Official Course Interacting with the Search Service Microsoft SharePoint 2013 SharePoint Practice.
Offsite Backups. The purpose of this Startup Guide is to familiarize you with Own Web Now's Offsite Backup offering and show you how to purchase, deploy.
1 CS428 Web Engineering Lecture 18 Introduction (PHP - I)
Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.
1 Archive-It Training University of Maryland July 12, 2007.
November 3, 2011 Deborah de Bruin Building Digital Libraries.
Money Tracker Pro User Guide (version 0.4) Quick facts Create account Add transaction Transfer money between accounts Schedule reoccurring transaction.
Review of last Session Adding custom html Adding custom html HTML is the language that web servers understand, all web pages are created using HTML. HTML.
Login using your admin password Getting started > To edit content on the website you need to login with administrative rights. Your login/ password would.
E-Commerce Solutions. What is e-Commerce  Simply put, e-commerce is the online transaction of business, featuring linked computer systems of the vendor,
New Tools to Increase Sales And to Enhance The User Experience.
LBTO IssueTrak User’s Manual Norm Cushing version 1.3 August 8th, 2007.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Survey of Semantic Annotation Platforms
1 OPOL Training (OrderPro Online) Prepared by Christina Van Metre Independent Educational Consultant CTO, Business Development Team © Training Version.
Teacher’s Assessment Assistant Worksheet Builder Starting the Program
Reporting in Version 5 Application Reports AKA: In Context or Right Click AKA: In Context or Right Click Export to Excel from Listing pages Management.
Training Guide for Inzalo SOP Users. This guide has been prepared to demonstrate the use of the Inzalo Intranet based SOP applications. The scope of this.
SIGNZ V3.39 Centre Proposals SIGNZ V3.39 Centre Proposals.
University of Sheffield NLP Teamware: A Collaborative, Web-based Annotation Environment Kalina Bontcheva, Milan Agatonovic University of Sheffield.
ITCS373: Internet Technology Lecture 5: More HTML.
Introduction to GATE Developer Ian Roberts. University of Sheffield NLP Overview The GATE component model (CREOLE) Documents, annotations and corpora.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Reports and Learning Resources Module 5 1. SLMS Primary Administrator Training Module 5: Reports and Learning Resources 2.
My Workspace ELearning in Sakai Randy Graff, PhD HSC Training.
Java server pages. A JSP file basically contains HTML, but with embedded JSP tags with snippets of Java code inside them. A JSP file basically contains.
Advertising 1 *The red circles show the position of the keyframes on the timeline. What are banner and pop-up advertisements? 1 Answer Banner and pop-up.
Combining GATE and UIMA Ian Roberts. University of Sheffield NLP 2 Overview Introduction to UIMA Comparison with GATE Mapping annotations between GATE.
 Registry itself is easy and straightforward in implementation  The objects of registry are actually complicated to store and manage  Objects of Registry.
UoS Libraries 2011 EndNote X5 - basic graduate session.
LANDESK SOFTWARE CONFIDENTIAL Tips and Tricks with Filters Jenny Lardh.
University of Sheffield, NLP Module 6: ANNIC Kalina Bontcheva © The University of Sheffield, This work is licensed under the Creative Commons.
Getting Started. Package Overview (GradeQuick)‏ Web-based grade book –Access Anywhere –Always Current Paper grade book “look and feel” Flexible grading.
The School Portal and New and Improved IFAP Tools for Our Partners Today’s Focus: What is a Portal? (general definitions) What is the School Portal? How.
Web QT Today Runs against the Online Transaction Processing (OLTP) Production Database Uses J2EE Architecture Designed to provide operational support.
2004/051 >> Supply Chain Solutions That Deliver Users.
Microsoft Office 2013 Try It! Chapter 4 Storing Data in Access.
©2003 Paula Matuszek GOOGLE API l Search requests: submit a query string and a set of parameters to the Google Web APIs service and receive in return a.
21 Copyright © 2009, Oracle. All rights reserved. Working with Oracle Business Intelligence Answers.
PS-1 project Designing an E-commerce page for HMT BEARINGS LTD and SEO of the website.
Google Apps and Tools for the Classroom
Combining GATE and UIMA Ian Roberts. 2 Overview Introduction to UIMA Comparison with GATE Mapping annotations between GATE and UIMA.
Unit 13 – Website Development FEATURES OF WEBSITES.
3M Partners and Suppliers Click to edit Master title style USER GUIDE Supplier eInvoicing USER GUIDE The 3M beX environment: Day-to-day use.
Munis Version 9.1 & 8.3 Sneak Peek System Administration.
Learning Aim A.  Websites are constructed on many different features.  It can be useful to think about these when designing your own websites.
Ariba Punch-Out Catalog Process Flow
NIMAC for Accessible Media Producers: February 2013 NIMAC 2.0 for AMPs.
E commerce Online Shopping Website at Rs. 7920/-.
Free But Effective Listing Building and Marketing Service How to easily and quickly grow a list of potential buyers and constantly send them marketing.
111 State Management Beginning ASP.NET in C# and VB Chapter 4 Pages
GNU EPrints 2 Overview Christopher Gutteridge 19 th October 2002 CERN. Geneva, Switzerland.
Introducing GATECloud.net Valentin Tablan, Ian Roberts University of Sheffield.
GATE Mímir and cloud services
Business Directory REST API
Parts.cat.com Client training 2016.
B2B Portal Training Materials
Introduction to the New SSA OnePoint Online Website
USER MANUAL - WORLDSCINET
B2B Portal Training Materials
USER MANUAL - WORLDSCINET
Presentation transcript:

GATE Mímir and cloud services Multi-paradigm indexing and search tool Pay-as-you-go large-scale annotation

University of Sheffield NLP GATE Mímir GATE Mímir is an indexing system for GATE documents. Mímir can index:  Text: the original document content is indexed (based on Token annotations)  Annotations: annotations and features  Semantics: annotations can be linked to external ontologies which can be used at search time Mímir queries allow for any combination of these

University of Sheffield NLP Why Mímir? Standard search covers the text. GATE documents also have annotations, which give access to the document's:  Structure (sections, titles, etc.)  Linguistic features (nouns, verbs, etc.)  Semantics  Etc. Examples:  BBC News demo

University of Sheffield NLP Mímir query language Simplest queries are free text  text  “quoted string” Searching against the document text

University of Sheffield NLP Plain text queries

University of Sheffield NLP Plain text queries

University of Sheffield NLP Token features Free text queries are actually searching the string features of the GATE Token annotations Can also search other token features, e.g. root (morphology) or category (POS)

University of Sheffield NLP Morphology

University of Sheffield NLP Gaps Default combinator is sequence – terms must be adjacent  Different from typical search engines Can allow gaps with [n..m] Arbitrary gap with AND  x AND y finds the shortest span that covers both

University of Sheffield NLP Gaps

University of Sheffield NLP Annotations So far nothing a standard search engine couldn’t do… But Mímir also indexes annotations Syntax:  {AnnotationType feat1=val1 feat2=val2}  Feature comparisons can be =, =, >  UI offers pop-up with available types/features Let’s generalise – any person, not just Harriet

University of Sheffield NLP Annotations

University of Sheffield NLP Other operators Containment (use parentheses to group)  Query1 IN Query2  Query1 OVER Query2

University of Sheffield NLP Other operators Alternative  Query1 OR Query2  Query1 | Query2

University of Sheffield NLP Other operators Set difference  Query1 MINUS Query2  Returns all spans that match Query1 but are not also matches of Query2  E.g. sentences that don’t mention a location {Sentence} MINUS ( {Sentence} OVER {Location} )

University of Sheffield NLP Try it!  BBC News demo Find:  Document titles  Date expressions  Amounts of money being paid  Amounts of money being received Hint: if you get too much noise, try  restricting to matches within a sentence  using IN {Content} to ignore boilerplate

University of Sheffield NLP Semantics Annotations may be linked to a knowledge base, e.g. DBpedia ( Annotation refers to an instance  KB knows that this instance belongs to the class of politicians (and people, …)  SPARQL query language can retrieve instances that match constraints More tomorrow…

University of Sheffield NLP Semantics In news demo, Person, Location and Organization have class and inst features

University of Sheffield NLP Semantic search Can use SPARQL to query the KB at search time E.g. to find all politicians {Person sparql=” SELECT DISTINCT ?inst WHERE { ?inst a :Politician }"}

University of Sheffield NLP Semantic search

University of Sheffield NLP Custom UI Not the sort of query you want to construct by hand… Mímir provides an XML-over-HTTP query API to allow programmatic querying Can build custom UIs that hide the query language from users   Try out the query builder, look at the underlying query

University of Sheffield NLP Building an index - you need: Some annotated GATE documents A description of which annotations and features you want to index  “Index template”  Only the features you specify in the template will be available for searching A running instance of the Mímir webapp  You can download and build your own (requires Grails) A way to push the documents to the server

University of Sheffield NLP Pushing documents GATE PR  “Mímir indexing PR” available as part of the Mímir source distribution GCP – the “GATE Cloud Parallelizer”  Tool to deploy a saved GATE application multi-threaded on your own machine  Includes various “output handlers” to save annotations to disk, or push them into Mímir  or let us do it for you…

University of Sheffield NLP GATECloud.net A cloud based service from the GATE team Usual cloud benefits:  Pay-as-you-go, no upfront hardware costs  No sysadmin work  Web-based management tools  Always latest version, maintained by us Not-so-usual cloud benefits  Based on open-source software  Bring your own pipeline

University of Sheffield NLP Features On demand document processing (a.k.a. Annotation Jobs)  Parallel processing on Amazon EC2  On-line job definition tool  Many output formats, including Mímir On demand servers, including Mímir Top up your account with vouchers from the University online shop

University of Sheffield NLP Architecture

University of Sheffield NLP Dedicated servers Rent a dedicated Mímir server for your private use Start and stop it as required Pay only for the hours it is running Data (i.e. indexes) persistent across reboots Backup and restore facility available

University of Sheffield NLP The shop

University of Sheffield NLP Reserving a server The usual e-commerce experience  Sign up for an account  Buy a top-up voucher  Add item(s) to your basket  Checkout to complete the order Server appears in your dashboard Behind the scenes, creates a persistent data volume for your data

University of Sheffield NLP Dashboard

University of Sheffield NLP Reservation control panel

University of Sheffield NLP Controlling the server Start and stop instance  Startup/shutdown takes a few minutes – system will you when server is ready  You pay the hourly price whenever the instance is running Backup and restore  Save the state of your data volume so you can roll back later Destroy reservation  If you no longer need the server, destroy it to discard the data volume and all backups  This cannot be undone

University of Sheffield NLP Annotation jobs Parallel and distributed annotation of documents with a GATE application Upload your own documents  zip, tar, arc/warc archives Upload your own pipeline  “Export for GATECloud.net” … or use a standard one Output annotations in various formats, or send documents directly to Mímir

University of Sheffield NLP Job lifecycle

University of Sheffield NLP Execution environment Amazon EC2 Ubuntu LTS, 64-bit Oracle Java 7 ~2GB/thread RAM on average GCP 2.4, based on GATE Embedded 8.0

University of Sheffield NLP Reserving a job Same process as servers Choose the job you want from the shop Add to basket Checkout Job appears in dashboard

University of Sheffield NLP Managing a job - application

University of Sheffield NLP Managing a job - input

University of Sheffield NLP Outputting to Mímir Start up the Mímir server we reserved earlier Create an appropriate index template Create an index

University of Sheffield NLP Create a template

University of Sheffield NLP Create an index

University of Sheffield NLP Index details – note URL

University of Sheffield NLP Managing a job - output

University of Sheffield NLP Start the job

University of Sheffield NLP When job completes…

University of Sheffield NLP “Sync” the index

University of Sheffield NLP “Sync” the index Mímir accumulates documents in RAM Documents saved to disk after (by default) one hour, or when memory threshold reached Documents become searchable once saved to disk “Sync” button forces an immediate save, if you know no more documents due  Can continue to send more documents, but only sync-ed ones available for search

University of Sheffield NLP Search your new index

University of Sheffield NLP Try it! Sign up for an account on Use your voucher code Reserve a Mímir 5.0 server Start it up, log in Create a new index template using the contents of index-template.groovy Create a new local index using this template Visit index admin page and note the URL

University of Sheffield NLP Try it! Reserve a “custom annotation job” Application zip file is annie-with-morph.zip Input is news-corpus-large.zip  Mime type: text/html, Encoding: UTF-8 Set one output to MIMIR, using the Index URL you noted above  Make sure to not include any spaces in the index URL Run the job, and when finished sync the index

University of Sheffield NLP Try it! Try some searches on your new index  E.g. stock price movements {Organization} (up | down) ({Money} | {Percent}) When finished, make sure you stop the Mímir server and destroy the reservation

University of Sheffield NLP AnnoMarket.com New development based on GATECloud.net, adding  A much wider range of pre-packaged pipelines  Simplified UI, plus REST APIs for job management and processing single documents  “Test this pipeline” function to quickly try different pipelines on your text  Access to crawled web data

University of Sheffield NLP AnnoMarket.com

University of Sheffield NLP Test this pipeline

University of Sheffield NLP For developers REST APIs for integration with other systems Sell your own pipelines on the platform and take a cut of the proceeds  Talk to us for more details Vouchers available if you want to try it!

University of Sheffield NLP Questions? More info  