NXT meets the ICSI Corpus Jean Carletta and Jonathan Kilgour University of Edinburgh HCRC Language Technology Group.

Slides:



Advertisements
Similar presentations
An Overview of the Integration of the UCSF Dept. of Radiology Teaching File with MIRC Wyatt M. Tellis University of California San Francisco Departments.
Advertisements

Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
A Common Standard for Data and Metadata: The ESDS Qualidata XML Schema Libby Bishop ESDS Qualidata – UK Data Archive E-Research Workshop Melbourne 27 April.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
ANNIC ANNotations In Context GATE Training Course 27 – 28 April 2006 Niraj Aswani.
Tutorial on Standoff Markup as used in: HCRC Map Task Corpus MATE/NITE Workbench Amy Isard HCRC Language Technology Group University of Edinburgh.
Using Multiple Synchronized Views Heymo Kou.  What is the two main technologies applied for efficient video browsing? (one for audio, one for visual.
Lecture Tagging and Search Motivation Ubiquitous Presenter (UP) is a system designed at UCSD to promote and demonstrate the concept of “active learning.”
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
Information Retrieval in Practice
Technical Tips and Tricks for User Support Mike Gardner
Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.
Project topics Projects are due till the end of May Choose one of these topics or think of something else you’d like to code and send me the details (so.
XML and Data Integration Edward Yau (2002/03/27).
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Searching the Web II. The Web Why is it important: –“Free” ubiquitous information resource –Broad coverage of topics and perspectives –Becoming dominant.
1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System Supervisor: Prof Michael Lyu Presented by: Lewis Ng,
Information systems and databases Database information systems Read the textbook: Chapter 2: Information systems and databases FOR MORE INFO...
Overview of Search Engines
DEiXTo.
Collecting, Storing, Coding, and Analyzing Spoken Tutorial Dialogue Corpora Diane Litman LRDC & Pitt CS.
Discourse Level Software Current Status and Future Directions Nov. 16, 2004 Lars Huttar Knowledge Management Services.
DIVA - University of Fribourg - Switzerland Seminar presentation, jan Lawrence Michel, MSc Student Portable Meeting Recorder.
The NITE XML Toolkit Jean Carletta University of Edinburgh HCRC Language Technology Group.
A summary of the report written by W. Alink, R.A.F. Bhoedjang, P.A. Boncz, and A.P. de Vries.
EXtensible Neuroimaging Archive Toolkit (XNAT) Washington University Neuroinformatics Group.
HTML, XHTML, and CSS Chapter 12 Creating and Using XML Documents.
Welcome to AMI Community of Interest Workshop February 4 and 5, 2008.
Survey of Semantic Annotation Platforms
ANNIC ANNotations In Context GATE Training Course October 2006 Kalina Bontcheva (with help from Niraj Aswani)
Information Extraction From Medical Records by Alexander Barsky.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Practical Project of the 2006 Joint International Master’s Degree.
Tokeniser Francisco Miguel Pérez Romero University of Sevilla.
Hands-on tutorial: Using Praat for analysing a speech corpus Mietta Lennes Palmse, Estonia Department of Speech Sciences University of Helsinki.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,
Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of California, Berkeley Ray R Larson School of Information.
SPEAKING ‘Information Technology’ ‘Computers’. Make dialogues on a spot, without preparation. Use active vocabulary from the topics ‘Information Technology’
Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.
Collaborative Annotation of the AMI Meeting Corpus Jean Carletta University of Edinburgh.
Food and Agriculture Organization of the UN Library and Documentation Systems Division July 2005 Ontologies creation, extraction and maintenance 6 th AOS.
VLDB Demo WISE-Integrator: A System for Extracting and Integrating Complex Web Search Interfaces of the Deep Web Hai He, Weiyi Meng, Clement Yu, Zonghuan.
Natural language processing tools Lê Đức Trọng 1.
ENTERFACE 08 Project 1 “MultiParty Communication with a Tour Guide ECA” Mid-term presentation August 19th, 2008.
C-Store: RDF Data Management Using Column Stores Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 24, 2009.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
Faceted browsing for ACL Anthology Praveen Bysani.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
WDDX Case Study: Building a Cross CFUG Search April Fleming.
Information Retrieval
Brought to you as a public service by: Ian O’Donnell DISASTER RISK REDUCTION SOCIAL TAGGING SITE:
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Text segmentation Amany AlKhayat. Before any real processing is done, text needs to be segmented at least into linguistic units such as words, punctuation,
Welcome to AMI Community of Interest Workshop February 4 and 5, 2008.
Geographic Information Systems Using ESRI ArcGIS 9.3 Complex selection by attributes.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
M1G Introduction to Programming 2 2. Creating Classes: Game and Player.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
A Generic Toolkit for Electronic Editions of Medieval Manuscripts
The AMI Meeting Corpus: A Pre-Announcement
Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.
3.0 Map of Subject Areas.
QuickBooks Pro- Issues That Are Commonly Faced by You Failure in upgrading data file Rebuilding data file Connection loose while attempt Reinstallation.
Knowledge Based Workflow Building Architecture
CSE 635 Multimedia Information Retrieval
Dr. Bhavani Thuraisingham The University of Texas at Dallas
Information Retrieval
Presentation transcript:

NXT meets the ICSI Corpus Jean Carletta and Jonathan Kilgour University of Edinburgh HCRC Language Technology Group

ICSI Meeting Corpus 75 natural meetings from research groups –close-talking and far-field microphones orthographic transcription "speech quality" tags (e.g., emphasis) dialogue acts using MRDA hot spots

The NITE XML Toolkit library support for data handling and search using a data model that can express both timing and complex structure multiple file stand-off XML data storage some standard GUIs, data utilities library support for writing tailored GUIs

extract from Bdb001.A.words.xml time - line extract from Bdb001.A.speech-quality.xml Stand-off XML

Tasks pre-NXT: up-translation and tokenization hand annotation (topic segmentation, dialogue acts, extractive summaries,...) automatic annotation/indexing by query match

Queries in NXT ($a w):(TEXT($a) ~ /th.*/):: ($s speechquality):($s ^ $a) && Find instances of words starting with “th” For each find instances of speech quality tags of type "emphasis" that dominate the word Discard words that are not dominated by at least one such tag Use queries to understand data, verify quality, index.

NXT as Meeting Browser Browser = display + signal indexing + search NXT data displays: –synchronize with signal –highlight search results

Issues Already can't load all the ICSI data at once on some machines NXT supports display of one meeting at a time but browsing may be over several meetings Really complicated queries are often too slow for browser response times Key: Pre-indexing of query results, tailored data builds

Conclusions NXT available, free, open source, useful in a surprising number of ways