EARS STT Workshop at ICASSP, March 2005 EARS STT Workshop at ICASSP Christopher Cieri, Mohamed Maamouri, Shudong Huang, James Fiumara, Stephanie Strassel,

Slides:



Advertisements
Similar presentations
Presentations Online PoCO July 2009 Presentations Online Adhoc Committee Ron Jensen Chair.
Advertisements

A Model for Common Services Process Approach Assessment Planning Decision-Making Management.
SafetyFirsts Customer-Only Web Site Resources A Green Light to Safety Research, Articles, Specialty Reports & Training Presentations.
Register Laulima Workshop for Instructors Solutions to help you engage your students through Laulima.
ATC Conference Call January 10, 2008 Thank you for joining the call. We will start the call shortly. Please enter * 6 to mute your line and # 6 to unmute.
[Title of meeting] [Name of sponsor] [Date] For guidance on working with PowerPoint and reformatting slides, click on Help, then Microsoft PowerPoint Help,
Project Status Report January 2014
EGEE NA3 Planning Prof. Malcolm Atkinson Director John Murison Training Manager CERN 13 th November 2003.
1 Update on the HESDA Digest blog or klog? Lesly Huxley HESDA Development Meeting, Bradford, May 2003.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Agile-SCRUM. Introduction to SCRUM Sanil Xavier What is Scrum?
1 TransQUAL Online: Improving Student Transitions to Life After School Version 3.0 Employment and Disability Institute Cornell University ILR School Extension.
THE ADVANCED TECHNOLOGY ENVIRONMENTAL AND ENERGY CENTER (ATEEC) Summative External Evaluation July 1, 2013 – June 30, 2014 PRELIMINARY OUTLINE.
TECH581 Improve/Control Presentation Fall 2008 Date: 12/09/08 Team Members: xxxxx xxxxx Sound Removed.
Register Laulima Workshop for Instructors Solutions to help you engage your students through Laulima.
1 of 7 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
ClubRunner Connect. Communicate. Collaborate. ClubRunner and Rotary International Database Integration Introduction and Overview Introduced: November 2010.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Introduction to WebCT Sheridan College Architectural Technology.
Change Advisory Board COIN v1.ppt Change Advisory Board ITIL COIN June 20, 2007.
ArcGIS Workflow Manager An Introduction
New PBIS Coaches Meeting September 2,  Gain knowledge about coaching  Acquire tips for effective coaching  Learn strategies to enhance coaching.
Arabic STD 2006 Results Jonathan Fiscus, Jérôme Ajot, George Doddington December 14-15, Spoken Term Detection Workshop
Regional Technical Forum End-use Load Shape Business Case Project Project Initiation Meeting Portland, OR March 5, 2012.
LBTO IssueTrak User’s Manual Norm Cushing version 1.3 August 8th, 2007.
ASSE’s Council on Practices & Standards 2012 ASSE Chapter Leadership.
Do it pro bono. Strategic Scorecard Service Grant The Strategy Management Practice is presented by Wells Fargo. The design of the Strategic Scorecard Service.
Sakai Overview Sakai Conference: June 12-14, 2007 Amsterdam, NL.
Nov. 9, 2006EPIC QI WorkshopSlide 1 EPIC/PHSI Quality Improvement Workshop: The Journey Ahead Khalid Aziz November 10, 2006.
11 Update on Transcription of Fisher Phase II Data Owen Kimball, Chia-lin Kao, Tresi Arvizo, John Makhoul.
Quality Assurance. Identified Benefits that the Core Skills Programme is expected to Deliver 1.Increased efficiency in the delivery of Core Skills Training.
Welcome to the San Bernardino County Coach Quarterly Meeting.
Enhanced Infrastructure for Creation & Collection of Translation Resources Zhiyi Song, Stephanie Strassel (speaker), Gary Krug, Kazuaki Maeda.
LREC 2008, May 26 – June 1, Marrakesh Speaker Recognition: Building the Mixer 4 and 5 Corpora Linda Brandschain, Christopher Cieri, David Graff, Abby Neely,
Item 5d Texas RE 2011 Budget Assumptions April 19, Texas RE Preliminary Budget Assumptions Board of Directors and Advisory Committee April 19,
Laulima Workshop for Instructors Solutions to help you engage your students through Laulima.
TPIT using Model On Demand Jay Teixeira Manager Model Administration Regional Planning Group 5/21/13.
1 RMS TAC Update April 3, Test Plan Flight Dates It is the practice of RMS to approved the dates for future testing flights. This enables new.
Oregon Standards: An Update 2009 Superintendent’s Summer Institute Oregon Department of Education August 3, 2009.
SFU Curriculum Workshop Everything you should know about SFU’s curriculum development process. Presented by Senate and Academic Services and Office of.
Worldwide Protein Data Bank wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.
Next Generation Assessment Stakeholder Meeting December 10,
The Basics of Managing Your Department Website March 8, 2012.
2004/051 >> Supply Chain Solutions That Deliver Users.
Oman College of Management and Technology Course – MM Topic 7 Production and Distribution of Multimedia Titles CS/MIS Department.
1 FollowMyLink Individual APT Presentation First Talk February 2006.
Creating A Knowledge Archive for State Higher Education Organizations A Pilot A Pilot Rob Sheehan & Kirk Trickett Ohio Board of Regents prepared.
CERN openlab Phase 5 Preparation (Technical) Alberto Di Meglio CERN openlab CTO office.
CTIIP Coordination. Contract Period of Performance Digital Pathology and Integrative Query –July 24, 2014 – January 23, 2016 DICOM WG-30 –June 26, 2014.
Podcasting workshop Roni Malek Science Learning Centre London
A Framework for Assessing Needs Across Multiple States, Stakeholders, and Topic Areas Stephanie Wilkerson & Mary Styers REL Appalachia American Evaluation.
Grant Writing for Digital Projects September 2012 IODE Project Office IODE Project Office Oostende, Belgium Oostende, Belgium Sustainability and.
1 Introduction Overview This annotated PowerPoint is designed to help communicate about your instructional priorities. Note: The facts and data here are.
WELCOME! Communication Camp NDSU Agriculture Communication WiFi Connect to NDSU Limited Open browser Enter Full name Password is 7n7K4X6g.
ACES User Interface Workshop #1 Prototype Inspection 22. November 2011.
RT-03F Friday Discussions: Summary, Action Items, Schedule 15 November 2003.
SFU Curriculum Workshop Everything you should know about SFU’s curriculum development process. Presented by Senate and Academic Services and Office of.
Cluster Host Preparation Meeting Autumn Term 1a Overview and Action Planning Judith Carter Senior Adviser Complex Needs/Vulnerable Learners
PR70007 – MarkeTrak Ph2 ERCOT Marketrak Task Force February 7, 2008.
General Adult Education and
Planning the Digital Transformation Readiness Check for SAP S/4HANA
Stronger Economies Together (SET)
Evaluation of Priority Gender Equality
Using EPSS Introductory Session Greet everyone
DTC Spring Kick-Off January 8th 2015.
Gateway to Competency Portability
Customs Declaration Service
General recommendations
Iowa Statewide Assessment of Student Progress
PCS Technical Board 27th February 2018, Vienna.
Presentation transcript:

EARS STT Workshop at ICASSP, March 2005 EARS STT Workshop at ICASSP Christopher Cieri, Mohamed Maamouri, Shudong Huang, James Fiumara, Stephanie Strassel, David Graff, Kevin Walker, Mark Liberman {ccieri,maamouri,shudong,jfiumara,

EARS STT Workshop at ICASSP, March 2005 What Happens Next? Collect feedback here Check feasibility of new ideas –e.g. availability of BN (tran)scripts Estimate cost, timeline for wish list Sponsors allocate funds EARS Board revise priorities Re-estimate cost, timeline for task list Communicate final plan “Start”

EARS STT Workshop at ICASSP, March 2005 What Happened Next? Feedback was generally favorable Next day learned of 3 month projects Received 25% funding Preparation of utility thresh holds Learned of TIDES/EARS end Learned that GALE <> TIDES+EARS Completed existing commitments –STT Test Sets (MT Test Set) –CTS Collections Adjusted focus to GALE preparation

EARS STT Workshop at ICASSP, March 2005 Broadcast News Continue 2004 collection –>2000h English: VOA, NBC/MSNBC, CNN, ABC, PBS, PRI, WB17 –>1000h Chinese: VOA, CCTV, Radio Free Asia (RFA), NTDTV, Tai Yuan –>1000h Arabic: VOA, Al Hurra, Al Jazeera, Dubai, Jordan TV, LBC, Nile Select 2005 evaluation set then distribute 2004 data (February 2005) –delivery made after eval set picked 2005 Collection same sources, volumes –add semi-automatic language, source, program ID to QC process –harvest (tran)scripts where possible –100 hours of transcribed Chinese BN (commercial, QTr) –100 hours of transcribed Arabic BN (commercial, QTr) –collect broadcast conversations: audio and (tran)scripts Continue IPR negotiations Contribute to Experiments –Utility of Careful vs. Commercial vs. QTr. vs. CC. vs. Roverized ASR Update pronouncing Lexicons with vocab from English, Chinese, Arabic Continue collection with sources adjusted for GALE –Greater focus on broadcast conversation –Total: 62.5 hrs/week of Arabic, 60 hrs/week of Chinese, 75 hs/week of English –BC: 2.5 hours/week Arabic, 15 hours/week Chinese, 25 hours/week English –Acquired IPR for several new programs: 100% English 50% of Arabic, Chinese

EARS STT Workshop at ICASSP, March 2005 English CTS Volume: complement 2003 collection to provide another 1400 hours (was 850) with subjects making minute calls Used November 2003 Topics BBNT/WordWave doing transcription Complete collection of 1400 hours Finalize evaluation set Distribute beginning in December as transcripts are ready 1400 hours sent to BBN/WordWave for transcription 450 hours distributed to sites February 17

EARS STT Workshop at ICASSP, March 2005 Chinese CTS New Collection at HKUST –Target 200 hours transcribed, gender balance, regions represented Transcription based upon RT hours in delivered to LDC so far –regions not balanced across delivery increments Select 2005 evaluation & dev/test sets –to control demographics across train/test sets Deliver training data once final increment has arrived and evaluation data extracted Repeat collection in 2005 –require gender, age, regional balance across collection epoch –require word segmentation? Build portable platform? HKUST finished Collection of 150 hours of CTS –ready for release once test set extracted –will deliver 50 more hours at end of March –will collect & transcribe another 50 hours through June

EARS STT Workshop at ICASSP, March 2005 Arabic CTS Fisher Protocol, platform in US Select 2005 evaluation set from current collection Continue collection until current pool sapped Complete audit and transcription; deliver in December Add ‘yellow’ tier (surface phonemic) transcription Build portable platform? Begin new dialect? Demographics changed since last test sets created –new Dev/Test as well as Eval set required Finished 50 hours of Levantine Arabic CTS Released on 01/15/2005 as LDC 2005SO7 & LDC 2005TO3 50 more hours of Levantine due March 31, hours scheduled June 30, 2005 ??? Yellow layer transcription of 15h underway RT rates improving: 8-10xRT on green, 15xRT yellow (assuming green)

EARS STT Workshop at ICASSP, March 2005 STT Test Sets None

EARS STT Workshop at ICASSP, March 2005 MDE Ported English specification v6.2 to Chinese, Arabic Created MDE v7 specification, tool for English Created Chinese and Arabic tools Created small pilot data set in each language Distributed as: LDC2004E47

EARS STT Workshop at ICASSP, March 2005 GALE Preparation Created 13 new Fisher English topics designed to elicit ACE worthy conversations Collected 500 conversations; manually selected 25% for transcription. ACE transcribed; are in ACE annotation pipeline LDC Staff Read DLI DLPT material in Arabic LDC Staff read WSJ articles In preparation for GALE, adding new source types e-lists, blogs, chat, technical reports, GovDocs Built general purpose speech annotation toolkit; ready April 1.

EARS STT Workshop at ICASSP, March 2005 Distribution Rules Most EARS sites are LDC members Those who are not have data under evaluation agreement –Require return at end of program –LDC will offer extension; sites not part of GALE by June 2005 must return data then –Or non-members, non-GALE sites can keep data by becoming LDC members Exception drive arrays of BN data. This must be returned by both members and non- member not involved in GALE

EARS STT Workshop at ICASSP, March 2005 GALE-related efforts Data scouting in English, Chinese, Arabic –Exploring new domains Broadcast conversation (roundtable, talk shows, call-ins) Web text (blogs, newsgroups, chat, discussion forums) –Defining best practices Identifying, Harvesting, Formatting, Licensing –Researching more economical sources, methods Transcripts, story segmentation Annotation efficiencies Local infrastructure in place –Annotation toolkit –Annotation guidelines & web resources guide –Scouting teams for English, Chinese Arabic lagging Sharable version of tools, docs in progress To date, –English: 270 sites identified (16 topics) –Chinese: 57 sites identified (10 topics) –Arabic: 10 sites identified (3 topics) –All of these now/soon in ACE annotation pipeline –IPR secured under “fair use”

EARS STT Workshop at ICASSP, March 2005 Documentation

EARS STT Workshop at ICASSP, March 2005 Use search engine to find sites for each types –Minimum thresholds for each data type/subject Tool tallies good/bad sites identified; logs URLs/judgments to DB Categorize URLs as good or bad for TIDES-type annotation –“Bad” URLs are not revisited for a topic Process  The left side of the web scouting tool shows a tally of the data types found for the annotator’s topic.  The bottom pane of the tool is a window where the annotator inputs information, including data type, title, and URL, for each site that he finds.  The top pane of the tool is occupied by a web browser.

EARS STT Workshop at ICASSP, March 2005 Up-to-minute updates