Tools for Automating the Captioning of Video

Slides:



Advertisements
Similar presentations
Using Audacity Audacity is a free, easy-to- use audio editor and recorder for a variety of operating systems.
Advertisements

CAPTIONING ONLINE MEDIA An introductory guide for colleges and universities Mary Reilly Captioned Media Specialist Services for Students with Disabilities.
Microsoft Office Illustrated Fundamentals Unit M: Creating a Presentation.
Microsoft Word: What you need to know for your Legal Analysis Writing and Research (LAWR) Class.
CAPTIONING ONLINE MEDIA An introductory guide for colleges and universities Mary Reilly Captioned Media Specialist Services for Students with Disabilities.
CAPTIONING VIDEOS FOR YOUTUBE Marisol Miranda, Beth Coombs.
Captioning Basics VLC Professional Development Center.
Setup Guide for Win 7 Speech Recognition 6/30/2014 Debbie Hebert, PT, ATP Central AT Services.
An Introduction to Content Management. By the end of the session you will be able to... Explain what a content management system is Apply the principles.
Micah Orloff and Donna Eyestone October 19, 2010 For audio call Toll Free and use PIN/code
1 “ Speech ” EMPOWERED COMPUTING Greenfield Business Centre, 20 th September, 2006.
Speech Recognition. My computer doesn’t understand me……….. Software is now mainstream Many people use it within office/home setting for inputting text.
File Formats Chapter 9 Bit Literacy. File formats are often ignored by users Applications automatically save files in the application’s format All formats.
Lights, Camera, Caption! Presented by Kaela Parks.
Real-Time Speech Recognition Subtitling in Education Respeaking 2009 Dr Mike Wald University of Southampton.
Audio on the Web Teaching OntheNet 2002 Minneapolis, MN June 23-25, 2002.
Captioning Basics Beth Case Program Manager for Digital, Emerging, and Assistive Technologies University of Louisville
Exploring Adobe Presenter Presented By: Immersion Team
Student Quick Start Guide Prepared by: Information Services Division Perpustakaan Sultan Abdul Samad Universiti Putra Malaysia
 Given live by a presenter  Played without a presenter on a computer screen or on the Web  Slides provide a way to use text and graphics to introduce.
CREATING ACCESSIBLE MEDIA Joseph Polizzotto Faculty Reports Winter 2016.
Creating a PowerPoint With Sound PowerPoint 2007 Version.
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
JING SCREEN CAPTURE Anne Perorazio Information Resources Specialist UM Health Sciences Libraries
Using the Automatic Captions Feature. Objectives Learn how to use the Automatic Captions feature in YouTube  Edit the generated captions  Extract the.
Closed Captioning: Your Guide to Technology and Accessibility
Introduction to Notes Sui for Teachers.
Making videos accessible – Mandatory guidelines
Creating a Document MOAC Lesson 1.
DATA SECURITY FOR MEDICAL RESEARCH
Florida Atlantic University Student Accessibility Services Beginning the Journey and Planning your Strategy to Provide Closed Captioning on your Campus.
Accessible Social Media
Advanced Programing practices
Microsoft Office Live Meeting 2007
Microsoft Word 2010 Prepared 2/20/11 Objectives:
Welcome to your first Online Class Session
Captioning Considerations for Web-based Media
Web Programming– UFCFB Lecture 8
INTERNET JOURNALISM WORK FOR SEMETER
Conferences Presenter Training
MICKEY & MINNIE’S TEXT SERVICE
Before we get started You will need the following apps downloaded to your mobile device: Microsoft Translator Office Lens  This matches with Engage section.
Using Speech Recognition for Input: A Powerful and Readily Available Tool Dr. Donna Olsen Instructional Technologist Central Wyoming College
Video the UW: Overview & How We're Doing
Zoom Host Training Zoom offers: What is Zoom?
A hybrid intralinguistic subtitling tool Laura Cacheiro Quintas, UPVD
Lesson 5 Formatting a Presentation
Lesson 5 Formatting a Presentation
Lesson 5 Formatting a Presentation
Lesson 5 Formatting a Presentation
Creating Transcripts of Your Narrated PowerPoints Richard Oliver Department of Information Systems 2018 Quality in Online Education Conference.
Captioning Project.
Lesson 5 Formatting a Presentation
How to insert a media file into your webpage
Presenting with Prezi.com: Using Mind Maps for Virtual Lectures
ENDNOTE Software – The Basics
PowerPoint Basics Eric Prebys.
Add some WordArt to your cover slide
Click Audacity Installer link
Course: Module: Lesson # & Name Instructional Material 1 of 32 Lesson Delivery Mode: Lesson Duration: Document Name: 1. Professional Diploma in ERP Systems.
Lesson 5 Formatting a Presentation
Google Drive/Google ClassroomGetting Started with Google!
Advanced Programing practices
Web Programming– UFCFB Lecture 8
Free and Easy Captioning- Making our videos accessible for all.
Polishing and Running a Presentation
Screencasting with Kaltura Classroom Hands-on Training
Making Social Media Posts Accessible
Microsoft Office Illustrated Fundamentals
Feeding Your Website: Video Strategy & Closed Captions
Presentation transcript:

Tools for Automating the Captioning of Video Joseph Polizzotto Access Technology Specialist Instructor High Tech Center Training Unit (HTCTU)

High Tech Center Training Unit Goals for this Session Identify common challenges Review YouTube's captioning process State rationale for automation in captioning Demonstrate use of automation tools: Transcribing a video Segmenting a transcript into chunks Aligning a corrected transcript with the video 11/9/2018 High Tech Center Training Unit

Captioning Challenges Expensive ($2-3 per minute) Time-consuming Depends on the video but generally 5x length of video Turn-around times Standard rate varies but generally 3-4 business days Learning how to use the technology Third-party content must be captioned 11/9/2018 High Tech Center Training Unit

High Tech Center Training Unit YouTube Videos Sign-up is free Videos are hosted remotely YouTube auto-captions the videos! Why not just use the built-in captioning tools? Free Fast Easy to learn Community captioning! Captions can be turned on and off 11/9/2018 High Tech Center Training Unit

Captioning with YouTube Free Google/ YouTube Upload video to YouTube Edit YouTube's auto-captions Add speaker IDs Add non-speech information Save and submit the corrected captions 11/9/2018 High Tech Center Training Unit

YouTube Captioning Tools Crowdsourcing Upload transcripts and caption files Download caption files (SRT, VTT) YouTube recognizes non-speech sounds 11/9/2018 High Tech Center Training Unit

High Tech Center Training Unit YouTube Drawbacks Auto-captions They are inaccurate They may not generate They don't have punctuation They are seamless, appearing at silent intervals Correcting auto-captions You must correct inside of caption blocks You must correct caption blocks for good grammar 11/9/2018 High Tech Center Training Unit

Captioning with YouTube A Demonstration

High Tech Center Training Unit Assessment of YouTube Captioning in YouTube is great but... Remember quality caption standards captions must have punctuation captions should be ~32 characters per line caption block divisions should reflect grammar captions should not appear during silent intervals By automating the entire captioning process, YouTube may actually lead to: a slower editing process poorer quality caption blocks Captioning key 11/9/2018 High Tech Center Training Unit

Automation and Captioning Automation is still a good idea! Automation is best one step at a time: Transcription What words are in the video Segmentation how these words are chunked Alignment when the chunks appear during the video 11/9/2018 High Tech Center Training Unit

A "Semi-Automated" Workflow… Transcribe the video (machine) Edit the transcription (humans must do!) Chunk the corrected transcript (machine) Align the chunks with the video (machine) 11/9/2018 High Tech Center Training Unit

1) Automation and Transcription Speech to Text Services (STT), e.g., IBM Watson Google Cloud Speech API Speechmatics Word Error Rates (WER) State of the art is around 96% (our own testing) Even with a quality transcription, you must edit: misrecognitions speaker identification non-speech information 11/9/2018 High Tech Center Training Unit

Word Error Rate (WER) Methodology Prepare a perfect transcript Eliminate all punctuation Place each word on its own line For each STT generated transcript, Put each word on its own line Eliminate insignificant differences such as spelling variants and capitalization Use DIFF and DIFFSTAT tools to compare the two Divide differences by number of words in perfect transcript

Word Error Rate Comparison 11/9/2018 High Tech Center Training Unit

Word Error Rate Percentage YouTube 4% Speechmatics 7% Pop-up Archive 7% Trint 8% Google Speech 9% Google Voice Typing 13% Dragon (trained) 14% Dragon (untrained) 23% IBM Watson 26% Microsoft Bing 29% Tests Performed: July 2017

YouTube Transcription Word Error Rate of 4% (best) Do It Yourself Captions (http://www.diycaptions.com/) Edit and download captions as plain text Open video in Amara (for third party content) Aeneas Web App (www.aeneasweb.org) Can use to perform segmentation + alignment also (steps 3 and 4 in captioning process) Email us for script that will download TXT file Adds punctuation too! Also check out: http://captionsconverter.com 11/9/2018 High Tech Center Training Unit

High Tech Center Training Unit Dragon Transcription Dragon Premium or Professional Speaker-dependent Only one speaker in audio/video file A profile can be created from an audio file Requires ~5 minute long recording Best transcription occurs when profile is trained Correct misrecognitions Save profile Save as DOCX or RTF 11/9/2018 High Tech Center Training Unit

Google VoiceTyping Transcription Google Docs Tool Speaker-independent Transcribe audio file with multiple speakers Record from sound coming from computer For Mac, use Soundflower application For PC, use Stereo Mix recording output Steps: Play audio/video file Activate VoiceTyping 11/9/2018 High Tech Center Training Unit

Transcribing with Google Voice Typing A Demonstration

2) Editing the Transcript Inevitable in ANY captioning workflow… add speaker IDs add non-speech information correct misrecognitions and add punctuation oTranscribe (http://otranscribe.com/) Free Use offline Opens in web browser Link to YouTube videos Easy shortcuts for video playback 11/9/2018 High Tech Center Training Unit

Editing with oTranscribe A Demonstration

3) Chunking the Transcript Chunks = caption blocks Quality caption blocks will not have: More than two lines of text More than ~32 characters per line Two sentences on same line Breaking of grammatical constructions Preposition + prepositional phrases Text segmentation tools Bash Script 11/9/2018 High Tech Center Training Unit

Chunking with a Bash Script A Demonstration

High Tech Center Training Unit 4) Aligning the Chunks Alignment = adding time stamps Caption chunks must be in synch with video Aeneas (https://github.com/readbeyond/aeneas) A Python/C library Quickly creates captions files (e.g., SRT) Use when you have a transcript for your video Can be used with text files in up to 38 languages Use from command line or via the Aeneas Web App 11/9/2018 High Tech Center Training Unit

High Tech Center Training Unit Aeneas Web App (AWA) Free sign up Aligns a transcript with a video (step 4) If you have a YouTube video, the AWA will also: Download YouTube's transcription (step 1) Allow you to edit the transcription (step 2) Chunk the transcription (step 3) The caption file is sent to your email in these formats: SRT, VTT, JSON, SAMI 11/9/2018 High Tech Center Training Unit

Segmenting and Aligning with the Aeneas Web App A Demonstration Use Safari…Remove the See more option with Videos… https://youtu.be/kdwEnaNmCGw

Video Experiment

Video Details Title: "Orthodox Environmentalism" Speaker(s): Andrew Stephen Damick YouTube Link: : https://youtu.be/k_HZczxGfnY Video Length: 3:17 Key Words Environmentalism Orthodox Seraphim of Sarov Possible Challenges Bird chirping at beginning Music a little overpowering at times 11/9/2018 HTCTU

Experiment Goals How long it takes to generate a transcript: Creating a transcript from scratch vs. editing an automatic transcript How long it takes to segment a transcript: Manually create chunks vs. using a script How accurately the transcript is synchronized with video: YouTube synchronization vs. Aeneas synchronization 11/9/2018 HTCTU

Our Manual Benchmarks Length of Time: Typing Speed: Length of Time: Listen and Type Method: Listen and Echo (DNS v. 15) Method: Video open in one window A text editor in another window Length of Time: 18:34.04 Typing Speed: 234 CPM, 47 WPM Video open in one window MS Word in other window Length of Time: 10:49.53 (first pass) 5:71.92 (editing mistakes) = 16:21.45 (total time) User Profile Notes: Profile had been used only a couple of times 11/9/2018 HTCTU

Automatic Transcription Processing Time YouTube (High Speed): ~36 minutes to complete upload process + automatic captions Google Docs Voice Typing 3:19.30 Gentle 2:37.99 IBM Watson (High Speed): 2:25.14 PocketSphinx 1:54.05 Dragon Professional (v. 15 for PC) 1:19.00 *This work is tedious Will depend on the length and quality of the video Video uploaded to YouTube had these specs: General Complete name : /Users/jpolizzotto/Desktop/Orthodox Environmentalism-k_HZczxGfnY.mp4gsst : 0Overall bit rate : 1 277 Kbps gstd : 197090File size : 30.0 MiBDuration : 3mn 17sFormat profile : Base Media / Version 2Format : MPEG-4Encoded date : UTC 2016-11-16 11:25:49Tagged date : UTC 2016-11-16 11:25:49Overall bit rate mode : VariableCodec ID : mp42 (isom/mp42) VideoBits/(Pixel*Frame) : 0.052ID : 1Bit rate : 1 147 KbpsWidth : 1 280 pixelsDisplay aspect ratio : 16:9Minimum frame rate : 23.974 fpsFrame rate : 23.976 (24000/1001) fpsMaximum frame rate : 23.981 fpsStream size : 26.9 MiB (90%)Format settings, ReFrames : 3 framesDuration : 3mn 17s 11/9/2018 HTCTU

Editing (Automated) Transcripts YouTube (YT captions editor) 8:12.11 Google VoiceTyping (Otranscribe) 8:48.77 Gentle (Otranscribe) 16:02.07 IBM Watson (Otranscribe) 12:13.84 PocketSphinx (Otranscribe) 13:51.19 Dragon Professional (Otranscribe) 9:16.11 Dragon Professional (MS Word) 9:04.20 * ”Manual Entry” used as a reference point. Open two windows for editing 11/9/2018 HTCTU

Key Findings Editing a "raw" transcript is faster than creating captions from scratch YouTube "auto" captions May be fastest to edit BUT more work is necessary to edit caption blocks YouTube "auto" captions and Google VoiceTyping NOT the same speech to text algorithm is used Recommendation Google Voice Typing (free) for multiple speakers Dragon Naturally Speaking (paid) for a single speaker Sphinx or Gentle STT may be better when sound is minimal 11/9/2018 HTCTU

Other Findings Noise and Hesitation Markers Noise Issues Sphinx adds [NOISE] marker IBM Watson adds %HESITATION marker Noise Issues IBM Watson, Gentle, and Sphinx had difficult time with noise Otherwise, IBM Watson was very accurate Spacing Issues IBM Watson breaks utterances into new lines, increasing editing time Punctuation Issues: Only DNS inserts commas and periods 11/9/2018 HTCTU

Segmenting the Transcript Manual Method Script Method STEPS: Using Text Wrangler, hard wrap at 40 characters per line Edit for logical grammatical chunks Add a blank line between sentences TIME: 7:10.54 Steps: Run Perl script: sentence-boundary.pl places sentence on own line Run Bash script: caption blocks of < 40 characters respect sentence breaks space between caption blocks TIME: 35.47 I used Text Wrangler to perform Manual method 11/9/2018 HTCTU

Synching the Transcript YouTube Synching Aeneas Synching Using an "unchunked" transcript, YouTube will create >42 character caption blocks Grammatical units are correctly joined (improvement!) as long as punctuation is added Accurate time stamps Each caption block remains on the screen until the next block Accurate time stamps Caption blocks appear only for duration of relevant audio (remove non- speech intervals) N.B. When uploading a “segmented” transcript to YouTube, YouTube will retain the same formatting of the caption blocks in the output subtitle file 11/9/2018 HTCTU

High Tech Center Training Unit Captioning Tips Share free resources / tools Educate about quality captions Encourage creation of a transcript beforehand For third party content, use Amara.org Avoid violation of copyright A link to a captioned video can be shared with students Amara does not allow uploading a whole transcript- caption chunks will need to be done manually Time stamps also will need to be done manually - Next test YouTube auto segmenting steps 11/9/2018 High Tech Center Training Unit

High Tech Center Training Unit Summary YouTube's captioning process can inhibit the creation of quality captions Humans still required for editing the transcript Automate each step in a captioning workflow Speech to text (STT) technology segmentation for chunking Forced alignment tools for time stamping 11/9/2018 High Tech Center Training Unit

High Tech Center Training Unit Captioning Resources 3Play Media: http://www.3playmedia.com/ Popular captioning vendor Articles and webinars on captioning laws and tools Captioning Key: http://www.captioningkey.org/ Information on quality captions Amara.org: https://www.amara.org/en/ Caption third party content from YouTube et al. 11/9/2018 High Tech Center Training Unit