Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tools for Automating the Captioning of Video

Similar presentations


Presentation on theme: "Tools for Automating the Captioning of Video"— Presentation transcript:

1 Tools for Automating the Captioning of Video
Joseph Polizzotto Access Technology Specialist Instructor High Tech Center Training Unit (HTCTU)

2 High Tech Center Training Unit
Goals for this Session Identify common challenges Review YouTube's captioning process State rationale for automation in captioning Demonstrate use of automation tools: Transcribing a video Segmenting a transcript into chunks Aligning a corrected transcript with the video 11/9/2018 High Tech Center Training Unit

3 Captioning Challenges
Expensive ($2-3 per minute) Time-consuming Depends on the video but generally 5x length of video Turn-around times Standard rate varies but generally 3-4 business days Learning how to use the technology Third-party content must be captioned 11/9/2018 High Tech Center Training Unit

4 High Tech Center Training Unit
YouTube Videos Sign-up is free Videos are hosted remotely YouTube auto-captions the videos! Why not just use the built-in captioning tools? Free Fast Easy to learn Community captioning! Captions can be turned on and off 11/9/2018 High Tech Center Training Unit

5 Captioning with YouTube
Free Google/ YouTube Upload video to YouTube Edit YouTube's auto-captions Add speaker IDs Add non-speech information Save and submit the corrected captions 11/9/2018 High Tech Center Training Unit

6 YouTube Captioning Tools
Crowdsourcing Upload transcripts and caption files Download caption files (SRT, VTT) YouTube recognizes non-speech sounds 11/9/2018 High Tech Center Training Unit

7 High Tech Center Training Unit
YouTube Drawbacks Auto-captions They are inaccurate They may not generate They don't have punctuation They are seamless, appearing at silent intervals Correcting auto-captions You must correct inside of caption blocks You must correct caption blocks for good grammar 11/9/2018 High Tech Center Training Unit

8 Captioning with YouTube
A Demonstration

9 High Tech Center Training Unit
Assessment of YouTube Captioning in YouTube is great but... Remember quality caption standards captions must have punctuation captions should be ~32 characters per line caption block divisions should reflect grammar captions should not appear during silent intervals By automating the entire captioning process, YouTube may actually lead to: a slower editing process poorer quality caption blocks Captioning key 11/9/2018 High Tech Center Training Unit

10 Automation and Captioning
Automation is still a good idea! Automation is best one step at a time: Transcription What words are in the video Segmentation how these words are chunked Alignment when the chunks appear during the video 11/9/2018 High Tech Center Training Unit

11 A "Semi-Automated" Workflow…
Transcribe the video (machine) Edit the transcription (humans must do!) Chunk the corrected transcript (machine) Align the chunks with the video (machine) 11/9/2018 High Tech Center Training Unit

12 1) Automation and Transcription
Speech to Text Services (STT), e.g., IBM Watson Google Cloud Speech API Speechmatics Word Error Rates (WER) State of the art is around 96% (our own testing) Even with a quality transcription, you must edit: misrecognitions speaker identification non-speech information 11/9/2018 High Tech Center Training Unit

13 Word Error Rate (WER) Methodology Prepare a perfect transcript
Eliminate all punctuation Place each word on its own line For each STT generated transcript, Put each word on its own line Eliminate insignificant differences such as spelling variants and capitalization Use DIFF and DIFFSTAT tools to compare the two Divide differences by number of words in perfect transcript

14 Word Error Rate Comparison
11/9/2018 High Tech Center Training Unit

15 Word Error Rate Percentage
YouTube 4% Speechmatics 7% Pop-up Archive 7% Trint 8% Google Speech 9% Google Voice Typing 13% Dragon (trained) 14% Dragon (untrained) 23% IBM Watson 26% Microsoft Bing 29% Tests Performed: July 2017

16 YouTube Transcription
Word Error Rate of 4% (best) Do It Yourself Captions ( Edit and download captions as plain text Open video in Amara (for third party content) Aeneas Web App ( Can use to perform segmentation + alignment also (steps 3 and 4 in captioning process) us for script that will download TXT file Adds punctuation too! Also check out: 11/9/2018 High Tech Center Training Unit

17 High Tech Center Training Unit
Dragon Transcription Dragon Premium or Professional Speaker-dependent Only one speaker in audio/video file A profile can be created from an audio file Requires ~5 minute long recording Best transcription occurs when profile is trained Correct misrecognitions Save profile Save as DOCX or RTF 11/9/2018 High Tech Center Training Unit

18 Google VoiceTyping Transcription
Google Docs Tool Speaker-independent Transcribe audio file with multiple speakers Record from sound coming from computer For Mac, use Soundflower application For PC, use Stereo Mix recording output Steps: Play audio/video file Activate VoiceTyping 11/9/2018 High Tech Center Training Unit

19 Transcribing with Google Voice Typing
A Demonstration

20 2) Editing the Transcript
Inevitable in ANY captioning workflow… add speaker IDs add non-speech information correct misrecognitions and add punctuation oTranscribe ( Free Use offline Opens in web browser Link to YouTube videos Easy shortcuts for video playback 11/9/2018 High Tech Center Training Unit

21 Editing with oTranscribe
A Demonstration

22 3) Chunking the Transcript
Chunks = caption blocks Quality caption blocks will not have: More than two lines of text More than ~32 characters per line Two sentences on same line Breaking of grammatical constructions Preposition + prepositional phrases Text segmentation tools Bash Script 11/9/2018 High Tech Center Training Unit

23 Chunking with a Bash Script
A Demonstration

24 High Tech Center Training Unit
4) Aligning the Chunks Alignment = adding time stamps Caption chunks must be in synch with video Aeneas ( A Python/C library Quickly creates captions files (e.g., SRT) Use when you have a transcript for your video Can be used with text files in up to 38 languages Use from command line or via the Aeneas Web App 11/9/2018 High Tech Center Training Unit

25 High Tech Center Training Unit
Aeneas Web App (AWA) Free sign up Aligns a transcript with a video (step 4) If you have a YouTube video, the AWA will also: Download YouTube's transcription (step 1) Allow you to edit the transcription (step 2) Chunk the transcription (step 3) The caption file is sent to your in these formats: SRT, VTT, JSON, SAMI 11/9/2018 High Tech Center Training Unit

26 Segmenting and Aligning with the Aeneas Web App
A Demonstration Use Safari…Remove the See more option with Videos…

27 Video Experiment

28 Video Details Title: "Orthodox Environmentalism"
Speaker(s): Andrew Stephen Damick YouTube Link: : Video Length: 3:17 Key Words Environmentalism Orthodox Seraphim of Sarov Possible Challenges Bird chirping at beginning Music a little overpowering at times 11/9/2018 HTCTU

29 Experiment Goals How long it takes to generate a transcript:
Creating a transcript from scratch vs. editing an automatic transcript How long it takes to segment a transcript: Manually create chunks vs. using a script How accurately the transcript is synchronized with video: YouTube synchronization vs. Aeneas synchronization 11/9/2018 HTCTU

30 Our Manual Benchmarks Length of Time: Typing Speed: Length of Time:
Listen and Type Method: Listen and Echo (DNS v. 15) Method: Video open in one window A text editor in another window Length of Time: 18:34.04 Typing Speed: 234 CPM, 47 WPM Video open in one window MS Word in other window Length of Time: 10:49.53 (first pass) 5:71.92 (editing mistakes) = 16:21.45 (total time) User Profile Notes: Profile had been used only a couple of times 11/9/2018 HTCTU

31 Automatic Transcription Processing Time
YouTube (High Speed): ~36 minutes to complete upload process + automatic captions Google Docs Voice Typing 3:19.30 Gentle 2:37.99 IBM Watson (High Speed): 2:25.14 PocketSphinx 1:54.05 Dragon Professional (v. 15 for PC) 1:19.00 *This work is tedious Will depend on the length and quality of the video Video uploaded to YouTube had these specs: General Complete name : /Users/jpolizzotto/Desktop/Orthodox Environmentalism-k_HZczxGfnY.mp4gsst : 0Overall bit rate : Kbps gstd : File size : 30.0 MiBDuration : 3mn 17sFormat profile : Base Media / Version 2Format : MPEG-4Encoded date : UTC :25:49Tagged date : UTC :25:49Overall bit rate mode : VariableCodec ID : mp42 (isom/mp42) VideoBits/(Pixel*Frame) : 0.052ID : 1Bit rate : KbpsWidth : pixelsDisplay aspect ratio : 16:9Minimum frame rate : fpsFrame rate : (24000/1001) fpsMaximum frame rate : fpsStream size : 26.9 MiB (90%)Format settings, ReFrames : 3 framesDuration : 3mn 17s 11/9/2018 HTCTU

32 Editing (Automated) Transcripts
YouTube (YT captions editor) 8:12.11 Google VoiceTyping (Otranscribe) 8:48.77 Gentle (Otranscribe) 16:02.07 IBM Watson (Otranscribe) 12:13.84 PocketSphinx (Otranscribe) 13:51.19 Dragon Professional (Otranscribe) 9:16.11 Dragon Professional (MS Word) 9:04.20 * ”Manual Entry” used as a reference point. Open two windows for editing 11/9/2018 HTCTU

33 Key Findings Editing a "raw" transcript is faster than creating captions from scratch YouTube "auto" captions May be fastest to edit BUT more work is necessary to edit caption blocks YouTube "auto" captions and Google VoiceTyping NOT the same speech to text algorithm is used Recommendation Google Voice Typing (free) for multiple speakers Dragon Naturally Speaking (paid) for a single speaker Sphinx or Gentle STT may be better when sound is minimal 11/9/2018 HTCTU

34 Other Findings Noise and Hesitation Markers Noise Issues
Sphinx adds [NOISE] marker IBM Watson adds %HESITATION marker Noise Issues IBM Watson, Gentle, and Sphinx had difficult time with noise Otherwise, IBM Watson was very accurate Spacing Issues IBM Watson breaks utterances into new lines, increasing editing time Punctuation Issues: Only DNS inserts commas and periods 11/9/2018 HTCTU

35 Segmenting the Transcript
Manual Method Script Method STEPS: Using Text Wrangler, hard wrap at 40 characters per line Edit for logical grammatical chunks Add a blank line between sentences TIME: 7:10.54 Steps: Run Perl script: sentence-boundary.pl places sentence on own line Run Bash script: caption blocks of < 40 characters respect sentence breaks space between caption blocks TIME: 35.47 I used Text Wrangler to perform Manual method 11/9/2018 HTCTU

36 Synching the Transcript
YouTube Synching Aeneas Synching Using an "unchunked" transcript, YouTube will create >42 character caption blocks Grammatical units are correctly joined (improvement!) as long as punctuation is added Accurate time stamps Each caption block remains on the screen until the next block Accurate time stamps Caption blocks appear only for duration of relevant audio (remove non- speech intervals) N.B. When uploading a “segmented” transcript to YouTube, YouTube will retain the same formatting of the caption blocks in the output subtitle file 11/9/2018 HTCTU

37 High Tech Center Training Unit
Captioning Tips Share free resources / tools Educate about quality captions Encourage creation of a transcript beforehand For third party content, use Amara.org Avoid violation of copyright A link to a captioned video can be shared with students Amara does not allow uploading a whole transcript- caption chunks will need to be done manually Time stamps also will need to be done manually - Next test YouTube auto segmenting steps 11/9/2018 High Tech Center Training Unit

38 High Tech Center Training Unit
Summary YouTube's captioning process can inhibit the creation of quality captions Humans still required for editing the transcript Automate each step in a captioning workflow Speech to text (STT) technology segmentation for chunking Forced alignment tools for time stamping 11/9/2018 High Tech Center Training Unit

39 High Tech Center Training Unit
Captioning Resources 3Play Media: Popular captioning vendor Articles and webinars on captioning laws and tools Captioning Key: Information on quality captions Amara.org: Caption third party content from YouTube et al. 11/9/2018 High Tech Center Training Unit


Download ppt "Tools for Automating the Captioning of Video"

Similar presentations


Ads by Google