WP4-22. Final Evaluation of Subtitle Generator Vincent Vandeghinste, Pan Yi CCL – KULeuven.

Slides:



Advertisements
Similar presentations
1 Radio Maria World. 2 Postazioni Transmitter locations.
Advertisements

Números.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
SKELETAL QUIZ 3.
PDAs Accept Context-Free Languages
ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala
/ /17 32/ / /
Reflection nurulquran.com.
EuroCondens SGB E.
Worksheets.
Sequential Logic Design
STATISTICS Linear Statistical Models
Addition and Subtraction Equations
By John E. Hopcroft, Rajeev Motwani and Jeffrey D. Ullman
1 When you see… Find the zeros You think…. 2 To find the zeros...
Western Public Lands Grazing: The Real Costs Explore, enjoy and protect the planet Forest Guardians Jonathan Proctor.
EQUS Conference - Brussels, June 16, 2011 Ambros Uchtenhagen, Michael Schaub Minimum Quality Standards in the field of Drug Demand Reduction Parallel Session.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
Summative Math Test Algebra (28%) Geometry (29%)
Introduction to Turing Machines
ASCII stands for American Standard Code for Information Interchange
Using symbols instead of numbers What number can be added to 21 to get 50? + 15 = 277 x = 56 Sometimes questions have a missing number that you need to.
K INDERGARTEN : READY BY EXIT Curriculum Standards & Achievement Expectations.
The 5S numbers game..
突破信息检索壁垒 -SciFinder Scholar 介绍
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Sampling in Marketing Research
The basics for simulations
Factoring Quadratics — ax² + bx + c Topic
Figure 3–1 Standard logic symbols for the inverter (ANSI/IEEE Std
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
1 Prediction of electrical energy by photovoltaic devices in urban situations By. R.C. Ott July 2011.
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Progressive Aerobic Cardiovascular Endurance Run
Visual Highway Data Select a highway below... NORTH SOUTH Salisbury Southern Maryland Eastern Shore.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
TCCI Barometer September “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
When you see… Find the zeros You think….
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
ST/PRM3-EU | | © Robert Bosch GmbH reserves all rights even in the event of industrial property rights. We reserve all rights of disposal such as copying.
Foundation Stage Results CLL (6 or above) 79% 73.5%79.4%86.5% M (6 or above) 91%99%97%99% PSE (6 or above) 96%84%100%91.2%97.3% CLL.
Subtraction: Adding UP
Numeracy Resources for KS2
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
2.4 Bases de Dados Estudo de Caso. Caso: Caixa Eletrônico Caixa Eletrônico com acesso à Base de Dados; Cada cliente possui:  Um número de cliente  Uma.
Static Equilibrium; Elasticity and Fracture
Resistência dos Materiais, 5ª ed.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
Take any two digit number between 11 to Add the numbers (Example: Number 54 : 5+4=9). 2. Now subtract the answer from original number (i.e. 54-9=45,
Biostatistics course Part 14 Analysis of binary paired data
UNDERSTANDING THE ISSUES. 22 HILLSBOROUGH IS A REALLY BIG COUNTY.
Comparison between tillage transect and supplementary data GLPF Grant- Team meeting #5 July 23-24, 2013.
Patient Survey Results 2013 Nicki Mott. Patient Survey 2013 Patient Survey conducted by IPOS Mori by posting questionnaires to random patients in the.
A Data Warehouse Mining Tool Stephen Turner Chris Frala
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
Chart Deception Main Source: How to Lie with Charts, by Gerald E. Jones Dr. Michael R. Hyman, NMSU.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Introduction Embedded Universal Tools and Online Features 2.
What impact does the address have on the tribe?
úkol = A 77 B 72 C 67 D = A 77 B 72 C 67 D 79.
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Presentation transcript:

WP4-22. Final Evaluation of Subtitle Generator Vincent Vandeghinste, Pan Yi CCL – KULeuven

Example Transcript: Het meest spectaculaire aan de daadwerkelijke start van de euro is dat er eigenlijk niets spectaculairs te melden valt. Ondertitel: Het meest spectaculaire aan de start van de euro was dat er niets spectaculairs te melden valt.

Flow

Availability Calculator Pronunciation Time of Input Sentence => estimate nr of characters available in subtitle If UNKNOWN, estimate it by – counting nr of syllables –Average speaking rate for Dutch

Syllable Counter Rule-based Evaluated on CGN-lexicon combined with FREQ-lists Estimated nr  Nr of syl in phonetic transcripts 99.63% of all words in CGN is correctly estimated

Average Syllable Duration ASDNo pausesPauses included Literature177 ms All CGN files186 ms237 ms One Speaker185 ms239 ms Read-aloud188 ms256 ms

Availability Calculator When pronunciation time not given: estimate it Subtitles: 70 chars / 6 sec = chars/sec If nr of chars > nr of available chars => compress sentence

Sentence Compressor Parallel Corpus Sentence Analysis Sentence Compression Evaluation

Parallel Corpus Sentence aligned Source & Target corpus: –Tagging –Chunking –SSUB detection Chunk alignment

Chunk Alignment Every 4-gram from src-chnk is compared with every 4-gram from tgt-chnk A = ( m / (m+n)). (L1 + L2)/2 If (A > 0.315) then Align Chunk F-value for NP/PP-alignment is 95%

Sentence Analysis Tagging (TnT): accuracy = 96.2% (Oostdijk et al., 2002) Chunking Chunk TypePrec.RecallF-value NP94.36%93.91%94.13% PP94.84%95.22%95.03%

Sentence Analysis (2) SSUB detection Type of SPrec.RecallF-value OTI71.43%65.22%68.18% RELP69.66%68.89%69.27% SSUB56.83%60.77%58.74%

Sentence Compression Use of statistics Use of rules Word reduction Selection of the Compressed Sentence

Use of statistics

Use of rules To avoid generating ungrammatical sentences Rules of type For every NP, never remove the head noun Rules are applied recursively

Word Reduction Example: replace gevangenisstraf by straf Counterexample: replace voetbal by bal Making use of Wordbuilding module (WP2) Introduces a lot of errors: added accuracy? Better integration with rest of system should be possible

Selection of the Compressed Sentence All previous steps result in an ordered list of sentence alternatives –Supposedly grammatically correct –Sentences are ordered depending on their probability –First sentence (most probable) with a length smaller than available nr of chars is chosen

Evaluation ConditionABC ASD185 ms/syl192ms/syl256 ms/syl No output44.33%41.67%15.67% Red rate39.93%37.65%16.93% Interrater Agreement 86.2%86.9%91.7% Accurate4.8%8.0%28.9% ± accurate28.1%26.3%22.1% Reasonable32.9%34.3%51%

Subtitle Layout Generator Actieve of gewezen voetballers zoals Ruud Gullit of Dennis Bergkamp moeten het stellen met nauwelijks anderhalf miljard. wordt Actieve of gewezen voetballers zoals Ruud Gullit of Dennis Bergkamp moeten het stellen met nauwelijks anderhalf miljard.

Conclusion System approach works very well: –If sentence analysis is correct –If there are possible reductions (according to the ruleset) A lot of No Output cases: System cannot reduce sentence –Sentence cannot be reduced (even by humans) –Rule-set is too strict / Wrong sentence analysis –Not fine-grained enough statistical info Bad output: –Wrong sentence analysis (CONJ) –Wrong word-reductions

Future Near future (within Atranos) –Better integration of word-reduction –Combine advantages of CNTS approach and CCL approach into one approach Far future (outside Atranos) –Better sentence analysis: full parse is needed –More fine-grained analysis of parallel corpus