Multimedia Synchronization Brian P. Bailey Spring 2006
Announcements
MM Synchronization Applications composed of more than one media (at least one continuous) Express desired relationships –content, spatial, temporal, and interaction –combinations of each
Content and Spatial Relationships Content –define how views relate to data sources –e.g., a graph linked to a table of data Spatial –define relative positions of media objects –subdivide the space; express relationships –e.g., pack command in Tcl, layout managers in Java, tables in HTML, etc.
Temporal Relationships Define how media are coordinated in time –audio should not drift from video by > 80ms –voice narration should accompany a slide and end when user navigates elsewhere –display different caption for each video scene; and update it in response to user interaction Intra-media and inter-media relationships Time-independent and dependent media
Lip Synchronization Left: audio after video; Right: audio before video
Lip Synchronization Tolerable Not tolerable Not detectable Tolerable Not tolerable
Tele-pointer Synchronization Left: pointer before audio; Right: pointer after audio
Synchronization Guidelines Lip synchronization within 80ms –video before audio is more tolerable Other fine-grained synchronization should typically be within range of 500ms
Interaction Relationships Define how interaction affects playback –e.g., if user transitions to next slide in narrated slide show, narration should change as well Classes of interaction –navigation, participation, and control –asynchronous and synchronous
Synchronization Model Enables expression of media and synchronization relationships An effective model should support: –spatial and temporal relations (fine & coarse) –rich interaction (beyond VCR control) –efficient runtime (interaction monitoring) –be usable and comprehensible
Models Timeline Hierarchical Petri net Interval Event-based Common threads –provide language to express relationships –runtime system to monitor relationships –policies to enforce relationships
Timeline Model Uses a single global timeline Actions triggered when the time marker reaches a specific point along timeline
Example Define a timed sequence of images, each image has a caption that goes with it I1I1 C1C1 t1t1 I2I2 C2C2 t2t2 I3I3 C3C3 t3t3
Example (Cont.) Rule language –At (t1), show (I1, C1) –At (t2), show (I2, C2) –At (t3), show (I3, C3) Visual environment
Hierarchical Model (SMIL) Based on sequential and parallel Apply operators to only the start/end points of each media object I1I2I3I1T1
Example Narrated slide show –image, text, audio on each slide –select link to move to the next slide S1A1T1 … I1 S2A2T2I2
Timed Petri Nets tokens, places, transitions, and arcs
Example 11ms 33ms 11ms 33ms Specify audio video synchronization
Interval Model 13 relationships between two intervals Before A B Meets A B During A B Overlaps A B Starts A B Ends A B Equal A B
Associate actions with expressions Expressions may contain scalars, clocks, variables, relations, and connectives When the expression becomes TRUE, invoke associated action When “Time > Q.end + 5 && !Response” Answer=WRONG Event Model (Nsync)
Background and Time Model Each media object attached to a clock Clock implements logical time –Value = Rate * System + Offset Express temporal behavior as relationships among clocks Interactive events tied to variables
Overview No Yes MoreInfo? DetailedNarration More Info Example: Delayed Transition
Model Specification When “Narration >= Overview && ! MoreInfo” NextSlide When “Narration >= Overview && MoreInfo” PlayDetails When “Narration >= Overview + Details” NextSlide Narration : narration’s logical timeline Overview : normal transition point Details : additional narrative details MoreInfo : records kitchen info status
Reactive Interface
Model Specification When “Video >= 0 && Video < T1” Select Kitchen When “Video >= T1 && Video < T2” Select Deck When “Video >= T2 && Video <= T3” Select Yard
Expression Evaluation Propositional logic breaks down –returns logic value only at present time –requires polling to catch future transitions Predictive logic –returns logic value at present time along with a prediction of any future transition –eliminates need for intermittent polling/timers
Predictive Logic States WBT(t) False now, but Will Become True at future time t WBF(t) True now, but Will Become False at future time t
Prediction Example When “Video > 10” Action When “Video > 10” Action 10 Video Time Rate = 1 (then - now) t = rate System Time t = (10 - 0) / 1 = WBT(10) 0
Prediction Example When “Video > 10” Action When “Video > 10” Action 0 10 Video Time Rate = 1 (then - now) ? = rate System Time ? = (10 - 3) / 2 = WBT(3.5) ? Rate = 2 3
Evaluation Rules for “AND” WBT(x) && WBT(y) = WBT( max(x, y) ) WBF(x) && WBF(y) = WBF( min(x, y) ) WBF(x) && WBT(y) = WBT(infinity) if (x < y) WBT(y) then WBF(x) otherwise
Take Home Exercise WBT(x) | | WBT(y) = ? WBF(x) | | WBF(y) = ? WBF(x) | | WBT(y) = ?
Pros Complements current languages –adds ability to express combinations of interactive and temporal behavior –syntax can easily be translated into mark up Predictive logic useful in run-time engines –eliminates need for polling/timers –enables look-ahead pre-fetching
Cons Difficult to visualize rule propagation –makes system difficult to debug Rules are not groups into hierarchies –enable divide and conquer strategy Lack of scope –all rules always active –guard actions with complex expressions
Take Home Exercise Be able to model relationships within relatively simple applications Weigh tradeoffs between models