Semantic representation of events in 3D animation Minhua Eunice Ma and Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Informatics.

Semantic representation of events in 3D animation Minhua Eunice Ma and Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Informatics University of Ulster, Northern Ireland

Homer (story generation) CONFUCIUS (story interpretation & presentation) Seanchaí user input (text: stories, play/movie scripts) multimodal presentation natural language stories Seancha í : an Intelligent MultiMedia storyteller

CONFUCIUS: story interpretation & presentation Story in natural language CONFUCIUS Movie/drama script 3D animation non-speech audio Tailored menu for script input Speech (dialogue) Storywriter /playwright User /story listener

Architecture of CONFUCIUS 3D authoring tools visual knowledge (3D graphic library) Prefabricated objects (knowledge base) Script writer Script parser Natural Language Processing Text To Speech Sound effects Animation generation Synchronizing & fusion 3D world with audio in VRML Natural language stories Language knowledge mapping lexicon grammar etc semantic representations visual knowledge

Semantic Representation Languages Sentence level semantics  FOPC (First Order Predicate Calculus)  Semantic networks  Conceptual Dependency (CD) (Schank 1973) Primitives and scripts  Frame-based representations (Minsky 1975) Verb Semantics  event-logic truth conditions (Siskind 1995)  x-schemas with f-structures (Bailey et al. 1997)

MultiModal semantic representation Multimodal semantics Language modalityVisual modality Non-speech audio modality Media-independent representation Visual media-dependent representation Intermediate level High-level multimodal semantic representation: XML-based/frame-based Audio media-dependent representation

knowledge base Language knowledge Visual knowledge World knowledge Spatial & qualitative reasoning knowledge Semantic knowledge - lexicons (eg. WordNet) Syntactic knowledge - grammars Statistical models of language Associations between words Object model (nouns) Functional information Internal coordinate axes (for spatial reasoning) Associations between objects Knowledge base of CONFUCIUS Event model (event verbs, describes the motion of objects)

Categories of events  Atomic entities Change physical location such as position and orientation, e.g. “bounce”, “turn” Change intrinsic attributes such as shape, size, color, and texture, e.g. “bend”, and even visibility, e.g. “disappear”, “fade” (in/out)  Non-atomic entities Non-character events Two or more individual objects fuse together, e.g. “ melt ” (in) One object divides into two or more individual parts, e.g. “ break ” (into pieces) Change sub-components (their position, size, color), e.g. “ blossom ” Environment events (weather verbs), e.g. “ snow ”, “ rain ” Character events Action verbs  Intransitive verbs  Transitive verbs Non-action verbs (stative, emotion, possession, mental activities, cognition & perception) Idioms & metaphor verbs

Categories of action verbs  Intransitive verbs Biped kinematics, e.g. “walk”, “swim”, & other motion models like “fly” Face expressions, e.g. “laugh”, “anger” Lip movement, e.g. “speak”, “say”  Transitive verbs single object, e.g. “throw”, “push”, “kick” multiple objects direct and indirect objects, e.g. “ give ”, “ pass ”, “ show ” indirect object & the tool used to perform the action, e.g. “ cut ”, “ hammer ” involve speech modality

Basic predicate-arguments 1) move(obj, xInc, yInc, zInc) 2) moveTo(obj, loc) 3) moveToward(obj,loc,displacement) 4) rotate(obj,xAngle,yAngle,zAngle) 5) faceTo(obj1, obj2) 6) alignMiddle(obj1, obj2, axis) 7) alignMax(obj1, obj2, axis) 8) alignMin(obj1, obj2, axis) 9) alignTouch(obj1, obj2, axis) 10) touch(obj1, obj2, axis) 11) scale(obj, rate) 12) squash(obj, rate, axis) 13) group(x, [y|_], newObj) 14) ungroup(xyList, x, yList)

3 rd level 2 nd level Atomic level moveToward(), alignMiddle(),alignTouch(), alignMax(), alignMin(), faceTo() move(), moveTo(), rotate(), scale(), squash() touch() Hierarchical structure of predicates

Front viewTop view before after x y z obj1 obj2 obj1 obj2 obj1 obj2 obj1 obj2 obj1 obj2 obj1 obj2 x z y obj1 obj2 obj1 obj2 obj1 is in the front obj2 is on the top touch(obj1, obj2, x):- alignMiddle(obj1,obj2,y), alignMiddle(obj1,obj2,z), alignTouch(obj1,obj2,x). touch(obj1, obj2, y):- alignMiddle(obj1,obj2,z), alignMiddle(obj1,obj2,x), alignTouch(obj1,obj2,y). touch(obj1, obj2, z):- alignMiddle(obj1,obj2,x), alignMiddle(obj1,obj2,y), alignTouch(obj1,obj2,z).

Decomposite predicate-argument model -- an example: “call” First Level call(a):- type(a, Person), type(tel, Telephone), pickup(a, tel.receiver,a.leftEar), dial(a, tel.keypad), speak(a, tel.receiver), putdown(a, tel.receiver, tel.set). Second Level pickup(x,obj,dest):- type(x, Person), moveToward(x.leftHand,location(obj),location(obj)-location(x)-5), touch(x.leftHand, obj, axis), group(x.leftHand, obj, xHandObj), moveToward(xHandObj, dest, _). putdown(x, obj, dest):- moveTo(x.leftHand, dest), ungroup(x, obj, x1), type(x1, Person).

Visual definition & word sense verbword sensevisual definition entry one many mapping word sense -- minimal complete unit of meaning in the language modality visual definition entry -- minimal complete unit of meaning in the visual modality polysemy synonymy Example: “close” (a door) 1.a normal door (rotation on y axis) 2.a sliding door (moving on x axis) 3.a rolling shutter door (a combination of rotation on x axis and moving on y axis) many

Troponyms & verbs derived from adjectives/nouns  troponym elaborates the manners of a base verb (Fellbaum 1998) examples: “trot”-“walk” (fast), “gulp”-“eat” (quickly) base verb + adverb present the base verb + modify the manner (speed, the agent’s state, duration of the activity, iteration, etc.)  Verbs derived from adjectives or nouns change objects’ properties (size, color, shape) or the world state verbs with affixes such as –en, -ify, or –ize, e.g. “lengthen” using predicates scale(), squash() or changing the corresponding property fields of the object in VRML

Representing active & passive voice  active and passive voice  converse verb pairs such as “give/take”, “buy/sell”, “lend/borrow”  same activity from different point of view  use of VRML Viewpoint node

Implementation: semantics  VRML bounce(obj):- move(obj, 0, 20, 0), move(obj, 0, -20, 0). (a) visual definition of “bounce ” DEF ball Transform { translation 0 0 0 children [ DEF ball-TIMER TimeSensor { loop TRUE cycleInterval 0.5 }, DEF ball-POS-INTERP PositionInterpolator { key [0, 0.5, 1 ] keyValue [0 0 0, 0 20 0, 0 0 0 ] }, Shape { appearance Appearance { material Material {} } geometry Sphere { radius 5 } }] ROUTE ball-TIMER.fraction_changed TO ball-POS-INTERP.set_fraction ROUTE ball-POS-INTERP.value_changed TO ball.set_translation } (c) Output  VRML code of a bouncing ball Example: “A ball is bouncing” DEF ball Transform { translation 0 0 0 children [ Shape { appearance Appearance{ material Material{} } geometry Sphere { radius 5 } ] } (b) VRML code of a static ball

Semantic decomposition  previous decomposite methologies (e.g. Schank’s CD analysis)  basic predicates “move”, “go”, “change”  pros and cons  generative and interpretative facilities (Jackendoff, 1972)  inadequate to capture the creative aspect of meaning  comparison  aimed at presentation purposes for visual modalities  no emphasis on atomic predicates Relation to previous work

scriptsExtended predicate-argument representation rob(person, place):- obtain(person, gun), go(person, place), holdUp(person, place), escape(person, place). call(a):- pickup(a,tel.receiver,a.leftEar), dial(a, tel.keypad), speak(a, tel.receiver), putdown(a,tel.receiver, tel.set). orderFood(person):- ATRANS(waiter,person,menu), MTRANS(menu, person), MBUILD(person, choice), TRANS(person,waiter,choice). pickup(x,obj,dest):- moveToward(x.leftHand,location(obj), location(obj)-location(x)-5), touch(x.leftHand, obj, axis), group(x.leftHand, obj, xhandObj), moveToward(xhandObj, dest, _). Event levelsExample verbs Routine eventsRob, cook, interview, eatOut Simple action verbsjump, lift, give, walk, push Primitive actionsATRANS, PTRANS, MOVE (Script) move, rotate (Extended predicate-argument representation) high level low level

Conclusion & future work  Conclusion  formalizes meaning of action verbs  implement in Java & VRML  reusable in other systems  Future work  inadequate  vagueness problem in language visualisation (underspecification)  temporal relations between sub-activities  representing non-action verbs & adjectives  using other modalities (e.g. speech/audio) to aid event representation

Semantic representation of events in 3D animation Minhua Eunice Ma and Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Informatics.

Similar presentations

Presentation on theme: "Semantic representation of events in 3D animation Minhua Eunice Ma and Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Informatics."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Semantic representation of events in 3D animation Minhua Eunice Ma and Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Informatics.

Similar presentations

Presentation on theme: "Semantic representation of events in 3D animation Minhua Eunice Ma and Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Informatics."— Presentation transcript:

Similar presentations

About project

Feedback