Building character animation for intelligent storytelling with the H-Anim standard Minhua Eunice Ma and Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Informatics University of Ulster
EuroGraphics Ireland 29 April 2003 MultiModal interactive storytelling AesopWorld KidsRoom Larsen & Petersen’s Interactive Storytelling Computer games Virtual humans & embodied agents Jack (University of Pennsylvania) Improv (Perlin & Goldberg, 1996) BEAT (Cassell et al., 2000) SimHuman Gandalf Previous research
EuroGraphics Ireland 29 April 2003 Automatic Text-to-Graphics Systems WordsEye (Coyne & Sproat, 2001) ‘Micons’ and CD-based language animation (Narayanan et al. 1995) Spoken Image (Ó Nualláin & Smith, 1994) & successor SONAS (Kelleher et al. 2000) Semantic representations Schank’s (1972) Conceptual Dependency (CD) Theory & scripts Jackendoff’s (1990) Lexical Conceptual Structure (LCS) Previous research
EuroGraphics Ireland 29 April 2003 Objectives of CONFUCIUS To interpret natural language story and movie (drama) script input and to extract conceptual semantics from the natural language To generate 3D animation and virtual worlds automatically from natural language To integrate 3D animation with speech and non-speech audio, to form an intelligent multimedia storytelling system for presenting multimodal stories
EuroGraphics Ireland 29 April 2003 CONFUCIUS’ context diagram Story in natural language CONFUCIUS Movie/drama script 3D animation non-speech audio Tailored menu for script input Speech (dialogue) Storywriter /playwright User /story listener
EuroGraphics Ireland 29 April 2003 Architecture of CONFUCIUS 3D authoring tools, existing 3D models & character models visual knowledge (3D graphic library) Prefabricated objects (knowledge base) Script writer Script parser Natural Language Processing Text To Speech Sound effects Animation generation Synchronizing & fusion 3D world with audio in VRML Natural language stories Language knowledge mapping lexicon grammar etc semantic representations visual knowledge
EuroGraphics Ireland 29 April 2003 knowledge base Language knowledge Visual knowledge World knowledge Spatial & quantitative reasoning knowledge Semantic knowledge - lexicons (e.g. WordNet) Syntactic knowledge - grammars Statistical models of language Associations between words Object model (nouns) Functional information Internal coordinate axes (for spatial reasoning) Associations between objects Knowledge base of CONFUCIUS Event model (event verbs, describes the motion of objects/humans)
EuroGraphics Ireland 29 April 2003 Software & Standards Java: parsing intermediate representation, changing VRML code to add/modify animation, integrating modules 3D graphic modelling Authoring tools Humanoid characters: Character Studio, Internet Character Animator (ICA) Narrator: Microsoft Agent Props & stage: 3D Studio Max Modelling language & standard VRML 97 for modelling the geometric of objects, props and environment Humanoid modelling MPEG-4 Face and Body Animation (FBA) Humanoid Animation (H-Anim) specifications Main problem to solve: defining standards for high-level behaviours of virtual Humans Natural language processing tools PC-PARSE (morphologic and syntax analysis) WordNet (lexicon, semantic inference)
EuroGraphics Ireland 29 April 2003 Level 1 Of Articulation of H-Anim Joints and segments of LOA1 Though CONFUCIUS adopts Level 1 Of Articulation (LOA1) in its human character animation, its animation script engine adds ROUTEs dynamically based on the h-anim’s joint list and animation keyframe list. As long as the animation keyframes are in conformity with the joints definition in the h-anim file, CONFUCIUS’ animation engine is well adapted for any level of articulation.
EuroGraphics Ireland 29 April 2003 Agents and Avatars— How much autonomy? Autonomy & intelligence: highlow autonomous characters avatarsinterface agents Virtual humans: Autonomous characters/agents have higher requirements for sensing, memory, reasoning, planning, behaviour control, and even emotional status (a sense- control-action structure) Avatars are “user-controlled” and hence require fewer autonomous actions. However, basic naïve physics such as collision detection and reaction is still demanded when the user controls an avatar to hit a wall or grasp an object A virtual character in non-interactive storytelling is somewhere in between an agent and an avatar. Most of its behaviours, emotion, and responses to the changing environment are described in story input characters in non-interactive storytelling
EuroGraphics Ireland 29 April 2003 Semantic representations
EuroGraphics Ireland 29 April 2003 Lexical Visual Semantic Representation (LVSR) is a necessary semantic representation between 3D model information and syntactic information because 3D model differences, although crucial in distinguishing word meanings, are invisible to syntax LVSR is based on Jackendoff’s LCS and adapts it to the task of language visualization. It enhances LCS by Schank’s scripts Ontological categories of LVSR: OBJ, HUMAN, EVENT, STATE, PLACE, PATH, and PROPERTY OBJ for props or places (e.g. buildings) HUMAN for either human being or any other articulated animated characters (e.g. animals) as long as their skeleton hierarchy is defined in the graphic library EVENT for actions, movements and manners STATE for static existence PROPERTY for attributes of OBJ/HUMAN Lexical Visual Semantic Representation
EuroGraphics Ireland 29 April 2003 PATH & PLACE predicates We analysed 62 common English prepositions and defined 7 PATH predicates and 11 PLACE predicates for interpreting spatial movement events of OBJ/HUMANs
EuroGraphics Ireland 29 April 2003 Examples of LVSR & animation generation Manipulating environment & spatial relations Input sentence: John walked towards the house. LVSR: [EVENT walk ([HUMAN john],[PATH toward [OBJ house]])] Output animation Input sentence: Nancy ran across the field. LVSR: [EVENT run ([HUMAN nancy],[PATH via [PLACE on [OBJ field]]])] Output animation Manipulating objects Input sentence: John lifted his hat. LVSR: [EVENT go ([OBJ hat],[PATH from [PLACE on [OBJ john.head]]])] [EVENT lift ([HUMAN john],[OBJ hat])] Output animation
EuroGraphics Ireland 29 April 2003 Graphics library Simple geometry files geometry & joint hierarchy Files (H-Anim) animation library (key frames) objects/props characters motions instantiation
EuroGraphics Ireland 29 April 2003 Animation generator verb semantic analysis use lexical entries in Lexical Visual Semantics to analyse verb semantics, replace synonyms, spatial reasoning match basic motions in library? motion decomposition animation controller environment placement N Y Syntax tree VRML file of the virtual story world motion instantiation apply scripts LVSR If the event predicate matches basic human motions in animation library Apply spatial info & place OBJ/HUMAN into a specified environment
EuroGraphics Ireland 29 April 2003 Collision detection Collision detection is a crucial issue for path planning, manoeuvring objects, reactive behaviour, and multiple characters’ activities VRML provides a built-in collision detection mechanism for the avatar (user), but the mechanism does not apply to intersection between other characters/objects Collision avoidance algorithms for humanoid bodies: Coarse approximations (e.g. bounding boxes or spheres) Polygon level checks between humans and objects Dynamic LOD checking according to distance to the observer, users’ observation focus, and whether the human is in a crowd, etc. CONFUCIUS’ animation generator uses bounding cylinders around the human body segments for protagonists A bounding cylinder around the whole human body for minor characters, characters in a crowd, and characters beyond the scope of attention
EuroGraphics Ireland 29 April 2003 Multiple characters’ synchronization & coordination Multiple characters’ activities A character can start a task when another signals that the situation (pre-conditions) is ready Characters can communicate with one another Two or more characters can cooperate in a shared task Multiple characters’ synchronization Event-driven timing mechanism (VRML provides a utility for event routing (ROUTE node) Exact time-driven synchronization Nancy was walking along the street. John called her. Nancy stopped and saw John. John walked towards her. They exchanged greetings. The end of the animation john_speech (calling Nancy) triggers: (1) to stop the animation of nancy_walk (2) to start the animation of nancy_gazeWander (searching for who’s calling) (3) to start the animation of john_walk (walking towards Nancy)
EuroGraphics Ireland 29 April 2003 Relation to other work A general purpose humanoid character animation system Compared with other related virtual human modelling systems, CONFUCIUS’ character animation focuses on the language-to-humanoid animation process rather than considering human modelling & motion solely Fully use existing 3D OBJ/HUMAN models, tools and programs, such as the H-anim models Nancy (by C. Ballreich, © Name3D / Yglesias, Wallock, Divekar, Inc.) and Baxter (by C. Babski, © LIG/EPFL), animation keyframe files, and BVH to h- anim keyframe conversion script (by M. Lewis, The Ohio State University) Adopt current studies in linguistics such as LCS and improve them to adapt the demands of language visualization
EuroGraphics Ireland 29 April 2003 Prospective applications Children’s education Multimedia presentation Movie/drama production Computer games Virtual Reality Conclusion & future work CONFUCIUS’ humanoid character animation explores challenging problems in language visualization and automatic animation production: formalizes meaning of action verbs and spatial prepositions maps language primitives with visual primitives a reusable common senses knowledge base for other systems Future work Deformation for facial expressions under-specified language input action composition for simultaneous activities