Interactive Control of Avatars Animated with Human Motion Data By: Jehee Lee, Jinxiang Chai, Paul S. A. Reitsma, Jessica K. Hodgins, Nancy S. Pollard Presented by: Nathan Hoobler
Why do we use motion capture? Get realistic behavior “for free” An easy interface for generating control for high DOF models Can capture behavior far too complicated to model by hand Kung Fu, Acrobatics, other stylized motion
What is the problem with motion capture? Motion capture data is inherently complicated Usually far more degrees of freedom than can be easily controlled by hand Not trivial to synthesize new behaviors Transitions between different types of motion are hard Often there are redundant behaviors
What does this paper do? Identify distinct behaviors in the motion capture data Allow intuitive control of high DOF data with a small DOF interface Allow seamless transitions between different behaviors
System Overview Loosely-patterned data comes in A probabilistic transition matrix is built Simplified transition graph is used to determine motion
System Overview Various datasets come in
What kind of data can we use? Long, consistent motion recordings are required for good transition generation Does not handle sensor noise well
System Overview Various datasets come in Low-Level transitions are generated
Low-Level Representation At this level, the system is very similar to the Video Textures technique For each frame, find any other frames in the dataset that are similar Calculate the probability of a transition from frame j to frame k based on how closely the two frames match
Low-Level: Building the Matrix The probability of transitioning from frame i to frame j is computed as Where D(i, j) is the weighted “distance” from frame i to frame j And d(pi, pj) is
So, how efficient is this? Since the matrix is just a 2D mapping from any one frame to any other, the number of transitions is O(n^2)…
So, how efficient is this? Since the matrix is just a 2D mapping from any one frame to any other, the number of transitions is O(n^2)… … For frames per dataset (!)
So, how efficient is this? Since the matrix is just a 2D mapping from any one frame to any other, the number of transitions is O(n^2)… … For frames per dataset (!) We need to reduce the number of transitions
Low-Level: Pruning We can take advantage of a few useful features of the Motion Capture data Contact with the world should be similar between transitioning frames Any interesting data is going to have mostly low- probability transitions There are many frames that are very similar to others We want to avoid going down dead-end routes
Low-Level: Pruning (Contact) Criteria 1: Contact Even if frames are very similar, so not transition if the contact states are different (Strict interpretation) Only allow transitions during contact states
Low-Level: Pruning (Likelihood) Criteria 2: Likelihood Throw away transitions whose probability is less than some threshold value
Low-Level: Pruning (Similarity) Criteria 3: Similarity If a frame has many transitions to states that are all very similar to each other as well, throw away all but the best fitting transition
Low-Level: Pruning (SCC) Criteria 4: Connectedness In theory, we want to avoid transitions that don’t lead to well-connected nodes Only add transitions that remain within the largest Strongly Connected Component of the graph “A maximal subgraph of a directed graph such that for every pair of vertices u, v in the subgraph, there is a directed path from u to v and a directed path from v to u.” (Mathworld)
Low-Level: Blending Need interpolation to avoid discontinuities Problem: sharp changes are allowed at contact points
Low-Level: Blending Need interpolation to avoid discontinuities Problem: sharp changes are allowed at contact points Solution: use a non-linear blend function centered on the contact point and a moving average
Low-Level: Blending Case 1: Follow the incoming frame Case 2: Follow the outgoing frame Case 3: Choose the side closest to the contact point Case 4: Just let the foot slide; it’ll look bad no matter what
Low-Level: Coordinate System Fixed/Global versus Relative Each has an advantage, depending on the situation The paper uses both, depending on the example
Fixed/Global Coordinates Advantages Good for spatial data (the recording environment corresponds strongly with the simulated environment) Disadvantages Not good for synthesizing motion in new environments
Relative Coordinates Advantages Much easier to synthesize motions from anywhere in the environment into new behaviors Disadvantages Ignores orientation and position in three- space, which may be important for some actions
High-Level Representation Low-level representation is far too complicated to interact with Simplify the data by grouping like frames into clusters For each frame, find the possible clusters that can be transitioned to in the near term
High-Level Representation Various datasets come in Low-Level transitions are generated Frames are grouped into clusters
Building Clusters We want a simplified data set Weight important joints (arms, legs, pelvis, etc.) high Weight less important joints (neck, etc.) low Using weighted values, find similar frames and group them into clusters
High-Level Representation Various datasets come in Low-Level transitions are generated Frames are grouped into clusters A transition tree is built for each frame
Building the Cluster Forest Each frame has a tree of clusters representing its valid transitions Find the most probable transition from the current frame to another cluster If the number of frames required to reach that cluster is within a time threshold, add it to the forest Repeat
Caveats about Clustering Clustering is not always extremely useful Mostly a user interface issue Useful for directly selecting the next motion (Direct Choice) Not as useful for procedurally determining behavior (Path Sketching, Mimic)
Control Methods Several interface methods were used, depending on how well they suited the example Direct Choice Sketching Video-Capture
Direct Choice Display valid states for the avatar, and let the user choose
Path Sketching Allow the user to specify a path to follow Find motions that will put the avatar in the right place
Video Mimic Determine limb and body orientation from video input Find closest matching frame(s), and imitate the user
Results Terrain Path Sketching Step Stool Path Sketching Direct Choice Playground Direct Choice
Any Questions?