Muscles
The default muscle set is established by the Speech Graphics team as part of the character setup service. However you can edit the muscles in SGX Studio.
Speech Graphics animation is generated by "muscle-dynamic" models, which output muscle activations over time. Here a "muscle" is defined in a superficial way as an independently controllable deformation pattern – similar to the concept of "action unit" from the Facial Action Coding System. Because they are biologically based primitives, muscles have the benefit of being largely universal and analogous across characters, thus allowing a generalized behavior model.
Each muscle simulates the effect of one isolated muscle contraction. Muscles can be unidirectional or bidirectional; a bidirectional muscle has two opposite effects.
Each muscle has these properties:
A time-varying contraction degree, represented by a parameter with range [0, 1], where 0 is fully relaxed and 1 is fully contracted. For bidirectional muscles, the range is [-1, 1].
A rig pose that defines the deformational effect of the muscle contraction on the rig. Rig poses are specified using the rig’s animation parameters. For bidirectional muscles, there is a positive pose and a negative pose.
Dynamic parameters that help determine the velocity of contraction or relaxation of the muscle. These specifically include two parameters: contraction duration and skew.
Speech Graphics' muscle-dynamic behavior algorithms generate muscle contraction values over time. These contraction values are converted into animation on the rig via the deformational definitions of the muscles.
Muscles are NOT visemes or expressions
Muscles are used to animate both speech and nonverbal behavior. However, note that muscles are NOT visemes or facial expressions.
A viseme is a group of several speech sounds which are considered to look similar. For example, the sounds /t, d, n, l/ may form a single viseme, likewise /p, b, m/ – because the members of these groups are highly confusable visually. In traditional facial animation, lip sync is achieved using a target pose for each viseme, with some form of interpolation between successive visemes.
Speech Graphics does not use visemes. While the muscle-dynamic system does cause the emergence of complex speech poses – corresponding to consonants and vowels – it does so through the cooperation of independent muscles working together to achieve articulatory objectives. For example, achieving /t/ involves contributions of tongue muscles, jaw muscles, and lip muscles operating independently, rather than a single monolitic pose affecting all of that anatomy.
Similarly, nonverbal facial expressions typically involve a number of muscles. For example, a smile might be composed of mouth corner retraction plus various cheek and eye movement – actions that are actually independently controllable on a real person’s face. If you were to make a single monolithic muscle for this this or other complex expressions, your animation would look less natural, because it would move from pose to pose in a way that is not grounded in what is physically possible. See Behavior Modes on how nonverbal facial expressions are constructed from combinations of muscles.
Default muscle set
As part of the character setup process, your character control file will be delivered with a set of default muscles already defined. After delivery, you can further edit the muscles in SGX Studio (see Editing muscles).
There are currently 73 default muscles. Muscles are grouped into anatomical regions (brows, cheeks, jaw, lips, tongue, etc), of which there are currently 10. The main significance of anatomical regions – aside from providing a convenient organizational structure for sorting muscles – is that they carry default dynamic parameters. For example, displacements of the tongue generally happen much faster than those of the head due to their relative sizes and distances traveled, and these differences are reflected in the default dynamics of the two regions. New muscles added to each region automatically inherit the default dynamics of that region.
The default muscles are listed below by anatomical region. Note that muscles with left and right side lateral versions are marked L and R, respectively. Bidirectional muscles (having negative as well as positive poses) are marked [⇔]. Muscles shown in teal are protected muscles, discussed below.
Brows
Brow In L
Brow In R
Brow Lower L
Brow Lower R
Brow Raise L
Brow Raise R
Brows In
Inner Brow Lower L
Inner Brow Lower R
Inner Brow Raise
Outer Brow Lower L
Outer Brow Lower R
Outer Brow Raise L
Outer Brow Raise R
Scalp Slide [⇔]
Cheeks
Cheek Raise L
Cheek Raise R
Cheek Raise Outer L
Cheek Raise Outer R
Eyeballs
Microdart [⇔]
Eyeball Pitch [⇔]
Eyeball Roll [⇔]
Eyelids
Blink
Eye Close
Eye Flare L
Eye Flare R
Eye Squeeze L
Eye Squeeze R
Eye Squint L
Eye Squint R
Lower Eyelid Flex L
Lower Eyelid Flex R
Head
Head Pitch [⇔]
Head Roll [⇔]
Head Thrust [⇔]
Head Yaw [⇔]
Jaw
Jaw Opening
Jaw Clench
Jaw Shift Lateral [⇔]
Jaw Shift Longitudinal [⇔]
Lips
Adduction
Compression
Lip Flare
Lower Lip Pull
Lower Lip Push
Lower Lip Tuck
Pinching
Retraction
Rounding
Upper Lip Pull
Dimple L
Dimple R
Lip Corner Depress L
Lip Corner Depress R
Lip Tighten
Mouth Stretch
Mouth Swing [⇔]
Retraction L
Retraction R
Upper Lip Pull L
Upper Lip Pull R
Nose
Nostril Flare
Nose Wrinkle L
Nose Wrinkle R
Nostril Compress
Throat
Larynx Lower
Larynx Raise
Neck Tense L
Neck Tense R
Tongue
Tongue Advance
Tongue Body Raise
Tongue Retraction
Tongue Tip Raise
You have creative control over muscle definitions for your character. With the default muscle set as a starting point, you can remove muscles, add new muscles, and edit existing muscles in SGX Studio.
Editing muscle poses
To set the pose of a muscle in SGX Studio, the user will edit the animation parameters of the rig to build up the pose. One may use any number of different animation parameters to make a single pose.
There are three key principles to follow when defining muscle poses: isolation, detail and extremity.
Isolation
The first key is isolation. A muscle pose should capture the effect of one independently controllable muscle contraction, without mixing in the effects of other muscles. This differentiates muscles from more complex events that occur through cooperation between muscles, such as expressions.
Creating muscles that combine movements that are in reality independently controllable can have two negative effects:
It collapses the degrees of freedom of the character.
It yokes anatomical parts together in ways that do not respect their individual dynamic properties and can thus defy what is physically possible.
Detail
While isolating muscle movements is important, it is also important to capture the entire effect of a muscle, which means maximizing detail in the deformation. Thus the second key to defining muscle poses is detail. Naturalness lives in the details. And since muscles are the building blocks of animation, highly detailed muscle deformations are essential to natural-looking animation.
Detailed deformations require attention to all parts of the face that move with a particular muscle contraction, including small secondary movements. For example, when a person’s upper lip moves, the cheeks and nose area often get affected too. It’s important include these secondary movements to reconstruct the full natural deformation. This is particularly important to keep in mind with bone rigs, where highly local deformations are possible and one could be tempted to focus on only one piece of anatomy.
Extremity
The third key to defining muscle poses is extremity. The muscle pose should represent the most extreme extent to which a real human could contract the muscle. For example, when defining the pose for Nose Wrinkle L, the version on the left is good, but the one on the right is too weak.
If you pose muscles too weakly or subtlely, you will not achieve a good range of movement. This is because the pose represents the maximum displacement, not some average displacement. Although in actual animation you will probably never encounter the muscle contracted to 100%, if the maximum displacement is not truly maximum, then by extension all lesser degrees of muscle contraction will be weaker looking than they should be. Furthermore, highly expressive facial animation will be difficult to achieve because the muscles cannot form intense poses.
Editing dynamic parameters
The muscle-dynamic system controls animation by varying the degree of contraction of each muscle over time. This behavior is governed by an underlying model of motor planning and biomechanics. However, different muscles have different anatomical properties, which leads to different perceptual expectations about how fast each muscle should move. The human brain is quite good at spotting unnatural velocities on the face.
For this reason we provide settable parameters for each muscle that determine its average velocities. We can distinguish between contraction and relaxation phases of a muscle’s movement, and these parameters address the velocities of both phases. The two available parameters are called contraction duration and skew.
Setting these parameters is a manual process that involves viewing the effect of estimated values in animation. SGX Studio provides a muscle preview function that animates a particular muscle going from 0 to 1 and then back to 0. This is useful for evaluating the naturalness of the estimated dynamic parameters. In each case, the question you should ask yourself is, does the movement speed look natural for this character’s face, or does it look too fast or too slow? Imagine the character is alive and performing the movement on their own volition.
Here are some best practices for viewing muscle previews:
Be sure that the muscle is going all the way to 1.0 when judging contraction duration or skew.
If viewing previews through a default video player, set the player to loop around so that you can sit back and watch the movement several times.
While watching, avoid auditory distractions such as background speech and especially music, as these can impact your temporal perception and thus impede judgement about how natural the speed of a movement is.
Similarly, avoid visual distractions by removing progress bars or other visual artifacts from the video player.
Make a series of small adjustments and review, rather than big changes.
Contraction duration
The contraction duration of a muscle is the amount of time it takes to fully contract the muscle from 0 to 1. In SGX Studio, users can find the right contraction duration by generating muscle previews at different values of contraction duration. For example, below are muscle preview examples of Dimple L, with contraction duration set to three different values: 200 ms, 460 ms and 700 ms. For contraction duration, pay particular attention to the contraction phase (from 0 to 1).
As you should be able to see, the contraction duration of 460 ms looks the most natural for Dimple L on this face.
Skew
The skew of a muscle expresses the difference, if any, between the time it takes to contract a muscle and the time it takes to relax it again – i.e. to return from 1 to 0. The difference is expressed as a deviation in the ratio. A value above 0, or positive skew, means that relaxation takes longer than contraction. A value below 0, or negative skew, means that relaxation takes less time than contraction. A value of 0 means relaxation has the same duration as contraction.
For example, below are three videos of Dimple L, with skew values of -0.35, -0.11, and +0.1.
In this case, the skew value of -0.11 provides the most natural looking return to neutral.
Set contraction duration first, then skew.
Creating new muscles
The guidelines above with respect to creating poses and setting dynamic parameters apply to new muscles as well as to existing muscles: make sure new muscles are fully isolated, have good deformational detail, are posed at the extreme, and have realistic velocities.
With respect to dynamic parameters, when adding new muscles, it is important to add them to the right anatomical region, so that they inherit dynamic properties suitable for the muscle. Although some of the properties can be edited after adding them, it helps to have a good base. There are also other dynamic properties that cannot be edited and so are entirely dependent on the anatomical region.
When adding new muscles, it is important to add them to the closest matching anatomical region, so that they inherit dynamic properties suitable for the muscle.
Of particular usefulness is the ability to add muscles to control non-human anatomy, such as antennae, gills, eyeballs on stalks, or other non-human appendages. Such muscles can be added to the nonverbal expressions of the character to be driven by our nonverbal system along with other muscles. If you are adding muscles for non-human anatomy, add them to the anatomical region that has most in common with the new muscle in terms of location on the body and expected velocity.
Protected muscles
The default muscle set of a character is open-ended. You may add or remove muscles and edit them as much as you wish. However, a subset of the default muscles are protected. Protected muscles cannot be deleted in SGX Studio, and editability is restricted. Generally speaking, protected muscles:
are expected to exist by our algorithms;
require a higher degree of precision and understanding of our animation system than typical muscles do; and
on account of that precision, may involve defining multiple poses of the muscle rather than just one.
For these reasons protected muscles are normally set up by trained technical artists and animation linguists on the Speech Graphics team, as part of our character setup services.
Protected muscles include the speech muscles and certain eye muscles. These are discussed below.
Speech muscles
There are 16 muscles that are used in speech animation, all of which are protected. In terms of anatomical regions, these muscles span the jaw, lips, tongue and nose. The speech muscle set has been derived scientifically based on decomposition of articulatory movement into its components. Setting up these muscles involves a number of key poses essential for optimal lip sync quality. The speech muscles are depicted and described below.
Jaw | |
Jaw Opening | |
Lips | |
Adduction | |
Compression | The lips are pursed without protrusion. The lower lip is more active than the upper lip, and is slightly bowed in the middle, forming a dip. There is dimpling at the sides of the mouth. |
Lip Flare | |
Lower Lip Pull | |
Lower Lip Push | |
Lower Lip Tuck | |
Pinching | |
Retraction | |
Rounding | |
Upper Lip Pull | |
Nose | |
Nostril Flare | Both nostrils dilate. There is no movement except in the outer nostril walls. |
Tongue | |
Tongue Advance | |
Tongue Body Raise | |
Tongue Retraction | |
Tongue Tip Raise |
For some educational applications, we extend the core muscle set to organs that are not normally visible, such as the velum or soft palate. An example of this is shown below.
Protected eye muscles
The following muscles of the eyeballs and eyelids are protected. These muscles are expected to exist for certain algorithms to function, including the system’s safeguards against eyelid intersections.
Eyeballs | |
Microdart | The eyeball rolls left (and right) slightly, as when looking between the eyes of a collocutor. This muscle is bi-directional. |
Eyelids | |
Blink | The upper and lower eyelids make contact. Most of the movement is by the upper eyelid. The lower eyelid tenses upward and inwardly. |
Eye Close | The upper and lower eyelids make contact. This may be the same pose as blink. |
Eye Flare L/R | Widening of the eye, primarily by upward movement of the upper lid. Separated into left and right. |
Eye Squeeze L/R | Narrowing of the eye with peripheral squeezing. Separated into left and right. Separated into left and right. |
Eye Squint L/R | Narrowing of the eye through the tensing of the upper and lower eyelids. Separated into left and right. |
Lower Eyelid Flex L/R | Flexing upward of the lower eyelid. Separated into left and right. |
Is this FACS?
Variability of the face is constrained by biomechanics; therefore, a natural way to parameterize motion is in terms of the set of motor functions by which the face can move. One of the first analyses of facial motion into basic motor functions was proposed in 1970 by Carl-Herman Hjortsjö and expanded upon by the psychologists Paul Ekman and Wallace Friesen, whose Facial Action Coding System (FACS) provides a comprehensive analysis of facial motion. They claimed that while there are 268 muscles in the human face, the face performs only 46 possible basic actions, which they called action units. Each action unit is the motion of one independently controllable muscle group, which gives rise to a characteristic displacement pattern on the facial surface. In principle, any facial expression, whether involved in speech, affect, or some other behavior, may be decomposed into a set of one or more action units each activated to a certain degree. While originally designed as a descriptive system for taxonomy of facial expressions, FACS is widely used today as a basis for facial rigging, with rig parameters representing degrees of activation for corresponding action units.
The Speech Graphics muscle set can be related (though not exactly matched) to FACS action units. There are a few main differences. One is that our decomposition of the speech muscles is much more tailored to an understanding of the dynamics of speech production. Another difference is that the muscles include not just facial muscles but also muscles of the internal vocal tract, including the tongue, as well as muscles of the body that may move during speech, such as the head; even arms, hands and torso can be animated. Finally, Speech Graphics muscles can be bidirectional, unlike FACS action units.