Rigging Guidelines
What is rigging?
Rigging is a crucial aspect of character animation. A rig provides a set of parametric controls to deform a 3D character and enable animations. These controls are analogous to the strings of a puppet. They are the degrees of freedom of the character and thus the building blocks of movement. If done well, rigging enables realistic and expressive animation; but if done poorly, it imposes a limit on the animation quality that can be achieved.
Rig types: Bones, blendshapes and UI controls
Character rigging typically falls into three categories: bones, blendshapes, and UI controls. A bone (aka joint) is a simple 3D transformation (translation, rotation and/or scaling) that affects a local region of the character skin. Blendshapes deform the skin via 3D interpolation (morphing) between various shapes. And UI (user-interface) controls are typically a compact set of artist-friendly handles that can be manipulated to deform the character in a digital content creation tool. Many modern character rigs combine all three methods.
Speech Graphics animation systems control muscles, and muscles control your rig. This layer of indirection means that as long as muscle poses can be defined in terms of a rig, they can work for any type of rig – whether driven by bones, blendshapes, or UI controls. We don’t prescribe any specific set of blendshapes, layout of bones, or control interface.
Bones/Joints
Bones (aka joints) are ideal for simulating skeletal motion, and are thus a fitting solution for controlling the body. Somewhat counterintuitively, they are also very popular in facial animation, even though most facial muscles only deform soft tissues of the face and do not move the skeleton. Bones are useful in facial animation due to their relatively low computational cost and high degree of local control.
When creating a bone rig for the face, around 100 bones is roughly the minimum number for the face to have an optimal range of motion (fewer bones can be used in resource-constrained rendering environments such as browsers). It is important to have the right number of bones for the mouth area so that even high-deformation poses, such as lip pursing, can be achieved. As a rule of thumb, it takes a minimum of 12 to 16 bones in the lips to be able to achieve the necessary mouth poses. For the rest of the face, the bones should be laid out so as to provide the most flexibility where the amount of deformation is highest, such as the upper cheeks, between the brows, and around the eyes.
Maximum skinning influence should be 4-8 bones per vertex. You can get away with 4 bones per vertex if the bones are not too densely placed – e.g., there are no more than 12 bones in the lips; if you have more than 12 bones in the lips, then you will likely need a maximum skinning influence of 8. It will require a degree of skinning adjustment and polish to get all the bones looking nice when animating.
Blendshapes and UI controls
Blendshapes are a great option for facial animation because they can achieve complex deformation patterns in which each vertex has its own trajectory. UI controls are animator-friendly handles that can drive collections of blendshapes and/or bones as a coherent movement, possibly with context-dependent correctives. Both blendshapes and UI controls tend to be based on an analysis of the face into its basic movement patterns.
One description of the basic muscle movements of the face is the Facial Action Coding System (FACS). Many practitioners use some adaptation of this analysis for blendshapes and UI controls. The Speech Graphics muscle set itself provides an analysis of the face into basic movement patterns (see Is this FACS?). However, there is no need to design single blendshape or UI control to pose a Speech Graphics muscle; you can use combinations of rig parameters to achieve the pose.
There is no need to design single blendshape or UI control to pose a Speech Graphics muscle; you can use combinations of rig parameters to achieve the pose.
The ability to do fine-tuning in the lip areas is important. This usually requires additional blendshapes that are not a part of the base FACS set. The Epic MetaHumans rigs are a great example of all poses/blendshapes needed to achieve the full range of facial animations. All of the poses can be found under the “CTRL_expressions” node inside Maya.
Rig quality
The quality of the muscle poses is crucial to achieving good overall animation, so the main factor in judging the quality of a rig is simply whether it can achieve good muscle poses. Of particular concern are muscles of the lips, jaw, tongue and eyelids, which require extra precision. See Muscles.
Our team can evaluate your rig in terms of its suitability for speech and nonverbal behavior, and help advise on any issues that might affect quality. Normally, no changes to your rig are required. However, for optimal results, keep in mind the key characteristics of good muscle poses: isolation, detail and extremity.
Enabling isolation
A good rig should be capable of isolated movements, meaning that if you can move one part of your face without moving another, your rig should be able to do that too. As a rough guideline, to the extent that a human (or other creature) can move any of the following parts of the face independently from each other, your rig should be able to as well:
lips vs jaw vs tongue
lower face vs upper face
lower lip vs upper lip
lower eyelid vs upper eyelid
left side of face vs right side of face
The isolation principle is important because it allows each muscle pose to precisely capture the effect of one muscle contraction, without mixing in the effects of other muscles. Creating poses that mix muscles together collapses the degrees of freedom of the character, and yokes anatomical parts in ways that do not respect their individual dynamic properties.
Enabling detail
While isolating muscle movements is important, it is also important to capture the entire effect of a muscle, which means maximizing detail in the deformation. For this reason, the rig should be capable of fairly detailed deformation patterns and control over the entire facial surface. The worst thing to do is to have "dead" zones on the face – i.e., regions that cannot be moved by any rig control. More generally, density of controls allows us to achieve more complex and realistic deformations. Also keep in mind that muscle movements can have secondary non-local effects. For example, when the upper lip moves down, the cheeks and nose area may get pulled along. These subtle secondary movements add perceptual cues to the viewer's experience so that the facial animation is immersive and compelling.
As a general rule, if a particular deformation pattern can be made on a human face, it should be possible to approximate that deformation pattern on your rig. Even if the character is an animal, alien or monster, it should be capable of achieving detailed deformations appropriate to its (probably anthropomorphized) physiology.
Enabling extremity
Finally, in order to achieve highly expressive facial animation, the rig must be capable of extreme muscle poses. Each muscle pose represents the maximum extent to which the character can exercize that muscle. Therefore, if the rig cannot achieve strong poses due to built-in limitations (such as low-intensity blendshapes), then the character will not be able to achieve its full range of movement. Not only will highly expressive animation be difficult to achieve, but more generally, the character's behavior will be relatively muted at all levels of intensity, because of the reduced proportions of these more weakly posed muscles.