Markup
The aim of SGX is to minimize the need for direction from the user, by automatically producing facial movement that accurately matches the speaker's vocal performance. However, there may be some cases where you want more control over the outcome. For this SGX provides a markup system, whereby you can insert markup tags into the transcripts. An SGX markup tag consists of comma-separated attribute settings surrounded by parentheses.
Markup Example
Each markup tag applies to everything that follows it, up until the next tag. Thus the position of the tag in the transcript is important. For example:
(mood=serious, mag=1.0, speed=0.9) So he ran out of the way and the next morning
it turned out that his servant had actually (mood=fearful, mag=1.2) died
in the landslide further up the hill sometime before he warned him
(mood=normal, mag=1.0) so that's my ghost story
In the above example:
(mood=serious, mag=1.0, speed=0.9)
applies to "So he ran out of the way and the next morning it turned out that his servant had actually".(mood=fearful, mag=1.2)
applies to "died in the landslide further up the hill sometime before he warned him".(mood=normal, mag=1.0)
applies to "so that's my ghost story".
Markup Attributes
attribute | description | range | default value | tag examples* |
---|---|---|---|---|
Select a mood or engage automatic moods | Any of the moods in the Mood Library or "auto" or "default" | "default" | mood=happy | |
Select a voice mode or engage automatic voice mode | "effort", "auto", or "default" | "default" | voice_mode=auto | |
Increase or decrease in magnitude of movements | 0.0 to 2.0 | 1.0 | magnitude=1.3 | |
Increase or decrease in magnitude of speech movements | 0.0 to 2.0 | 1.0 | speech_magnitude=0.8 | |
Increase or decrease in speed of movements | 0.0 to 2.0 | 1.0 | speed=1.2 | |
Increase in articulatory effort | 0.0 to 1.0 | 0.0 | hyper=0.3 | |
Limit on jaw opening | 0.0 to 1.0 | 1.0 | jawmax=0.4 | |
Time of the tag in seconds | 0.0 to the audio duration | computed from transcript position | time=12.5 |
*Note attribute names don't need to be spelled out in full, the name may be cut off at the end as much as you want as long as it remains distinct from the other attributes. For example, instead of magnitude=1.4
, you may type magn=1.4
or even ma=1.4
. However m=1.4
wouldn't work because that is ambiguous between magnitude
and mood
.
Unspecified Values
You don't need to specify every attribute in a tag, only the ones you want to change. Attributes that are not specified in a given tag simply continue to have the same values that they had in the preceding tag if there is one, or their default values if it's the first tag in the transcript.
For example:
(mood=serious, mag=1.0, speed=0.9) So he ran out of the way and the next morning
it turned out that his servant had actually (mood=fearful, mag=1.2) died
in the landslide further up the hill sometime before he warned him
(mood=normal, mag=1.0) so that's my ghost story
The speed attribute is 0.9 for the entire transcript, since it's set in the first tag and not overridden in the second or third tag. The hyperarticulation attribute hasn't been set in the first tag so it takes its default value of 0.0. But from the second tag onward it's 0.1.
By the same logic, when no tag is present at all, all of the attributes have their default values.
Markup Without Transcription
Even without transcription (remember: transcription is optional but preferred), you can still insert markup. For example:
(mood=serious, mag=1.0, speed=0.9)
is a transcript with only markup, and no actual transcription of the dialogue. Since there are no words to define the timing of the tag, the tag will apply to the entire file, unless a time value is given using the "time" attribute. For example:
(mood=serious, mag=1.0, speed=0.9, time=1.4)(mood=fearful, mag=1.2, time=2.8)
In this case, the first tag is 1.4 seconds into the file, and the second tag is at 2.8 seconds. Just as with transcript-inserted tags, the tags at these time points define the properties up to the next tag.
Batch Markup
Batch markup can be done via SGX-GUI or through the command line. You can apply a Mood, enable Voice Mode, and modify values for Magnitude, Speed, Hyperarticulation, and Jawmax. Note that if you apply specific tags in the transcript(s) they will override the batch tags.
To apply markup values via SGX-GUI edit the values in the fields in the "Batch Markup" portion of the interface.
To apply markup values via the command line use the -m
command line option followed by the values you would like to set. Here is an example:
-m "mood=happy, mag=1.25, speed=0.5, hyp=0.1, jawmax=.75"