Behavior Markup | Speech Graphics Knowledge Base

The Behavior Markup system is a way of inserting Behavior Controls into the text transcripts of audio files. This can be helpful during script preparation, giving the writer some direction over the performance.

Keep in mind that markup is fairly labor intensive. It is usually more efficient to apply behavior controls to a whole batch via batch processing, and then if desired, visualize animation and edit metadata in the graphical editor (see SGX Interactive Processing).

Local behavior markup overrides global behavior parameters set at processing.

Markup Format

To use markup, add XML tags with the name “sgx” to your transcript. The tags should surround the text corresponding to the audio portions you want to be affected by the tag. For example in the following:

<sgx mode="friendly" speech_magnitude="1.2"> This is actually a story about my grandfather who fought in the war. </sgx> Um before that he was a mountaineer. Uh he climbed various mountains. He was one of the teams that climbed K two one of the first ones.

the audio portion corresponding to the blue text will have the mode set to “friendly” and the Speech Magnitude modifier set to 1.2.

Tags can also be nested:

<sgx mode="friendly" nonverbal_magnitude="1.2"> This is actually a story about my grandfather who fought in the war.<sgx mode="excited"> Um before that he was a mountaineer.</sgx> Uh he climbed various mountains. He was one of the teams that climbed K two one of the first ones.</sgx>

Attributes of inner tags override those of outer tags. Thus in the above example, the audio portion corresponding to the red text will be in mode “excited” instead of “friendly”.

Markup Attributes for Behavior Modes

These attributes are used in order to change the behavior mode and set up auto modes.

Attribute	Description
`mode`	The default behavior mode that should be active unless an auto mode is detected.
`positive_mode`	The behavior mode that the positive auto mode should trigger.
`negative_mode`	The behavior mode that the negative auto mode should trigger.
`effort_mode`	The behavior mode that the effort auto mode should trigger.
`laugh_mode`	The behavior mode that the laugh auto mode should trigger.
`auto_modes`	The auto modes to activate. Possible values are “none”, “all”, or a comma-separated list containing one or more of “positive”, “negative”, “effort”, or “laugh”.

Here is an example:

<sgx mode="Neutral" auto_modes="positive,negative" positive_mode="Happy" negative_mode="Sad"> This is actually a story about my grandfather who fought in the war.<sgx mode="Excited" auto_modes="none"> Um before that he was a mountaineer.</sgx> Uh he climbed various mountains. He was one of the teams that climbed K two one of the first ones.</sgx>

In this example, for the outer elements (in blue):

The default behavior mode is “Neutral”.
The positive and negative auto modes are active. The effort auto mode will not be detected.
The positive auto mode will trigger “Happy” and the negative auto mode will trigger “Sad”.

For the inner element (in red):

The default behavior mode is “Excited”
The auto modes are disabled, meaning only the default mode will occur.

Markup Attributes for Behavior Modifiers

These attributes are used in order to set the behavior modifiers.

Attribute	Modifier
`speech_magnitude`	Speech Magnitude
`speech_speed`	Speech Speed
`hyperarticulation`	Hyperarticulation
`jaw_limit`	Jaw Limit
`nonverbal_magnitude`	Nonverbal Magnitude
`nonverbal_speed`	Nonverbal Speed
`blink_frequency`	Blink Frequency
`dart_frequency`	Dart Frequency

Here is an example:

<sgx speech_magnitude="0.9" nonverbal_magnitude="0.9"> This is actually a story about my grandfather who fought in the war.<sgx speech_magnitude="1.2" nonverbal_magnitude="1.4"> Um before that he was a mountaineer.</sgx> Uh he climbed various mountains. He was one of the teams that climbed K two one of the first ones.</sgx>

In this example, Speech Magnitude and Nonverbal Magnitude are both set to 0.9 for the outer elements (in blue), while for the inner nested element (in red), Speech Magnitude is 1.2 while Nonverbal Magntitude is 1.4.

Markup Time Attribute

Normally markup tags are aligned to audio based on the alignment of the words they enclose. For example, this markup

<sgx mode="friendly"> This is actually a story about my grandfather who fought in the war. </sgx> <sgx mode="serious"> Um before that he was a mountaineer. </sgx>

produces this alignment of modes:

in which the boundary between the “friendly” and “serious” modes is 5,390 ms because that’s where “This is actually a story about my grandfather who fought in the war” ends and “Um before that he was a mountaineer” starts.

However, you may also set the timing of tags explicitly, using the optional time attribute, which overrides positionally based timing.

Attribute	Description
`time`	The start time of the tag, in milliseconds. If specified, overrides the start time deduced from the tag’s position in the transcript.

For example this markup

<sgx mode="friendly" time="0"/> <sgx mode="serious" time="4000"/> This is actually a story about my grandfather who fought in the war. Um before that he was a mountaineer.

produces this alignment of modes:

in which the “serious” mode now starts at 4,000 ms because that is the time value given in the tag. Note that the time attribute determines the start time of the tag; the end time of the tag is automatically the start of the next chronological tag or the end of the audio, whichever comes first. Therefore, end tags are not needed.

You may occasionally find slight a offset between time values you provide and the effective start times in the output. This is because, where possible, SGX will snap times to nearby boundaries in the phrasal analysis of the speech.

Markup Without Transcription

Even without transcription, markup tags can still be used. For example:

<sgx mode="friendly" time="0"/> <sgx mode="serious" time="2000"/>

This transcript contains only markup, and no actual transcription of the audio. There are no words to define the timing of the tags, but the time attribuite can be used to set the start time explicitly. In the absence of a time attribute, the start time of the tag is the start of the audio file.