Input Options

SGX has two input options:

Audio and Transcripts (recommended)
Audio only

For the most accurate lip sync, we recommend using audio and transcripts as input. When including transcripts, you must use an SGX language module to support pronunciation of text in that language. For a list of supported languages and dialects, see Language Support.

Audio-only processing works for any language, even fictional ones. It uses a universal model of the human vocal tract to map acoustics to articulation. While the lip sync quality may not be quite as accurate as when using transcripts, which help to disambiguate certain sounds, the universal model provides natural-looking speech motion. Moreover, we're always working to improve this model.

See Audio Guidelines and Transcription Guidelines for more information on input requirements and best practices.