Have no idea to use zh-ft-model to use emotive tags

by CharlesNi - opened Nov 27, 2025

Nov 27, 2025

I tried using zh-ft-model to generate speech with emotive or paralinguistic tags, but it seems the model cannot produce events such as laughter, sighs, or other expressive cues.
For example, I used prompts like:
prompts = [
"嗨，我叫塔拉，我是一个能够模仿人声的语音生成模型。",
"我也被训练去理解并生成一些副语言表达，比如叹气<叹气>、轻笑<轻笑>，或者打哈欠<哈欠>!",
]
chosen_voice = "tara"
However, none of these tags triggered the expected audio events.
Both generation methods I tested failed to produce the intended expressions.
Interestingly, I was able to generate expressive speech successfully using orpheus-3b-0.1-ft, but the same approach does not work on zh-ft-model.
So my question is: What is the correct way to prompt zh-ft-model to produce emotive or paralinguistic events?
Does this model support such tags, and if so, is there a specific syntax or configuration needed?
Thanks!

CharlesNi changed discussion status to closed Nov 27, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment