Have no idea to use zh-ft-model to use emotive tags

#3
by CharlesNi - opened

I tried using zh-ft-model to generate speech with emotive or paralinguistic tags, but it seems the model cannot produce events such as laughter, sighs, or other expressive cues.
For example, I used prompts like:
prompts = [
"嗨,我叫塔拉,我是一个能够模仿人声的语音生成模型。",
"我也被训练去理解并生成一些副语言表达,比如叹气<叹气>、轻笑<轻笑>,或者打哈欠<哈欠>!",
]
chosen_voice = "tara"
However, none of these tags triggered the expected audio events.
Both generation methods I tested failed to produce the intended expressions.
Interestingly, I was able to generate expressive speech successfully using orpheus-3b-0.1-ft, but the same approach does not work on zh-ft-model.
So my question is: What is the correct way to prompt zh-ft-model to produce emotive or paralinguistic events?
Does this model support such tags, and if so, is there a specific syntax or configuration needed?
Thanks!

CharlesNi changed discussion status to closed

Sign up or log in to comment