SpeechT5 TTS β GGUF (ggml-quantised)
GGUF / ggml conversion of microsoft/speecht5_tts for use with CrispStrobe/CrispASR.
SpeechT5 is a lightweight (~80M param) encoder-decoder TTS model:
- Text encoder β 12-layer transformer (768d) with relative positional encoding
- Speech decoder β 6-layer AR decoder generating continuous mel frames (no codebook tokens)
- Postnet β 5-layer Conv1d + BatchNorm + Tanh residual stack
- HiFi-GAN vocoder β 4x upsample (rates [4,4,4,4]) with MRF resblocks to 16 kHz PCM
Speaker conditioning via 512-d x-vector (e.g. from Matthijs/cmu-arctic-xvectors). Deterministic output (greedy decoding, no sampling).
Released under MIT license.
Files
| File | Language | Size | Notes |
|---|---|---|---|
speecht5-tts-f16.gguf |
English | 301 MB | encoder + decoder + postnet + HiFi-GAN vocoder |
speecht5-german-f16.gguf |
German | 300 MB | German fine-tune, same architecture |
speaker.bin |
β | 2 KB | Default 512-d x-vector for speaker conditioning |
Quick start
# 1. Build CrispASR
git clone https://github.com/CrispStrobe/CrispASR
cd CrispASR
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j --target crispasr-cli
# 2. Download model + speaker
huggingface-cli download cstr/speecht5-tts-GGUF speecht5-tts-f16.gguf speaker.bin --local-dir .
# 3. Synthesize
./build/bin/crispasr --backend speecht5 -m speecht5-tts-f16.gguf \
--voice speaker.bin \
--tts "Hello, how are you today?" \
--tts-output hello.wav
Or with auto-download:
./build/bin/crispasr -m speecht5 --auto-download \
--tts "The quick brown fox jumps over the lazy dog." \
--tts-output fox.wav
Python binding
from crispasr import Session
sess = Session("speecht5-tts-f16.gguf")
sess.set_voice("speaker.bin")
pcm = sess.synthesize("Hello world.")
sess.write_wav("hello.wav", pcm)
Architecture details
See docs/architecture.md#speecht5 for the full architecture breakdown.
Conversion
Converted with models/convert-speecht5-tts-to-gguf.py from the CrispASR repo. The HiFi-GAN vocoder weights are from microsoft/speecht5_hifigan and are embedded in the same GGUF file.
- Downloads last month
- -
Hardware compatibility
Log In to add your hardware
16-bit
Model tree for cstr/speecht5-tts-GGUF
Base model
microsoft/speecht5_tts