Supertonic TTS Quantization for QCS6490

A step-by-step guide to quantize the Supertonic TTS model for Qualcomm QCS6490 using QAIRT/QNN.

Sample Output

Audio generated on QCS6490 board using quantized models (10 diffusion steps, noise-reduced):

Requirements

QAIRT/QNN SDK v2.37
Python 3.8+
Target device: QCS6490

Pipeline Architecture

                text + style
                     │
         ┌───────────┴───────────┐
         │                       │
  duration_predictor        text_encoder
         │                       │
    duration (scalar)       text_emb (1,128,256)
         │                       │
   latent_mask (1,1,256)         │
         └───────────┬───────────┘
                     │
              vector_estimator (10 diffusion steps)
                     │
               denoised_latent
                     │
                  vocoder
                     │
              audio (44.1kHz)

The duration_predictor outputs a single scalar representing the total speech duration. This is post-processed into a latent_mask that tells the vector_estimator how many of the 256 fixed-size latent frames are active speech vs padding.

Workflow

1. Input Preparation

Prepare calibration inputs for model quantization.

Input_Preparation.ipynb

2. Step-by-Step Quantization

Convert ONNX models to QNN format with quantization for HTP backend.

Supertonic_TTS_StepbyStep.ipynb

3. Correlation Verification

Verify quantized model outputs against reference using cosine similarity.

Correlation_Verification.ipynb

Project Structure

├── Input_Preparation.ipynb         # Prepare calibration inputs
├── Supertonic_TTS_StepbyStep.ipynb # ONNX → QNN quantization guide
├── Correlation_Verification.ipynb  # Output verification
├── assets/                         # ONNX models (git submodule)
│   └── onnx/
│       ├── text_encoder.onnx
│       ├── duration_predictor.onnx
│       ├── vector_estimator.onnx
│       └── vocoder.onnx
├── QNN_Models/                     # Quantized QNN models (.bin, .cpp)
├── QNN_Model_lib/                  # QNN runtime libraries (aarch64)
├── qnn_calibration/                # Calibration data for verification
├── inputs/                         # Prepared input data
└── board_output/                   # Inference outputs from board

Models

Model	Description
text_encoder	Encodes text tokens with style embedding
duration_predictor	Predicts phoneme durations
vector_estimator	Diffusion-based latent generator (10 steps)
vocoder	Converts latent to audio waveform

ONNX Models (Source)

Located in assets/onnx/ (git submodule from Hugging Face):

text_encoder.onnx
duration_predictor.onnx
vector_estimator.onnx
vocoder.onnx

QNN Models (Quantized)

Located in QNN_Models/:

text_encoder_htp.bin / .cpp
vector_estimator_htp.bin / .cpp
vocoder_htp.bin / .cpp

Compiled Libraries (Ready for Deployment)

Located in QNN_Model_lib/aarch64-oe-linux-gcc11.2/:

libtext_encoder_htp.so
libvector_estimator_htp.so
libvocoder_htp.so
libduration_predictor_htp.so

These .so files are compiled from the .cpp sources and are ready to be deployed (via SCP) to the board for inference.

Note: The duration_predictor is quantized and compiled but not used in the current calibration-based workflow since latent_mask is precomputed. For an end-to-end pipeline with arbitrary text input, the duration predictor must run first to dynamically generate the latent_mask.

Getting Started

Clone with submodules:

git clone --recurse-submodules https://github.com/dev-ansh-r/Supertonic-TTS-QCS6490

Follow the notebooks in order:
- Input_Preparation.ipynb
- Supertonic_TTS_StepbyStep.ipynb
- Correlation_Verification.ipynb

Note

Inference script and sample application are not provided. Optimization work is ongoing and will be released soon.

License

This model inherits the licensing from Supertone/supertonic-2:

Model: OpenRAIL-M License
Code: MIT License

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for dev-ansh-r/qualcomm-Supertonic-TTS-QCS6490

Base model

Supertone/supertonic-2

Finetuned

(1)

this model