Supertonic TTS Quantization for QCS6490

A step-by-step guide to quantize the Supertonic TTS model for Qualcomm QCS6490 using QAIRT/QNN.

Sample Output

Audio generated on QCS6490 board using quantized models (10 diffusion steps, noise-reduced):

Requirements

  • QAIRT/QNN SDK v2.37
  • Python 3.8+
  • Target device: QCS6490

Pipeline Architecture

                text + style
                     │
         ┌───────────┴───────────┐
         │                       │
  duration_predictor        text_encoder
         │                       │
    duration (scalar)       text_emb (1,128,256)
         │                       │
   latent_mask (1,1,256)         │
         └───────────┬───────────┘
                     │
              vector_estimator (10 diffusion steps)
                     │
               denoised_latent
                     │
                  vocoder
                     │
              audio (44.1kHz)

The duration_predictor outputs a single scalar representing the total speech duration. This is post-processed into a latent_mask that tells the vector_estimator how many of the 256 fixed-size latent frames are active speech vs padding.

Workflow

1. Input Preparation

Prepare calibration inputs for model quantization.

Input_Preparation.ipynb

2. Step-by-Step Quantization

Convert ONNX models to QNN format with quantization for HTP backend.

Supertonic_TTS_StepbyStep.ipynb

3. Correlation Verification

Verify quantized model outputs against reference using cosine similarity.

Correlation_Verification.ipynb

Project Structure

├── Input_Preparation.ipynb         # Prepare calibration inputs
├── Supertonic_TTS_StepbyStep.ipynb # ONNX → QNN quantization guide
├── Correlation_Verification.ipynb  # Output verification
├── assets/                         # ONNX models (git submodule)
│   └── onnx/
│       ├── text_encoder.onnx
│       ├── duration_predictor.onnx
│       ├── vector_estimator.onnx
│       └── vocoder.onnx
├── QNN_Models/                     # Quantized QNN models (.bin, .cpp)
├── QNN_Model_lib/                  # QNN runtime libraries (aarch64)
├── qnn_calibration/                # Calibration data for verification
├── inputs/                         # Prepared input data
└── board_output/                   # Inference outputs from board

Models

Model Description
text_encoder Encodes text tokens with style embedding
duration_predictor Predicts phoneme durations
vector_estimator Diffusion-based latent generator (10 steps)
vocoder Converts latent to audio waveform

ONNX Models (Source)

Located in assets/onnx/ (git submodule from Hugging Face):

  • text_encoder.onnx
  • duration_predictor.onnx
  • vector_estimator.onnx
  • vocoder.onnx

QNN Models (Quantized)

Located in QNN_Models/:

  • text_encoder_htp.bin / .cpp
  • vector_estimator_htp.bin / .cpp
  • vocoder_htp.bin / .cpp

Compiled Libraries (Ready for Deployment)

Located in QNN_Model_lib/aarch64-oe-linux-gcc11.2/:

  • libtext_encoder_htp.so
  • libvector_estimator_htp.so
  • libvocoder_htp.so
  • libduration_predictor_htp.so

These .so files are compiled from the .cpp sources and are ready to be deployed (via SCP) to the board for inference.

Note: The duration_predictor is quantized and compiled but not used in the current calibration-based workflow since latent_mask is precomputed. For an end-to-end pipeline with arbitrary text input, the duration predictor must run first to dynamically generate the latent_mask.

Getting Started

  1. Clone with submodules:

    git clone --recurse-submodules https://github.com/dev-ansh-r/Supertonic-TTS-QCS6490
    
  2. Follow the notebooks in order:

    • Input_Preparation.ipynb
    • Supertonic_TTS_StepbyStep.ipynb
    • Correlation_Verification.ipynb

Note

Inference script and sample application are not provided. Optimization work is ongoing and will be released soon.

License

This model inherits the licensing from Supertone/supertonic-2:

  • Model: OpenRAIL-M License
  • Code: MIT License

Copyright (c) 2026 Supertone Inc. (original model)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dev-ansh-r/qualcomm-Supertonic-TTS-QCS6490

Finetuned
(1)
this model