You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Indic Speaker Embedding Model (Fine-tuned)

Fine-tuned speaker embedding model for Indian languages, based on pyannote wespeaker-voxceleb-resnet34-LM.

Model Description

This model was fine-tuned on 112K+ audio samples from:

IndicVoices: 22 Indian languages, massive speaker diversity
Kathbath: 12 Indian languages

Training Details

Base Model: pyannote/wespeaker-voxceleb-resnet34-LM
Embedding Dimension: 256
Training Samples: 84,741
Validation Samples: 17,161
Held-out for EER: 10,317
Total Speakers: 3,975 (training) + 442 (held-out)

Training Configuration

Phase 1: 5 epochs with frozen backbone (head only)
Phase 2: 15 epochs full fine-tuning
Augmentations: 13 types (noise, reverb, pitch shift, etc.)
Label smoothing: 0.1
Dropout: 0.3

Results

Metric	Value
Best Val Accuracy	91.4%
Best EER	4.18%

Usage

import torch
from pyannote.audio import Model

# Load base model
model = Model.from_pretrained("pyannote/wespeaker-voxceleb-resnet34-LM")

# Load fine-tuned weights
checkpoint = torch.load("checkpoint.pt")
# Note: This checkpoint includes a classification head for Indian languages

Intended Use

Speaker diarization for Indian language audio
Speaker verification/identification
Bengali speaker diarization (DLSPRINT challenge)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for smam/indic-speaker-embedding-finetuned

Base model

pyannote/wespeaker-voxceleb-resnet34-LM

Finetuned

(7)

this model

smam
/

indic-speaker-embedding-finetuned