Paraformer-zh-streaming

Streaming speech recognition โ€” real-time transcription with low latency for Chinese.

Streaming variant of Paraformer that processes audio in chunks, delivering results in real-time as audio arrives.

Quick Start

from funasr import AutoModel

model = AutoModel(
    model="funasr/paraformer-zh-streaming",
    hub="hf",
    device="cuda",
)

# Streaming inference with chunk processing
import sounddevice as sd
import numpy as np

chunk_size = [0, 10, 5]  # [lookback, chunk, lookahead] in 60ms frames
result = model.generate(
    input="audio.wav",
    chunk_size=chunk_size,
    encoder_chunk_look_back=4,
    decoder_chunk_look_back=1,
)
print(result[0]["text"])

Features

  • Real-time streaming recognition
  • Low latency chunk-by-chunk processing
  • Chinese + English support
  • Compatible with WebSocket deployment

Model Details

Property Value
Architecture Paraformer-Streaming
Parameters 220M
Languages Chinese, English
Mode Streaming (online)
Sample Rate 16kHz

Links

Downloads last month
234
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 1 Ask for provider support

Model tree for funasr/paraformer-zh-streaming

Quantizations
1 model

Space using funasr/paraformer-zh-streaming 1