TexTeller ONNX
ONNX export of OleehyO/TexTeller, an image-to-LaTeX model based on VisionEncoderDecoderModel.
This export is tuned for:
- transformers.js (browser / Node)
- WebGPU / ONNX Runtime Web
- KV-cache decoding (supports
decoder_with_past_model.onnxwith dynamic batch)
Contents
Main files:
encoder_model.onnx- Input:
pixel_valuesof shape[batch_size, 1, 448, 448]
- Input:
decoder_model.onnx- Decoder without past (not required for KV-cache decoding)
decoder_with_past_model.onnx- Decoder with KV cache
- Key inputs:
input_ids:[batch_size, decoder_sequence_length]encoder_hidden_states:[batch_size, encoder_sequence_length, 768]past_key_values.N.decoder.{key,value}:[batch_size, 16, past_decoder_sequence_length, 64]past_key_values.N.encoder.{key,value}:[batch_size, 16, encoder_sequence_length, 64]
Supporting files:
config.json– model configtokenizer.json/tokenizer_config.json– tokenizerpreprocessor_config.json– image preprocessing config
Preprocessing
preprocessor_config.json:
{
"do_resize": true,
"size": { "height": 448, "width": 448 },
"resample": 3,
"do_normalize": true,
"image_mean": [0.9545467],
"image_std": [0.15394445],
"do_convert_rgb": false,
"num_channels": 1,
"feature_extractor_type": "ViTFeatureExtractor"
}
Important:
Input must be grayscale (
num_channels = 1)Resize to 448 × 448
Normalize (per channel):
x = (x / 255.0 - 0.9545467) / 0.15394445
If you use AutoProcessor / ImageProcessor in transformers / transformers.js with this repo, it will apply these settings automatically.
transformers.js (browser / WebGPU) example
import { pipeline } from "@huggingface/transformers";
// Replace with this repo id
const MODEL_ID = "your-username/texteller-onnx";
const captioner = await pipeline("image-to-text", MODEL_ID, {
device: "webgpu", // or "wasm"
dtype: "fp16", // good default for WebGPU
});
// Any image source supported by transformers.js: URL, HTMLImageElement, etc.
const outputs = await captioner("path-or-url-to-image.png", {
max_new_tokens: 128,
});
console.log(outputs[0]?.generated_text);
Notes
- This ONNX export supports
decoder_with_past_model.onnxwith dynamic batch, so you can implement your own batched, KV-cached beam search on top ofmodel.forwardandpast_key_values. - For simple use cases, using
pipeline("image-to-text", ...)as shown above is enough.
- Downloads last month
- 920
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for Ji-Ha/TexTeller3-ONNX-dynamic
Base model
OleehyO/TexTeller