TexTeller ONNX

ONNX export of OleehyO/TexTeller, an image-to-LaTeX model based on VisionEncoderDecoderModel.

This export is tuned for:

transformers.js (browser / Node)
WebGPU / ONNX Runtime Web
KV-cache decoding (supports decoder_with_past_model.onnx with dynamic batch)

Main files:

encoder_model.onnx
- Input: pixel_values of shape [batch_size, 1, 448, 448]
decoder_model.onnx
- Decoder without past (not required for KV-cache decoding)
decoder_with_past_model.onnx
- Decoder with KV cache
- Key inputs:
  - input_ids: [batch_size, decoder_sequence_length]
  - encoder_hidden_states: [batch_size, encoder_sequence_length, 768]
  - past_key_values.N.decoder.{key,value}: [batch_size, 16, past_decoder_sequence_length, 64]
  - past_key_values.N.encoder.{key,value}: [batch_size, 16, encoder_sequence_length, 64]

Supporting files:

config.json – model config
tokenizer.json / tokenizer_config.json – tokenizer
preprocessor_config.json – image preprocessing config

Preprocessing

preprocessor_config.json:

{
  "do_resize": true,
  "size": { "height": 448, "width": 448 },
  "resample": 3,
  "do_normalize": true,
  "image_mean": [0.9545467],
  "image_std": [0.15394445],
  "do_convert_rgb": false,
  "num_channels": 1,
  "feature_extractor_type": "ViTFeatureExtractor"
}

Important:

Input must be grayscale (num_channels = 1)
Resize to 448 × 448

Normalize (per channel):

x = (x / 255.0 - 0.9545467) / 0.15394445

If you use AutoProcessor / ImageProcessor in transformers / transformers.js with this repo, it will apply these settings automatically.

transformers.js (browser / WebGPU) example

import { pipeline } from "@huggingface/transformers";

// Replace with this repo id
const MODEL_ID = "your-username/texteller-onnx";

const captioner = await pipeline("image-to-text", MODEL_ID, {
  device: "webgpu", // or "wasm"
  dtype: "fp16",    // good default for WebGPU
});

// Any image source supported by transformers.js: URL, HTMLImageElement, etc.
const outputs = await captioner("path-or-url-to-image.png", {
  max_new_tokens: 128,
});

console.log(outputs[0]?.generated_text);

Notes

This ONNX export supports decoder_with_past_model.onnx with dynamic batch, so you can implement your own batched, KV-cached beam search on top of model.forward and past_key_values.
For simple use cases, using pipeline("image-to-text", ...) as shown above is enough.

Downloads last month: 920

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ji-Ha/TexTeller3-ONNX-dynamic

Base model

OleehyO/TexTeller

Quantized

(3)

this model

Ji-Ha
/

TexTeller3-ONNX-dynamic

TexTeller ONNX

Contents

Preprocessing

transformers.js (browser / WebGPU) example

Notes

Model tree for Ji-Ha/TexTeller3-ONNX-dynamic

Datasets used to train Ji-Ha/TexTeller3-ONNX-dynamic