trl internal testing

company

Activity Feed Request to join this org

AI & ML interests

Internal testing artifact mangement for trl library

Recent Activity

qgallouedec new activity 3 days ago

trl-internal-testing/tiny-Qwen3VLForConditionalGeneration:Upload Qwen3VLForConditionalGeneration

qgallouedec new activity 3 days ago

trl-internal-testing/tiny-Qwen3VLForConditionalGeneration:Upload Qwen3VLForConditionalGeneration

qgallouedec new activity 3 days ago

trl-internal-testing/tiny-LlavaNextForConditionalGeneration:Upload LlavaNextForConditionalGeneration

View all activity

qgallouedec

in trl-internal-testing/tiny-Qwen3VLForConditionalGeneration 3 days ago

Upload Qwen3VLForConditionalGeneration

#3 opened 3 days ago by

qgallouedec

Upload Qwen3VLForConditionalGeneration

#4 opened 3 days ago by

qgallouedec

in trl-internal-testing/tiny-LlavaNextForConditionalGeneration 3 days ago

Upload LlavaNextForConditionalGeneration

#7 opened 3 days ago by

qgallouedec

in trl-internal-testing/tiny-LlavaForConditionalGeneration 3 days ago

Upload LlavaForConditionalGeneration

#4 opened 3 days ago by

qgallouedec

Upload LlavaForConditionalGeneration

#3 opened 3 days ago by

qgallouedec

in trl-internal-testing/tiny-LlavaNextForConditionalGeneration 3 days ago

Upload LlavaNextForConditionalGeneration

#6 opened 3 days ago by

qgallouedec

albertvillanova

in trl-internal-testing/tiny-Gemma4ForConditionalGeneration 4 days ago

Upload Gemma4ForConditionalGeneration

#6 opened 5 days ago by

albertvillanova

qgallouedec

in trl-internal-testing/tiny-Cohere2ForCausalLM 4 days ago

Upload Cohere2ForCausalLM

#2 opened 13 days ago by

qgallouedec

Upload tiny Cohere2ForCausalLM

#1 opened 24 days ago by

qgallouedec

in trl-internal-testing/tiny-CohereForCausalLM 4 days ago

Upload tiny CohereForCausalLM

#1 opened 24 days ago by

qgallouedec

Upload CohereForCausalLM

#2 opened 13 days ago by

qgallouedec

in trl-internal-testing/tiny-Glm4MoeForCausalLM 4 days ago

Upload Glm4MoeForCausalLM

#1 opened 24 days ago by

qgallouedec

albertvillanova

in trl-internal-testing/tiny-Gemma4ForConditionalGeneration 6 days ago

Upload Gemma4ForConditionalGeneration

#5 opened 6 days ago by

albertvillanova

Upload Gemma4ForConditionalGeneration

#4 opened 6 days ago by

albertvillanova

Upload Gemma4ForConditionalGeneration

#3 opened 6 days ago by

albertvillanova

qgallouedec

posted an update 8 days ago

Post

9901

Shipped hf-sandbox! 🥡

🧪 Running an eval that executes model-generated C on a few thousand prompts? You probably don't want any of that on your laptop.
Just shipped hf-sandbox, a Modal-style sandbox API on top of Hugging Face Jobs. Spin up an isolated, ephemeral container, run untrusted code, get the result back. No Docker on your laptop, no infra to manage.

Just pip install hf-sandbox.

Early days (v0.1); feedback and issues very welcome:
👉 https://github.com/huggingface/hf-sandbox

1 reply

qgallouedec

in trl-internal-testing/tiny-Qwen2_5_VLForConditionalGeneration 10 days ago

Upload Qwen2_5_VLForConditionalGeneration

#11 opened 10 days ago by

qgallouedec

Upload Qwen2_5_VLForConditionalGeneration

#10 opened 10 days ago by

qgallouedec

Upload Qwen2_5_VLForConditionalGeneration

#9 opened 10 days ago by

qgallouedec

posted an update 10 days ago

Post

249

**TRL v1.4 is out 🚀** Chunked NLL loss for SFT and a first-class **OpenReward** integration.

**Chunked NLL loss for SFT — drops peak VRAM by up to 14×**

Standard SFT materializes a full [batch × seq × vocab] logits tensor before computing cross-entropy, which dominates peak memory at long context lengths. The new loss_type="chunked_nll" path drops ignored-label tokens before the lm_head matmul and computes cross-entropy in checkpointed chunks of 256.

Peak GPU memory, AdamW fp32:
- Qwen3-14B, 8×H100 FSDP2, 16k seq: 58.9 GB → 38.9 GB
- Qwen3-4B, 1×H100 80GB, 16k seq: OOM → 63.8 GB
- Qwen3-32B, 8×H100 FSDP2, 8k seq: OOM → 71.2 GB

End-to-end it's consistently as fast or faster than nll, and unlocks sequence lengths that don't fit at all under the standard path.

SFTConfig(loss_type="chunked_nll")

Works with PEFT and VLMs out of the box.

**Open Reward Standard environment adapter**

The new trl.experimental.openreward adapter plugs any environment speaking the [Open Reward Standard](https://openrewardstandard.io) protocol into any TRL trainer that takes an environment_factory. One string — a catalog name or a URL — wires the dataset, factory, and reward_func slots; tools are bound dynamically from JSON Schema, no per-env wrapper code:

from trl import GRPOTrainer
from trl.experimental.openreward import OpenRewardSpec

spec = OpenRewardSpec("Eigent/SETA", num_tasks=64)

trainer = GRPOTrainer(
    ...,
    train_dataset=spec.train_dataset,
    environment_factory=spec.environment_factory,
    reward_funcs=spec.reward_funcs,
)

v1.4 also brings MFU helpers for dense + MoE models, GRPO support for Liger 0.8.0 (delta clipping + VESPO + KL bias correction), Tülu 3's length-normalized DPO loss, four more training chat templates (Cohere, Cohere2, Gemma 3, Qwen3-2507), and a 5+ GB CUDA memory leak fix in activation offloading.

Full release notes: https://github.com/huggingface/trl/releases/tag/v1.4.0

AI & ML interests

Recent Activity

Team members 8