rl-llm-agent/Llama-3.2-3B-Instruct-sft-alfworld-leap-iter1 Text Generation • 3B • Updated Feb 12, 2025 • 5
rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-exploration-aflworld-iter0-checkpoint-50 Updated Jan 16, 2025 • 4
rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter1 Text Generation • 3B • Updated Jan 10, 2025 • 5