Video-Text-to-Text
Transformers
Safetensors
qwen2_5_vl
image-text-to-text
multimodal
agent
reinforcement-learning
text-generation-inference
Instructions to use Agents-X/PyVision-Video-7B-RL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Agents-X/PyVision-Video-7B-RL with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Agents-X/PyVision-Video-7B-RL") model = AutoModelForImageTextToText.from_pretrained("Agents-X/PyVision-Video-7B-RL") - Notebooks
- Google Colab
- Kaggle
Improve model card: add metadata and project links
#1
by nielsr HF Staff - opened
Hi! I'm Niels from the Hugging Face community science team. I've updated the model card for PyVision-Video-7B-RL to improve its discoverability and provide more context for researchers.
This PR:
- Adds
pipeline_tag: video-text-to-textandlibrary_name: transformersto the metadata. - Adds links to the GitHub repository and project page.
- Provides a brief overview of the PyVision-RL framework and its application to video reasoning.
Feel free to merge this to help users better understand and use the model!
stzhao changed pull request status to merged