Instructions to use ayeshaishaq/DriveLMMo1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ayeshaishaq/DriveLMMo1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="ayeshaishaq/DriveLMMo1", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ayeshaishaq/DriveLMMo1", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ayeshaishaq/DriveLMMo1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ayeshaishaq/DriveLMMo1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ayeshaishaq/DriveLMMo1", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/ayeshaishaq/DriveLMMo1
- SGLang
How to use ayeshaishaq/DriveLMMo1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ayeshaishaq/DriveLMMo1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ayeshaishaq/DriveLMMo1", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ayeshaishaq/DriveLMMo1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ayeshaishaq/DriveLMMo1", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use ayeshaishaq/DriveLMMo1 with Docker Model Runner:
docker model run hf.co/ayeshaishaq/DriveLMMo1
DriveLMM-o1: A Large Multimodal Model for Autonomous Driving Reasoning
DriveLMM-o1 is a fine-tuned large multimodal model designed for autonomous driving. Built on InternVL2.5-8B with LoRA-based adaptation, it leverages stitched multiview images to produce step-by-step reasoning. This structured approach enhances both final decision accuracy and interpretability in complex driving tasks like perception, prediction, and planning.
Key Features:
- Multimodal Integration: Combines multiview images for comprehensive scene understanding.
- Step-by-Step Reasoning: Produces detailed intermediate reasoning steps to explain decisions.
- Efficient Adaptation: Utilizes dynamic image patching and LoRA finetuning for high-resolution inputs with minimal extra parameters.
- Performance Gains: Achieves significant improvements in both final answer accuracy and overall reasoning scores compared to previous open-source models.
Performance Comparison:
| Model | Risk Assessment Accuracy | Traffic Rule Adherence | Scene Awareness & Object Understanding | Relevance | Missing Details | Overall Reasoning Score | Final Answer Accuracy |
|---|---|---|---|---|---|---|---|
| GPT-4o (Closed) | 71.32 | 80.72 | 72.96 | 76.65 | 71.43 | 72.52 | 57.84 |
| Qwen-2.5-VL-7B | 46.44 | 60.45 | 51.02 | 50.15 | 52.19 | 51.77 | 37.81 |
| Ovis1.5-Gemma2-9B | 51.34 | 66.36 | 54.74 | 55.72 | 55.74 | 55.62 | 48.85 |
| Mulberry-7B | 51.89 | 63.66 | 56.68 | 57.27 | 57.45 | 57.65 | 52.86 |
| LLaVA-CoT | 57.62 | 69.01 | 60.84 | 62.72 | 60.67 | 61.41 | 49.27 |
| LlamaV-o1 | 60.20 | 73.52 | 62.67 | 64.66 | 63.41 | 63.13 | 50.02 |
| InternVL2.5-8B | 69.02 | 78.43 | 71.52 | 75.80 | 70.54 | 71.62 | 54.87 |
| DriveLMM-o1 (Ours) | 73.01 | 81.56 | 75.39 | 79.42 | 74.49 | 75.24 | 62.36 |
Usage:
Load the model using the following code snippet:
from transformers import AutoModel, AutoTokenizer
import torch
path = 'ayeshaishaq/DriveLMMo1'
model = AutoModel.from_pretrained(
path,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
use_flash_attn=True,
trust_remote_code=True
).eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(
path,
trust_remote_code=True,
use_fast=False
)
For detailed usage instructions and additional configurations, please refer to the OpenGVLab/InternVL2_5-8B repository.
Code: https://github.com/ayesha-ishaq/DriveLMM-o1
Limitations: While DriveLMM-o1 demonstrates strong performance in autonomous driving tasks, it is fine-tuned for domain-specific reasoning. Users may need to further fine-tune or adapt the model for different driving environments.
- Downloads last month
- 41
Model tree for ayeshaishaq/DriveLMMo1
Base model
OpenGVLab/InternVL2_5-8B