Instructions to use ayeshaishaq/DriveLMMo1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ayeshaishaq/DriveLMMo1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="ayeshaishaq/DriveLMMo1", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("ayeshaishaq/DriveLMMo1", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ayeshaishaq/DriveLMMo1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ayeshaishaq/DriveLMMo1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ayeshaishaq/DriveLMMo1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/ayeshaishaq/DriveLMMo1

SGLang

How to use ayeshaishaq/DriveLMMo1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ayeshaishaq/DriveLMMo1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ayeshaishaq/DriveLMMo1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ayeshaishaq/DriveLMMo1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ayeshaishaq/DriveLMMo1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use ayeshaishaq/DriveLMMo1 with Docker Model Runner:
```
docker model run hf.co/ayeshaishaq/DriveLMMo1
```

DriveLMM-o1: A Large Multimodal Model for Autonomous Driving Reasoning

Paper

DriveLMM-o1 is a fine-tuned large multimodal model designed for autonomous driving. Built on InternVL2.5-8B with LoRA-based adaptation, it leverages stitched multiview images to produce step-by-step reasoning. This structured approach enhances both final decision accuracy and interpretability in complex driving tasks like perception, prediction, and planning.

Key Features:

Multimodal Integration: Combines multiview images for comprehensive scene understanding.
Step-by-Step Reasoning: Produces detailed intermediate reasoning steps to explain decisions.
Efficient Adaptation: Utilizes dynamic image patching and LoRA finetuning for high-resolution inputs with minimal extra parameters.
Performance Gains: Achieves significant improvements in both final answer accuracy and overall reasoning scores compared to previous open-source models.

Performance Comparison:

Model	Risk Assessment Accuracy	Traffic Rule Adherence	Scene Awareness & Object Understanding	Relevance	Missing Details	Overall Reasoning Score	Final Answer Accuracy
GPT-4o (Closed)	71.32	80.72	72.96	76.65	71.43	72.52	57.84
Qwen-2.5-VL-7B	46.44	60.45	51.02	50.15	52.19	51.77	37.81
Ovis1.5-Gemma2-9B	51.34	66.36	54.74	55.72	55.74	55.62	48.85
Mulberry-7B	51.89	63.66	56.68	57.27	57.45	57.65	52.86
LLaVA-CoT	57.62	69.01	60.84	62.72	60.67	61.41	49.27
LlamaV-o1	60.20	73.52	62.67	64.66	63.41	63.13	50.02
InternVL2.5-8B	69.02	78.43	71.52	75.80	70.54	71.62	54.87
DriveLMM-o1 (Ours)	73.01	81.56	75.39	79.42	74.49	75.24	62.36

Usage:

Load the model using the following code snippet:

from transformers import AutoModel, AutoTokenizer
import torch

path = 'ayeshaishaq/DriveLMMo1'
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    use_flash_attn=True,
    trust_remote_code=True
).eval().cuda()

tokenizer = AutoTokenizer.from_pretrained(
    path,
    trust_remote_code=True,
    use_fast=False
)

For detailed usage instructions and additional configurations, please refer to the OpenGVLab/InternVL2_5-8B repository.

Code: https://github.com/ayesha-ishaq/DriveLMM-o1

Limitations: While DriveLMM-o1 demonstrates strong performance in autonomous driving tasks, it is fine-tuned for domain-specific reasoning. Users may need to further fine-tune or adapt the model for different driving environments.

Downloads last month: 41

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for ayeshaishaq/DriveLMMo1

Base model

OpenGVLab/InternVL2_5-8B

Finetuned

(16)

this model

Dataset used to train ayeshaishaq/DriveLMMo1

Paper for ayeshaishaq/DriveLMMo1

DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding

Paper • 2503.10621 • Published Mar 13, 2025