Instructions to use microsoft/Phi-3-vision-128k-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use microsoft/Phi-3-vision-128k-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="microsoft/Phi-3-vision-128k-instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-vision-128k-instruct", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use microsoft/Phi-3-vision-128k-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "microsoft/Phi-3-vision-128k-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Phi-3-vision-128k-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/microsoft/Phi-3-vision-128k-instruct

SGLang

How to use microsoft/Phi-3-vision-128k-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "microsoft/Phi-3-vision-128k-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Phi-3-vision-128k-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "microsoft/Phi-3-vision-128k-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Phi-3-vision-128k-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use microsoft/Phi-3-vision-128k-instruct with Docker Model Runner:
```
docker model run hf.co/microsoft/Phi-3-vision-128k-instruct
```

QLoRA merging

#61

by vknyazev - opened Aug 16, 2024

Discussion

vknyazev

Aug 16, 2024

How can I merge weights of Phi3 Vision fine-tuned with QLoRA? It seems that .merge_and_unload() method does not work.

2U1

Aug 19, 2024

https://github.com/2U1/Phi3-Vision-Finetune/blob/main/src/merge_lora_weights.py

You can use this code for merging it.

vknyazev

Aug 19, 2024

https://github.com/2U1/Phi3-Vision-Finetune/blob/main/src/merge_lora_weights.py

You can use this code for merging it.

Thank you. But I made fine-tuning with Microsoft fine-tuning script and unfortunately you merging script fails on my lora_config.json from fine-tuning output dir.

2U1

Aug 19, 2024

https://github.com/2U1/Phi3-Vision-Finetune/blob/main/src/merge_lora_weights.py

You can use this code for merging it.

Thank you. But I made fine-tuning with Microsoft fine-tuning script and unfortunately you merging script fails on my lora_config.json from fine-tuning output dir.

Oh I think the loading script should be fix for that one. Can I see the file list that is made from the training script? I'll try to fix and post the code here.

vknyazev

Aug 19, 2024

Oh I think the loading script should be fix for that one. Can I see the file list that is made from the training script? I'll try to fix and post the code here.

I have these files after fine-tuning:

adapter_config.json
image_embedding_phi3_v.py
special_tokens_map.json
adapter_model.safetensors
image_processing_phi3_v.py
tokenizer_config.json
configuration_phi3_v.py
modeling_phi3_v.py
tokenizer.json
eval_after.json
preprocessor_config.json
training_args.bin
eval_before.json
processing_phi3_v.py
generation_config.json
processor_config.json

Thanks in advance.

2U1

Aug 20, 2024

In your case you could just use the Automodel class.

import torch
from transformers import AutoModelForCausalLM, AutoProcessor
from peft import PeftModel
from accelerate import Accelerator

model = AutoModelForCausalLM.from_pretrained('microsoft/Phi-3-vision-128k-instruct', low_cpu_mem_usage=True, trust_remote_code=True, torch_dtype=torch.float16)
processor = AutoProcessor.from_pretrained('microsoft/Phi-3-vision-128k-instruct', trust_remote_code=True)

print('Loading LoRA weights...')
model = PeftModel.from_pretrained(model, model_path)

print('Merging LoRA weights...')
model = model.merge_and_unload()

print('Model Loaded!!!')


accel = Accelerator()
# You could set the shard size whatever you want
accel.save_model(model, save_model_path, max_shard_size = '5GB')
model.config.save_pretrained(save_model_path)
processor.save_pretrained(save_model_path)

You could use this like this. I changed a littlebit and tested in my code and it works. But I've tested in my structure of my directory (my repo), so I think you should change a little bit like some arguments for loading the model or model_path.

vknyazev

Aug 20, 2024

You could use this like this. I changed a littlebit and tested in my code and it works. But I've tested in my structure of my directory (my repo), so I think you should change a little bit like some arguments for loading the model or model_path.

I just added some missing .py files from base model and it's working!
Thank you very much.

vknyazev changed discussion status to closed Aug 20, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment