Text Generation
Transformers
Safetensors
multilingual
phi3_v
nlp
code
vision
conversational
custom_code
Instructions to use microsoft/Phi-3-vision-128k-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/Phi-3-vision-128k-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="microsoft/Phi-3-vision-128k-instruct", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-vision-128k-instruct", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use microsoft/Phi-3-vision-128k-instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "microsoft/Phi-3-vision-128k-instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Phi-3-vision-128k-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/microsoft/Phi-3-vision-128k-instruct
- SGLang
How to use microsoft/Phi-3-vision-128k-instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "microsoft/Phi-3-vision-128k-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Phi-3-vision-128k-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "microsoft/Phi-3-vision-128k-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Phi-3-vision-128k-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use microsoft/Phi-3-vision-128k-instruct with Docker Model Runner:
docker model run hf.co/microsoft/Phi-3-vision-128k-instruct
clean_sample_code (#36)
Browse files- remove useless sample code (5bbaa42f12e926c3ce90d4ce4113e4566d8c16b5)
Co-authored-by: Haiping Wu <haipingwu@users.noreply.huggingface.co>
- sample_inference.py +0 -21
sample_inference.py
CHANGED
|
@@ -18,7 +18,6 @@ assistant_prompt = '<|assistant|>\n'
|
|
| 18 |
prompt_suffix = "<|end|>\n"
|
| 19 |
|
| 20 |
#################################################### text-only ####################################################
|
| 21 |
-
# single-image prompt
|
| 22 |
prompt = f"{user_prompt}what is the answer for 1+1? Explain it.{prompt_suffix}{assistant_prompt}"
|
| 23 |
print(f">>> Prompt\n{prompt}")
|
| 24 |
inputs = processor(prompt, images=None, return_tensors="pt").to("cuda:0")
|
|
@@ -33,7 +32,6 @@ response = processor.batch_decode(generate_ids,
|
|
| 33 |
print(f'>>> Response\n{response}')
|
| 34 |
|
| 35 |
#################################################### text-only 2 ####################################################
|
| 36 |
-
# single-image prompt
|
| 37 |
prompt = f"{user_prompt}Give me the code for sloving two-sum problem.{prompt_suffix}{assistant_prompt}"
|
| 38 |
print(f">>> Prompt\n{prompt}")
|
| 39 |
inputs = processor(prompt, images=None, return_tensors="pt").to("cuda:0")
|
|
@@ -66,25 +64,6 @@ response = processor.batch_decode(generate_ids,
|
|
| 66 |
print(f'>>> Response\n{response}')
|
| 67 |
|
| 68 |
#################################################### EXAMPLE 2 ####################################################
|
| 69 |
-
# multiple image prompt
|
| 70 |
-
# Note: image tokens must start from <|image_1|>
|
| 71 |
-
prompt = f"{user_prompt}<|image_1|>\n<|image_2|>\n What is shown in this two images?{prompt_suffix}{assistant_prompt}"
|
| 72 |
-
print(f">>> Prompt\n{prompt}")
|
| 73 |
-
url = "https://www.ilankelman.org/stopsigns/australia.jpg"
|
| 74 |
-
image_1 = Image.open(requests.get(url, stream=True).raw)
|
| 75 |
-
url = "https://img.freepik.com/free-photo/painting-mountain-lake-with-mountain-background_188544-9126.jpg?w=2000"
|
| 76 |
-
image_2 = Image.open(requests.get(url, stream=True).raw)
|
| 77 |
-
images = [image_1, image_2]
|
| 78 |
-
inputs = processor(prompt, images, return_tensors="pt").to("cuda:0")
|
| 79 |
-
generate_ids = model.generate(**inputs,
|
| 80 |
-
max_new_tokens=1000,
|
| 81 |
-
eos_token_id=processor.tokenizer.eos_token_id,
|
| 82 |
-
)
|
| 83 |
-
generate_ids = generate_ids[:, inputs['input_ids'].shape[1]:]
|
| 84 |
-
response = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
|
| 85 |
-
print(f'>>> Response\n{response}')
|
| 86 |
-
|
| 87 |
-
#################################################### EXAMPLE 3 ####################################################
|
| 88 |
# chat template
|
| 89 |
chat = [
|
| 90 |
{"role": "user", "content": "<|image_1|>\nWhat is shown in this image?"},
|
|
|
|
| 18 |
prompt_suffix = "<|end|>\n"
|
| 19 |
|
| 20 |
#################################################### text-only ####################################################
|
|
|
|
| 21 |
prompt = f"{user_prompt}what is the answer for 1+1? Explain it.{prompt_suffix}{assistant_prompt}"
|
| 22 |
print(f">>> Prompt\n{prompt}")
|
| 23 |
inputs = processor(prompt, images=None, return_tensors="pt").to("cuda:0")
|
|
|
|
| 32 |
print(f'>>> Response\n{response}')
|
| 33 |
|
| 34 |
#################################################### text-only 2 ####################################################
|
|
|
|
| 35 |
prompt = f"{user_prompt}Give me the code for sloving two-sum problem.{prompt_suffix}{assistant_prompt}"
|
| 36 |
print(f">>> Prompt\n{prompt}")
|
| 37 |
inputs = processor(prompt, images=None, return_tensors="pt").to("cuda:0")
|
|
|
|
| 64 |
print(f'>>> Response\n{response}')
|
| 65 |
|
| 66 |
#################################################### EXAMPLE 2 ####################################################
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
# chat template
|
| 68 |
chat = [
|
| 69 |
{"role": "user", "content": "<|image_1|>\nWhat is shown in this image?"},
|