可以使用vllm吗

by xiabo0816 - opened Sep 28, 2025

Discussion

xiabo0816

Sep 28, 2025

小白想问一下，如果想要使用vllm部署该怎么办呢

YanshekWoo

Lychee Team org Sep 29, 2025

可以先参考vllm官方文档：https://docs.vllm.ai/en/v0.7.1/getting_started/examples/embedding.html
具体效果和性能建议和 sentence-transformers 进行对比，避免vllm框架本身的BUG。
注意：由于该模型修改了双向注意力，需加载仓库中的 modeling.py，因此在加载模型时添加 trust_remote_code=True

Yuki131

Lychee Team org Sep 30, 2025

•

edited Sep 30, 2025

小白想问一下，如果想要使用vllm部署该怎么办呢

你好，KaLM-Embedding 模型已适配 vllm，相关实现可参考以下代码

import torch
import vllm
from vllm import LLM
def get_detailed_instruct(task_description: str, query: str) -> str:
    return f'Instruct: {task_description}\nQuery:{query}'

task = 'Given a query, retrieve documents that answer the query'
queries = [
    get_detailed_instruct(task, 'What is the capital of China?'),
    get_detailed_instruct(task, 'Explain gravity')
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documents

model = LLM(model="{MODEL_NAME_OR_PATH}", task="embed", trust_remote_code=True, dtype="float16")

outputs = model.embed(input_texts)
embeddings = torch.tensor([o.outputs.embedding for o in outputs])
scores = (embeddings[:2] @ embeddings[2:].T)
print(scores.tolist())

Yuki131 changed discussion status to closed Oct 14, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment