如何在创建模型时就使用int4？

#73

by shamankk - opened Jul 31, 2023

在创建int4版本时内存不够。

只找到了ChatGLM版本的int4创建方式：
model = AutoModel.from_pretrained('./chatglm', trust_remote_code=True)
model = model.half().quantize(4).cuda()

现在ChatGLM2依然是这样吗？
电脑内存较小时，无法创建，是否可以在创建模型时就使用int4？而不是在权重加载完以后再变成int4

这样是不是可以
model = AutoModel.from_pretrained(model_name, trust_remote_code=True, cache_dir=cache_dir, device_map="auto", load_in_4bit=True).cuda()

社区有量化后的模型下载

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment