UPDATE:

Due to HF storage limits, the GGUF files have been move to modelscope:

quantzor/GLM-4.6-REAP-266B-A32B-Q4_K

This is Q4_K_M gguf quant of AesSedai/GLM-4.6-REAP-266B-A32B

What Is This?

AesSedai/GLM-4.6-REAP-266B-A32B was created using REAP (Router-weighted Expert Activation Pruning), a novel expert pruning method that selectively removes redundant experts while preserving the router's independent control over remaining experts.

See the GLM-4.5-Air version by Cerebras for more details cerebras/GLM-4.5-Air-REAP-82B-A12B

The MTP tensors were not included in this quant (though llama.cpp hasn't implemented this feature anyway)

** Imatrix **

GLM-4.6-REAP-266B-A32B-imatrix.dat

Original Model Card for GLM-4.6-REAP

Note: currently non-functional because of missing mtp.safetensors file and entry in model.safetensors.index.json

Forked from https://github.com/CerebrasResearch/reap to https://github.com/AesSedai/reap to hack in GLM-4.6 support.

Produced with:

bash experiments/pruning-cli.sh 0,1,2,3,4,5,6,7 zai-org/GLM-4.6 reap 42 0.25 theblackcat102/evol-codealpaca-v1 true true true false false
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for gghfez/GLM-4.6-REAP-266B-A32B-Q4_K

Base model

zai-org/GLM-4.6
Finetuned
(12)
this model