Angelino Santiago
AI & ML interests
Recent Activity
Organizations
A wild idea / suggestion...
Currently have full running 13B (GLM 4.7 Flash) - which is very strong ; and experimental 21Bs of Qwen 3.5.
These are trained.These are in testing, and access is limited as of this writing.
As for MOEs:
This is a little more complicated as scripting must be written for Mergekit to "moe together" 0.8B, 2B, 4B, 9Bs etc etc.
A draft (by me) has been completed to do this; but not tested/debugged yet.No time line here ; too many variables.
RE 35B moes ; it is possible to address this in a different way ; but I have not tried it yet.
This is a different approach than REAP.
I believe I saw that 13B model repository earlier, but I cannot see it anymore. Was it an upscaled dense model of Qwen 3.5 9B with further training? That could be pretty interesting. Did you remove it or hide it? I was really looking forward to trying that model, or finetunes based on it. Hopefully, there is still a chance for that to reach public. 🙏
Good luck with these projects! 👍