anyone running via cpu+gpu+rpc gpu ?
i am getting very slow when i am running like this lol any help
Heya gopi! I'd have to see your full command to help out better and understand your setup too especially given you are using RPC.
RPC is not the most supported feature, and not likely to give best performance as you probably know.
Your best bet is trying to run on a single system with CPU+GPU(s).
Maybe a smaller quant would fit on a single rig? Otherwise, you'll have to play some games and do some research to figure out the best way of organizing the order of devices for using RPC. Also I don't think RPC can take advantage of the new -sm parallel last time I tried, but things move so quickly who knows today haha..
Happy new year!
happy new year!
ubergarm i have 256 ram and dual cpu and 12gb rtx and extranal connected gpu 3080 with 16vram.
looks like i need to add another gou in my server for boost up .
CUDA_VISIBLE_DEVICES="" ./bin/llama-server
--model "/home/gopi/deepresearch-ui/model/MiMo-V2-Flash-Q4_K_M-00001-of-00004.gguf"
--ctx-size 30000
--threads 40
--threads-batch 40
--host 0.0.0.0
--jinja
--port 8080
--mlock
--no-mmap
CUDA_VISIBLE_DEVICES="0" ./bin/llama-server
--model "/home/gopi/deepresearch-ui/model/MiniMax-M2.1-MXFP4_MOE-00001-of-00007.gguf"
--ctx-size 20000
-ngl 99
--n-cpu-moe 63
--threads 28
--threads-batch 28
--host 0.0.0.0
--mlock
--no-mmap
--jinja
--port 8080
i am currently expolering this two model and also i am thinking of to create a website for people share there model tricks and system specification so that it would be help to everyone. whats your on thoughts this ?