RichardForests 's Collections Transformers & MoE
updated
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper
• 2312.07987
• Published
• 41
Interfacing Foundation Models' Embeddings
Paper
• 2312.07532
• Published
• 12
Point Transformer V3: Simpler, Faster, Stronger
Paper
• 2312.10035
• Published
• 22
TheBloke/quantum-v0.01-GPTQ
Text Generation
• 7B • Updated
• 3
• 2
Text Generation
• 36B • Updated
• 3
• 1
mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-HQQ
Text Generation
• Updated
• 14
• 38
Denoising Vision Transformers
Paper
• 2401.02957
• Published
• 31
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
• 2401.06066
• Published
• 59
Buffer Overflow in Mixture of Experts
Paper
• 2402.05526
• Published
• 9
Beyond Scaling Laws: Understanding Transformer Performance with
Associative Memory
Paper
• 2405.08707
• Published
• 34