MangoMAS-MoE-7M
A ~7 million parameter Mixture-of-Experts (MoE) neural routing model for multi-agent task orchestration.
Model Architecture
Input (64-dim feature vector from featurize64())
β
βββββββ΄ββββββ
β GATE β Linear(64β512) β ReLU β Linear(512β16) β Softmax
βββββββ¬ββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 16 Expert Towers (parallel) β
β Each: Linear(64β512) β ReLU β Linear(512β512) β
β β ReLU β Linear(512β256) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
Weighted Sum (gate_weights Γ expert_outputs)
β
Classifier Head: Linear(256βN_classes)
β
Output Logits
Parameter Count
| Component | Parameters |
|---|---|
| Gate Network | 64Γ512 + 512 + 512Γ16 + 16 = ~41K |
| 16 Expert Towers | 16 Γ (64Γ512 + 512 + 512Γ512 + 512 + 512Γ256 + 256) = ~6.9M |
| Classifier Head | 256Γ10 + 10 = ~2.6K |
| Total | ~6.95M |
Input: 64-Dimensional Feature Vector
The model consumes a 64-dimensional feature vector produced by featurize64():
- Dims 0-31: Hash-based sinusoidal encoding (content fingerprint)
- Dims 32-47: Domain tag detection (code, security, architecture, etc.)
- Dims 48-55: Structural signals (length, punctuation, questions)
- Dims 56-59: Sentiment polarity estimates
- Dims 60-63: Novelty/complexity scores
Training
- Optimizer: AdamW (lr=1e-4, weight_decay=0.01)
- Updates: Online learning from routing feedback
- Minimum reward threshold: 0.1
- Device: CPU / MPS / CUDA (auto-detected)
Usage
import torch
from moe_model import MixtureOfExperts7M, featurize64
# Create model
model = MixtureOfExperts7M(num_classes=10, num_experts=16)
# Extract features
features = featurize64("Design a secure REST API with authentication")
x = torch.tensor([features], dtype=torch.float32)
# Forward pass
logits, gate_weights = model(x)
print(f"Expert weights: {gate_weights}")
print(f"Top expert: {gate_weights.argmax().item()}")
Intended Use
This model is part of the MangoMAS multi-agent orchestration platform. It routes incoming tasks to the most appropriate expert agents based on the task's semantic content.
Primary use cases:
- Multi-agent task routing
- Expert selection for cognitive cell orchestration
- Research demonstration of MoE architectures
Interactive Demo
Try the model live on the MangoMAS HuggingFace Space.
Citation
@software{mangomas2026,
title={MangoMAS: Multi-Agent Cognitive Architecture},
author={Shanker, Ian},
year={2026},
url={https://github.com/ianshank/MangoMAS}
}
Author
Built by Ian Shanker β MangoMAS Engineering
- Downloads last month
- 18