MangoMAS-MoE-7M

A ~7 million parameter Mixture-of-Experts (MoE) neural routing model for multi-agent task orchestration.

Model Architecture

Input (64-dim feature vector from featurize64())
         β”‚
    β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
    β”‚   GATE    β”‚  Linear(64β†’512) β†’ ReLU β†’ Linear(512β†’16) β†’ Softmax
    β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
          β”‚
    ╔═══════════════════════════════════════════════════╗
    β•‘     16 Expert Towers (parallel)                    β•‘
    β•‘  Each: Linear(64β†’512) β†’ ReLU β†’ Linear(512β†’512)   β•‘
    β•‘        β†’ ReLU β†’ Linear(512β†’256)                    β•‘
    β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
          β”‚
    Weighted Sum (gate_weights Γ— expert_outputs)
          β”‚
    Classifier Head: Linear(256β†’N_classes)
          β”‚
       Output Logits

Parameter Count

Component Parameters
Gate Network 64Γ—512 + 512 + 512Γ—16 + 16 = ~41K
16 Expert Towers 16 Γ— (64Γ—512 + 512 + 512Γ—512 + 512 + 512Γ—256 + 256) = ~6.9M
Classifier Head 256Γ—10 + 10 = ~2.6K
Total ~6.95M

Input: 64-Dimensional Feature Vector

The model consumes a 64-dimensional feature vector produced by featurize64():

  • Dims 0-31: Hash-based sinusoidal encoding (content fingerprint)
  • Dims 32-47: Domain tag detection (code, security, architecture, etc.)
  • Dims 48-55: Structural signals (length, punctuation, questions)
  • Dims 56-59: Sentiment polarity estimates
  • Dims 60-63: Novelty/complexity scores

Training

  • Optimizer: AdamW (lr=1e-4, weight_decay=0.01)
  • Updates: Online learning from routing feedback
  • Minimum reward threshold: 0.1
  • Device: CPU / MPS / CUDA (auto-detected)

Usage

import torch
from moe_model import MixtureOfExperts7M, featurize64

# Create model
model = MixtureOfExperts7M(num_classes=10, num_experts=16)

# Extract features
features = featurize64("Design a secure REST API with authentication")
x = torch.tensor([features], dtype=torch.float32)

# Forward pass
logits, gate_weights = model(x)
print(f"Expert weights: {gate_weights}")
print(f"Top expert: {gate_weights.argmax().item()}")

Intended Use

This model is part of the MangoMAS multi-agent orchestration platform. It routes incoming tasks to the most appropriate expert agents based on the task's semantic content.

Primary use cases:

  • Multi-agent task routing
  • Expert selection for cognitive cell orchestration
  • Research demonstration of MoE architectures

Interactive Demo

Try the model live on the MangoMAS HuggingFace Space.

Citation

@software{mangomas2026,
  title={MangoMAS: Multi-Agent Cognitive Architecture},
  author={Shanker, Ian},
  year={2026},
  url={https://github.com/ianshank/MangoMAS}
}

Author

Built by Ian Shanker β€” MangoMAS Engineering

Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support