Research
Frontier reasoning doesn't require trillion-parameter models. It requires getting more reasoning out of every active parameter — then compressing what's left to run anywhere.
Industry validation
A 30B model that activates just 3B parameters per token — and beats NVIDIA's own 120B on code and math, running on a single RTX 4090.
Its post-training recipe — Cascade RL and on-policy distillation — is the same family that powers our pipeline. As frontier labs open these techniques, we absorb them and compress further, for hardware where the cloud isn't an option.
AIME 2025
92.4
Math reasoning
LiveCodeBench v6
87.2
Code generation
IMO 2025
Gold
35 points · competition math
Activation ratio
3B / 30B
10% active per token
Directions
Transfer behavior from 70B+ teacher models into sub-3B students via on-policy distillation.
INT8/INT4 quantization-aware training — less memory, faster edge compute, intact accuracy.
Remove non-critical connections to enforce sparsity and accelerate matrix multiplications.
Reward modeling and policy optimization to teach spatial and persona reasoning.