GRPO: Group Relative Policy Optimization—an RL algorithm that estimates baselines from group averages rather than a separate value network, reducing memory usage
toploc: A locality-sensitive hashing scheme used to verify that inference was actually performed by the specified model without re-running the full computation
shardcast: A custom library for distributing large files (model weights) via a tree-topology network to minimize bandwidth bottlenecks
vLLM: A high-throughput library for LLM inference and serving
FSDP: Fully Sharded Data Parallel—a technique to shard model parameters across GPUs to save memory
rollout: The process of generating a sequence of actions (tokens) from a policy (model) in an environment
locality-sensitive hashing: A hashing method where similar inputs produce similar hashes with high probability; used here to verify activation states
permissionless compute: Computing resources contributed by anyone (e.g., the public) without centralized authorization or pre-vetting