Cold-Start: A scenario where an agent must perform a task without prior training data or expert demonstrations
Ascend C: A domain-specific language (DSL) for programming Huawei Ascend NPUs, similar to CUDA but with different memory hierarchies and APIs
Pass@k: A metric measuring the probability that at least one out of k generated code samples is correct
Monte-Carlo update: An RL update method that uses the total accumulated reward from a complete episode to update value estimates
Q-value: An estimate of the expected future reward of taking a specific action (here, retrieving a specific memory item) in a given state
M-MDP: Memory-based Markov Decision Process—an extension of MDPs where the state includes a dynamic external memory bank
EvoKernel: The proposed framework: a self-evolving agent that drafts and refines kernels using value-driven memory
KernelBench: A benchmark suite for evaluating LLMs on GPU/NPU kernel generation tasks
mHC kernels: micro-Heterogeneous Computing kernels, specialized operators for DeepSeek architectures
epsilon-greedy: An exploration strategy where the agent chooses the best known action most of the time but selects a random action with probability epsilon to explore
PopArt: A normalization technique for rewards in RL (Preserving Outputs Precise Adaptive Robust Transformation) to handle varying reward scales