S-expression: A nested logical form structure used to represent queries (e.g., in GrailQA), convertible to SPARQL
MCTS: Monte Carlo Tree Search—a heuristic search algorithm that builds a search tree by repeatedly simulating outcomes to find optimal decisions
UCT: Upper Confidence Bound applied to Trees—a formula used in MCTS to balance exploring less-visited nodes and exploiting high-scoring nodes
ReAct: Reasoning and Acting—a paradigm where LLMs generate reasoning traces ('Thoughts') and executable actions ('Actions') interleaved
SimCSE: A contrastive learning framework for sentence embeddings, used here to match generated relation names with actual KB relations
Incremental fine-tuning: Iteratively updating the model on data it generated itself (self-training), improving the policy and reward models over rounds
GrailQA: A large-scale KBQA dataset known for its compositional generalization and difficulty
SFT: Supervised Fine-Tuning—training a pre-trained model on labeled examples