LLM: Large Language Model—a neural network trained on vast text data to generate human-like text.
Few-shot prompting: Providing a model with a few examples of a task (input-output pairs) in the prompt to guide its generation.
In-context learning: The ability of a model to learn a task from examples provided in the prompt without updating its weights.
Greedy decoding: A generation strategy where the model always chooses the most probable next word.
RLHF: Reinforcement Learning from Human Feedback—training a model using rewards derived from human preferences.
Vicuna-13B: An open-source chatbot model fine-tuned from LLaMA on user-shared conversations.
CODEX: A version of GPT-3 fine-tuned on code (code-davinci-002).
GPT-3.5: Refers specifically to text-davinci-003 or gpt-3.5-turbo in this paper.
GSM8K: A benchmark dataset of high school math word problems.
CodeNet: A large-scale dataset for code tasks, used here for code readability.