PET: Pattern-Exploiting Training—a semi-supervised approach that reformulates classification as a cloze-style (fill-in-the-blank) language modeling task
PETAL: PET with Automatic Labels—a variant of PET that automatically finds the best verbalizer (token mapping) for labels, reducing manual engineering
Further-pretraining: Taking an existing pretrained language model and training it further on domain-specific unlabeled text (e.g., clinical letters) before fine-tuning
Verbalizer: A mapping function in prompting that converts a class label (e.g., 'Positive') into a token in the model's vocabulary (e.g., 'good')
Shapley values: A game-theoretic method to attribute the contribution of each input feature (token) to the final model prediction
Cloze question: A test where a participant is asked to supply a missing word, used here as the prompt format (e.g., '... This is [MASK].')
Sequence Classifier (SC): A standard BERT-based classification approach adding a linear layer on top of the [CLS] token, used here as the baseline
gbert: A German-language BERT model pretrained on general domain text
medbertde: A German-language BERT model pretrained on medical and clinical text