FIZLE: Framework for Instructed Zero-shot Counterfactual Generation with LanguagE Models—the authors' proposed pipeline
Label Flip Score: The percentage of generated counterfactuals that successfully change the black-box classifier's prediction
Levenshtein distance: A metric measuring the minimum number of single-character edits required to change one string into another
Universal Sentence Encoder (USE): A model used to compute semantic similarity between the original text and the generated counterfactual in a latent embedding space
Polyjuice: A baseline counterfactual generation method that uses a language model trained on control codes
BAE: A baseline adversarial attack method using BERT-based masked language modeling to perturb text
CheckList: A baseline testing methodology using templates and masked language models for behavioral testing
Hard-prompting: Using fixed, discrete textual templates as prompts (as opposed to learnable soft prompts)
DistilBERT: A smaller, faster, cheaper, and lighter version of BERT used as the black-box classifier to be explained in the experiments