CrowS-Pairs: Crowdsourced Stereotype Pairs—a benchmark dataset of sentence pairs (one stereotypical, one anti-stereotypical) used to measure social bias in language models
WinoQueer: A benchmark dataset derived from the Winograd Schema format, specifically designed to test for anti-LGBTQ+ bias in language models
PLM: Pretrained Language Model—models like BERT or GPT trained on vast amounts of text data
siya: The third-person singular pronoun in Filipino, which is inherently gender-neutral (covering he/she)
bakla/bading: Filipino terms for male individuals with female identities/expressions; often encompasses gay, queer, nonbinary, or trans categories in Western parlance
tomboy: Filipino term often referring to non-heterosexual women, transmen, or butch lesbians
masked language model: Models like BERT that are trained to predict missing (masked) words in a sentence
causal language model: Models like GPT that are trained to predict the next token in a sequence