Context attributes: Long text fields describing general product info (e.g., Description, Title).
Feature attributes: Short text fields describing specific properties (e.g., Color, Size, Flavor).
Hierarchical attention: A mechanism that first attends locally within cells and then aggregates information across cells, avoiding full N^2 attention over the whole row.
Dilated attention: An attention pattern with gaps (strides) allowing a token to attend to distant tokens without processing every intermediate one, effectively increasing the receptive field.
[COL] token: A special token prepended to each attribute value to serve as the aggregate representation for that specific cell/attribute.
PR AUC: Area Under the Precision-Recall Curve, a metric suitable for imbalanced classification tasks like error detection where errors are rare.