EfficientViT: A ViT variant that replaces quadratic Softmax attention with linear Softmax-free attention and uses BatchNorm/Hardswish for hardware efficiency.
Softmax-free Attention: Attention mechanism using ReLU and linear matrix multiplication properties instead of Softmax, reducing complexity from quadratic to linear.
MBConv: Mobile Inverted Bottleneck Convolution, a building block containing pointwise and depthwise convolutions.
PTQ: Post-Training Quantization—converting a model to low-precision integers without full re-training.
Channel-wise Migration: A technique to shift the scaling burden from quantization-sensitive activation channels to weight channels in depthwise convolutions.
Log2 Quantization: A non-uniform quantization method used for divisors in linear attention to handle wide dynamic ranges and sensitivity of small values.
DSP: Digital Signal Processor—specialized hardware blocks on FPGAs used for high-speed arithmetic like multiplication.
FPS: Frames Per Second—a metric for processing speed.
GELU: Gaussian Error Linear Unit—a non-linear activation function common in standard ViTs, often replaced by Hardswish in efficient models.