SmolLM2-135M-Instruct
Regarding latency, the 4-bit AWQ config offers the fastest inference, with 62.2ms sync latency—a 2× speedup over the base (129.6ms) and significantly lower than most others.
Regarding memory savings, 2-bit and 4-bit GPTQ configs reduce inference memory from 772MB to just 94MB, an 8.2× reduction, while staying close to baseline quality.
In terms of emissions, the AWQ 4-bit config also leads with the lowest CO₂ footprint at 0.000043, cutting emissions by ~55% compared to the base (0.000095).

Try it on your setup.
smash_config["quantizer"] = "quanto" smash_config["device"] = "cpu"
Read the complete benchmark: https://www.pruna.ai/blog/smollm2-smaller-faster