Example: best combination for LLMs

Currently, one of the base Smash_Config for the LLMs is:

Disclaimer: the "best" config always depends on your use case.


smash_config = SmashConfig(cache_dir_prefix="/efs/smash_cache")
smash_config.add_tokenizer(model_name)
smash_config['quantizer'] = 'hqq'
smash_config["hqq_weight_bits"] = 4
smash_config['compiler'] = 'torch_compile'
smash_config['torch_compile_fullgraph'] = True
smash_config['torch_compile_dynamic'] = True
smash_config['hqq_compute_dtype'] = 'torch.bfloat16'
smash_config._prepare_saving = False

This is currently used on Qwen3-32B on Replicate: https://replicate.com/prunaai/qwen3-32b

Do not hesitate to contact us (support@pruna.ai) for assistance deploying this configuration in your environment.