Example: best combination for LLMs
Currently, one of the base Smash_Config for the LLMs is:
Disclaimer: the "best" config always depends on your use case.
smash_config = SmashConfig(cache_dir_prefix="/efs/smash_cache") smash_config.add_tokenizer(model_name) smash_config['quantizer'] = 'hqq' smash_config["hqq_weight_bits"] = 4 smash_config['compiler'] = 'torch_compile' smash_config['torch_compile_fullgraph'] = True smash_config['torch_compile_dynamic'] = True smash_config['hqq_compute_dtype'] = 'torch.bfloat16' smash_config._prepare_saving = False
This is currently used on Qwen3-32B on Replicate: https://replicate.com/prunaai/qwen3-32b
Do not hesitate to contact us (support@pruna.ai) for assistance deploying this configuration in your environment.