Llama 3.1-8B-Instruct
Llama-Juiced: The Fastest Llama 3.1-8B Endpoint (1.89× Faster than vLLM and Available on AWS)

In terms of cost efficiency, the Llama 3.1-8B (meta-llama/Llama-3.1-8B-Instruct) deployed with Pruna + vLLM is up to 3.31× more efficient than the vanilla base model, and up to 2.52× more efficient than the model deployed with vLLM alone.
In terms of inference speed, the Llama 3.1-8B (meta-llama/Llama-3.1-8B-Instruct) deployed with Pruna + vLLM is up to 2.6× more efficient than the vanilla base model, and up to 1.89× more efficient than the model deployed with vLLM alone.
Read the full benchmark: https://www.pruna.ai/blog/llama-juiced