Skip to content
Pruna AI Customer Support Portal home
Pruna AI Customer Support Portal home

Llama 3.1-8B-Instruct

Llama-Juiced: The Fastest Llama 3.1-8B Endpoint (1.89× Faster than vLLM and Available on AWS)

  • In terms of cost efficiency, the Llama 3.1-8B (meta-llama/Llama-3.1-8B-Instruct) deployed with Pruna + vLLM is up to 3.31× more efficient than the vanilla base model, and up to 2.52× more efficient than the model deployed with vLLM alone.

  • In terms of inference speed, the Llama 3.1-8B (meta-llama/Llama-3.1-8B-Instruct) deployed with Pruna + vLLM is up to 2.6× more efficient than the vanilla base model, and up to 1.89× more efficient than the model deployed with vLLM alone.

Read the full benchmark: https://www.pruna.ai/blog/llama-juiced