TensorRT-LLM evaluation of the new H200 GPU achieves 11,819 tokens/s on Llama2-13B

github.com

TensorRT-LLM evaluation of the new H200 GPU achieves 11,819 tokens/s on Llama2-13B

github.com

noneabove1182@sh.itjust.worksM to

LocalLLaMA@sh.itjust.worksEnglish · 2 years ago

H200 is up to 1.9x faster than H100. This performance is enabled by H200’s larger, faster HBM3e memory.

https://nvidianews.nvidia.com/news/nvidia-supercharges-hopper-the-worlds-leading-ai-computing-platform

You must log in or register to comment.

Chat

LocalLLaMA@sh.itjust.works

localllama@sh.itjust.works

Create a post

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !localllama@sh.itjust.works

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

1 user / day
83 users / week
498 users / month
1.03K users / 6 months
1 local subscriber
2.96K subscribers
255 Posts
1.2K Comments
Modlog