24.2 C
New York
Wednesday, June 12, 2024

MLPerf Coaching Outcomes Showcase Unprecedented Efficiency, Elasticity

The total-stack NVIDIA accelerated computing platform has as soon as once more demonstrated distinctive efficiency within the newest MLPerf Coaching v4.0 benchmarks.

NVIDIA greater than tripled the efficiency on the big language mannequin (LLM) benchmark, based mostly on GPT-3 175B, in comparison with the record-setting NVIDIA submission made final 12 months. Utilizing an AI supercomputer that includes 11,616 NVIDIA H100 Tensor Core GPUs linked with NVIDIA Quantum-2 InfiniBand networking, NVIDIA  achieved this exceptional feat by way of bigger scale — greater than triple that of the three,584 H100 GPU submission a 12 months in the past — and in depth full-stack engineering.

Due to the scalability of the NVIDIA AI platform, Eos can now practice large AI fashions like GPT-3 175B even sooner, and this nice AI efficiency interprets into vital enterprise alternatives. For instance, in NVIDIA’s current earnings name, we described how LLM service suppliers can flip a single greenback invested into seven {dollars} in simply 4 years working the Llama 3 70B mannequin on NVIDIA HGX H200 servers. This return assumes an LLM service supplier serving Llama 3 70B at $0.60/M tokens, with an HGX H200 server throughput of 24,000 tokens/second.

NVIDIA H200 GPU Supercharges Generative AI and HPC 

The NVIDIA H200 Tensor GPU builds upon the power of the Hopper structure, with 141GB of HBM3 reminiscence and over 40% extra reminiscence bandwidth in comparison with the H100 GPU. Pushing the boundaries of what’s potential in AI coaching, the NVIDIA H200 Tensor Core GPU prolonged the H100’s efficiency by as much as 47% in its MLPerf Coaching debut.

NVIDIA Software program Drives Unmatched Efficiency Positive aspects

Moreover, our submissions utilizing a 512 H100 GPU configuration at the moment are as much as 27% sooner in comparison with only one 12 months in the past attributable to quite a few optimizations to the NVIDIA software program stack. This enchancment highlights how steady software program enhancements can considerably increase efficiency, even with the identical {hardware}.

This work additionally delivered practically good scaling. Because the variety of GPUs elevated by 3.2x — going from 3,584 H100 GPUs final 12 months to 11,616 H100 GPUs with this submission — so did the delivered efficiency.

Study extra about these optimizations on the NVIDIA Technical Weblog.

Excelling at LLM Advantageous-Tuning

As enterprises search to customise pretrained massive language fashions, LLM fine-tuning is changing into a key trade workload. MLPerf launched a brand new LLM fine-tuning benchmark this spherical, based mostly on the favored low-rank adaptation (LoRA) method utilized to Meta Llama 2 70B.

The NVIDIA platform excelled at this job, scaling from eight to 1,024 GPUs, with the largest-scale NVIDIA submission finishing the benchmark in a report 1.5 minutes.

Accelerating Steady Diffusion and GNN Coaching

NVIDIA additionally accelerated Steady Diffusion v2 coaching efficiency by as much as 80% on the identical system scales submitted final spherical. These advances replicate quite a few enhancements to the NVIDIA software program stack, showcasing how software program and {hardware} enhancements go hand-in-hand to ship top-tier efficiency.

On the brand new graph neural community (GNN) check based mostly on R-GAT, the NVIDIA platform with H100 GPUs excelled at each small and huge scales. The H200 delivered a 47% increase on single-node GNN coaching in comparison with the H100. This showcases the highly effective efficiency and excessive effectivity of NVIDIA GPUs, which make them best for a variety of AI functions.

Broad Ecosystem Help

Reflecting the breadth of the NVIDIA AI ecosystem, 10 NVIDIA companions submitted outcomes, together with ASUS, Dell Applied sciences, Fujitsu, GIGABYTE, Hewlett Packard Enterprise, Lenovo, Oracle, Quanta Cloud Expertise, Supermicro and Sustainable Steel Cloud. This broad participation, and their very own spectacular benchmark outcomes, underscores the widespread adoption and belief in NVIDIA’s AI platform throughout the trade.

MLCommons’ ongoing work to deliver benchmarking greatest practices to AI computing is important. By enabling peer-reviewed comparisons of AI and HPC platforms, and conserving tempo with the speedy modifications that characterize AI computing, MLCommons gives firms in every single place with essential information that may assist information vital buying choices.

And with the NVIDIA Blackwell platform, next-level AI efficiency on trillion-parameter generative AI fashions for each coaching and inference is coming quickly.

Supply hyperlink

Related Articles


Please enter your comment!
Please enter your name here

Latest Articles