Microsoft is deploying Nvidia’s new A100 Ampere GPUs across its data centers, and giving customers a massive AI processing boost in the process. Get it? In the– (a sharply hooked cane enters, stage right)
Ahem. As I was saying. The ND A100 v4 VM family starts with a single VM and eight A100 GPUs, but it can scale up to thousands of GPUs with 1.6Tb/s of bandwidth per VM. The GPUs are connected with a 200GB/s InfiniBand link, and Microsoft claims to offer dedicated GPU bandwidth 16x higher than the next cloud competitor. The reason for the emphasis on bandwidth is that the total available bandwidth often constrains AI model size and complexity.
Nvidia isn’t the only company with a new feather in its hat. Microsoft also notes that it built its new platform on AMD Epyc (Rome), with PCIe 4.0 support and third-generation NVLink. According to Microsoft, these advances should deliver an immediate 2x – 3x improvement in AI performance with no engineering work or model tuning. Customers who choose to leverage new features of Ampere like sparsity acceleration and Multi-Instance GPU (MIG) can improve performance by as much as 20x. According to Nvidia’s Ampere whitepaper, MIG is a feature that improves GPU utilization in a VM environment and can allow for up to 7x more GPU instances for no additional cost.
This feature is primarily aimed at Cloud Service Providers, so it’s not clear how Microsoft’s customers would benefit from it. But Nvidia does write that its Sparsity feature “can accelerate FP32 input/output data in DL frameworks in HPC, running 10x faster than V100 [Volta] FP32 FMA operations or 20x faster with sparsity.” There are a number of specific operations where Nvidia states that performance over Volta has improved by 2x to 5x in certain conditions, and the company has said the A100 is the greatest generational leap in its history.
Microsoft states that this ND A100 v4 series of servers is now in preview, but that they are expected to become a standard offering in the Azure portfolio.
Ampere’s generational improvements over Volta are important to the overall effort to scale up AI networks. AI processing is not cheap and the tremendous scale Microsoft talks about also requires tremendous amounts of power. The question of how to improve AI power efficiency is a… hot topic.
Later this year, AMD will launch the first GPU in its CDNA family. CDNA is the compute-tuned version of RDNA and if AMD is going to try to challenge Ampere in any AI, machine learning, or HPC markets, we’d expect the upcoming architecture to lead the effort. For now, Nvidia’s Ampere continues to own the vast majority of GPU deployments in the AI/ML space.
- Nvidia Could Bring Ampere to Gamers for Just $5 a Month
- GeForce Now Finally Comes to Chromebooks
- TSMC, Foxconn Reportedly Possibly Interested in ARM Acquisition