NVLink and NVSwitch

                          The Building Blocks of Advanced Multi-GPU Communication

                          特级毛片www

                          How NVLink and NVSwitch Work Together

                          NVLink

                          Tesla V100 with NVLink GPU-to-GPUconnections

                          NVSwitch

                          All-to-all communication between 16 GPUs

                          NVLink Maximizes System Throughput

                          NVIDIA NVLink technology addresses interconnect issues by providing higher bandwidth, more links, and improved scalability for multi-GPU system configurations. A single NVIDIA Tesla? V100 GPU supports up to six NVLink connections for a total bandwidth of 300 gigabytes per second (GB/sec)—10X the bandwidth of PCIe Gen 3. Servers like the NVIDIA DGX-1? and DGX-2 take advantage of this technology to give you greater scalability for ultrafast deep learning training.

                          NVIDIA NVLink Performance Since 2014

                          Highest Levels of GPU-to-GPU Acceleration

                          First introduced with the NVIDIA Pascal? architecture, NVLink on Tesla V100 has increased the signaling rate from 20 to 25 GB/s in each direction. This direct communication link between two GPUs, improves accuracy and convergence of high-performance computing (HPC) and AI and achieves speeds over an order of magnitude faster than PCIe.

                          NVLink Connecting Eight Tesla V100 Accelerators in a Hybrid Cube Mesh Topology as Used in the DGX-1V Server

                          New Levels of Performance

                          NVLink can bring up to 70 percent more performance to an otherwise identically configured server. Its dramatically higher bandwidth and reduced latency enables even larger deep learning workloads to scale in performance as they grow.

                          NVLink Delivers Up To 70% Speedup vs PCIe

                          NVLink Delivers Up To 70% Speedup vs PCIe

                          NVLink: GPU Servers: Dual Xeon Gold 6140@2.30GHz or E5-2698 v4@3.6GHz for PyTorch with 8xV100 PCIe vs 8xV100 NVLink. SW benchmarks: MILC (APEX medium). HOOMD-Blue (microsphere), LAMMPS (LJ 2.5).

                          NVSwitch

                          NVSwitch: The Fully Connected NVLink

                          The rapid adoption of deep learning has driven the need for a faster, more scalable interconnect, as PCIe bandwidth often creates a bottleneck at the multi-GPU system level.

                          NVIDIA NVSwitch builds on the advanced communication capability of NVLink to solve this problem. It takes deep learning performance to the next level with a GPU fabric that enables more GPUs in a single server and full-bandwidth connectivity between them.

                          Full Connection for Unparalleled Performance

                          NVSwitch is the first on-node switch architecture to support 16 fully connected GPUs in a single server node and drive simultaneous communication between all eight GPU pairs at an incredible 300 GB/s. These 16 GPUs can be used as a single large-scale accelerator with 0.5 terabyte of unified memory space and 2 petaFLOPS of deep learning compute power. A single HGX-2 or DGX-2 system with NVSwitch delivers up to 2.7X more application performance than 2 HGX-1 or DGX-1 systems connected with InfiniBand.

                          NVSwitch Delivers a >2X Speedup for Deep Learning and HPC

                          2 HGX-1V servers have dual-socket Xeon E5 2698v4 Processor, 8X V100 GPUs. Servers connected via 4X 100 Gb IB ports (run on DGX-1) | HGX-2 server has dual-socket Xeon Platinum 8168 Processor, 16X V100 GPUs, NVSwitch (run on DGX-2).

                          NVIDIA HGX-2

                          Explore the world’s most powerful accelerated server platform for deep learning, machine learning, and HPC.