Global technology portfolio managers Alison Porter, Graeme Clark and Richard Clode discuss the recent launch of Ampere, NVIDIA’s latest graphics accelerator, which looks set to be a gamechanger for the development of cloud computing, artificial intelligence and gaming graphics.

  Key takeaways

  • Ampere plays a key role in NVIDIA’s vision to move computing activities from the server to the data centre level, significantly enhancing efficiency.
  • NVIDIA sees Ampere as a universal cloud workload accelerator, that can efficiently perform both AI training and inferencing in one system.
  • Ampere will enable huge improvements in ray tracing (life-like graphics), including the use of an AI engine to enhance performance.

NVIDIA’s CEO being forced to deliver a ‘kitchen keynote’ to formally launch Ampere, the company’s new graphics processing unit (GPU), was yet another first in the new normal COVID-19 world. However, that did not detract from what is one of the most important semiconductor launches in recent years that has significant implications for artificial intelligence (AI), cloud computing, gaming and Moore’s Law.

Ampere comes more than three years after NVIDIA launched Volta in late 2017. Volta’s performance leap and optimisation for AI was a game changer for the adoption of graphics processors to accelerate workloads in the cloud and for AI training. NVIDIA became the de facto standard for AI training given the combination of its superior hardware as well as a decade’s worth of investment in its proprietary software stack, CUDA. The launch of Volta led to a key inflection in AI and since the launch of Volta, NVIDIA’s data centre revenues have doubled in three years to more than US$1bn a quarter.

Compute (computing activities) and AI are inexorably linked, one enables demand for the other and vice versa. The inflection in AI we have seen in recent years was enabled by compute performance reaching a level and a cost that made neural networks and deep learning viable. However, new more complex neural networks like the BERT models used for natural language processing that enable Amazon Alexa and Google Home are significantly more complex and larger than prior models. This requires a next generation processor to meet those requirements and unlock the next wave of AI innovation.

chip, microchip. semiconductor, semi, GPU, technology

Source: Getty Images.

What’s inside Ampere

Ampere is that giant stride forward that could potentially drive yet another inflection in AI. Utilising Moore’s Law to increase the density of transistors, Ampere packs 54 billion transistors onto a chip roughly the same size as Volta that had only 21 billion. However, indicative of the challenges of Moore’s Law, NVIDIA is utilising Taiwan Semiconductor Manufacturing Co’s (TSMC) 7nm manufacturing process and not the leading edge 5nm that Apple is using this year for the new iPhone. Moore’s Stress is forcing semiconductor companies to turn to architectural improvements to continue to drive the performance upgrades customers demand. For Ampere, NVIDIA is utilising TSMC’s CoWoS packaging technology to better integrate next generation high bandwidth memory as well as Infiniband fabric from recently acquired Mellanox. In both cases these higher speed interfaces reduce bottlenecks of moving large data sets between the processors or from memory to the processors.

Moving compute from server to data centre

The major new feature of Ampere that has significant implications is the ability of this GPU to virtualise to up to seven different threads. This is a key enabler of NVIDIA’s vision to move compute from the server to the data centre level. In the same way VMWare virtualised servers with their software, NVIDIA envisages a world where virtualised hardware and software enable a hyperscaler (such as Google, Facebook and Amazon that can achieve huge scale in computing – typically for big data or cloud computing) to run any workload anywhere in its data centre to maximise efficiency.

Customising servers for a specific workload will be a thing of the past. NVIDIA sees Ampere as a universal cloud workload accelerator. As part of that, Ampere performs both AI training and inferencing in one system. The heavy performance requirements of AI training with incredibly complex neural networks in the past required a different processor compared to more lightweight requirements for AI inferencing where the output of that AI training model is applied to the real world. Ampere can perform both functions efficiently for the first time as the inferencing is virtualised bringing to bear the equivalent performance of a Volta chip to up to 56 users all within one Ampere system.

Slashing the cost of compute

Moore’s Law has been a key driver of technology share gains for decades. The ability to provide twice the compute for the same cost is the exponential curve at the heart of all technological innovation. NVIDIA’s Ampere is the next major iteration of that principle, and delivers much more than that. Ampere offers up to 20 times superior performance for workloads as Moore’s Law is married with wider hardware innovation as well as software innovation both in terms of the virtualisation but also within CUDA, their proprietary programming language used by most AI researchers. NVIDIA introduced CUDA sparsity support with Ampere whereby complex AI models can run much faster by hiding less important model weights (that determine how much influence an input has on the output) during the iterative process, reducing the amount of calculation required. The outcome of bringing the full might of NVIDIA’s broad array of technology to bear is very impressive. Multiple examples were offered of Ampere powered A100 systems doing the same work of a Volta-based V100 for 1/10th of the cost and 1/20th of the power consumption. Visually, that is taking a room full of server racks and replacing them with just one server rack. As NVIDIA’s CEO put it ‘the more you buy, the more you save’ and the company anticipates Ampere to be a major driver of reducing the cost of cloud compute and AI development and deployment. Consequently, orders for Ampere have already been taken from major hyperscalers including Amazon Web Services, Microsoft Azure, Google Cloud, Alibaba Cloud and Tencent Cloud.

Taking gaming graphics to the next level

While NVIDIA did not announce a gaming product based on Ampere, this is expected later this year. In the keynote address, NVIDIA’s CEO referenced the huge improvements Ampere can make to ray tracing. Ray tracing is the ability to create incredibly life-like graphics integrating full light refraction. This requires vast amounts of processing power and was used in computer-generated imagery (CGI) blockbuster Hollywood films. Volta enabled ray tracing to be brought to PC games for the first time as the compute performance could be housed on a standard gaming card at a reasonable cost. Ampere will take this to the next level and NVIDIA has also used an AI engine to enhance the ray tracing performance by learning what a much higher resolution image looks like as well as the motion vectors of live game video graphics to anticipate what pixels need to be displayed in a future image. After two years of ray tracing penetration, all the key game development engines now support ray tracing as well as hit games like Minecraft, Call of Duty: Modern Warfare and Battlefield V plus the new PlayStation 5 console launching later this year. Future Ampere gaming cards may take ray tracing PC games to the next level while virtualised Amperes are seen to have a potential key role to play in the future of cloud gaming services.

Conclusion

Whether you are a gamer, use online services like Amazon, Netflix or Spotify that are built in the cloud or interact with AI services like Amazon Alexa, many industry experts see the launch of NVIDIA’s Ampere as a major step forward that has the capability to make these services better and cheaper.

 

 

Glossary:

GPU: performs complex mathematical and geometric calculations that are necessary for graphics rendering.

Moore’s Law: predicts that the number of transistors that can fit onto a microchip will roughly double every two years, therefore decreasing the relative cost and increasing performance.

Moore’s Stress: refers to the long-held notion that the processing power of computers increases exponentially every couple of years has hit its limit. As the scale of chip components gets increasingly closer to that of individual atoms, it is now more expensive and more technically difficult to double the number of transistors and as a result the processing power for a given chip every two years. 

Natural language processing: a branch of AI that helps computers understand, interpret and manipulate human language. NLP draws from many disciplines, including computer science and computational linguistics, in its pursuit to fill the gap between human communication and computer understanding.

Virtualisation: refers to a simulated, or virtual, computing environment rather than a physical environment. Virtualisation often includes computer-generated versions of hardware, operating systems, storage devices and more. This allows organisations to partition a single physical computer or server into several virtual machines. Each virtual machine can then interact independently and run different operating systems or applications while sharing the resources of a single host machine.

Workload: the amount of processing that a computer has been given to do at a given time.

AI training: aka machine learning, is a subset of AI that allows computer systems to automatically learn and improve, without being programmed by a human.

Deep learning/neural network: a subset of machine learning, it is a series of algorithms that aims to recognise underlying relationships in a set of data through a process that imitates the way the human brain operates.

AI inference: refers to artificial intelligence processing. Whereas machine learning and deep learning refer to training neural networks, AI inference applies knowledge from a trained neural network model and uses it to infer a result.

Sparsity: is used to accelerate AI by reducing the mounds of matrix multiplication deep learning requires, shortening the time to good results.