Technical Article

Powering High Performance Computing

November 16, 2021 by Paul Yeaman

Factorized Power Architecture enables next-generation processors to achieve their full potential. High-performance processors require higher steady-state and peak currents with dramatically increasing slew rates, while operating at lower voltages with an increasing number of high-speed I/Os. This trend is accelerating and continually challenging power system designers to ensure delivery of adequate power to the processor core with low loss in the Power Delivery Network (PDN).

Conventional approaches utilizing multiphase buck regulators are becoming significantly challenged, rendering a new approach necessary to keep pace.

Vicor’s Factorized Power Architecture (FPA™) is a departure from the common multiphase methods and uniquely addresses each of the challenges facing VR developments for new processor technologies. FPA also enables Lateral Power Delivery (LPD) and Vertical Power Delivery (VPD) PCB deployment options. The VPD solution reduces losses by up to 95% and eliminates bottlenecks by freeing up 100% of the processor perimeter.

Power Demands Doubling

The rapid advancements in artificial intelligence (AI) are being enabled by advanced GPU’s and specialized AI processors utilizing the most advanced silicon process nodes at 7nm, 5nm, and rapidly on their way to 3nm. Nominal core operating voltages at these process nodes are currently between 0.75V and 0.85V. To meet the performance workloads that AI demands, nominal current consumption has increased, with currents exceeding 600A steady-state and 1000 peak. The trend is a 2x increase from just two years ago and is continuing to rise at a similar rate.

The decrease in voltage and increase in current presents two problems. First, the increase in current exacerbates the copper losses in motherboard PCB’s as copper planes and vias connecting the VR to the processor have a constrained resistance. Increasing the processor current increases the losses by the current squared; lowering efficiency and contributing significant additional heat to the processor thermal management system. Secondly, the voltage drop across the PDN is proportional to the increase in current. As core voltage decreases, the effect of this voltage drop has an outsized impact on processor performance.

For example, a core load of 400W with a 100µΩ trace that results in an undershoot of 4% on a 1V rail becomes 7% on a 0.75V rail – nearly 2x greater. The physical constraints of the power delivery network render limited options for reducing that resistance. Adding copper layers or increasing copper thickness to the motherboard will result in lower resistance, but to achieve the same 4% voltage drop at 0.75V, the trace resistance would have to decrease by almost half. Doubling the amount of copper for carrying high current is typically not possible for reasons of cost or physical limitation.

In short, the best solution is to position VR closer to the processor.

While it sounds simple, it is complex to implement. First, there are signal integrity challenges. Moving the hard-switching multiphase VR closer to the processor brings whatever inherent noise the VR has with it. The problem is further compounded by the number of discrete phases needed.

A second challenge is the footprint of the VR. A typical processor package is 60 x 60mm. While that seems large, it is important to note that most of that area is dedicated to I/O. All of the heat is generated in the core, and all of the high currents must find a way to it.

Figure 1. A typical processor package, shown in red, is 60 x 60mm. All of the current is consumed by the core at the center. PCB resistive losses and parasitic capacitance/inductance in the path to the core are what is called the “last inch” and are the limiting factor to ensuring maximum processor performance. Image courtesy of Bodo's Power Systems magazine.

This means that even if the VR is positioned adjacent to the edge of the package, there is still a significant distance that the high current must travel to get to the core. In the typical VR approach, higher current requires more phases. Since most multiphase VRs are discrete devices, the inductor, and switching stage must be laid out individually—and in most cases cooled individually as well. Therefore, more phases mean a larger VR that increases the challenge for close placement near the processor.

Factorized Power Architecture Unlocks New Levels of Power Efficiency

Factorized Power Architecture (FPA™) is based on the fundamental principle of dividing a power converter into two primary functions, optimizing each separately and then implementing those functions as a system. The two functions are regulation and current multiplication.

Regulation

The efficiency of a regulator is inversely proportional to the work performed – the more work, the lower the efficiency. The closer the input and output voltages of a regulator are to each other, the less work is performed and the higher the efficiency becomes. By virtue of its position in the system, FPA minimizes the regulator’s input to output voltage differential. The PRM™ regulator is implemented using a Zero-Voltage Switching (ZVS) Buck-Boost topology, which features high efficiency where the input and output voltage difference is small. ZVS greatly reduces switching losses, enabling high-frequency operation and greatly reducing converter size. The PRM typically regulates an input between 40 and 60V to an output voltage between 30 and 50V.

Current Multiplication

The PRM is followed by a second stage performing a voltage step-down and current step-up function. This is implemented using the Sine Amplitude Converter topology in a device called a VTM™ Current Multiplier. The VTM’s behavior can be realized as an ideal transformer, where the input and output voltage are related by a fixed ratio and the device impedance remains low (hundreds of µΩ) beyond 1MHz.

Since there is no energy storage in the VTM, it can provide large amounts of power if it is kept sufficiently cool. This allows for matching the power capability of the VTM with the thermal capability of the processor. Together, the PRM and VTM form the building blocks of FPA.

Figure 2. PRM™ and VTM™ are the building blocks of FPA. PRMs are selected based on the system input voltage range and power requirements; VTMs are selected based on the output voltage range and current requirements. The PRM can be mounted anywhere in the system where convenient; the VTM should be mounted as close to the processor core as possible. Image courtesy of Bodo's Power Systems magazine.

One is dedicated to regulation followed another dedicated to transformation.

SM-ChiP Package Reduces Noise and Improves Thermals

While the topology and architecture used to implement a high-performance regulator are important, of equal importance is the packaging technology. Vicor’s SM-ChiP package integrates everything – passives, magnetics, FETs, and control – into a single device. Moreover, this package is engineered to enable the most efficient extraction of current at the lowest thermal impedance to facilitate cooling.

Many SM-ChiPs also include grounded metal shielding over a significant surface of the device. This serves not only to facilitate cooling but also to localize high-frequency parasitic currents to keep them from propagating outside the device.

Figure 3. Typical PRM/VTM Factorized Power Architecture solution supporting main rail power on an AI accelerator card. Placing VTMs on opposite sides of the socket divides current flowing through the power delivery network to the processor in half and reduces losses by 50%. Image courtesy of Bodo's Power Systems magazine.

Lateral Power Delivery Cuts PDN Losses by 50%

To provide flexibility for a wide variety of application implementations, Vicor has developed PRMs and VTMs with power level granularity that enables flexibility to support reducing PDN loss. For example, implementing a single higher-current VTM as two smaller lower-current VTMs allows for placement on opposite sides of the processor socket. This reduces power delivery network losses by 50%, dividing the current in half and adding a separate path to the core area.

At 2.8mm, the VTM is thinner than many of the mechanical processor support elements such as package sockets, stiffeners, and heat sink attachment hardware. Locating the VTM under these elements couples them to the processor thermal management system and eliminates the need for a dedicated VTM heat sink while at the same time reducing power delivery network losses by locating the current multiplier closer to the core.

Figure 4. In a typical multiphase VR, the phase inductor height typically limits its proximity to the processor and separate cooling is required for the DRMOS stage. The low profile of the VTM allows it to move under the processor heat sink and associated hardware, while PRM can be placed farther away without loss in performance. Image courtesy of Bodo's Power Systems magazine.

These are examples of lateral power delivery (LPD). In LPD, the current multiplier is located on the processor side of the motherboard and the current flows laterally from the VR to the processor. This presents an inherent loss no matter how close the current multiplier is to the processor core.

The solution to this inherent loss is vertical power delivery (VPD). In VPD, the current multiplier is located on the opposite side of the processor, directly underneath it

Figure 5. Vertical power delivery (VPD) with GTM Geared Current Multiplier placed underneath processor maximizing power delivery performance. The VPD solution also relieves the processor top-side periphery for options including higher I/O routing, onboard memory, or tighter processor clustering. Image courtesy of Bodo's Power Systems magazine.

Significantly reducing PDN losses by reducing the distance the current travels through the motherboard. VPD needs two key features to achieve this function.

First, the area directly under the processor contains high-frequency capacitors which are necessary to decouple very high-frequency currents (>10MHz) from the rest of the system. Secondly, for maximum efficiency, the physical location, and pattern of the current exiting the VPD solution must exactly mirror the location and pattern of the processor core power inputs. This enables the high-current flow to achieve a true “vertical” profile.

To achieve these features, the Vicor VPD solution consists of VTM Current Multipliers implemented with a gearbox to comprise a GTM™ Geared Current Multiplier. The gearbox performs two functions: it incorporates the high-frequency decoupling capacitance and redistributes the current from the VTM into a pattern mirroring the processor above it. The VTM array in a GTM is sized based on the processor output current requirement and the gearbox BGA pattern is based on the processor. In this way, the GTM represents the combination of both a standard (VTM) and a customer (gearbox) solution.

A Better Way for High-performance Computing Power

The implementation of Factorized Power Architecture LPD and VPD solutions using SM-ChiP packaging enables sweeping reductions in power delivery network losses for low-voltage, high-current processors. As processor current requirements continue to climb to 1000A and beyond, the Vicor Factorized Power Architecture will be able to deliver lower core voltages and higher core currents while providing lower power delivery network losses and higher system efficiency.

This article originally appeared in Bodo’s Power Systems magazine.