New-Tech Europe Magazine | Q1 2023

a way that ensures that only the insights gained through the additional edge-based training are shared, and not an individual’s private data. All fielded machines can benefit from this additional training without compromising privacy. Federated learning has wide applicability in privacy-preserving device personalization, where the performance of vision and speech algorithms can be tailored to specific users. It also has applications in network security, where the collective learning of network ingress nodes can be used to discover proactive security rules without sharing sensitive private network traffic. The benefit of a unified cloud and edge compute architecture is that the model can be logically split to run in the cloud and on the edge using identical software binaries. The unified architecture ensures that compatible data formats are used and optimizations for data formats such as sparsity representations don’t break between cloud and edge. A scalable, unified architecture and continual learning throughout the lifetime of a deployed application departs from today’s conventional training and inference practice that relies on CPUs and GPUs in the data center and specialized devices in the edge. Yet this unified approach seems the most logical path if the industry wants to make large gains in performance, accuracy, and power efficiency as AI/ML becomes pervasive.

architecture. These efficient edge AI accelerators employ architectural innovations such as dataflow and on chip broadcast networks that permit data fetched from external memory to be reused many times once brought on-chip. There are real application examples where the existence of a unified scalable dataflow architecture for machine learning breaks down the wall between distinctive phases of training and inference. Federated learning is one such example, which unlocks new types of AI/ML workloads. For many connected applications, federated learning can supplant the one-way street approach of reduced-precision AI/ML inference models derived through one-time offline training, and unlock performance that might be difficult to achieve because the representative centralized offline training sets are unavailable. Federated learning exploits an important characteristic of inference at the edge, where devices are exposed to many diverse inputs that range far beyond the original model training sets. If properly designed, these edge devices can learn from these additional inputs and further improve their model accuracy during device deployment. There can be hundreds, thousands, or millions of edge devices that are all improving the same AI/ML models to provide better local answers or decision. For example, consider CT or MRI scanners made by the same vendor, distributed in hospitals around the world. These imaging devices are often tasked with finding cancer tumors and other problems, and can increasingly use AI/ ML models to help radiologists identify suspect tissues. As each machine in the field improves its model, the original trained model that’s being used to initialize new imaging equipment can benefit from the same improvements if federated learning is employed to update and improve the original model. Such updates can be performed in

AI/ML models. Such training almost universally employs floating-point data formats with high dynamic range to maximize model accuracy by allowing tiny incremental adjustment to model weights. Floating-point computations consume more power and therefore require additional cooling. In addition, CPUs and GPUs expend considerable amounts of power to move large training data sets between memory and their internal computing elements. Most edge inference chips cannot afford the silicon or the power consumption to perform all calculations using full precision, floating-point data formats. Many make compromises to attain high peak TFLOPS and TOPS metrics, often by employing data types with less precision to represent AI/ML weights, activations, and data. Vendors of edge AI/ML chips provide software tools to reduce the precision of the trained model weights, converting models to smaller number formats such as FP8, scaled integers, or even binary data formats. Each of these smaller data formats deliver advantages for edge inference workloads, but all these formats lose some amount of model accuracy. Retraining AI/ML models with reduced precision can often reclaim some of that accuracy. Now imagine that you have a scalable device architecture that can be deployed in small, embedded edge devices and in larger devices capable of aggregating workloads running in the data center. Those same optimizations that improve power consumption and cost efficiency at the edge also make compute in the data center denser and more cost efficient, which lowers the facility’s capital and operating expenses for both inference and training. AI/ML accelerator scalable architectures that support both full- and reduced precision floating point formats break down the artificial boundary between training and inference and enable the deployment of the same standard and familiar software tools for a unified

Ivo Bolsens, Senior VP, AMD

New-Tech Magazine Europe l 27

Made with FlippingBook - professional solution for displaying marketing and sales documents online