New-Tech Europe | February 2019

differently for each layer of the neural network to deliver the required performance with the maximum possible efficiency. Memory Architecture As well as improving compute efficiency by varying inferencing precision, configuring both the bandwidth and structure of programmable on- chip memories can further enhance the performance and efficiency of embedded AIs. A customized MPSoC can have more than four times the on- chipmemory, and six times thememory- interface bandwidth of a conventional compute platform running the same inference engine. The configurability of the memory allows users to reduce bottlenecks and optimize utilization of the chip’s resources. In addition, a typical subsystem has only limited cache integrated on-chip and must interact frequently with off-chip storage, which adds to latency and power consumption. In an MPSoC, most memory exchanges can occur on-chip, which is not only faster but also saves over 99% of the power consumed by off-chip memory interactions. Silicon Area Solution size is also becoming an increasingly important consideration, especially for mobile AI on-board drones, robots, or autonomous/self- driving vehicles. The inference engine implemented in the FPGA fabric of an MPSoC can occupy as little as one-eighth of the silicon area of a conventional SoC, allowing developers to build more powerful engines within smaller devices. Moreover, MPSoC device families can offer designers a variety of choices to implement the inference engine in the most power-, cost-, and size- efficient option capable of meeting system performance requirements. There are also automotive-qualified parts with hardware functional-safety features certified according to industry- standard ISO 26262 ASIL-C safety specifications, which is very important

optimize the implementation of their target inference engine. To meet this need, Xilinx continues to extend its ecosystem of development tools and machine-learning software stacks, and working with specialist partners to simplify and accelerate implementation of applications such as computer vision and video surveillance. Flexibility for the Future Leveraging the SoC’s configurability to create an optimal platform for an application at hand also gives AI developers flexibility to keep pace with the rapid evolution of neural network architectures. The potential for the industry to migrate to new types of neural networks represents a significant risk for platform developers. The reconfigurable MPSoC gives developers flexibility to respond to changes in the way neural networks are architected, by reconfiguring to build the most efficient processing engine using any contemporary state-of-the-art strategy. More and more, AI is being embedded in equipment such as industrial controls, medical devices, security systems, robotics and autonomous vehicles. Adaptive acceleration leveraging programmable logic fabric MPSoC devices holds the key to delivering the responsive and advanced functionality required to remain competitive.

for autonomous-driving applications. An example is Xilinx’s Automotive XA Zynq ® UltraScale+™ family, which contains a 64-bit quad-core ARM ® Cortex™-A53 and dual-core ARM Cortex-R5 based processing system alongside the scalable programmable logic fabric, giving the opportunity to consolidate control processing, machine-learning algorithms, and safety circuits with fault tolerance in a single chip. Today, an embedded inference engine can be implemented in a single MPSoC device, and consume as little as 2 Watts, which is a suitable power budget for applications such as mobile robotics or autonomous driving. Conventional compute platforms cannot run real-time CNN applications at these power levels even now, and are unlikely to be able to satisfy the increasingly stringent demands for faster response and more sophisticated functionality within more challenging power constraints in the future. Platforms based on programmable MPSoCs can provide greater compute performance, increased efficiency, and size/weight advantages at power levels above 15W, too. The advantages of such a configurable, multi-parallel compute architecture would be of academic interest only, were developers unable to apply them easily in their own projects. Success depends on suitable tools to help developers

New-Tech Magazine Europe l 27

Made with FlippingBook Online newsletter