New-Tech Europe | June 2017

Previous Page

Next Page

Page Background

vision c5 dsp block diagramThe

Vision C5 DSP neural network

processor is:

A complete, standalone DSP that

runs all layers of CNN (convolution,

fully connected, normalization,

pooling...)

A DSP for the fast-changing neural

network field: programmable and

future-proof

Performance of 1 TMAC/s (trillion

multiply-accumulates per second)

1024 8-bit MACs or 512 16-bit MACs

for exceptional performance at both

resolutions

128-way, 8-bit SIMD or 64-way, 16-

bit SIMD VLIW architecture

Not a hardware accelerator to pair

with a vision DSP, rather a dedicated

neural network optimized processor

Architected for multi-processor

design—scales to multi-TMAC/s

solutions

Same proven software tool set as the

Vision P5 and P6 DSPs

<1mm2 in 16nm

vision c5 software flowWonderful

hardware is not a lot of use if it

is too difficult to program. There

are standard open-source CNN

frameworks such as Caffe and

TensorFlow that are the most

common way to develop in the

space. These flow cleanly into the

CNN mapper and then all the way

down to the Vision C5 DSP.

Summary

The Vision C5 is targeted at high-

performance CNN applications that

require TMAC/s operation. For lower

performance, such as the neural

nets that are occasionally required

in mobile, the Vision P6 DSP is more

appropriate, with a performance of

up to 200 GMAC/s. For the most

demanding applications, multicore

versions of the Vision C5 DSP fit the

bill.

is targeted at vision, lidar, voice,

and radar applications in the

mobile, surveillance, automotive,

drone, and wearable markets.

It has a computational capacity

of 1TeraMAC/s (trillion multiply-

accumulate operations per second).

It is not an accelerator, it is a

standalone self-contained neural

network DSP. This is important

since accelerators only handle part

of the problem, requiring a lot of

processing power on whatever other

processor is in use to do the rest. For

example, they may only handle the

convolutional (first) step of a CNN,

which in addition to only partially

offloading the computation, means

that a lot of bandwidth is going to be

used shifting data back and forth. The

Vision C5 DSP completely offloads all

the processing and minimizes the

data movement, where much of the

power is actually consumed.

Typically, neural network applications

are divided into two phases, training

and inference. The training is

normally done in the cloud and

requires processing large sets of

data requiring 1016 to 1022 MACs

per dataset. Inference usually runs

closer to the edge of the network, in

the drone or car for example. Each

image requires 108 to 1012 MACs.

The biggest issue, though, is power.

It is this inference phase of using

neural networks where the Vision C5

DSP is focused.

Embedded Solutions

Special Edition

New-Tech Magazine Europe l 61