vision c5 dsp block diagramThe
Vision C5 DSP neural network
processor is:
A complete, standalone DSP that
runs all layers of CNN (convolution,
fully connected, normalization,
pooling...)
A DSP for the fast-changing neural
network field: programmable and
future-proof
Performance of 1 TMAC/s (trillion
multiply-accumulates per second)
1024 8-bit MACs or 512 16-bit MACs
for exceptional performance at both
resolutions
128-way, 8-bit SIMD or 64-way, 16-
bit SIMD VLIW architecture
Not a hardware accelerator to pair
with a vision DSP, rather a dedicated
neural network optimized processor
Architected for multi-processor
design—scales to multi-TMAC/s
solutions
Same proven software tool set as the
Vision P5 and P6 DSPs
<1mm2 in 16nm
vision c5 software flowWonderful
hardware is not a lot of use if it
is too difficult to program. There
are standard open-source CNN
frameworks such as Caffe and
TensorFlow that are the most
common way to develop in the
space. These flow cleanly into the
CNN mapper and then all the way
down to the Vision C5 DSP.
Summary
The Vision C5 is targeted at high-
performance CNN applications that
require TMAC/s operation. For lower
performance, such as the neural
nets that are occasionally required
in mobile, the Vision P6 DSP is more
appropriate, with a performance of
up to 200 GMAC/s. For the most
demanding applications, multicore
versions of the Vision C5 DSP fit the
bill.
is targeted at vision, lidar, voice,
and radar applications in the
mobile, surveillance, automotive,
drone, and wearable markets.
It has a computational capacity
of 1TeraMAC/s (trillion multiply-
accumulate operations per second).
It is not an accelerator, it is a
standalone self-contained neural
network DSP. This is important
since accelerators only handle part
of the problem, requiring a lot of
processing power on whatever other
processor is in use to do the rest. For
example, they may only handle the
convolutional (first) step of a CNN,
which in addition to only partially
offloading the computation, means
that a lot of bandwidth is going to be
used shifting data back and forth. The
Vision C5 DSP completely offloads all
the processing and minimizes the
data movement, where much of the
power is actually consumed.
Typically, neural network applications
are divided into two phases, training
and inference. The training is
normally done in the cloud and
requires processing large sets of
data requiring 1016 to 1022 MACs
per dataset. Inference usually runs
closer to the edge of the network, in
the drone or car for example. Each
image requires 108 to 1012 MACs.
The biggest issue, though, is power.
It is this inference phase of using
neural networks where the Vision C5
DSP is focused.
Embedded Solutions
Special Edition
New-Tech Magazine Europe l 61