on-demand webinar

DUTh: Architectural Improvements for Low-Power and Functional Safety of Dataflow CNN Accelerators Using HLS

Architectural Improvements for Low-Power and Functional Safety of Dataflow CNN Accelerators Using HLS

Estimated Watching Time: 64 minutes

DUTh demos using HLS to design CNN accelerators with on-line checking capabilities, improve power efficiency due to optimized data handling on spatial variants of convolution, and effectively use HLS for customized FP operators.

Deep Convolution Neural Networks (CNNs) are dominant in modern Machine Learning (ML) applications. Their acceleration directly in hardware calls for a multifaceted approach that combines high-performance, energy efficiency, and functional safety. These requirements hold for every CNN architecture, including both systolic and spatial dataflow alternatives. In this talk, we focus on the latter case, where CNNs are implemented using a series of dedicated convolutions engines, where the data are streamed from one layer to the other through custom memory buffers.

In this context, we will first present a High-Level Synthesis (HLS) implementation for dataflow CNNs that utilizes the benefits of Catapult HLS and allows the implementation of a wide variety of architectures. Especially, we will focus on the energy-efficient implementation of non-traditional forms of spatial convolutions, such as strided or dilated convolutions, which leverage the decomposition of convolution to eliminate any redundant data movements.

In the following, we will present an algorithmic approach for online fault detection on CNNs. In this case, the checksum of the actual result is checked against a predicted checksum computed in parallel by a hardware checker. Based on a newly introduced invariance condition of convolution, the proposed checker predicts the output checksum implicitly using only data elements at the border of the input features. In this way, the power required for accumulating the input features is reduced without requiring large buffers to hold intermediate checksum results.

Finally, we study customized floating point HLS operators that support fused dot products of single or double-width output (i.e., input in FP8 and the result in FP16) that eliminate redundant rounding and type conversion steps. Also, floating-point datatypes that support adjustable bias for tuning the dynamic range will be discussed.

What you will learn

Utilizing HLS to design CNN accelerators with on-line checking capabilities.
Improve power efficiency due to optimized data handling on spatial variants of convolution.
Effective use of HLS for implementing customized FP operators.

Who should attend

HW designers/Researchers interested in High-Level Synthesis designs for ML accelerators and Floating-Point arithmetic.

Meet the speaker

Democritus University of Thrace (DUTh)

Dionysios Filippas

Ph.D. student, Democritus

Dionysios Filippas is a Ph.D. student in Electrical and Computer Engineering at Democritus University of Thrace (DUTh). He received his Diploma and M.Sc. degree in the same department in 2019 and 2021 respectively. His experience involves the design of network-on-chips and customized hardware accelerators for data clustering algorithms. His current research focuses on the design of power efficient CNN accelerators as well as the design of customized floating-point arithmetic units using HLS.

DUTh: Architectural Improvements for Low-Power and Functional Safety of Dataflow CNN Accelerators Using HLS

Share

Meet the speaker

Dionysios Filippas