on-demand webinar

HLS for Vision and Deep Learning Hardware Accelerators Seminar

Complete Seminar Recording, Slides, and Q&A

Estimated Watching Time: 284 minutes


HLS flow

Deep learning applications are exploding, especially those that use images for computer vision. We see these applications in everything from self-driving cars to significant advances in healthcare. However, AI-infused applications that need to be deployed at the edge as inference solutions, are challenged by the power and performance needed to execute them. Leading companies like NVIDIA, Google, Bosch, Qualcomm, and many others have turned to High-Level Synthesis (HLS) to address this. Using HLS in their design and verification flow enables them to go from idea to silicon, with PPA (power, performance and area) that is equivalent to hand-coded RTL, but in a fraction of the time.

This 8 part technical webinar series is designed for those that know very little about HLS and those familiar with its concepts, but want real examples applying it to computer vision and deep learning implementation.

HLS — What, How and Why Now?

This webinar is a high-level introduction of HLS, trends, and why it is essential for computer vision and deep learning.

Using High-Level Synthesis to Accelerate Computer/Machine Vision Applications

High-Level Synthesis (HLS) has been used in multiple companies, projects and designs targeting vision processing in autonomous car projects. HLS is the fastest way to turn complex algorithms into efficient HW implementation and also create a methodology that enables design teams to rapidly react to changes in algorithm or functional specification and still make demanding schedules. This session will step through the basics of how HLS works and why HLS is such a good fit for Image Processing and vision applications using a practical example vision algorithm (HOG: Histogram of Oriented Gradients).

Implementing Machine Learning Hardware Using High-Level Synthesis

Neural networks are typically developed and trained in a high-performance 32-bit floating-point compute environment. But, in many cases a custom hardware solution is needed for the inference engine to meet power and real-time requirements. Each neural network
and end-application may have different performance requirements that can dictate the optimal hardware architecture. This makes custom-tailored solutions impossible when using hand-coded RTL creation flows. HLS has the unique ability to quickly go from complex algorithms written in C to generated RTL, enabling accurate profiles
for power and performance for an algorithm's implementation without having to write it by hand. This webinar steps through a CNN (Convolutional Neural Network) inference engine implementation, highlighting specific architectural choices, and shows how to integrate and test inside of TensorFlow. This demonstrates how an HLS flow is used to rapidly design custom CNN accelerators.

From HLS Component to a Working Design

Complex algorithms do not exist in a vacuum. After HLS is used to create an RTL component, to be useful, it needs to be integrated into a larger system. This means connecting it to other components, a processor, and even software. Once integrated, the system needs to be verified. The verification of the complete environment does not just mean functional correctness, but also needs to consider performance, and in some cases power. This webinar details approaches to integrating accelerator blocks into processor-based sub-systems, interfacing to software, and verifying the accelerator in the context of the larger system. It also covers deploying the system onto a FPGA prototyping board.

Leveraging HLS IP to Accelerate Design and Verification

To accelerate and ease the adoption of HLS, Catapult provides both building block HLS IP and various application reference designs written in C++ or SystemC that are designed to help deliver optimal QofR. This webinar will describe the available IP including the Math and DSP blocks available as open-source and the several reference designs, including 2-D convolution for image enhancements and two CNN (tinyYOLO) implementations for real-time object classification. 

Siemens: Application-Specific Neural Network Inference

There is a wide range of solutions for implementation of the inference stage of convolutional neural networks available on the market. Almost all of them follow a generic accelerator approach which introduces overhead and implementation penalties for a specific network configuration. High-level Synthesis leverages application/network specific optimizations to further optimize PPA for specific neural networks or classes of networks. This webinar gives an introduction to the design flow starting from AI/ML frameworks like TensorFlow down to FPGA/ASIC and relevant optimization techniques.

SeeCubic®: Catapult HLS for Ultra-D Display Processing

Ultra-D technology provides a solution for glasses-free autostereoscopic displays that can be used in any display application. Real-time conversion of 2D or 3D (left/right) signals to the Ultra-D format is a key component of the Ultra-D implementation. This conversion is based on innovative depth estimation using our patented proprietary algorithms. When our business development team requested the development of an IP block for this function, we were facing a Catch-22 situation. How do we design an IP block that is suitable for IC integration without information on the semiconductors technology? Furthermore, how do we design it for multiple technologies and enable integration in multiple products with different on-chip infrastructures? In this webinar, we show how Catapult® High-Level Synthesis (HLS) development methodology has enabled this IP block development. We will also illustrate how we executed the project with a relatively small team, resulting in a complete FPGA-based validation platform. And finally, we will share some unexpected lessons and reflect on key organizational success factors.

Xperi®: A Designer’s Life with HLS

This webinar will discuss two aspects of their experience going from RTL to HLS. The first topic is using HLS for algorithms such as Face Detection that they know well with RTL for comparison. The second is to use HLS to develop new Neural Network accelerators and how HLS could help them get from algorithm to critical FPGA demonstrators in a time which would not be possible with traditional RTL flow.

Who Should View

  • RTL Designers or Project Managers interested in moving up to HLS to improve design and verification productivity.
  • Architects or hardware-aware algorithm developers in the field of image processing, computer vision, machine and deep learning, that are interested in rapid and accurate exploration of power/performance metrics.
  • New project teams with only a few hardware designers and multiple software experts that want to rapidly create high-performance FPGA or ASIC IP for computer vision or deep learning markets.

What You Will Learn

  • How HLS is used to implement a computer vision algorithm in either an FPGA or ASIC technology and the trade-offs for power and performance.
  • How HLS is employed to analyze unique architectures for a very energy-efficient inference solution such as a CNN (Convolutional Neural Network) from a pre-trained network.
  • How to integrate the design created in HLS into a larger system, including peripherals, processor, and software.
  • How to verify the design in the context of the larger system and how to deploy it into an FPGA prototype board.
  • Customer's experience using HLS for Image Processing and AI
  • Using HLS for algorithms such as Face Detection with RTL for comparison
  • How to use HLS to develop new Neural Network accelerators
  • How HLS can help get from algorithm to critical FPGA demonstrators faster than with traditional RTL flow
  • How Catapult HLS development methodology has enabled IP block development

Meet the speakers

Siemens EDA

Ellie Burns

Former Director of Marketing

Ms. Burns has over 30 years of experience in the chip design and the EDA industries in various roles of engineering, applications engineering, technical marketing and product management. She was formerly the Director of Marketing for the Calypto Systems' Division at Siemens EDA responsible for low-power RTL solutions with PowerPro and HLS Solutions with Catapult. Prior to Siemens and Mentor, Ms. Burns held engineering and marketing positions at CoWare, Cadence, Synopsys, Viewlogic, Computervision and Intel. She holds a BSCpE from Oregon State University.

Siemens EDA

Michael Fingeroff

HLS Technologist

Michael Fingeroff has worked as an HLS Technologist for the Catapult High-Level Synthesis Platform at Siemens Digital Industries Software since 2002. His areas of interest include Machine Learning, DSP, and high-performance video hardware. Prior to working for Siemens Digital Industries Software, he worked as a hardware design engineer developing real-time broadband video systems. Mike Fingeroff received both his bachelor's and master's degrees in electrical engineering from Temple University in 1990 and 1995 respectively.

Siemens EDA

Russell Klein

HLS Program Director

Russell Klein is a Program Director at Siemens EDA’s (formerly Mentor Graphics) High-Level Synthesis Division focused on processor platforms. He is currently working on algorithm acceleration through the offloading of complex algorithms running as software on embedded CPUs into hardware accelerators using High-Level Synthesis. He has been with Mentor for over 25 years, holding a variety of engineering, marketing and management positions, primarily focused on the boundary between hardware and software. He holds six patents in the area of hardware/software verification and optimization. Prior to joining Mentor he worked for Synopsys, Logic Modeling, and Fairchild Semiconductor.

Siemens EDA

David Burnette

Director of Engineering

David Burnette is currently Director of Engineering for the Catapult High-Level Synthesis product of Siemens EDA. He has contributed to the HLS program over the last 26 years, starting first with behavioral synthesis from VHDL followed by C++/SystemC. Much of his recent work has centered around High-Level Verification (designing infrastructure for comparing the untimed C++ against the timed RTL) and the development of class-based C++ HLS IP for math, DSP/Image Processing and Machine Learning. He received his BSEE and MSEE from Virginia Tech and holds 4 patents in the area of HLS methodologies.

Siemens EDA

Herbert Taucher

Head of Research Group

Herbert is responsible for industrial research in electronics in Siemens. His team is working on computing architectures and design flows for secure and safe real-time capable industrial Edge Computing. There is a special focus on AI/ML as compute workload and on leveraging AI/ML in the design flow. Herbert has a 20+ year history in SoC/ASIC/FPGA design.


Bram Riemens


Bram Riemens co-founded SeeCubic B.V. in 2011. This startup develops the Ultra-D technology for 3D viewing without glasses. As System Architect Bram drives research and development of the Ultra-D signal processing technology. Before SeeCubic, Bram worked for 25 years in Philips and NXP Research, where he contributed to innovative video processing algorithms and systems. His research focused in particular on the interaction between algorithmic optimization and various realization means (such as general-purpose processors, domain-specific processors, configurable hardware or function-specific hardware). His work resulted in more than 35 patents and patent applications, contributions to several conference papers and contributions to commercial ICs. During the last years at SeeCubic, Bram has headed the implementation of Depth Estimation algorithms in a Real-Time Conversion IP block. In this system, the pixel number-crunching is implemented in hardware using the Siemens' Catapult technology.


Alexandru Radoi

VLSI engineer

Alexandru Radoi has 7 years of experience as a VLSI engineer at FotoNation (currently XPERI FotoNation).

Related resources