on-demand webinar

Customers Discuss their Real-World Use of High-Level Synthesis

Estimated Watching Time: 356 minutes


Leading experts from key companies present how they have successfully deployed HLS in production design flows. Companies presenting are: Google, NASA-JPL, NVIDIA, NVIDIA Research, NXP Semiconductors, STMicroelectronics, and Viosoft

The focus of this seminar is to have real-world customers present their successes using Catapult High-Level Synthesis (HLS) in markets such as Automotive, 5G/Communications, Video/Imaging, AI/ML, and MEMs Sensors. The companies who will be presenting are:

  • Google (Video/Imaging)
  • NASA-JPL (Video/Imaging)
  • NVIDIA (Video/Imaging)
  • NVIDIA Research (AI/ML)
  • NXP Semiconductors (Automotive)
  • STMicroelectronics (MEMS Sensors)
  • Viosoft (5G/Communications)

In this seminar, leading experts will present how they have successfully deployed HLS in production design flows.

Seminar Sessions:

Seminar Introduction and Market Overview
This session will provide an overview of the seminar and touch on HLS' growth in the markets that will be covered today.
Speaker: Stuart Clubb

HLS 101
HLS enables designers to rapidly go from a high-level description in C++/SystemC to optimized RTL. This introduction will show the basics around how High-Level Synthesis and Catapult HLS can be used to synthesize to optimal RTL for a production design flow.
Speaker: Michael Fingeroff

NXP: Way to HLS - Our Transition Towards HLS
In this presentation it will be shown why we decided to adopt our methodology towards High-Level Synthesis with Catapult. Our chosen design and verification flow is outlined together with power estimation and optimization steps that are used. Using our real life design some examples will be shown and our experiences shared. The future next steps will conclude the presentation.
Speaker: NXP - Reinhold Schmidt

Google's Video Coding Unit (VCU) Accelerator
Video sharing (e.g., YouTube, Vimeo, Facebook, TikTok) accounts for the majority of internet traffic, and video processing is also foundational to several other key workloads (video conferencing, virtual/augmented reality, cloud gaming, video in Internet-of-Things devices, etc.). The importance of these workloads motivates larger video processing infrastructures and - with the slowing of Moore’s law - specialized hardware accelerators to deliver more computing at higher efficiencies. This session describes the design and deployment, at scale, of a new accelerator targeted at warehouse-scale video transcoding. We present our approach to using High-Level Synthesis to design our hardware for deeper architecture evaluation and verification, including a new accelerator building block - the video coding unit (VCU) - and discuss key design trade-offs for balanced systems at data center scale and co-designing accelerators with large-scale distributed software systems. We evaluate these accelerators “in the wild” serving live data center jobs, demonstrating 20-33x improved efficiency over our prior well-tuned non-accelerated baseline. Our design also enables effective adaptation to changing bottlenecks and improved failure management, and new workload capabilities not otherwise possible with prior systems.
Speaker: Google - Aki Kuusela

Intro to MatchLib
MatchLib is an open-source library written in SystemC and C++, originally created by NVIDIA Research, that enables much faster design and verification of SOCs using HLS. One of the primary objectives of MatchLib is easier performance accurate modeling of SOCs which enables designers to find system-level performance bottlenecks far sooner in their design cycle. This session will introduce MatchLib, and show how it enables designers to identify and resolve issues such as bus and memory contention, arbitration strategies, and optimal interconnect structure at a much higher level of abstraction than RTL.
Speaker: Michael Fingeroff

NVIDIA Research: Automated Physically-Aware Interconnect Modeling and Generation using IPA
Interconnect design is a critical part of many highly complex SoCs, yet HLS has not historically been used for chip-level interconnect. One major limiter is that interconnect architecture and physical floorplan are tightly coupled and can be difficult to estimate early in the design process.
We demonstrate IPA (Interconnect Prototyping Assistant) to help address this gap. IPA is an open-source framework (available at https://github.com/NVlabs/IPA) for interconnect prototyping and implementation in HLS-based SoC flows, written in SystemC and Python. IPA is used during early architectural prototyping by abstracting specifics of interconnect implementation. IPA then generates interconnect models, including interfaces, for cycle-accurate SystemC simulations. If the design requires long wires between communication units, IPA automatically inserts retiming stages to meet clock frequency targets. IPA’s SystemC code is fully HLS-compatible for RTL creation, and thus can be used within a full-chip HLS flow for pushbutton interconnect generation once a design point is selected.
Speaker: NVIDIA Research - Nathaniel Pinckney

Day 2 Intro
Speaker: Stuart Clubb

NVIDIA: Applying Catapult HLS design and Verification to NVIDIA’s Large Scale Video Codecs: Benefits and Challenges
Switching the whole IP design to HLS methodology brought the design team a lot of benefits on coding effort and simulation runtime saving. However, it also posed some challenges, like how to handle such a big design efficiently and achieve the same design quality as handwritten RTL. This presentation will cover how we met these challenges and magnified the benefits of HLS to our designs.
Speaker: NVIDIA - Hai Lin

STMicroelectronics: High-Level Synthesis in Analog Native Products Upward Trend
STMicroelectronics has been using Catapult High-Level Synthesis (HLS) in last years 10 years. Firstly has been used the huge digital on top ASIC implementation with the aim of designing complex arithmetical structures in a fast but efficient way. In this presentation work we’ll show that Catapult delivers great benefits also on AMS applications with a growing digital content. STMicroelectronics is leader in adding intelligence to its sensors and actuators products, but now the challenge is moving to analog native product, where adding elaboration into analog paves the way to new application, and allows system optimization from computation and power perspective. This presentation will show two examples of blocks where using Catapult can be a booster for AMS products: will focus on a contactless thermometer formula, an ASK general purpose demodulator. The use of HLS has enabled last minute functional changes without impacting timeline. The digital block can be integrated on silicon easily, quickly and effectively.
Speaker: STMicroelectronics - Sandro Dalle Feste

NASA-JPL: Pros and Cons of a C++ Flow vs. a SystemC Flow for the Harris Corner Detector
In order to understand which Catapult HLS flow is better suited for our needs at JPL, we implemented Harris Corner Detector image processing core in both untimed C++ and timed SystemC. Untimed C++ was easy to get started with. After modifying the algorithmic model to synthesize on Catapult HLS, we plugged the design back into algorithmic regression and verified that we did not introduce bugs. However, with this flow we did encounter unexpected behaviors in synthesized RTL. C++ design masked these problems due to its untimed nature. Desiring higher level of control over hardware synthesized, we implemented the same design in SystemC. SystemC synthesized the design to our expectations but it required more effort to design and verify. In this presentation I will present what were the lessons learned from each flow.
Speaker: NASA-JPL - Ashot Hambardzumyan

Viosoft: Functional Exploration and Offloading of 5G Physical Layer Protocol Stack
The 3GPP Radio Layer 1 Protocol Stack, also known as RAN1, encodes and modulates signals, performs MIMO and beamforming, along with other compute-intensive functions such as error correction, rate matching, mapping, and RF processing. L1 functions can be hosted on general purpose compute, FPGA, or ASIC with respective trade-offs in power, performance, density and costs. Implementing L1 functions in common high level language lets the designer explore trade-offs that can result in the most optimal deployment based on their network capacity, technical, and market constraints.
Speaker: Viosoft - Hieu Tran

Closing Session
Summary, what we learned and where to get started
Speaker: Stuart Clubb

Meet the speakers

Siemens EDA

Stuart Clubb

Technical Product Management Director

Siemens EDA

Michael Fingeroff

HLS Technologist

Michael Fingeroff has worked as an HLS Technologist for the Catapult High-Level Synthesis Platform at Siemens Digital Industries Software since 2002. His areas of interest include Machine Learning, DSP, and high-performance video hardware. Prior to working for Siemens Digital Industries Software, he worked as a hardware design engineer developing real-time broadband video systems. Mike Fingeroff received both his bachelor's and master's degrees in electrical engineering from Temple University in 1990 and 1995 respectively.

NXP Semiconductors

Reinhold Schmidt

Digital Designer

My main focus is digital signal processing, where I work on digital baseband processing systems for RF communication devices.

I used to work on decimation chains, narrowband interference cancellation, analog mismatch compensation. Now I am focusing more on subsystem level meaning the complete baseband design of ultra wideband radios.


Aki Kuusela

Senior Engineering Manager, Consumer Hardware

Aki Kuusela is an Engineering Manager at Google's Devices & Services unit. He has an M.S. in electrical engineering from University of Oulu, Finland, and has worked on video compression and various hardware accelerators for more than 20 years. He joined Google in 2010 and has participated in the development of the open video formats VP9, AV1 and AV2 within the Alliance for Open Media. He is a strong supporter of high-level synthesis design flows, having taped out his first HLS design in 2014. At Google he has worked on both data center and consumer chips, including the VCU ASIC and the Tensor SoC.

NVIDIA Research

Nathaniel Pinckney

Senior Research Scientist

Nate Pinckney received his B.S. degree in engineering from Harvey Mudd College in 2008, and his Ph.D. in electrical engineering from the University of Michigan in 2015. He has authored or coauthored over 40 publications in the areas of high-level synthesis methodologies, low-power VLSI design, and cryptographic accelerators. In 2015, he joined NVIDIA in Austin, TX.


Hai Lin

Senior ASIC Design Engineer

Graduated as EE master from Shanghai JiaoTong University at 2008 and joined NVIDIA Shanghai R&D site as ASIC verification leader for GPU NOC unit. Started to work for Shanghai video design team since 2016 and lead video jpg engine developing, started to develop HLS design flow since 2018 and lead HLS design methodology team for NVIDIA Shanghai R&D site.


Marco Castellano

Digital Design Manager

Marco Castellano earned his Laurea degree from the University of Pavia, Italy, in 2005. He continued his studies at the same university, completing a Ph.D. in electrical engineering in 2009. His doctoral research was a collaborative effort between the University of Pavia and STMicroelectronics, focusing on the creation of high-speed, low-power arithmetic circuits for Digital Signal Processing. In 2016, Marco advanced to a leadership role within an R&D team at STMicroelectronics, where he was instrumental in driving the innovation of embedded processing in analog products and sensors. His contributions to the field are documented in a series of published papers, presentations at scientific conferences, and a portfolio of patents, all centered around the integration of algorithms.


Ashot Hambardzumyan

FPGA Engineer

Ashot received his B.S in Computer Engineering from California Polytechnic University of Pomona and his M.S in Computer Science from Georgia Institute of Technology. He is the principle investigator for High-Level Synthesis for Jet Propulsion Laboratory's Autonomous Systems division. His experience includes FPGA verification lead for Mars Ingenuity Helicopter and FPGA designer on Entry Descent and Landing vision accelerator. Currently, he is in charge of computer vision FPGA accelerators for JPL's Mars Sample Return project.


Hieu Tran


At Viosoft andSoC.one, Hieu founded and led projects and initiatives in diverse verticals ranging from edge compute, IoT/5G, and digital-transformation of telco to blockchain for communication and ad-tech.

An advocate of conceptsthat bring consumers more choices, better pricing, and active participation, Hieu collaborated with talented, multinational teams of engineers to conceptualize, develop and deploy disaggregated services and infrastructures. Hieu has the privilege of tackling many of the leading challenges in broad areas of tech, and enjoys the clarity of learning from customers and collaborating technically with team members. He is a believer in utilizing these experiences to align team resources and competencies with customers needs to reach achievable milestones.

A UCLA engineering graduate, Hieu attended Stanford graduate program while serving as a member of the technical staff at Tandem Computers (HPE) where he benefited from the guidance and leadership of tech pioneers in transaction processing, databases, distributing computing, and virtualization. Prior to Viosoft, Hieu led efforts at Integrated Systems (Wind River) in areas covering real-time software systems and tools for networking and automotive design.

Related resources