온디맨드 웨비나

Part 2: Adapting software algorithms to hardware architectures for high performance and low-power

예상 소요 시간: 50분

공유

Part 2: Adapting software algorithms to hardware architectures for high performance and low-power

GPUs and DSPs offer very high-parallelism and impressive memory bandwidths, within the scope of a fully programmable platform. However, they need to fetch and decode every instruction, and must have a relatively fixed architecture, which leads to wasted energy. The Single-Instruction-Multiple-Data architecture of most high-performance GPUs also leads to reduced performance and reduced energy efficiency when threads can take different execution paths (the so called "divergence" problem).

FPGAs, on the other hand, provide a fully customizable architecture. For example, the precision of each computation can be tailored specifically for the application at hand. Moreover control is fully application specific and hardwired. Finally, their memory architecture can be specialized as much as needed, much beyond the DRAM/SRAM/register hierarchy that DSPs and GPUS provide.

Design costs stemming from low-level RTL design and the difficulty to reuse a significant portion of past designs have been a significant adoption hurdle for FPGAs in rapidly evolving application domains. This has recently changed, thanks to the advent of high-level synthesis, which allows a design team to quickly explore one or several highly optimized architectures from essentially the same software model, written e.g. in CUDA or OpenCl, that has been used to implement the same algorithm on a CPU or GPU. A very broad set of highly optimized low-level libraries written in these languages (e.g. CUBLAS, CUDNN, ...) is available to ease the task of accelerating machine learning, computer vision, image recognition, database search and other applications on FPGAs.

Who should attend:

  • Programmers interested in learning how to efficiently implement highly
    parallel applications on FPGAs

What you will learn:

  • Code optimization strategies to efficiently map OpenCl and C++ code
    on FPGAs
  • Difference between memory architectures on GPUs and FPGAs
  • How to select the best platform for the application at hand
  • Migration path to ASICs when the algorithms are consolidated and
    manufacturing costs must be reduced

발표자 소개

Politecnico di Torino

Luciano Lavagno

Professor

Luciano Lavagno received his Ph.D. in EECS from U.C.Berkeley in 1992. Luciano co-authored four books and over 200 scientific papers. He was the architect of the POLIS HW/SW co-design tool and one of the architects of the Cadence CtoSilicon High-Level Synthesis tool. He is a professor with Politecnico di Torino, Italy and, also, a consultant for the Catapult High-Level Synthesis group of Siemens EDA. His research interests include High-Level Synthesis, HW/SW co-design, and design tools for wireless sensor networks.

관련 자료

Rapid Algorithm to HW: Using HLS for Computer Vision and Deep Learning Seminar
Webinar

Rapid Algorithm to HW: Using HLS for Computer Vision and Deep Learning Seminar

How HLS helps project teams rapidly & accurately explore power/performance of algorithms, quickly get to FPGA implementations to create demonstrator/prototypes & use same source RTL IP for ASIC implementation.

에지 기계 학습: HLS를 이용한 전력 및 성능 최적화
White Paper

에지 기계 학습: HLS를 이용한 전력 및 성능 최적화

기계 학습을 에지(edge)로 옮기려면 전력과 성능 면에서 중요한 요구사항이 뒤따릅니다. 평범한 상용 솔루션을 사용하는 방안은 실용적이지 않습니다.

AI 가속기 생태계: 개요
White Paper

AI 가속기 생태계: 개요

Catapult HLS 플랫폼은 AI 설계자들에게 프로젝트를 바로 시작할 수 있는 환경을 제공하는 AI 가속기 생태계를 제시합니다. 이 에코시스템은 IP 라이브러리부터 완전한 툴키트까지의 리소스를 실무 엔지니어에게 제공합니다.