ICS 2021 tutorial

Bambu: High-level synthesis for parallel programming

ICS21 – International Conference on Supercomputing
June 14 – 17, 2021. Virtual event

Abstract

Applications operating on very large datasets present unique behaviors, such as fine-grained, unpredictable memory accesses, and highly unbalanced task-level parallelism, that make existing high-performance general-purpose processors or accelerators (e.g., GPUs) suboptimal. To address these issues, research and industry are developing a variety of custom accelerator designs for this application area, including solutions based on reconfigurable devices (Field Programmable Gate Arrays). These new approaches often employ High-Level Synthesis (HLS) to accelerate the development of the accelerators. This tutorial will discuss the impact of FPGAs on High-Performance Computing, focusing on applications in the areas of data analytics and machine learning. The tutorial will dive into approaches for the High-Level Synthesis (HLS) of parallel applications, highlighting key methodologies, trends, advantages, benefits, but also gaps that still need to be closed. The tutorial will provide a hands-on experience of Bambu, one of the most advanced HLS tools available. Able to support the majority of C constructs, Bambu integrates with many logic syntheses and simulation tools, generating accelerators for a variety of FPGA vendors, starting from parallel code annotated with OpenMP. It also optimizes the memory architectures of the generated accelerators.

Tutorial topics covered

The tutorial will initially provide an overview of the current state of tools and platforms for FPGA acceleration, discussing, in particular, the relevant applications and workloads for HPC. In the introductory part, we will make the case for the use of FPGAs in particular for memory-bound and memory-intensive applications from the data analytics and machine learning areas, but also highlight where they can be relevant for more conventional scientific simulation workloads. We will then discuss where Bambu is positioned in the spectrum of the available tools, introducing its unique characteristics and features. Focusing on Bambu, we will present the synthesis approaches and architectural templates adapted to extract and manage task-level parallelism, and the techniques to support complex memory patterns and parallel memory subsystems. The hands-on part of the tutorial will teach the audience how to install the tool and actually use the tool with a set of relevant application kernels (in particular, focusing on graph kernels). The audience will learn how to generate and optimize accelerators starting from parallel specifications, they will learn how to optimize the memory architecture, verify the generate accelerators and the designs obtained by integrating third parties IP, and how to configure the flow to target different brands of FPGAs, different boards, and different simulation infrastructure. The last part of the tutorial will provide an overview of current trends and future research directions for reconfigurable hardware in High-Performance Computing.

Tutorial exercises are available at: https://github.com/ferrandi/PandA-bambu/tree/tutorial_2021/documentation/tutorial_ics_2021

Tutorial slides:

Organizers and Short Bios

Fabrizio Ferrandi, Associate Professor, Politecnico di Milano, Italy
Fabrizio Ferrandi Fabrizio Ferrandi (Member, IEEE) received the Laurea (cum laude) degree in electronic engineering and the Ph.D. degree in information and automation engineering (computer engineering) from the Politecnico di Milano, Milan, Italy, in 1992 and 1997. He has been an assistant professor with the Politecnico di Milano, until 2002. Currently, he is an associate professor with the Dipartimento di Elettronica, Informazione e Bioingegneria of the Politecnico di Milano. His research interests include synthesis, verification simulation, and testing of digital circuits and systems. He is a member of the IEEE Computer Society since 1995, the Test Technology Technical Committee, and the European Design and Automation Association.

Serena Curzel, PHD student, Politecnico di Milano, Italy
Serena Curzel received the B.S. degree in Electronics and telecommunication Engineering from Università degli studi di Trento, Italy, in 2016and the M.S. degree in Electronics Engineering from Politecnico di Milano, Italy, in 2019, where she is currently pursuing the Ph.D. degree in Information Technology. Her main research interests are FPGA acceleration of domain-specific applications (including Deep Neural Networks) and High-Level Synthesis. Since 2019 she is also in charge of software development for the HERMES-SP CubeSat project.

Michele Fiorito, research assistant, Politecnico di Milano, Italy
Michele Fiorito received the M.S. degree in Computer Science Engineering from Politecnico di Milano, Italy, in 2020, where he is currently working as a research assistant to support software development for the HERMES-SP Cub Sat project. His main research interests are High-Level Synthesis tools design and approximate computing.

Vito Giovanni Castellana, Senior Research Scientist, Pacific Northwest National Laboratory, United States of America
Dr. Vito Giovanni Castellana received the M.S degree in Informatic Engineering, in 2010, and the Ph.D. degree in Computer Engineering, in 2014, from Politecnico di Milano in Italy. Since February 2014, he has been a research scientist in the PNNL’s High-Performance Computing group. He joined PNNL in 2012 as a post-master research associate. His research interests are embedded systems and computer architectures, design automation, and HPC.

Marco Minutoli, Research Scientist, Pacific Northwest National Laboratory, United States of America
Marco Minutoli received the M.S degree in Informatic Engineering, in 2014 from Politecnico di Milano in Italy. Since February 2016, he has been a research scientist in the PNNL’s High-Performance Computing group. He joined PNNL in 2014 as a post-master research associate. Since 2016 is a Ph.D. candidate in Computer Science at Washington State University. His research interests are focused on the design and analysis of data structures and graph algorithms for high-performance and big data applications.

Antonino Tumeo, Senior Research Scientist, Pacific Northwest National Laboratory, United States of America
Dr. Antonino Tumeo received the M.S degree in Informatic Engineering, in 2005, and the Ph.D. degree in Computer Engineering, in 2009, from Politecnico di Milano in Italy. Since February 2011, he has been a research scientist in the PNNL’s High-Performance Computing group. He joined PNNL in 2009 as a post-doctoral research associate. Previously, he was a post–doctoral researcher at Politecnico di Milano. His research interests are modeling and simulation of high-performance architectures, hardware-software codesign, FPGA prototyping and GPGPU computing.

A framework for Hardware-Software Co-Design of Embedded Systems