All posts by Fabrizio Ferrandi

Fabrizio Ferrandi received his Laurea (cum laude) in Electronic Engineering in 1992 and the Ph.D. degree in Information and Automation Engineering (Computer Engineering) from the Politecnico di Milano, Italy, in 1997. He joined the faculty of Politecnico di Milano in 1999 as "Ricercatore" and later in 2002 as Associate Professor with the Dipartimento di Elettronica, Informazione e Bioingegneria. He has published over 100 papers. His research interests include synthesis, verification, simulation and testing of digital circuits and systems. Fabrizio Ferrandi is a Member of IEEE, of the IEEE Computer Society, of the Test Technology Technical Committee and of European Design and Automation Association – EDAA. Some references to my publications: DBLP Google Scholar

PandA 0.9.6 released

New features introduced:

  • Added support to BRAVE FPGAs. Both NG-Medium and NG-Large are supported.
  • Added support to TASTE model-based design flow. A new option has been added to activate the customized HLS flow: –experimental-setup=BAMBU-TASTE.
  • Added support for Hardware Discrepancy Analysis. Reference paper: Pietro Fezzardi, Marco Lattuada, Fabrizio Ferrandi, Using Efficient Path Profiling to Optimize Memory Consumption of On-Chip Debugging for High-Level Synthesis. ACM Trans. Embedded Comput. Syst. 16(5): 149:1-149:19 (2017).
  • Added support for OpenMP for. Reference papers: M. Minutoli, V. G. Castellana, A. Tumeo, M. Lattuada, and F. Ferrandi, “Efficient Synthesis of Graph Methods: A Dynamically Scheduled Architecture,” in Proceedings of the 35th International Conference on Computer-Aided Design, New York, NY, USA, 2016, p. 128:1–128:8. V. G. Castellana, M. Minutoli, A. Morari, A. Tumeo, M. Lattuada, and F. Ferrandi, “High level synthesis of RDF queries for graph analytics,” in Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015, pp. 323-330.
  • Customized Dynamic HLS flow updated. Reference paper: M. Lattuada and F. Ferrandi, “A Design Flow Engine for the Support of Customized Dynamic High Level Synthesis Flows”, ACM Trans. Reconfigurable Technol. Syst. 12, 4, Article 19 (October 2019), 26 pages.
  • Added initial support to C++/fortran high-level synthesis. Now, Ubuntu distributions require the installation of g++-multilib package.
  • Added support to CLANG/LLVM compiler version 4, 5, 6 and 7. On multiple files, LTO is exploited.
    Added support to high-level synthesis of spec in LLVM format (i.e., file with .ll extension).
  • Added to CLANG/LLVM based analysis an interprocedural value range analysis based on the following paper: Fernando Magno Quintao Pereira, Raphael Ernani Rodrigues, and Victor Hugo Sperle Campos. “A fast and low-overhead technique to secure programs against integer overflows”, In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (CGO ’13). Washington, DC, USA, 1-11. The implementation started from code available at this link:https://code.google.com/archive/p/range-analysis/. The original code went through a deep revision and it has been changed to be compatible with a recent version of LLVM. Some extensions have been also added:
    • Added anti range support.
    • Redesigned many Range operations to take into account wrapping and to improve the reductions performed.
    • Integrated the LLVM lazy value range analysis.
    • Added support to range value propagation of load from constant arrays.
    • Added support to range value propagation of load and store from generic arrays.
  • Added to CLANG/LLVM based analysis an Andersen based pointer analysis. The specific version used is described in: “The Ant and the Grasshopper: Fast and Accurate Pointer Analysis for Millions of Lines of Code”, by Ben Hardekopf & Calvin Lin, in PLDI 2007. The BDD library used is the Buddy BDD package By Jørn Lind-Nielsen. Users working on a Debian/Ubuntu distribution can install the libbdd-dev package.
  • Added support to GCC compiler version 8.
  • Added support to mingw-w64. A Windows-7 64bit binary distribution is now available. This minimal msys2 binary distribution includes bambu, GCC8 with plugin and multilib enabled and a customized version of clang7.
  • Added support to Mac OSX exploiting Mac Ports project (https://www.macports.org/)
  • Improved the vagrant based virtual machine generation scripts. Such scripts now create an Ubuntu 16.04 32bit VirtualBox image and an Ubuntu 18.04 64bit VirtualBox image.
  • Added Vagrant scripts for Ubuntu precise, trusty,xenial and bionic, Fedora 29, CentOs7, MacOSX MacPorts.
    Improved timing constraints and timing reports.
  • Added –registered-inputs=top option.
  • Improved the support for discrepancy when Verilator is used as simulator and a better check of basic floating-point operations.
  • Added an example using C++14 constexpr declaration and a simple GCD example written in C++.
  • The building system is now based on a single configure.ac.
    Regressions are now exploiting a Jenkins based infrastructure.
  • Some performance, style, and c++ improvements have been done following the suggestions coming from cppcheck, clang static analyzer, and Codacy.
  • A single-precision floating-point faithfully rounded powf function has been added. It follows the method published in: Florent De Dinechin, Pedro Echeverria, Marisa Lopez-Vallejo, Bogdan Pasca. Floating-Point Exponentiation Units for Reconfigurable Computing. ACM Transactions on Reconfigurable Technology and Systems (TRETS), ACM, 2013, 6 (1), pp.4:1–4:15. The powf function currently does not supports subnormals.
  • Ported some parts of the code to C++11 standard by applying clang-tidy modernize-deprecated-headers, modernize-pass-by-value, modernize-use-auto, modernize-use-bool-literals, modernize-use-equals-default, modernize-use-equals-delete, modernize-loop-convert and modernize-use-override.
  • Added a SRT4 implementation. Added –hls-fpdiv to select which floating-point division will be used. Current options: SRT4 for Sweeney, Robertson, Tocher floating-point division with radix 4 and G for Goldschmidt floating point division.
  • Added support to ac_types and ac_math library from Mentor Graphics. Concerning the original library, the bambu/PandA library does support both LLVM/Clang and GCC compiler and it requires c++14 standard for the compilation. Initial support to ap_* objects used by VIVADO HLS has been added as well.
  • Added support to top design interfaces: none (ap_none), acknowledge (ap_ack), valid (ap_vld), ovalid (ap_ovld), handshake (ap_hs), fifo (ap_fifo) and array (ap_memory). These interfaces are activated by pragmas added to the source code and when the option –generate-interface=INFER is passed to bambu. Examples of pragma use can be found in panda_regressions/hls/bambu_specific_test4.
  • Added some examples taken from https://github.com/Xilinx/HLx_Examples
  • Added option –clock-name, –reset-name, –start-name and –done-name to specify the top component controlling signal names.
  • Renamed top component and removed the suffix _minimal_interface. Now, the top component has the same name of the top function.
  • Improved and fixed the synthesis of empty functions.
  • Added option –VHDL-library=libraryname to specify the library in which the synthesized function has to be compiled.
  • Extended the VHDL support to other bambu library components.
  • If it is supported, the compiler used to compile the PandA framework uses the c++17 standard (-std=c++17), otherwise, it uses c++11 standard.
  • Integrated Mockturtle library from EPFL (https://github.com/lsils/mockturtle) to simplify LUT-based expressions.
  • Integrated Abseil C++ libraries (https://abseil.io/)
  • Added examples of parallel_queries from ICCAD15 and ICCAD16 papers.
  • Added FPT tutorial material.
  • Added PNNL19 tutorial material.
  • Added PACT19 tutorial material.
  • Improved makefile parallelization.
  • Improved bambu determinism: two runs with the same specs produce the same HDL output.
  • Fixed COND_EXPR_RESTRUCTURING step
  • Fixed omp tests
  • Fixed libm tests
  • Fixed make check and cpp examples
  • Fixed c++ struct initialization
  • Fixed floating point division
  • Fixed make dist command under Github repository

PandA 0.9.5 released

New features introduced:
– Added support to GCC 6 and GCC 7 (GCC 4.9 is still the preferred GCC compiler).
– Added support to bitfields.
– Added support for pointers and memory operations to the Discrepancy Analysis. Reference paper: Pietro Fezzardi and Fabrizio Ferrandi, “Automated bug detection for pointers and memory accesses in High-Level Synthesis compilers”, in 2016 26th International Conference on Field Programmable Logic and Applications (FPL), 2016.
– Added new option: –discrepancy-only=comma,separated,list,of,function,names
Restricts the discrepancy analysis only to the functions whose name is in the list passed as argument.
– Added new option: –discrepancy-permissive-ptrs
Do not trigger hard errors on pointer variables.
– added preliminary support to TASTE integration. Reference paper: M. Lattuada, F. Ferrandi, and M. Perrotin, “Computer Assisted Design and Integration of FPGA Accelerators in Aerospace Systems,” in Proceedings of the IEEE Aerospace Conference, 2016, pp. 1-11.
– Added support to OpenMP SIMD. Reference paper: M. Lattuada and F. Ferrandi, “Exploiting Vectorization in High Level Synthesis of Nested Irregular Loops,” Journal of Systems Architecture, vol. 75, pp. 1-14, 2017.
– Default golden reference is now input C code without any modification.
– The options –synthesize and –objectives have been removed. Now the same values passed with –objectives can
be directly passed through the option –evaluation.
– Improved the precision and the effectiveness of the Bit Value analysis and optimizations.
– Improved detection of irreducible loops.
– Improved CSE.
– Added a frontend transformations that merge some operations into FPGA LUTs.
– Now frontend explicitly introduces function calls to softfloat functions.
– Added support to block RAM with latency = 3 (–high-latency=4).
– Added bambu option –fsm-encoding=[auto,one-hot,binary].
– Added a new option: –disable-reg-init-value
Used to remove the INIT value from registers (useful for ASIC designs)
– Improved mapping of multiplications on DSPs.
– Added a GCC plugin to apply the whole program optimization starting from the topfname function instead of main function (currently only GCC 4.9 is supported).
– Added further integer division algorithms:
– non-restoring division with unrolling factor equal to 1 (–hls-div=nr1) which becomes the default division algorithm.
– non-restoring division with unrolling factor equal to 2 (–hls-div=nr2)
– align divisor shift dividend method (–hls-div=as)
– Added a specialization of the integer division working with 64bits dividend and 32bits divisor.
– Single precision floating point faithfully rounded expf and logf functions implemented following the HOTBM method published by
– Jeremie Detrey and Florent de Dinechin, “Parameterized floating-point logarithm and exponential functions for FPGAs”, Microprocessors and Microsystems, vol.31,n.8, 2007, pp.537-545.
The code has been exhaustively tested and it supports subnormals.
– Single precision floating point faithfully rounded sin, cos, sincos and tan functions implemented following the HOTBM method published by
– Jeremie Detrey and Florent de Dinechin, “Floating-point Trigonometric Functions for FPGAs” FPL 2007.
The code has been exhaustively tested and it supports subnormals.
– Single precision floating point faithfully rounded sqrt function implemented following the method published by
– Florent de Dinechin, Mioara Joldes, Bogdan Pasca, Guillaume Revy: Multiplicative Square Root Algorithms for FPGAs. FPL 2010: 574-577
The code has been exhaustively tested and it supports subnormals.
– Implemented the port swapping algorithm as described in the following paper:
– Hao Cong, Song Chen and T. Yoshimura, “Port assignment for interconnect reduction in high-level synthesis,” Proceedings of Technical Program of 2012 VLSI Design, Automation and Test, Hsinchu, 2012, pp. 1-4.
– Improved support to structs passed by copy.
– Improved ROM identification.
– Added a new option: –rom-duplication
Assume that read-only memories can be duplicated in case timing requires.
– Improved memory initialization.
– Added some transformations that lowered some memcpy and memset call to simple instructions.
– Improved softfloat functions for basic single and double precisions operations: sum, sub, mul and division.
Now addition and subtraction operations correctly manage operand equal to +0 and -0.
– Added three options to control which softfloat and libm libraries are used: –softfloat-subnormal, –libm-std-rounding and –soft-fp.
– Fixed builtin isnanf.
– Added double precision implementation of libm round function.
– Added __builtin_lrint, __builtin_llrint, __builtin_nearbyint to libm library.
– Fixed and improved tgamma and tgammaf function.
– Added support to parallel compilation of bambu libraries.
– Added support to the automatic configuration of newer releases of Quartus for IntelFPGAs.
– Improved verilator detection.
– Improved libicu detection.
– Improved boost filesystem macro.
– Fixed problems due to -m32 under arch linux.
– Fixed compilation problems with glpk and ubuntu 14.04.
– Fixed a problem with long double. They now have the same size of double.
– Added support to Mentor Visualizer.
– Improved components characterization and timing models.
– Extended support to VHDL.
– Now VHDL modelsim simulation uses 2008 standard.
– Extended set of synthesis scripts and synthesis results.
– Improved area reporting for Virtex4 devices.
– Improved characterization of asynchronous RAMs.
– Fixed extraction of slack delay from ISE trce and Lattice reports.
– Fixed yosys backend wr.r.t the newer Vivado releases.
– Added SLICES to the set of data collected by characterization.
– Extended set of regression tests.

Quality of results of this release on different target FPGAs could be found at:
CHStone QoR.
libm QoR.
Basic FP operations QoR.

For any information or bug report, please write to panda-info@polimi.it or visit the google group page.

PandAxICT5

2 minutes for a pitch at H2020 Info Day http://panda.dei.polimi.it/wp-content/uploads/ICT05-PandA.pdf https://ec.europa.eu/digital-single-market/en/news/h2020-info-day-factories-future-12-ict-5-and-ict-31-ict-innovation-manufacturing-smes-i4ms #ICT5 #DSMeu #UE #PandA4Design

PandA 0.9.3 released

New features introduced:
– general improvement of performances of generated circuits
– added full support to GCC 4.9 family which is now the default
– improved retrieving of GCC alias analysis information
– added first version of VHDL backend
– added support to CycloneV
– added support to Artix7
– extended support to Virtex7 boards family
– added option –top-rtldesign-name that controls which is the function to be synthesized by the RTL backed
– it is now possible to write the testbench in C instead of using the xml file
– added a first experimental backend to yosys (yosys link )
– added examples/crc_yosys which tests yosys backend and C based testbenches
– improved Verilog testbench generation: it is now fully compliant with cycle based simulators (e.g., VERILATOR)
– added option –backend-script-extensions to pass further constraints to the RTL synthesis (e.g., pin assignment)
– added examples/VGA showing how to integrate existing HDL based IPs in a real FPGA design
– added scripts and results for CHStone synthesis of Lattice based designs
– improved support of complex numbers
– single precision soft-float functions redesigned: now –soft-float is the default and –flopoco becomes optional
– single precision floating point division implemented exploiting Goldshmidt algorithm
– improved synthesis of libm functions
– improved libm regression test
– improved architectural timing model
– improved graphviz representation of FSMs: timing information has been added
– added option –post-rescheduling to further improve the resource usage
– parameter registering is now performed and it can be controlled by using option –registered-inputs
– added a full implementation of Bit Value analysis and coupled with Value Range analysis performed by GCC
– added option –experimental-setup to control bambu defaults:
* BAMBU-PERFORMANCE-MP – multi-port performance oriented setup
* BAMBU-PERFORMANCE – single port performance oriented setup
* BAMBU-AREA-MP – multi-port area oriented setup
* BAMBU-AREA – single-port area oriented setup
* BAMBU – no specific optimizations enabled
– improved code speculation
– improved memory localization
– added option –do-not-expose-globals making possible localization of globals, as it is similarly done by some commercial tools
– added support of high latency memories and of distributed memories: zero, one and two delays memories are supported
– added option –aligned-access to drive the memory allocation towards more simple block RAM models: it can be used under some restricted assumptions (e.g., no vectorization and no structs used)
– ported the GCC algorithm which rewrites a division by a constant in adds and shifts
– added option –hls-div that maps integer divisions and modulus on a C based implementation of the Newton-Raphson algorithm
– improved technology libraries management:
* technology libraries and contraints are now managed in a independent way
* multiple technology libraries can be provided to the tool at the same time
– improved and parallelized PandA test regression infrastructure
– added support to Centos7, fedora 21, Ubuntu 14.04 and Ubuntu 14.10 distributions
– complete refactoring of output messages

Problems fixed:
– fixed problem related to Bison 2.7
– fixed reinstallation of PandA in a different folder
– fixed installation problems on systems where boost and gcc are not installed in default locations
– removed some implicit conversions from generated verilog circuits

For any information or bug report, please write to panda-info@elet.polimi.it or to

PandA 0.9.2 released

PandA 0.9.2 new features introduced:
– added an initial support to GCC 4.9.0,
– stable support to GCC versions: v4.5, v4.6, v4.7 (default) and v4.8,
– added an experimental support to Verilator simulator,
– new dataflow dependency analysis for LOADs and STOREs; we now use GCC alias analysis to see if a LOAD and STORE pair or a STORE and STORE pair may conflict,
– added a frontend step that optimizes PHI nodes,
– added a frontend step that performs conditionally if conversions,
– added a frontend step that performs simple code motions,
– added a frontend step that converts if chains in a single multi-if control construct,
– added a frontend step that simplifies short circuits based control constructs,
– added a proxy-based approach to the LOADs/STOREs of statically resolved pointers,
– improved EBR inference for Lattice based devices,
– now, memory models are different for Lattice, Altera, Virtex5 and Virtex6-7 based devices,
– updated FloPoCo to a more recent version,
– now, register allocation maps storage values on registers without write enable when possible,
– added support to CentOS/Scientific Linux distributions,
– added support to ArchLinux distribution,
– added support to Ubuntu 13.10 distribution,
– now, testbenches accept a user defined error for float based computations; the error is specified in ULPs units; a Unit in the Last Place is the spacing between floating-point numbers,
– improved architectural timing model,
– added a very simple symbolic estimator of number of cycles taken by a function, it mainly covers function without loops and without unbounded operations,
– general refactoring of automatic HLS testbench generation,
– added support to libm function lceil and lceilf,
– added skip-pipe-parameter option to bambu; it is is used to select a faster pipelined unit (xilinx devices have the default equal to 1 while lattice and altera devices have the default equal to 0),
– improved memory allocation when byte-enabled write memories are not needed,
– added support to variable lenght arrays,
– added support to memalign builtin,
– added EXT_PIPELINED_BRAM memory allocation policy, bambu with this option assumes that a block ram memory is allocated outside the core (LOAD with II=1 and latency=2 and STORE with latency=1),
– added 2D matrix multiplication examples for integers and single precision floating point numbers,
– added some synthesis scripts checking bambu quality of results w.r.t. 72 single precision libm functions (e.g., sqrtf, sinf, expf, tanhf, etc.),
– added spider tool to automatically generate latex tables from bambu synthesis results,
– moved all the dot generated files into directory HLS_output/dot/. Such files (scheduling, FSM, CFG, DFG, etc) are now generated when –print-dot option is passed,
– VIVADO is now the default backend flow for Zynq based devices.

Problems fixed:
– fixed all the Bison related compilation problems,
– fixed some problems with testbench generation of 2D arrays,
– fixed configuration scripts for manually installed Boost libraries; now, we need at least Boost 1.48.0,
– fixed some problems with C pretty-printing of the internal IR,
– fixed some ISE/Vivado synthesis problems when System Verilog based model are used,
– fixed some problems with –soft-float based synthesis,
– fixed RTL-backend configuration scripts looking for tools (e.g., ISE, Vivado, Quartus and Diamond) already installed,
– fixed some problems with real-to-int and int-to-real conversions, added some explicit tests to the panda regressions.

For any information or bug report, please write to panda-info@elet.polimi.it