All posts by Fabrizio Ferrandi

Fabrizio Ferrandi received his Laurea (cum laude) in Electronic Engineering in 1992 and the Ph.D. degree in Information and Automation Engineering (Computer Engineering) from the Politecnico di Milano, Italy, in 1997. He joined the faculty of Politecnico di Milano in 1999 as "Ricercatore" and later in 2002 as Associate Professor with the Dipartimento di Elettronica, Informazione e Bioingegneria. He has published over 100 papers. His research interests include synthesis, verification, simulation and testing of digital circuits and systems. Fabrizio Ferrandi is a Member of IEEE, of the IEEE Computer Society, of the Test Technology Technical Committee and of European Design and Automation Association – EDAA. Some references to my publications: DBLP Google Scholar

PandA 0.9.5 released

New features introduced:
– Added support to GCC 6 and GCC 7 (GCC 4.9 is still the preferred GCC compiler).
– Added support to bitfields.
– Added support for pointers and memory operations to the Discrepancy Analysis. Reference paper: Pietro Fezzardi and Fabrizio Ferrandi, “Automated bug detection for pointers and memory accesses in High-Level Synthesis compilers”, in 2016 26th International Conference on Field Programmable Logic and Applications (FPL), 2016.
– Added new option: –discrepancy-only=comma,separated,list,of,function,names
Restricts the discrepancy analysis only to the functions whose name is in the list passed as argument.
– Added new option: –discrepancy-permissive-ptrs
Do not trigger hard errors on pointer variables.
– added preliminary support to TASTE integration. Reference paper: M. Lattuada, F. Ferrandi, and M. Perrotin, “Computer Assisted Design and Integration of FPGA Accelerators in Aerospace Systems,” in Proceedings of the IEEE Aerospace Conference, 2016, pp. 1-11.
– Added support to OpenMP SIMD. Reference paper: M. Lattuada and F. Ferrandi, “Exploiting Vectorization in High Level Synthesis of Nested Irregular Loops,” Journal of Systems Architecture, vol. 75, pp. 1-14, 2017.
– Default golden reference is now input C code without any modification.
– The options –synthesize and –objectives have been removed. Now the same values passed with –objectives can
be directly passed through the option –evaluation.
– Improved the precision and the effectiveness of the Bit Value analysis and optimizations.
– Improved detection of irreducible loops.
– Improved CSE.
– Added a frontend transformations that merge some operations into FPGA LUTs.
– Now frontend explicitly introduces function calls to softfloat functions.
– Added support to block RAM with latency = 3 (–high-latency=4).
– Added bambu option –fsm-encoding=[auto,one-hot,binary].
– Added a new option: –disable-reg-init-value
Used to remove the INIT value from registers (useful for ASIC designs)
– Improved mapping of multiplications on DSPs.
– Added a GCC plugin to apply the whole program optimization starting from the topfname function instead of main function (currently only GCC 4.9 is supported).
– Added further integer division algorithms:
– non-restoring division with unrolling factor equal to 1 (–hls-div=nr1) which becomes the default division algorithm.
– non-restoring division with unrolling factor equal to 2 (–hls-div=nr2)
– align divisor shift dividend method (–hls-div=as)
– Added a specialization of the integer division working with 64bits dividend and 32bits divisor.
– Single precision floating point faithfully rounded expf and logf functions implemented following the HOTBM method published by
– Jeremie Detrey and Florent de Dinechin, “Parameterized floating-point logarithm and exponential functions for FPGAs”, Microprocessors and Microsystems, vol.31,n.8, 2007, pp.537-545.
The code has been exhaustively tested and it supports subnormals.
– Single precision floating point faithfully rounded sin, cos, sincos and tan functions implemented following the HOTBM method published by
– Jeremie Detrey and Florent de Dinechin, “Floating-point Trigonometric Functions for FPGAs” FPL 2007.
The code has been exhaustively tested and it supports subnormals.
– Single precision floating point faithfully rounded sqrt function implemented following the method published by
– Florent de Dinechin, Mioara Joldes, Bogdan Pasca, Guillaume Revy: Multiplicative Square Root Algorithms for FPGAs. FPL 2010: 574-577
The code has been exhaustively tested and it supports subnormals.
– Implemented the port swapping algorithm as described in the following paper:
– Hao Cong, Song Chen and T. Yoshimura, “Port assignment for interconnect reduction in high-level synthesis,” Proceedings of Technical Program of 2012 VLSI Design, Automation and Test, Hsinchu, 2012, pp. 1-4.
– Improved support to structs passed by copy.
– Improved ROM identification.
– Added a new option: –rom-duplication
Assume that read-only memories can be duplicated in case timing requires.
– Improved memory initialization.
– Added some transformations that lowered some memcpy and memset call to simple instructions.
– Improved softfloat functions for basic single and double precisions operations: sum, sub, mul and division.
Now addition and subtraction operations correctly manage operand equal to +0 and -0.
– Added three options to control which softfloat and libm libraries are used: –softfloat-subnormal, –libm-std-rounding and –soft-fp.
– Fixed builtin isnanf.
– Added double precision implementation of libm round function.
– Added __builtin_lrint, __builtin_llrint, __builtin_nearbyint to libm library.
– Fixed and improved tgamma and tgammaf function.
– Added support to parallel compilation of bambu libraries.
– Added support to the automatic configuration of newer releases of Quartus for IntelFPGAs.
– Improved verilator detection.
– Improved libicu detection.
– Improved boost filesystem macro.
– Fixed problems due to -m32 under arch linux.
– Fixed compilation problems with glpk and ubuntu 14.04.
– Fixed a problem with long double. They now have the same size of double.
– Added support to Mentor Visualizer.
– Improved components characterization and timing models.
– Extended support to VHDL.
– Now VHDL modelsim simulation uses 2008 standard.
– Extended set of synthesis scripts and synthesis results.
– Improved area reporting for Virtex4 devices.
– Improved characterization of asynchronous RAMs.
– Fixed extraction of slack delay from ISE trce and Lattice reports.
– Fixed yosys backend wr.r.t the newer Vivado releases.
– Added SLICES to the set of data collected by characterization.
– Extended set of regression tests.

Quality of results of this release on different target FPGAs could be found at:
CHStone QoR.
libm QoR.
Basic FP operations QoR.

For any information or bug report, please write to or visit the google group page.


2 minutes for a pitch at H2020 Info Day #ICT5 #DSMeu #UE #PandA4Design

PandA 0.9.3 released

New features introduced:
– general improvement of performances of generated circuits
– added full support to GCC 4.9 family which is now the default
– improved retrieving of GCC alias analysis information
– added first version of VHDL backend
– added support to CycloneV
– added support to Artix7
– extended support to Virtex7 boards family
– added option –top-rtldesign-name that controls which is the function to be synthesized by the RTL backed
– it is now possible to write the testbench in C instead of using the xml file
– added a first experimental backend to yosys (yosys link )
– added examples/crc_yosys which tests yosys backend and C based testbenches
– improved Verilog testbench generation: it is now fully compliant with cycle based simulators (e.g., VERILATOR)
– added option –backend-script-extensions to pass further constraints to the RTL synthesis (e.g., pin assignment)
– added examples/VGA showing how to integrate existing HDL based IPs in a real FPGA design
– added scripts and results for CHStone synthesis of Lattice based designs
– improved support of complex numbers
– single precision soft-float functions redesigned: now –soft-float is the default and –flopoco becomes optional
– single precision floating point division implemented exploiting Goldshmidt algorithm
– improved synthesis of libm functions
– improved libm regression test
– improved architectural timing model
– improved graphviz representation of FSMs: timing information has been added
– added option –post-rescheduling to further improve the resource usage
– parameter registering is now performed and it can be controlled by using option –registered-inputs
– added a full implementation of Bit Value analysis and coupled with Value Range analysis performed by GCC
– added option –experimental-setup to control bambu defaults:
* BAMBU-PERFORMANCE-MP – multi-port performance oriented setup
* BAMBU-PERFORMANCE – single port performance oriented setup
* BAMBU-AREA-MP – multi-port area oriented setup
* BAMBU-AREA – single-port area oriented setup
* BAMBU – no specific optimizations enabled
– improved code speculation
– improved memory localization
– added option –do-not-expose-globals making possible localization of globals, as it is similarly done by some commercial tools
– added support of high latency memories and of distributed memories: zero, one and two delays memories are supported
– added option –aligned-access to drive the memory allocation towards more simple block RAM models: it can be used under some restricted assumptions (e.g., no vectorization and no structs used)
– ported the GCC algorithm which rewrites a division by a constant in adds and shifts
– added option –hls-div that maps integer divisions and modulus on a C based implementation of the Newton-Raphson algorithm
– improved technology libraries management:
* technology libraries and contraints are now managed in a independent way
* multiple technology libraries can be provided to the tool at the same time
– improved and parallelized PandA test regression infrastructure
– added support to Centos7, fedora 21, Ubuntu 14.04 and Ubuntu 14.10 distributions
– complete refactoring of output messages

Problems fixed:
– fixed problem related to Bison 2.7
– fixed reinstallation of PandA in a different folder
– fixed installation problems on systems where boost and gcc are not installed in default locations
– removed some implicit conversions from generated verilog circuits

For any information or bug report, please write to or to

PandA 0.9.2 released

PandA 0.9.2 new features introduced:
– added an initial support to GCC 4.9.0,
– stable support to GCC versions: v4.5, v4.6, v4.7 (default) and v4.8,
– added an experimental support to Verilator simulator,
– new dataflow dependency analysis for LOADs and STOREs; we now use GCC alias analysis to see if a LOAD and STORE pair or a STORE and STORE pair may conflict,
– added a frontend step that optimizes PHI nodes,
– added a frontend step that performs conditionally if conversions,
– added a frontend step that performs simple code motions,
– added a frontend step that converts if chains in a single multi-if control construct,
– added a frontend step that simplifies short circuits based control constructs,
– added a proxy-based approach to the LOADs/STOREs of statically resolved pointers,
– improved EBR inference for Lattice based devices,
– now, memory models are different for Lattice, Altera, Virtex5 and Virtex6-7 based devices,
– updated FloPoCo to a more recent version,
– now, register allocation maps storage values on registers without write enable when possible,
– added support to CentOS/Scientific Linux distributions,
– added support to ArchLinux distribution,
– added support to Ubuntu 13.10 distribution,
– now, testbenches accept a user defined error for float based computations; the error is specified in ULPs units; a Unit in the Last Place is the spacing between floating-point numbers,
– improved architectural timing model,
– added a very simple symbolic estimator of number of cycles taken by a function, it mainly covers function without loops and without unbounded operations,
– general refactoring of automatic HLS testbench generation,
– added support to libm function lceil and lceilf,
– added skip-pipe-parameter option to bambu; it is is used to select a faster pipelined unit (xilinx devices have the default equal to 1 while lattice and altera devices have the default equal to 0),
– improved memory allocation when byte-enabled write memories are not needed,
– added support to variable lenght arrays,
– added support to memalign builtin,
– added EXT_PIPELINED_BRAM memory allocation policy, bambu with this option assumes that a block ram memory is allocated outside the core (LOAD with II=1 and latency=2 and STORE with latency=1),
– added 2D matrix multiplication examples for integers and single precision floating point numbers,
– added some synthesis scripts checking bambu quality of results w.r.t. 72 single precision libm functions (e.g., sqrtf, sinf, expf, tanhf, etc.),
– added spider tool to automatically generate latex tables from bambu synthesis results,
– moved all the dot generated files into directory HLS_output/dot/. Such files (scheduling, FSM, CFG, DFG, etc) are now generated when –print-dot option is passed,
– VIVADO is now the default backend flow for Zynq based devices.

Problems fixed:
– fixed all the Bison related compilation problems,
– fixed some problems with testbench generation of 2D arrays,
– fixed configuration scripts for manually installed Boost libraries; now, we need at least Boost 1.48.0,
– fixed some problems with C pretty-printing of the internal IR,
– fixed some ISE/Vivado synthesis problems when System Verilog based model are used,
– fixed some problems with –soft-float based synthesis,
– fixed RTL-backend configuration scripts looking for tools (e.g., ISE, Vivado, Quartus and Diamond) already installed,
– fixed some problems with real-to-int and int-to-real conversions, added some explicit tests to the panda regressions.

For any information or bug report, please write to

PandA 0.9.1 released

PandA 0.9.1 new features introduced:
– complete support of CHSTone benchmarks synthesis and verification (,
– better support of multi-ported memories,
– local memory exploitation,
– read-only-memory exploitation,
– support of multi-bus for parallel memory accesses,
– support of unaligned memory accesses,
– better support of pipelined resources,
– improved module binding algorithms (e.g., slack-based module binding),
– support of resource constraints through user xml file,
– support of libc primitives: memcpy, memcmp, memset and memmove,
– better support of printf primitive for RTL debugging purposes,
– support of dynamic memory allocation,
– synthesis of libm builtin primitives such as sin, cos, acosh, etc,
– better integration with FloPoCo library (,
– soft-float based HW synthesis,
– support of Vivado Xilinx backend,
– support of Diamond Lattice backend,
– support of XSIM Xilinx simulator,
– synthesis and testbench generation of WISHBONE B4 Compliant Accelerators (see for details on the WISHBONE specification),
– synthesis of AXI4LITE Compliant Accelerators (experimental),
– inclusion of GCC regression tests to test HLS synthesis (tested HLS synthesis and RTL simulation),
– inclusion of libm regression tests to test HLS synthesis of libm (tested HLS synthesis and RTL simulation),
– support of multiple versions of GCC compiler: v4.5, v4.6 and v4.7.
– support of GCC vectorizing capability (experimental).

For any information or bug report, please write to