Category Archives: Uncategorized

PandA 0.9.5 released

New features introduced:
– Added support to GCC 6 and GCC 7 (GCC 4.9 is still the preferred GCC compiler).
– Added support to bitfields.
– Added support for pointers and memory operations to the Discrepancy Analysis. Reference paper: Pietro Fezzardi and Fabrizio Ferrandi, “Automated bug detection for pointers and memory accesses in High-Level Synthesis compilers”, in 2016 26th International Conference on Field Programmable Logic and Applications (FPL), 2016.
– Added new option: –discrepancy-only=comma,separated,list,of,function,names
Restricts the discrepancy analysis only to the functions whose name is in the list passed as argument.
– Added new option: –discrepancy-permissive-ptrs
Do not trigger hard errors on pointer variables.
– added preliminary support to TASTE integration. Reference paper: M. Lattuada, F. Ferrandi, and M. Perrotin, “Computer Assisted Design and Integration of FPGA Accelerators in Aerospace Systems,” in Proceedings of the IEEE Aerospace Conference, 2016, pp. 1-11.
– Added support to OpenMP SIMD. Reference paper: M. Lattuada and F. Ferrandi, “Exploiting Vectorization in High Level Synthesis of Nested Irregular Loops,” Journal of Systems Architecture, vol. 75, pp. 1-14, 2017.
– Default golden reference is now input C code without any modification.
– The options –synthesize and –objectives have been removed. Now the same values passed with –objectives can
be directly passed through the option –evaluation.
– Improved the precision and the effectiveness of the Bit Value analysis and optimizations.
– Improved detection of irreducible loops.
– Improved CSE.
– Added a frontend transformations that merge some operations into FPGA LUTs.
– Now frontend explicitly introduces function calls to softfloat functions.
– Added support to block RAM with latency = 3 (–high-latency=4).
– Added bambu option –fsm-encoding=[auto,one-hot,binary].
– Added a new option: –disable-reg-init-value
Used to remove the INIT value from registers (useful for ASIC designs)
– Improved mapping of multiplications on DSPs.
– Added a GCC plugin to apply the whole program optimization starting from the topfname function instead of main function (currently only GCC 4.9 is supported).
– Added further integer division algorithms:
– non-restoring division with unrolling factor equal to 1 (–hls-div=nr1) which becomes the default division algorithm.
– non-restoring division with unrolling factor equal to 2 (–hls-div=nr2)
– align divisor shift dividend method (–hls-div=as)
– Added a specialization of the integer division working with 64bits dividend and 32bits divisor.
– Single precision floating point faithfully rounded expf and logf functions implemented following the HOTBM method published by
– Jeremie Detrey and Florent de Dinechin, “Parameterized floating-point logarithm and exponential functions for FPGAs”, Microprocessors and Microsystems, vol.31,n.8, 2007, pp.537-545.
The code has been exhaustively tested and it supports subnormals.
– Single precision floating point faithfully rounded sin, cos, sincos and tan functions implemented following the HOTBM method published by
– Jeremie Detrey and Florent de Dinechin, “Floating-point Trigonometric Functions for FPGAs” FPL 2007.
The code has been exhaustively tested and it supports subnormals.
– Single precision floating point faithfully rounded sqrt function implemented following the method published by
– Florent de Dinechin, Mioara Joldes, Bogdan Pasca, Guillaume Revy: Multiplicative Square Root Algorithms for FPGAs. FPL 2010: 574-577
The code has been exhaustively tested and it supports subnormals.
– Implemented the port swapping algorithm as described in the following paper:
– Hao Cong, Song Chen and T. Yoshimura, “Port assignment for interconnect reduction in high-level synthesis,” Proceedings of Technical Program of 2012 VLSI Design, Automation and Test, Hsinchu, 2012, pp. 1-4.
– Improved support to structs passed by copy.
– Improved ROM identification.
– Added a new option: –rom-duplication
Assume that read-only memories can be duplicated in case timing requires.
– Improved memory initialization.
– Added some transformations that lowered some memcpy and memset call to simple instructions.
– Improved softfloat functions for basic single and double precisions operations: sum, sub, mul and division.
Now addition and subtraction operations correctly manage operand equal to +0 and -0.
– Added three options to control which softfloat and libm libraries are used: –softfloat-subnormal, –libm-std-rounding and –soft-fp.
– Fixed builtin isnanf.
– Added double precision implementation of libm round function.
– Added __builtin_lrint, __builtin_llrint, __builtin_nearbyint to libm library.
– Fixed and improved tgamma and tgammaf function.
– Added support to parallel compilation of bambu libraries.
– Added support to the automatic configuration of newer releases of Quartus for IntelFPGAs.
– Improved verilator detection.
– Improved libicu detection.
– Improved boost filesystem macro.
– Fixed problems due to -m32 under arch linux.
– Fixed compilation problems with glpk and ubuntu 14.04.
– Fixed a problem with long double. They now have the same size of double.
– Added support to Mentor Visualizer.
– Improved components characterization and timing models.
– Extended support to VHDL.
– Now VHDL modelsim simulation uses 2008 standard.
– Extended set of synthesis scripts and synthesis results.
– Improved area reporting for Virtex4 devices.
– Improved characterization of asynchronous RAMs.
– Fixed extraction of slack delay from ISE trce and Lattice reports.
– Fixed yosys backend wr.r.t the newer Vivado releases.
– Added SLICES to the set of data collected by characterization.
– Extended set of regression tests.

Quality of results of this release on different target FPGAs could be found at:
CHStone QoR.
libm QoR.
Basic FP operations QoR.

For any information or bug report, please write to or visit the google group page.


2 minutes for a pitch at H2020 Info Day #ICT5 #DSMeu #UE #PandA4Design

PandA 0.9.3 released

New features introduced:
– general improvement of performances of generated circuits
– added full support to GCC 4.9 family which is now the default
– improved retrieving of GCC alias analysis information
– added first version of VHDL backend
– added support to CycloneV
– added support to Artix7
– extended support to Virtex7 boards family
– added option –top-rtldesign-name that controls which is the function to be synthesized by the RTL backed
– it is now possible to write the testbench in C instead of using the xml file
– added a first experimental backend to yosys (yosys link )
– added examples/crc_yosys which tests yosys backend and C based testbenches
– improved Verilog testbench generation: it is now fully compliant with cycle based simulators (e.g., VERILATOR)
– added option –backend-script-extensions to pass further constraints to the RTL synthesis (e.g., pin assignment)
– added examples/VGA showing how to integrate existing HDL based IPs in a real FPGA design
– added scripts and results for CHStone synthesis of Lattice based designs
– improved support of complex numbers
– single precision soft-float functions redesigned: now –soft-float is the default and –flopoco becomes optional
– single precision floating point division implemented exploiting Goldshmidt algorithm
– improved synthesis of libm functions
– improved libm regression test
– improved architectural timing model
– improved graphviz representation of FSMs: timing information has been added
– added option –post-rescheduling to further improve the resource usage
– parameter registering is now performed and it can be controlled by using option –registered-inputs
– added a full implementation of Bit Value analysis and coupled with Value Range analysis performed by GCC
– added option –experimental-setup to control bambu defaults:
* BAMBU-PERFORMANCE-MP – multi-port performance oriented setup
* BAMBU-PERFORMANCE – single port performance oriented setup
* BAMBU-AREA-MP – multi-port area oriented setup
* BAMBU-AREA – single-port area oriented setup
* BAMBU – no specific optimizations enabled
– improved code speculation
– improved memory localization
– added option –do-not-expose-globals making possible localization of globals, as it is similarly done by some commercial tools
– added support of high latency memories and of distributed memories: zero, one and two delays memories are supported
– added option –aligned-access to drive the memory allocation towards more simple block RAM models: it can be used under some restricted assumptions (e.g., no vectorization and no structs used)
– ported the GCC algorithm which rewrites a division by a constant in adds and shifts
– added option –hls-div that maps integer divisions and modulus on a C based implementation of the Newton-Raphson algorithm
– improved technology libraries management:
* technology libraries and contraints are now managed in a independent way
* multiple technology libraries can be provided to the tool at the same time
– improved and parallelized PandA test regression infrastructure
– added support to Centos7, fedora 21, Ubuntu 14.04 and Ubuntu 14.10 distributions
– complete refactoring of output messages

Problems fixed:
– fixed problem related to Bison 2.7
– fixed reinstallation of PandA in a different folder
– fixed installation problems on systems where boost and gcc are not installed in default locations
– removed some implicit conversions from generated verilog circuits

For any information or bug report, please write to or to

PandA 0.9.2 released

PandA 0.9.2 new features introduced:
– added an initial support to GCC 4.9.0,
– stable support to GCC versions: v4.5, v4.6, v4.7 (default) and v4.8,
– added an experimental support to Verilator simulator,
– new dataflow dependency analysis for LOADs and STOREs; we now use GCC alias analysis to see if a LOAD and STORE pair or a STORE and STORE pair may conflict,
– added a frontend step that optimizes PHI nodes,
– added a frontend step that performs conditionally if conversions,
– added a frontend step that performs simple code motions,
– added a frontend step that converts if chains in a single multi-if control construct,
– added a frontend step that simplifies short circuits based control constructs,
– added a proxy-based approach to the LOADs/STOREs of statically resolved pointers,
– improved EBR inference for Lattice based devices,
– now, memory models are different for Lattice, Altera, Virtex5 and Virtex6-7 based devices,
– updated FloPoCo to a more recent version,
– now, register allocation maps storage values on registers without write enable when possible,
– added support to CentOS/Scientific Linux distributions,
– added support to ArchLinux distribution,
– added support to Ubuntu 13.10 distribution,
– now, testbenches accept a user defined error for float based computations; the error is specified in ULPs units; a Unit in the Last Place is the spacing between floating-point numbers,
– improved architectural timing model,
– added a very simple symbolic estimator of number of cycles taken by a function, it mainly covers function without loops and without unbounded operations,
– general refactoring of automatic HLS testbench generation,
– added support to libm function lceil and lceilf,
– added skip-pipe-parameter option to bambu; it is is used to select a faster pipelined unit (xilinx devices have the default equal to 1 while lattice and altera devices have the default equal to 0),
– improved memory allocation when byte-enabled write memories are not needed,
– added support to variable lenght arrays,
– added support to memalign builtin,
– added EXT_PIPELINED_BRAM memory allocation policy, bambu with this option assumes that a block ram memory is allocated outside the core (LOAD with II=1 and latency=2 and STORE with latency=1),
– added 2D matrix multiplication examples for integers and single precision floating point numbers,
– added some synthesis scripts checking bambu quality of results w.r.t. 72 single precision libm functions (e.g., sqrtf, sinf, expf, tanhf, etc.),
– added spider tool to automatically generate latex tables from bambu synthesis results,
– moved all the dot generated files into directory HLS_output/dot/. Such files (scheduling, FSM, CFG, DFG, etc) are now generated when –print-dot option is passed,
– VIVADO is now the default backend flow for Zynq based devices.

Problems fixed:
– fixed all the Bison related compilation problems,
– fixed some problems with testbench generation of 2D arrays,
– fixed configuration scripts for manually installed Boost libraries; now, we need at least Boost 1.48.0,
– fixed some problems with C pretty-printing of the internal IR,
– fixed some ISE/Vivado synthesis problems when System Verilog based model are used,
– fixed some problems with –soft-float based synthesis,
– fixed RTL-backend configuration scripts looking for tools (e.g., ISE, Vivado, Quartus and Diamond) already installed,
– fixed some problems with real-to-int and int-to-real conversions, added some explicit tests to the panda regressions.

For any information or bug report, please write to

PandA 0.9.1 released

PandA 0.9.1 new features introduced:
– complete support of CHSTone benchmarks synthesis and verification (,
– better support of multi-ported memories,
– local memory exploitation,
– read-only-memory exploitation,
– support of multi-bus for parallel memory accesses,
– support of unaligned memory accesses,
– better support of pipelined resources,
– improved module binding algorithms (e.g., slack-based module binding),
– support of resource constraints through user xml file,
– support of libc primitives: memcpy, memcmp, memset and memmove,
– better support of printf primitive for RTL debugging purposes,
– support of dynamic memory allocation,
– synthesis of libm builtin primitives such as sin, cos, acosh, etc,
– better integration with FloPoCo library (,
– soft-float based HW synthesis,
– support of Vivado Xilinx backend,
– support of Diamond Lattice backend,
– support of XSIM Xilinx simulator,
– synthesis and testbench generation of WISHBONE B4 Compliant Accelerators (see for details on the WISHBONE specification),
– synthesis of AXI4LITE Compliant Accelerators (experimental),
– inclusion of GCC regression tests to test HLS synthesis (tested HLS synthesis and RTL simulation),
– inclusion of libm regression tests to test HLS synthesis of libm (tested HLS synthesis and RTL simulation),
– support of multiple versions of GCC compiler: v4.5, v4.6 and v4.7.
– support of GCC vectorizing capability (experimental).

For any information or bug report, please write to