PandA Download

Latest stable release:

April 20th, 2016 – Release 0.9.4 – panda-0.9.4.tar.gz.

New features introduced:

  • added support to GCC 5 (GCC 4.9 is still the default and the preferred GCC version)
  • improved support to complex builtin data types
  • added an initial support to “Extended Asm – Assembler Instructions with C Expression Operands”. In particular, the asm instruction could be used to inline VERILOG/VHDL code in a C source description (done by extending the multiple assembler dialects feature in asm templates:
  • timing for combinational accelerator is now correctly estimated by the backend synthesis scripts (accelerator that takes a single cycle to complete).
  • added option –serialize-memory-accesses to remove any memory access parallelization. It is mainly useful for debugging purposes.
  • added option –distram-threshold to explicitly control the DISTRIBUTED/ASYNCHRONOUS RAMs inferencing.
  • refactored of simulation/evaluation options
  • added support to STRATIX V and STRATIX IV
  • added support to Virtex4
  • added support to C files for cosimulation
  • added support to out of context synthesis on Altera boards
  • now Lattice ECP3 is fully supported. In particular, the byte enabling feature required by some of the memories instantiated by bambu is implemented by exploiting Lattice PMI (Parameterizable Module Inferencing) library.
  • improved and extended the integration of existing IPs written in Verilog/VHDL.
  • added an example showing how asm could be inlined in the C source code: simple_asm
  • added an example, named file_simulate, showing how open, read, write and close could be used to verify a complex design with datasets coming from a file
  • added an example showing how python could be used to verify the correctness of the HLS process: python-bindings
  • added MachSuite (“MachSuite: Benchmarks for Accelerator Design and Customized Architectures.” – 2014 IEEE International Symposium on Workload Characterization.) to examples
  • added benchmarks of “A Survey and Evaluation of FPGA High-Level Synthesis Tools” – IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems to examples
  • added two examples showing how external IPs could be integrated in the HLS flow: IP_integration and led_example
  • improved the VGA example for DE1 and ported to Nexys4 Xilinx prototyping board
  • added two examples showing how it is possible to run two arcade games such as pong and breakout on a Nexys 4 prototyping board without using any processor. The two examples smoothly connect a low level controller for the VGA port plus some GPIO controllers with plain C code describing the game behavior.
  • added a tutorial describing how to use bambu in designing a simple example: led_example
  • refactoring of scripts for technology libraries characterization
  • improved regression scripts: now panda regression consists of about 250K tests
  • assertions check now have to be explicitly disabled also in release
  • done port is now registered whenever it is possible
  • added support to the synthesis of function pointers and to the inter-procedural resource sharing of functions (reference paper is: Marco Minutoli, Vito Giovanni Castellana, Antonino Tumeo, Fabrizio Ferrandi: Inter-procedural resource sharing in High Level Synthesis through function proxies. FPL 2015: 1-8)
  • added support to speculative SDC code scheduling (controlled through option –speculative-sdc-scheduling | reference paper is: Code Transformations Based on Speculative SDC Scheduling. ICCAD 2015: 71-77)
  • added two new experimental setups (BAMBU-BALANCED-MP and BAMBU-BALANCED) oriented to trade-off between area and performances; BAMBU_BALANCED-MP is the new default experimental setup
  • added a discrepancy analysis to verify the correctness of the generated code (controlled through option –discrepancy | reference paper is: Pietro Fezzardi, Michele Castellana, Fabrizio Ferrandi: Trace-based automated logical debugging for high-level synthesis generated circuits. ICCD 2015: 251-258)
  • added common subexpression elimination step
  • reset can now be active high or active low (controlled through option –reset-level)
  • added support to file IO libc functions: open, read, write and close.
  • added support to assert function.
  • added support to libc functions: stpcpy stpncpy strcasecmp strcasestr strcat strchr
    strchrnul strcmp strcpy strcspn strdup strlen strncasecmp
    strncat strncmp strncpy strndup strnlen strpbrk strrchr
    strsep strspn strstr strtok bzero bcopy mempcpy
    memchr memrchr rawmemchr index rindex
  • improved double precision soft-float library
  • added support to single and double precision complex division operations: __divsc3 __divdc3
  • added preliminary support to irreducible loops
  • changed the PandA hardware description license from GPL to LGPL

Previous releases:

March 24th, 2015 – Release 0.9.3 – panda-0.9.3.tar.gz.

PandA now requires GCC 4.6 or greater to be compiled.

New features introduced:

  • general improvement of performances of generated circuits
  • added full support to GCC 4.9 family which is now the default
  • improved retrieving of GCC alias analysis information
  • added first version of VHDL backend
  • added support to CycloneV
  • added support to Artix7
  • extended support to Virtex7 boards family
  • added option –top-rtldesign-name that controls which is the function to be synthesized by the RTL backed
  • it is now possible to write the testbench in C instead of using the xml file
  • added a first experimental backend to yosys (yosys link )
  • added examples/crc_yosys which tests yosys backend and C based testbenches
  • improved Verilog testbench generation: it is now fully compliant with cycle based simulators (e.g., VERILATOR)
  • added option –backend-script-extensions to pass further constraints to the RTL synthesis (e.g., pin assignment)
  • added examples/VGA showing how to integrate existing HDL based IPs in a real FPGA design
  • added scripts and results for CHStone synthesis of Lattice based designs
  • improved support of complex numbers
  • single precision soft-float functions redesigned: now –soft-float is the default and –flopoco becomes optional
  • single precision floating point division implemented exploiting Goldshmidt algorithm
  • improved synthesis of libm functions
  • improved libm regression test
  • improved architectural timing model
  • improved graphviz representation of FSMs: timing information has been added
  • added option –post-rescheduling to further improve the resource usage
  • parameter registering is now performed and it can be controlled by using option –registered-inputs
  • added a full implementation of Bit Value analysis and coupled with Value Range analysis performed by GCC
  • added option –experimental-setup to control bambu defaults:
    • BAMBU-PERFORMANCE-MP – multi-port performance oriented setup
    • BAMBU-PERFORMANCE – single port performance oriented setup
    • BAMBU-AREA-MP – multi-port area oriented setup
    • BAMBU-AREA – single-port area oriented setup
    • BAMBU – no specific optimizations enabled
  • improved code speculation
  • improved memory localization
  • added option –do-not-expose-globals making possible localization of globals, as it is similarly done by some commercial tools
  • added support of high latency memories and of distributed memories: zero, one and two delays memories are supported
  • added option –aligned-access to drive the memory allocation towards more simple block RAM models: it can be used under some restricted assumptions (e.g., no vectorization and no structs used)
  • ported the GCC algorithm which rewrites a division by a constant in adds and shifts
  • added option –hls-div that maps integer divisions and modulus on a C based implementation of the Newton-Raphson algorithm
  • improved technology libraries management:
    • technology libraries and contraints are now managed in a independent way
    • multiple technology libraries can be provided to the tool at the same time
  • improved and parallelized PandA test regression infrastructure
  • added support to Centos7, fedora 21, Ubuntu 14.04 and Ubuntu 14.10 distributions
  • complete refactoring of output messages

Problems fixed:

  • fixed problem related to Bison 2.7
  • fixed reinstallation of PandA in a different folder
  • fixed installation problems on systems where boost and gcc are not installed in default locations
  • removed some implicit conversions from generated verilog circuits

February 12th, 2014 – Release 0.9.2 – panda-0.9.2.tar.gz.

New features introduced:

  • added an initial support to GCC 4.9.0,
  • stable support to GCC versions: v4.5, v4.6, v4.7 (default) and v4.8,
  • added an experimental support to Verilator simulator,
  • new dataflow dependency analysis for LOADs and STOREs; we now use GCC alias analysis to see if a LOAD and STORE pair or a STORE and STORE pair may conflict,
  • added a frontend step that optimizes PHI nodes,
  • added a frontend step that performs conditionally if conversions,
  • added a frontend step that performs simple code motions,
  • added a frontend step that converts if chains in a single multi-if control construct,
  • added a frontend step that simplifies short circuits based control constructs,
  • added a proxy-based approach to the LOADs/STOREs of statically resolved pointers,
  • improved EBR inference for Lattice based devices,
  • now, memory models are different for Lattice, Altera, Virtex5 and Virtex6-7 based devices,
  • updated FloPoCo to a more recent version,
  • now, register allocation maps storage values on registers without write enable when possible,
  • added support to CentOS/Scientific Linux distributions,
  • added support to ArchLinux distribution,
  • added support to Ubuntu 13.10 distribution,
  • now, testbenches accept a user defined error for float based computations; the error is specified in ULPs units; a Unit in the Last Place is the spacing between floating-point numbers,
  • improved architectural timing model,
  • added a very simple symbolic estimator of number of cycles taken by a function, it mainly covers function without loops and without unbounded operations,
  • general refactoring of automatic HLS testbench generation,
  • added support to libm function lceil and lceilf,
  • added skip-pipe-parameter option to bambu; it is is used to select a faster pipelined unit (xilinx devices have the default equal to 1 while lattice and altera devices have the default equal to 0),
  • improved memory allocation when byte-enabled write memories are not needed,
  • added support to variable lenght arrays,
  • added support to memalign builtin,
  • added EXT_PIPELINED_BRAM memory allocation policy, bambu with this option assumes that a block ram memory is allocated outside the core (LOAD with II=1 and latency=2 and STORE with latency=1),
  • added 2D matrix multiplication examples for integers and single precision floating point numbers,
  • added some synthesis scripts checking bambu quality of results w.r.t. 72 single precision libm functions (e.g., sqrtf, sinf, expf, tanhf, etc.),
  • added spider tool to automatically generate latex tables from bambu synthesis results,
  • moved all the dot generated files into directory HLS_output/dot/. Such files (scheduling, FSM, CFG, DFG, etc) are now generated when –print-dot option is passed,
  • VIVADO is now the default backend flow for Zynq based devices.

Problems fixed:

  • fixed all the Bison related compilation problems,
  • fixed some problems with testbench generation of 2D arrays,
  • fixed configuration scripts for manually installed Boost libraries; now, we need at least Boost 1.48.0,
  • fixed some problems with C pretty-printing of the internal IR,
  • fixed some ISE/Vivado synthesis problems when System Verilog based model are used,
  • fixed some problems with –soft-float based synthesis,
  • fixed RTL-backend configuration scripts looking for tools (e.g., ISE, Vivado, Quartus and Diamond) already installed,
  • fixed some problems with real-to-int and int-to-real conversions, added some explicit tests to the panda regressions.

September 17th, 2013 – Release 0.9.1 – panda-0.9.1.tar.gz.

New features introduced:

  • complete support of CHSTone benchmarks synthesis and verification (link),
  • better support of multi-ported memories,
  • local memory exploitation,
  • read-only-memory exploitation,
  • support of multi-bus for parallel memory accesses,
  • support of unaligned memory accesses,
  • better support of pipelined resources,
  • improved module binding algorithms (e.g., slack-based module binding),
  • support of resource constraints through user xml file,
  • support of libc primitives: memcpy, memcmp, memset and memmove,
  • better support of printf primitive for RTL debugging purposes,
  • support of dynamic memory allocation,
  • synthesis of libm builtin primitives such as sin, cos, acosh, etc,
  • better integration with FloPoCo library (FloPoCo link),
  • soft-float based HW synthesis,
  • support of Vivado Xilinx backend,
  • support of Diamond Lattice backend,
  • support of XSIM Xilinx simulator,
  • synthesis and testbench generation of WISHBONE B4 Compliant Accelerators (see WB4 specs for details on the WISHBONE specification),
  • synthesis of AXI4LITE Compliant Accelerators (experimental),
  • inclusion of GCC regression tests to test HLS synthesis (tested HLS synthesis and RTL simulation),
  • inclusion of libm regression tests to test HLS synthesis of libm (tested HLS synthesis and RTL simulation),
  • support of multiple versions of GCC compiler: v4.5, v4.6 and v4.7.
  • support of GCC vectorizing capability (experimental).

March 21st, 2012 – Release 0.9.0 – Build 10722 – panda-0.9.0.tar.gz.

For any information or bug report, please write to

A framework for Hardware-Software Co-Design of Embedded Systems