Bambu: options

********************************************************************************
                    ____                  _
                   | __ )  __ _ _ __ ___ | |_   _   _
                   |  _ \ / _` | '_ ` _ \| '_ \| | | |
                   | |_) | (_| | | | | | | |_) | |_| |
                   |____/ \__,_|_| |_| |_|_.__/ \__,_|

********************************************************************************
                         High-Level Synthesis Tool

                         Politecnico di Milano - DEIB
                          System Architectures Group
********************************************************************************
                Copyright (C) 2004-2020 Politecnico di Milano
    Version: PandA 0.9.6 - Revision 5e5e306b86383a7d85274d64977a3d71fdcff4fe

Usage:
       bambu [Options] <source_file> [<constraints_file>] [<technology_file>]

Options:

  General options:

    --help, -h
        Display this usage information.

    --version, -V
        Display the version of the program.


  Output options:

    --verbosity, -v <level>
        Set the output verbosity level
        Possible values for <level>:
            0 - NONE
            1 - MINIMUM
            2 - VERBOSE
            3 - PEDANTIC
            4 - VERY PEDANTIC
        (default = 1)

    --no-clean
        Do not remove temporary files.

    --benchmark-name=<name>
        Set the name of the current benchmark for data collection.
        Mainly useful for data collection from extensive regression tests.

    --configuration-name=<name>
        Set the name of the current tool configuration for data collection.
        Mainly useful for data collection from extensive regression tests.

    --benchmark-fake-parameters
        Set the parameters string for data collection. The parameters in the
        string are not actually used, but they are used for data collection in
        extensive regression tests.

    --output-temporary-directory=<path>
        Set the directory where temporary files are saved.
        Default is 'panda-temp'

    --print-dot
        Dump to file several different graphs used in the IR of the tool.
        The graphs are saved in .dot files, in graphviz format

    --pretty-print=<file>
        C-based pretty print of the internal IR.

    --writer,-w<language>
        Output RTL language:
            V - Verilog (default)
            H - VHDL

    --no-mixed-design
        Avoid mixed design.

    --generate-tb=<file>
        Generate testbench for the input values defined in the specified XML
        file.

    --top-fname=<fun_name>
        Define the top function to be synthesized.

    --top-rtldesign-name=<top_name>
        Define the top module name for the RTL backend.

    --file-input-data=<file_list>
        A comma-separated list of input files used by the C specification.

    --C-no-parse=<file>
        Specify a comma-separated list of C files used only during the
        co-simulation phase.


  GCC options:

    --compiler=<compiler_version>
        Specify which compiler is used.
        Possible values for <compiler_version> are:
            I386_GCC48
            I386_GCC49
            I386_GCC5
            I386_GCC6
            I386_GCC7
            I386_GCC8
            I386_CLANG4
            I386_CLANG5
            I386_CLANG6
            I386_CLANG7

    -O<level>
        Enable a specific optimization level. Possible values are the usual
        optimization flags accepted by compilers, plus some others:
        -O0,-O1,-O2,-O3,-Os,-O4,-O5.

    -f<option>
        Enable or disable a GCC optimization option. All the -f or -fno options
        are supported. In particular, -ftree-vectorize option triggers the
        high-level synthesis of vectorized operations.

    -I<path>
        Specify a path where headers are searched for.

    -W<warning>
        Specify a warning option passed to GCC. All the -W options available in
        GCC are supported.

    -E
        Enable preprocessing mode of GCC.

    --std=<standard>
        Assume that the input sources are for <standard>. All
        the --std options available in GCC are supported.

    -D<name>
        Predefine name as a macro, with definition 1.

    -D<name=definition>
        Tokenize <definition> and process as if it appeared as a #define directive.

    -U<name>
        Remove existing definition for macro <name>.

    --param <name>=<value>
        Set the amount <value> for the GCC parameter <name> that could be used for
        some optimizations.

    -l<library>
        Search the library named <library> when linking.

    -L<dir>
        Add directory <dir> to the list of directories to be searched for -l.

    --use-raw
        Specify that input file is already a raw file and not a source file.

    -m<machine-option>
        Specify machine dependend options (currently not used).

    --Include-sysdir
        Return the system include directory used by the wrapped GCC compiler.

    --gcc-config
        Return the GCC configuration.

    --extra-gcc-options
        Specify custom extra options to the compiler.


  Target:

    --target-file=file, -b<file>
        Specify an XML description of the target device.

    --generate-interface=<type>
        Wrap the top level module with an external interface.
        Possible values for <type> and related interfaces:
            MINIMAL  -  (minimal interface - default)
            INFER    -  (top function is built with an hardware interface inferred from the pragmas or from the top function signature)
            WB4      -  (WishBone 4 interface)


  Scheduling:

    --parametric-list-based[=<type>]
        Perform priority list-based scheduling. This is the default scheduling algorithm
        in bambu. The optional <type> argument can be used to set options for
        list-based scheduling as follows:
            0 - Dynamic mobility (default)
            1 - Static mobility
            2 - Priority-fixed mobility

    --post-rescheduling
        Perform post rescheduling to better distribute resources.

    --speculative-sdc-scheduling,-s
        Perform scheduling by using speculative sdc.

    --pipelining,-p
        Perform functional pipelining starting from the top function.

    --fixed-scheduling=<file>
        Provide scheduling as an XML file.

    --no-chaining
        Disable chaining optimization.


  Binding:

    --register-allocation=<type>
        Set the algorithm used for register allocation. Possible values for the
        <type> argument are the following:
            WEIGHTED_TS        - solve the weighted clique covering problem by
                                 exploiting the Tseng&Siewiorek heuristics
                                 (default)
            WEIGHTED_COLORING   - use weighted coloring algorithm
            COLORING            - use simple coloring algorithm
            CHORDAL_COLORING    - use chordal coloring algorithm
            BIPARTITE_MATCHING  - use bipartite matching algorithm
            TTT_CLIQUE_COVERING - use a weighted clique covering algorithm
            UNIQUE_BINDING      - unique binding algorithm

    --module-binding=<type>
        Set the algorithm used for module binding. Possible values for the
        <type> argument are one the following:
            WEIGHTED_TS        - solve the weighted clique covering problem by
                                 exploiting the Tseng&Siewiorek heuristics
                                 (default)
            WEIGHTED_COLORING  - solve the weighted clique covering problem
                                 performing a coloring on the conflict graph
            COLORING           - solve the unweighted clique covering problem
                                 performing a coloring on the conflict graph
            TTT_FAST           - use Tomita, A. Tanaka, H. Takahashi maxima
                                 weighted cliques heuristic to solve the clique
                                 covering problem
            TTT_FAST2          - use Tomita, A. Tanaka, H. Takahashi maximal
                                 weighted cliques heuristic to incrementally
                                 solve the clique covering problem
            TTT_FULL           - use Tomita, A. Tanaka, H. Takahashi maximal
                                 weighted cliques algorithm to solve the clique
                                 covering problem
            TTT_FULL2          - use Tomita, A. Tanaka, H. Takahashi maximal
                                 weighted cliques algorithm to incrementally
                                 solve the clique covering problem
            TS                 - solve the unweighted clique covering problem
                                 by exploiting the Tseng&Siewiorek heuristic
            BIPARTITE_MATCHING - solve the weighted clique covering problem
                                 exploiting the bipartite matching approach
            UNIQUE             - use a 1-to-1 binding algorithm


  Memory allocation:

    --memory-allocation=<type>
        Set the algorithm used for memory allocation. Possible values for the
        type argument are the following:
            DOMINATOR          - all local variables, static variables and
                                 strings are allocated on BRAMs (default)
            XML_SPECIFICATION  - import the memory allocation from an XML
                                 specification

    --xml-memory-allocation=<xml_file_name>
        Specify the file where the XML configuration has been defined.

    --memory-allocation-policy=<type>
        Set the policy for memory allocation. Possible values for the <type>
        argument are the following:
            ALL_BRAM           - all objects that need to be stored in memory
                                 are allocated on BRAMs (default)
            LSS                - all local variables, static variables and
                                 strings are allocated on BRAMs
            GSS                - all global variables, static variables and
                                 strings are allocated on BRAMs
            NO_BRAM            - all objects that need to be stored in memory
                                 are allocated on an external memory
            EXT_PIPELINED_BRAM - all objects that need to be stored in memory
                                 are allocated on an external pipelined memory

   --base-address=address
        Define the starting address for objects allocated externally to the top
        module.

   --initial-internal-address=address
        Define the starting address for the objects allocated internally to the
        top module.

   --channels-type=<type>
        Set the type of memory connections.
        Possible values for <type> are:
            MEM_ACC_11 - the accesses to the memory have a single direct
                         connection or a single indirect connection (default)
            MEM_ACC_N1 - the accesses to the memory have n parallel direct
                         connections or a single indirect connection
            MEM_ACC_NN - the accesses to the memory have n parallel direct
                         connections or n parallel indirect connections

   --channels-number=<n>
        Define the number of parallel direct or indirect accesses.

   --memory-ctrl-type=type
        Define which type of memory controller is used. Possible values for the
        <type> argument are the following:
            D00 - no extra delay (default)
            D10 - 1 clock cycle extra-delay for LOAD, 0 for STORE
            D11 - 1 clock cycle extra-delay for LOAD, 1 for STORE
            D21 - 2 clock cycle extra-delay for LOAD, 1 for STORE

    --memory-banks-number=<n>
        Define the number of memory banks.

    --sparse-memory[=on/off]
        Control how the memory allocation happens.
            on - allocate the data in addresses which reduce the decoding logic (default)
           off - allocate the data in a contiguous addresses.

    --do-not-use-asynchronous-memories
        Do not add asynchronous memories to the possible set of memories used
        by bambu during the memory allocation step.

    --distram-threshold=value
        Define the threshold in bitsize used to infer DISTRIBUTED/ASYNCHRONOUS RAMs (default 256).

    --serialize-memory-accesses
        Serialize the memory accesses using the GCC virtual use-def chains
        without taking into account any alias analysis information.

    --unaligned-access
        Use only memories supporting unaligned accesses.

    --aligned-access
        Assume that all accesses are aligned and so only memories supporting aligned

        accesses are used.

    --do-not-chain-memories
        When enabled LOADs and STOREs will not be chained with other
        operations.

    --rom-duplication
        Assume that read-only memories can be duplicated in case timing requires.

    --bram-high-latency=[3,4]
        Assume a 'high latency bram'-'faster clock frequency' block RAM memory
        based architectures:
        3 => LOAD(II=1,L=3) STORE(1).
        4 => LOAD(II=1,L=4) STORE(II=1,L=2).

    --mem-delay-read=value
        Define the external memory latency when LOAD are performed (default 2).

    --mem-delay-write=value
        Define the external memory latency when LOAD are performed (default 1).

    --do-not-expose-globals
        All global variables are considered local to the compilation units.

    --data-bus-bitsize=<bitsize>
        Set the bitsize of the external data bus.

    --addr-bus-bitsize=<bitsize>
        Set the bitsize of the external address bus.


  Evaluation of HLS results:

    --simulate
        Simulate the RTL implementation.

    --mentor-visualizer
        Simulate the RTL implementation and then open Mentor Visualizer.

    --simulator=<type>
        Specify the simulator used in generated simulation scripts:
            MODELSIM - Mentor Modelsim
            XSIM - Xilinx XSim
            ISIM - Xilinx iSim
            ICARUS - Verilog Icarus simulator
            VERILATOR - Verilator simulator

    --max-sim-cycles=<cycles>
        Specify the maximum number of cycles a HDL simulation may run.
        (default 20000000).

    --accept-nonzero-return
        Do not assume that application main must return 0.

    --generate-vcd
        Enable .vcd output file generation for waveform visualization (requires
        testbench generation).

    --evaluation[=type]
        Perform evaluation of the results.
        The value of 'type' selects the objectives to be evaluated
        If nothing is specified all the following are evaluated
        The 'type' argument can be a string containing any of the following
        strings, separated with commas, without spaces:
            AREA            - Area usage
            AREAxTIME       - Area x Latency product
            TIME            - Latency for the average computation
            TOTAL_TIME      - Latency for the whole computation
            CYCLES          - n. of cycles for the average computation
            TOTAL_CYCLES    - n. of cycles for the whole computation
            BRAMS           - number of BRAMs
            CLOCK_SLACK     - Slack between actual and required clock period
            DSPS            - number of DSPs
            FREQUENCY       - Maximum target frequency
            PERIOD          - Actual clock period
            REGISTERS       - number of registers


  RTL synthesis:

    --clock-name=id
        Specify the clock signal name of the top interface (default = clock).

    --reset-name=id
        Specify the reset signal name of the top interface (default = reset).

    --start-name=id
        Specify the start signal name of the top interface (default = start_port).

    --done-name=id
        Specify the done signal name of the top interface (default = done_port).

    --clock-period=value
        Specify the period of the clock signal (default = 10ns).

    --backend-script-extensions=file
        Specify a file that will be included in the backend specific synthesis
        scripts.

    --backend-sdc-extensions=file
        Specify a file that will be included in the Synopsys Design Constraints
        file (SDC).

    --VHDL-library=libraryname
        Specify the library in which the VHDL generated files are compiled.

    --device-name=value
        Specify the name of the device. Three different cases are foreseen:
            - Xilinx:  a comma separated string specifying device, speed grade
                       and package (e.g.,: "xc7z020,-1,clg484,VVD")
            - Altera:  a string defining the device string (e.g. EP2C70F896C6)
            - Lattice: a string defining the device string (e.g.
                       LFE335EA8FN484C)

    --power-optimization
        Enable Xilinx power based optimization (default no).

    --no-iob
        Disconnect primary ports from the IOB (the default is to connect
        primary input and outpur ports to IOBs).

    --soft-float
        Enable the soft-based implementation of floating-point operations.
        Bambu uses as default a faithfully rounded version of softfloat with rounding mode equal to round to nearest even.

        This is the default for bambu.

    --flopoco
        Enable the flopoco-based implementation of floating-point operations.

    --softfloat-subnormal
        Enable the soft-based implementation of floating-point operations with subnormals support.

    --libm-std-rounding
        Enable the use of classical libm. This library combines a customized version of glibc, newlib and musl libm implementations into a single libm library synthetizable with bambu.
        Without this option, Bambu uses as default a faithfully rounded version of libm.

    --soft-fp
        Enable the use of soft_fp GCC library instead of bambu customized version of John R. Hauser softfloat library.

    --max-ulp
        Define the maximal ULP (Unit in the last place, i.e., is the spacing
        between floating-point numbers) accepted.

    --hls-div=<method>
        Perform the high-level synthesis of integer division and modulo
        operations starting from a C library based implementation or a HDL component:
             none  - use a HDL based pipelined restoring division
             nr1   - use a C-based non-restoring division with unrolling factor equal to 1 (default)
             nr2   - use a C-based non-restoring division with unrolling factor equal to 2
             NR    - use a C-based Newton-Raphson division
             as    - use a C-based align divisor shift dividend method

    --hls-fpdiv=<method>
        Perform the high-level synthesis of floating point division 
        operations starting from a C library based implementation:
             SRT4 - use a C-based Sweeney, Robertson, Tocher floating point division with radix 4 (default)
             G    - use a C-based Goldschmidt floating point division.
             SF   - use a C-based floating point division as describe in soft-fp library (it requires --soft-fp).
    --skip-pipe-parameter=<value>
        Used during the allocation of pipelined units. <value> specifies how
        many pipelined units, compliant with the clock period, will be skipped.
        (default=0).

    --reset-type=value
        Specify the type of reset:
             no    - use registers without reset (default)
             async - use registers with asynchronous reset
             sync  - use registers with synchronous reset

    --reset-level=value
        Specify if the reset is active high or low:
             low   - use registers with active low reset (default)
             high  - use registers with active high reset

    --disable-reg-init-value
        Used to remove the INIT value from registers (useful for ASIC designs)

    --registered-inputs=value
        Specify if inputs are registered or not:
             auto  - inputs are registered only for proxy functions (default)
             top   - inputs and return are registered only for top and proxy functions
             yes   - all inputs are registered
             no    - none of the inputs is registered

    --fsm-encoding=value
             auto    - it depends on the target technology. VVD prefers one encoding while the other are fine with the standard binary encoding. (default)
             one-hot - one hot encoding
             binary  - binary encoding

    --cprf=value
        Clock Period Resource Fraction (default = 1.0).

    --DSP-allocation-coefficient=value
        During the allocation step the timing of the DSP-based modules is
        multiplied by value (default = 1.0).

    --DSP-margin-combinational=value
        Timing of combinational DSP-based modules is multiplied by value.
        (default = 1.0).

    --DSP-margin-pipelined=value
        Timing of pipelined DSP-based modules is multiplied by value.
        (default = 1.0).

    --mux-margins=n
        Scheduling reserves a margin corresponding to the delay of n 32 bit
        multiplexers.

    --timing-model=value
        Specify the timing model used by HLS:
             EC     - estimate timing overhead of glue logics and connections
                      between resources (default)
             SIMPLE - just consider the resource delay

    --experimental-setup=<setup>
        Specify the experimental setup. This is a shorthand to set multiple
        options with a single command.
        Available values for <setup> are the following:
             BAMBU-AREA           - this setup implies:
                                    -Os  -D'printf(fmt, ...)='
                                    --memory-allocation-policy=ALL_BRAM
                                    --DSP-allocation-coefficient=1.75
                                    --distram-threshold=256
             BAMBU-AREA-MP        - this setup implies:
                                    -Os  -D'printf(fmt, ...)='
                                    --channels-type=MEM_ACC_NN
                                    --memory-allocation-policy=ALL_BRAM
                                    --DSP-allocation-coefficient=1.75
                                    --distram-threshold=256
             BAMBU-BALANCED       - this setup implies:
                                    -O2  -D'printf(fmt, ...)='
                                    --channels-type=MEM_ACC_11
                                    --memory-allocation-policy=ALL_BRAM
                                    -fgcse-after-reload  -fipa-cp-clone
                                    -ftree-partial-pre  -funswitch-loops
                                    -finline-functions  -fdisable-tree-bswap
                                    --param max-inline-insns-auto=25
                                    -fno-tree-loop-ivcanon
                                    --distram-threshold=256
             BAMBU-BALANCED-MP    - (default) this setup implies:
                                    -O2  -D'printf(fmt, ...)='
                                    --channels-type=MEM_ACC_NN
                                    --memory-allocation-policy=ALL_BRAM
                                    -fgcse-after-reload  -fipa-cp-clone
                                    -ftree-partial-pre  -funswitch-loops
                                    -finline-functions  -fdisable-tree-bswap
                                    --param max-inline-insns-auto=25
                                    -fno-tree-loop-ivcanon
                                    --distram-threshold=256
             BAMBU-TASTE          - this setup concatenate the input files and
                                    passes these options to the compiler:
                                    -O2  -D'printf(fmt, ...)='
                                    --channels-type=MEM_ACC_NN
                                    --memory-allocation-policy=ALL_BRAM
                                    -fgcse-after-reload  -fipa-cp-clone
                                    -ftree-partial-pre  -funswitch-loops
                                    -finline-functions  -fdisable-tree-bswap
                                    --param max-inline-insns-auto=25
                                    -fno-tree-loop-ivcanon
                                    --distram-threshold=256
             BAMBU-PERFORMANCE    - this setup implies:
                                    -O3  -D'printf(fmt, ...)='
                                    --memory-allocation-policy=ALL_BRAM
                                    --distram-threshold=512
             BAMBU-PERFORMANCE-MP - this setup implies:
                                    -O3  -D'printf(fmt, ...)='
                                    --channels-type=MEM_ACC_NN
                                    --memory-allocation-policy=ALL_BRAM
                                    --distram-threshold=512
             BAMBU                - this setup implies:
                                    -O0 --channels-type=MEM_ACC_11
                                    --memory-allocation-policy=LSS
                                    --distram-threshold=256
             BAMBU092             - this setup implies:
                                    -O3  -D'printf(fmt, ...)='
                                    --timing-model=SIMPLE
                                    --DSP-margin-combinational=1.3
                                    --cprf=0.9  -skip-pipe-parameter=1
                                    --channels-type=MEM_ACC_11
                                    --memory-allocation-policy=LSS
                                    --distram-threshold=256
             VVD                  - this setup implies:
                                    -O3  -D'printf(fmt, ...)='
                                    --channels-type=MEM_ACC_NN
                                    --memory-allocation-policy=ALL_BRAM
                                    --distram-threshold=256
                                    --DSP-allocation-coefficient=1.75
                                    --do-not-expose-globals --cprf=0.875


  Other options:

    --pragma-parse
        Perform source code parsing to extract information about pragmas.
        (default=no).

    --num-accelerators
        Set the number of physical accelerator instantiated in parallel sections. It must be a power of two (default=4).

    --time, -t <time>
        Set maximum execution time (in seconds) for ILP solvers. (infinite).

    --host-profiling
        Perform host-profiling.

    --disable-bitvalue-ipa
        Disable inter-procedural bitvalue analysis.


  Debug options:

    --discrepancy
           Performs automated discrepancy analysis between the execution
           of the original source code and the generated HDL (currently
           supports only Verilog). If a mismatch is detected reports
           useful information the user.
           Uninitialized variables in C are legal, but if they are used
           before initialization in HDL it is possible to obtain X values
           in simulation. This is not necessarily wrong, so these errors
           are not reported by default to avoid reporting false positives.
           If you can guarantee that in your C code there are no
           uninitialized variables and you want the X values in HDL to be
           reported use the option --discrepancy-force-uninitialized.
           Note that the discrepancy of pointers relies on ASAN to properly
           allocate objects in memory. Unfortunately, there is a well-known
           bug on ASAN (https://github.com/google/sanitizers/issues/914)
           when -fsanitize=address is passed to GCC or CLANG.
           On some compiler versions this issues has been fixed but since the
           fix has not been upstreamed the bambu option --discrepancy may not
           work. To circumvent the issue, the user may perform the discrepancy
           by adding these two options: --discrepancy --discrepancy-permissive-ptrs.

    --discrepancy-force-uninitialized
           Reports errors due to uninitialized values in HDL.
           See the option --discrepancy for details

    --discrepancy-no-load-pointers
           Assume that the data loaded from memories in HDL are never used
           to represent addresses, unless they are explicitly assigned to
           pointer variables.
           The discrepancy analysis is able to compare pointers in software
           execution and addresses in hardware. By default all the values
           loaded from memory are treated as if they could contain addresses,
           even if they are integer variables. This is due to the fact that
           C code doing this tricks is valid and actually used in embedded
           systems, but it can lead to imprecise bug reports, because only
           pointers pointing to actual data are checked by the discrepancy
           analysis.
           If you can guarantee that your code always manipulates addresses
           using pointers and never using plain int, then you can use this
           option to get more precise bug reports.

    --discrepancy-only=comma,separated,list,of,function,names
           Restricts the discrepancy analysis only to the functions whose
           name is in the list passed as argument.

    --discrepancy-permissive-ptrs
           Do not trigger hard errors on pointer variables.

    --discrepancy-hw
           Hardware Discrepancy Analysis.

    --assert-debug
        Enable assertion debugging performed by Modelsim.

A framework for Hardware-Software Co-Design of Embedded Systems