FLOOR PLANNING THE eFPGA YOU NEED
Floor Planner allows a designer to quickly try out EFLX arrays, using a specific IP core (EFLX4K shown here), with different sizes and combinations of Logic/DSP.
There are two types of EFLX cores: all-Logic (called "LM" in the floor planner) and DSP, were ~1/4 of the logic is replaced with strips of MACs with 22x22 multipliers, 48-bit pre-adder and 48-bit accumulator. The MACs are pipelined in strips of 10: the pipelining is directly between MACs without using the interconnect network for even higher performance and density.
In the floor planner, first the user moves the arrow in the upper right corner to set the array dimension. The grid shown is 8x8 - we have already fabricated a 7x7 array (see our TSMC16FF+/FFC EFLX4K page HERE). Array sizes can be square, 1x1, 2x2, 3x3, ... but can also be rectangular 1x2, 1x3, 1x4, 2x3, 3x2, 4x2, etc as required.
Once the user selects the array size, then they select the core type for each block in the array.
Once this is done the user can move on to loading RTL and checking area and performance. The user can quickly and interactively try different array sizes and placements of DSP/Logic blocks to determine which gives the best density and speed for their requirements.
Once the user is happy with the array size/feature configuration, a tcl script generates the GDS of the desired array automatically from the floor planner, a .LEF and .LIB file, with all interface timing including the clock network and it's connection to the rest of the SoC, is generated for the specific array instance. All of this takes a few hours to a few days, depending on array size/configuraiton.
Since we can quickly implement different array sizes and configurations, we encourage users to have multiple, different arrays in a single design if that gives them the best result. And if late in the process, the user changes their mind, we can easily give them larger or smaller arrays as needed.
Here is an example of a 7x7 floor plan, identical to the one used in our TSMC16FFC EFLX200K validation chip:
TIMING-AWARE PLACEMENT VIEWER
Once an array is defined, RTL/Verilog can be synthesized and mapped to the array. The Placement Viewer shows the physical design by IP core and by RBB block within the core (color coding: green is MAC, magenta is RBB-M, gray is RBB-L; a pale color is an empty logic block).
There are multiple screens available for examining how blocks connect. Here are two examples.
The screen below examines the input and output connections of a given block in the design.
Then the screen below shows the block by block path from start to finish of a specific timing path (a timing path is the output of a flop to the input of another flop that goes through multiple logic stages).
The designer can easily switch between the various timing corners supported in the EFLX Compiler: for example, in 16nm we support 7 corners.
Our new timing analyzer module allows you to see a histogram of all timing nets, then for each histogram bar to see the nets and then drill down into each net to see the stage by stage timing. This level of timing information aids in determining how to optimize your RTL for improving critical path worst case performance.
Timing is computed based on output files from Tempus/PrimeTime which describe every timing path through the EFLX core/array. Timing is available for each process node and for multiple corners for each process node (varying process conditions and voltages, not just worst case conditions).
Contact us for a demo and for a software evaluation license to try on your RTL: email@example.com.
Below are some details.
1st screen shows the 7 corners available for the TSMC 16FFC process. An EDIF netlist can be selected and a corner can be selected for optimizing place & route. Timing corners are available for all of the nominal voltages that TSMC supports: currently the 0.8V Tj nominal corners are populated (+/- 10%) and 1V corners for closing hold times. In the example below, an 8K LUT design will be placed and routed with timing optimized for SS, 0.72V and 125C.
After place and route, a timing histogram is generated showing the number of critical paths at each speed. The worst case performance for this example is 510.5MHz or 1959ps. In the GUI, using the cursor, the rightmost histogram bar was selected (1900-2000ps): the pop-up window shows there are two paths in this histogram.
Then, in this example, the 1959ps path is selected in the first pop-up window, which generates a 2nd pop-up window (see below) showing the 5% slowest paths in the logic cone of this path. Using this, a designer can see if one particular path is much longer and consider options to improve it.
Then, drilling down further, the designer can look at any of the paths in the logic cone (in the example below the 1946ps path is selected in the middle pop-up box). Once a path is selected, the designer can see every stage from the output of one flip flop through the various logic and net delays that make up the total path delay.
These data are based on silicon-sign-off data from Cadence Tempus, using TSMC cell libraries (CCS), wire load models (QRC), in the TSMC sign-off corners (e.g. SSGNP 0.72V, -40C RCworst-Cworst-T, AOCV) following TSMC timing sign-off guidelines. The database of timing reports and SDF timing annotation is then parsed by the EFLX Compiler to perform timing-analysis on your design in each corner. This rigorous ASIC timing signoff method ensures your RTL running on the EFLX array will meet the EFLX Compiler timing the same way you designed your ASIC to meeting timing under worst-case conditions. Unlike other FPGA companies, no timing margins or derates needs to be added to our timing-analysis reports because we use the same methodology you do for the rest of your chip.
These timing tools can allow the designer to gather information which may allow them to optimize the RTL to improve performance. In a future phase of the GUI, the physical graph of the path through the array will also be observable.
GUI PHASE 3 & 4
Additional features will be added in the near future for Interface Pin Mapping, Power Estimator and Logixizer/Debugger.
Synplify: this widely used Synopsys tool takes your netlist/RTL and breaks it down into primitives in an EDIF format, which feeds into the EFLX Compiler.
- Input your RTL to see the resources required: # LUTs/Cores, DSP blocks and RAM.
- Configure your EFLX array: select the number and type of EFLX cores, the clocks, the I/O configuration connecting the array to the SoC, and the type and amount of Block RAM.
- Input your RTL with your configured array to determine the worst case path and frequency for your target process.
- Generate the bit file (bit stream) that programs the EFLX array in the SoC to execute your RTL.
The EFLX Compiler is now in use at customers for designs and evaluation.
We can demonstrate our tools by Web-ex and run RTL for a customer, if they wish..
Here is a video demonstration of the key steps in compiling an RTL design for EFLX embedded FPGA to determine performance and LUT count. (the timing files vary by process node). NOTE: this is for the command line compiler; in April an updated demo of the GUI version will be available.
For qualified customers we can provide free evaluation licenses. Contact us at firstname.lastname@example.org.
[Synopsys is a Registered Trademark of Synposys, Inc.]