Tiling is Critical for eFPGA Users: ArrayLinx™ Delivers
FPGA chips come in multiple sizes: modular blocks of programmable logic, DSP MACs and RAM are intermixed in different sizes and ratios then stitched together with top-level interconnect, clocking, etc and surrounded by a ring of I/Os like GPIO, SerDes, USB, etc. There is extensive engineering and top-level physical design for each distinct FPGA array and chip.
eFPGA is different: customers want to use eFPGA arrays that have been silicon proven, where the GDS has not been touched. When we started Flex Logix, that seemed like an insurmountable problem: we'd have to do validation chips for multiple different array sizes and mixes of logic/DSP at extensive cost and time, and even then we couldn't cover all the sizes and variations possible.
Then inspiration struck - Cheng came up with a building block that could be put together into larger arrays without any GDS changes. We could build a 2x2 array to prove out all the connections then deliver arrays of various sizes and mixes using silicon proven blocks without GDS change. US Patent 9,906,225 issued recently covering much of this innovation: "Tiling" or what we now call ArrayLinx™.
Step 1 - a building block called EFLX4K Logic with 4K LUT4 equivalents (2,520 LUT6) and >1000 I/O pins (620 inputs and 620 outputs). A key point is that in an eFPGA the interface or I/O is just CMOS pins with optional flip-flops on the input and output: so even though there are >100 interface pins they take very little area, a few % of the core. This is a complete eFPGA.
Step 2 - do an almost identical second building block called EFLX DSP with identical size and footprint and external interfaces but replacing about 1/4 of the programmable logic with 40 MACs (22x22 multiplier, 48-bit pre-adder, 48-bit accumulator arranged in strips of 10 with pipelining between the MACs). This is a complete eFPGA.
You can see the actual implementation of the EFLX4K DSP in TSMC28HPM/HPC (proven in silicon) here. The I/O pins are in the very small rectangles at the top, bottom side taking a few % of the total area. You can see 3/4 of the eFPGA is logic and 1/4 is four strips of 10 MACs each. In the EFLX 4K Logic the DSP blocks are also populated with Logic.
The reason for two blocks is that some customers (e.g. networking) only need programmable logic. But some customers (e.g. Aerospace, Base Stations) want intense DSP such as FFTs & FIRs. And some customers want a moderate amount of DSP. So customers would like to have a "slider bar" to get the exact ratio of logic to MACs that is best for their application.
Step 3 - put a top-layer interconnect in the EFLX4K IP cores: it is not needed for standalone operation of a single EFLX4K, but it allows two or more EFLX4K IP cores to be abutted to create larger arrays with full connectivity of the top layer interconnect, clocking, configuration and other controls without any GDS change of the EFLX4K cores. The interconnect is a mesh with thousands of wires on each of the 4 edges: North, South, East, West. The ArrayLinx wires are only used for extending the interconnect across the EFLX eFPGA array, not for user connections to other external blocks. The EFLX4K Logic and DSP cores ArrayLinx is of course identical.
Step 4 - abut cores to form larger arrays. Two cores can be physically abutted and the ArrayLinx mesh connections are automatically made at the edges of the two cores making a single logical eFPGA array. The user interface pins (yellow) between the two cores are unused. Note that mixing the two types of cores is OK.
There are also clock, configuration, reset, and DFT connections that are formed automatically across the array at the same time. The details are available under NDA.
We can also abut cores North/South, in addition to East/West. Again, the user interface pins between abutted cores are unused, but represent only a few % of the total area - a small sacrifice in return for using untouched, silicon proven building blocks to construct larger arrays. Note that this 2x2 configuration is the minimum array size in our validation chips because it checks out both flavors of cores in all 4 edge directions.
Here is another representation that shows at the inner edges of the array how the ArrayLinx mesh connections are formed while the user interface pins are disabled, while at the outer edges of the array the ArrayLinx mesh connections are unused but the user interface pins are enabled for connections to the rest of the SoC. Note the ArrayLinx mesh connections are at the top level of the EFLX eFPGA (7 metal for 16nm EFLX4K, 6 for 28nm, 5 for 40nm) so that when RAMs are integrated between arrays using RAMLinx™ the ArrayLinx mesh connections run over the RAMs.
The exact number of ArrayLinx wires implemented is a function of the desired maximum achievable array size with good utilization. The EFLX4K arrays up to at least 7x7 or 200K LUT4 capacity (this has already been fabricated and proved in silicon in TSMC16FFC): larger arrays are physically possible, but we have not yet studied the "roll off" point where congestion climbs and utilization drops, it may be 8x8 or 9x9 or 10x10, TBD. At 6x6 we have had a customer achieve 97.7% utilization but our best estimate is that utilization will typically be about 90%. Check out our EFLX Compiler page to see the Floor Planner that allows customers to implement the exact array size and combination of Logic/DSP they want: the EFLX Compiler output directly generates the GDS for the desired array, then our engineers run the interface timing files, especially for the clock network and connection to the SoC, for the given array size/combination: this takes a few days to generate for all of the process, voltage, temperature corners.
With ArrayLinx we can mix the two EFLX4K cores Logic and DSP interchangeably anyway the customer wants: 100% Logic, 100% DSP or any mix of Logic and DSP so the customer can get the exact % mix of Logic and DSP that is optimal for their design, with no GDS risk.
We also have a smaller EFLX150 core to implement smaller arrays from 150 to ~4K LUT4s.
In the future, we could do larger EFLX cores for eFPGA arrays to 1M+ LUT4s.
Our validation chips always build at least a 2x2 array with a mix of Logic and DSP so that we can silico-prove both cores AND the North/South/East/West ArrayLinx interfaces.
EFLX eFPGA is Compatible with Most Metal Stacks
FPGA chips typically use the maximum number of metal layers because of the traditional mesh interconnect.
Flex Logix uses a new, patented interconnect which is almost twice as dense as the traditional mesh AND uses many fewer metal layers.
EFLX eFPGA uses 6-7 metal layers for our 16nm IP, 6 metal layers for our 28nm IP and 5 metal layers for our 40nm IP. This means we are compatible with almost all metal stacks, whereas IP from our competitors use many more metal layers making them compatible with few metal stacks.
Dense, Portable, Scalable Silicon-Proven eFPGA GDS
Get the details here: White Paper on eFPGA IP Density, Portability and Scalability.
Customers want eFPGA IP GDS that is
- dense (more logic in less are than other eFPGA)
- works on the specific foundry/process node/variation that they have chosen
- works with the metal stack they have chosen
- the size they want (and with the options they want)
- silicon proven so they know it will work in their chip
Only Flex Logix can do this, currently across 7 process node/variation combinations and dozens of metal stacks for each, in sizes from 100 LUTs to 200K LUTs with options for DSP and RAM, all using a Silicon Proven EFLX IP core GDS which is UNCHANGED from the one proven on our validation chip.
Other eFPGA vendors cannot.
Vendors who generate eFPGA derived from their FPGA chips have to do full-custom design changes to move from one process variant to another (e.g. 16X to 16X+) and even to support most metal stacks. These changes require multiple months and mean the GDS they deliver is different from what is in their FPGA chip. Same for different array sizes.
Vendors who generate eFPGA from Soft IP have multiple array sizes: validating one doesn't proven the others. And they use standard cells with traditional FPGA interconnect, so their density is 1/2-1/3.
The reasons Flex Logix can deliver silicon proven, high density, scalable arrays over incremental process variations? Multiple inventions and innovations:
- a patented interconnect, XFLX™, which is twice as dense as traditional FPGA mesh interconnect AND which requires many fewer metal layers, enabling compatibility with most metal stacks
- a tiling approach where a single EFLX core is a complete eFPGA, but when abutted, a top-layer of interconnect, ArrayLinx™, is formed, without touching the GDS, to extend the ArrayLinx eFPGA interconnect across arrays of sizes up to 7x7
- 6-input-LUTs which get higher logic density and higher performance
- standard cells which enable rapid implementation and are GDS-compatible across incremental process variations (e.g. X/X+/X++): our patented interconnect makes up for the density that standard cells lose, so we still match the density of eFPGA from FPGA chips
- Validation chips in every process node of at least a 2x2 array proving out the top-level interconnects that enable arrays up to 7x7. Validation chips are designed with on-chip high speed RAM, PLL and PVT monitors so performance can be verified over -40C to +125C and over the full voltage range.