You do not need know about FPGA's to integrate reconfigurable RTL into your SoC: our software maps your RTL into our EFLX array for you.  But if you are curious, read on.

FPGAs are Field Programmable Gate Arrays.  They offer a different kind of programmability from processors.  Processors are sequential while FPGAs enable massive parallelism.  A processor has one adder, one multiplier -- an FPGA can have dozens.  And embedded FPGAs offer huge I/O count.

The logic blocks in FPGAs offer Look Up Tables (LUTs) to implement any Boolean function: 4, 5, 6 inputs with one or two outputs.   LUTs often feed into carry chain/shift circuitry for implementation of adders or comparators.  As well, the LUTs feed into Flip-Flops which can be optionally bypassed.  The basic concepts haven't changed much in decades.  

The LUTs in EFLX-100/TSMC 40 & EFLX-2.5K/TSMC 28 are dual 4-input LUTs with two bypassable Flip-Flops on the outputs.  Four dual-4-input LUTs are packed in a Reconfigurable Building Block (RBB) along with carry circuitry and 8 Flip-Flops.  

The LUTS in EFLX-100/-2.5K for TSMC 16FF+/16FFC are 6-input LUTs, which can also be configured as dual 5-input LUTs with two bypassable Flip-Flops on the outputs.  

For DSP applications, Multiplier-Accumulators (MAC) are useful for high performance and high density.  In EFLX arrays the MAC has a 22-bit pre-adder, a 22x22 multiple and a 48-bit post adder/accumulator.  MACs can be combined or cascaded to form fast DSP functions.  

The magic in FPGAs is the interconnect network that allows any logic block to connect to any other - this is also controlled by programming bits.  Traditional FPGAs use 2D-mesh architectures that can require 10+ metal layers and take up much more area than the logic blocks themselves. But Flex Logix uses a new, patented architecture (the subject of the Outstanding Paper award at ISSCC 2014) which uses about half the area of the traditional interconnect and uses only 5-6 metal routing layers, but with very high utilization.  The interconnect network has been further improved in our "Gen 2" Architecture, first implemented in TSMC 16nm.  Here is the comparison for 28nm:

FPGA chips today typically offer a lot of high-performance I/O using SERDES.  This is to give bandwidth sufficient to utilize the FPGA's high-performance capability.  But the FPGA chip I/O can take up 25%+ of the chip area and uses a lot of power, plus has high latency.  Embedded FPGA uses on-chip signaling which is very fast and very small, resulting in much more I/O and bandwidth.  The I/O in EFLX surrounds the array with separate inputs and outputs; each has an optional Flip-Flop. These are standard CMOS standard cells so they can run very fast.

The programmable logic blocks above are combined into a single EFLX array: LUTs/RBBs (and optionally some MACs) form the center of the array, in an enveloping mesh of programmable interconnect, surrounded by a thin ring of I/O (hundreds to thousands).  

All of it is programmable: the programming is done by Configuration Bits which set the values of the LUTs, the MACs and the interconnect so that the FPGA implements the exact RTL function the customer wishes.  The Configuration Bits typically are stored in the same Flash Memory as the code bits for the on-chip processor.

Software is critical for an FPGA.  The embedded FPGA is programmed using RTL or a netlist: Verilog or VHDL.  This is mapped into the FPGA architecture using an industry standard synthesis tool then the EFLX Compiler which packs, places, routes, generates timing and generates the Configuration Bit Stream to be loaded into the EFLX array to implement the RTL function.  [Synopsys is a Registered Trademark of Synopsys, Inc.]

Here is a brief demo of our software for mapping your RTL to EFLX to determine LUT count and performance (timing files vary by process node):

Customers typically want IP proven in silicon AND every customer wants a different array size.  This cannot be economically achieved by designing custom embedded FPGA sizes.

Flex Logix uses a building block approach. Each EFLX embedded FPGA IP core is a stand-alone FPGA, but incorporates additional top-level interconnect which allows automatic connection to adjacent IP cores turning them automatically into larger EFLX arrays. This strategy allows us to provide ~75 different array sizes from 100 LUTs to 100K LUTs in ~6 months from when we receive PDK and standard cell library and have a committed customer who works with us to ensure we optimize the circuit design for the right power/performance tradeoff for that market (the digital architecture remains the same).

Flex Logix implements two array sizes: EFLX-100 with about 100 LUTs (and a DSP version where some LUTs are replaced with 2 MACs); and the EFLX-2.5K with about 2,500 LUTs (and a DSP version where some LUTs are replaced with 40 MACs).  The EFLX-100 has ~200 inputs and ~200 outputs (depending on the process) and the EFLX-100 has ~600 inputs and ~600 outputs.

When we port the EFLX IP cores to a new process, we implement a validation chip with at least 2x2 arrays of the core types to validate the inter-core interconnects; we use on-chip PLL and RAM to test the blocks at full performance so we are not limited by GPIO; we use PVT monitors so we know that we are validating at precisely the worst case temperature and voltage specs.

The EFLX-100 IP core can be tiled/arrayed from 1x1 to 5x5.  The EFLX-100 IP core actually has 120 LUTs (TSMC40ULP/LP) so an EFLX array can be from 120 to 3,000 LUTs using the EFLX-100 and there are about 25 different array sizes.

The EFLX-2.5K IP core can be tiled/arrayed from 1x1 to 7x7.  The EFLX-2.5K IP core has ~2500 LUTs, so an EFLX array using EFLX-2.5K can be from 2.5K to 122.5K LUTs, and there are about 50 different array sizes.

For any given array size, the application may require no DSP acceleration, a lot of DSP acceleration, or some DSP acceleration.  In any EFLX NxN array, the EFLX Logic and the EFLX DSP IP cores are interchangeable so you can get exactly the amount of DSP acceleration you need.

FPGAs are flexible but less area efficient than inflexible, hard-wired logic.  The same is true with embedded FPGA.  There is no fixed ratio of FPGA size to hard-wired size: it depends on the function being implemented and on how well the RTL is optimized for an FPGA architecture. In any case, the comparison may not be the right one: if you need flexibility, hard-wire won't provide it.

Embedded FPGAs can enable architectures not possible with FPGA chips.  Look at some examples:

1. Software reconfigurable I/O pin multiplexing,
2. Flexible I/O for MCU and IoT and even SoCs,
3. Extending battery life for MCU and IoT: EFLX can do DSP at lower energy than ARM,
4. Fast control logic for Reconfigurable Cloud Data Centers,
5. DSP Acceleration.

Here is a presentation Embedded FPGA for Architects and Physical Designers.

Here is a demonstration of the validation chip for the EFLX-100 TSMC40ULP IP core in multiple array sizes and VT combinations at a nominal voltage of 50MHz (actual performance typically much higher):