Field Programmable Gate Arrays(FPGA)
Field Programmable Gate Arrays are two dimensional arrays of logic blocks and flip-flops with a electrically programmable interconnections between logic blocks.
The interconnections consist of electrically programmable switches which is why FPGA differs from Custom ICs, as Custom IC is programmed using integrated circuit fabrication technology to form metal interconnections between logic blocks.
In an FPGA logic blocks are implemented using mutliple level low fanin gates, which gives it a more compact design compared to an implementation with two-level AND-OR logic. FPGA provides its user a way to configure:
1. The intersection between the logic blocks and
2. The function of each logic block.
Logic block of an FPGA can be configured in such a way that it can provide functionality as simple as that of transistor or as complex as that of a microprocessor. It can used to implement different combinations of combinational and sequential logic functions. Logic blocks of an FPGA can be implemented by any of the following:
1. Transistor pairs
2. combinational gates like basic NAND gates or XOR gates
3. n-input Lookup tables
5. Wide fan in And-OR structure.
Figure 1: Simplefied version of FPGA internal architecture.
Routing in FPGAs consists of wire segments of varying lengths which can be interconnected via electrically programmable switches. Density of logic block used in an FPGA depends on length and number of wire segments used for routing. Number of segments used for interconnection typically is a tradeoff between density of logic blocks used and amount of area used up for routing.
The ability to reconfigure functionality to be implemented on a chip gives a unique advantage to designer who designs his system on an FPGA It reduces the time to market and significantly reduces the cost of production.
Why do we need FPGAs ?
By the early 1980’s Large scale integrated circuits (LSI) formed the back bone of most of the logic circuits in major systems. Microprocessors, bus/IO controllers, system timers etc were implemented using integrated circuit fabrication technology. Random “glue logic” or interconnects were still required to help connect the large integrated circuits in order to :
1. generate global control signals (for resets etc.)
2. data signals from one subsystem to another sub system.
Systems typically consisted of few large scale integrated components and large number of SSI (small scale integrated circuit) and MSI (medium scale integrated circuit) components.
Intial attempt to solve this problem led to development of Custom ICs which were to replace the large amount of interconnect. This reduced system complexity and manufacturing cost, and improved performance.However, custom ICs have their own disadvantages. They are relatively very expensive to develop, and delay introduced for product to market (time to market) because of increased design time. There are two kinds of costs involved in development of Custom ICs:
1. cost of development and design
2. cost of manufacture
( A tradeoff usually exists between the two costs)
Therefore the custom IC approach was only viable for products with very high volume, and which were not time to market sensitive.
FPGAs were introduced as an alternative to custom ICs for implementing entire system on one chip and to provide flexibility of reporogramability to the user. Introduction of FPGAs resulted in improvement of density relative to discrete SSI/MSI components (within around 10x of custom ICs). Another advantage of FPGAs over CustomICs is that with the help of computer aided design (CAD) tools circuits could be implemented in a short amount of time (no physical layout process, no mask making, no IC manufacturing)
Figure 2: FPGA comparative analysis.
| || |
Logic block in an FPGA can be implemented in ways that differ in number of inputs and outputs, amount of area consumed, complexity of logic functions that it can implement, total number of transistors that it consumes. This section will describe some important implementations of logic blocks.
Crosspoint FPGA: consist of two types of logic blocks. One is transistor pair tiles in which transistor pairs run in parallel lines as shown in figure below:
second type of logic blocks are RAM logic which can be used to implement random access memory.
Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.
Both Crosspoint and Plessey are fine grain logic blocks. Fine grain logic blocks have an advantage in high percentage usage of logic blocks but they require large number of wire segments and programmable switches which occupy lot of area.
Actel Logic Block: If inputs of a multiplexer are connected to a constant or to a signal, it can be used to implement different logic functions. for example a 2-input multiplexer with inputs a and b, select , will implement function ac + bc´. If b=0 then it will implement ac, and if a=0 it will implement bc´.
Typically an Actel logic block consists of multiple number of multiplexers and logic gates.
Xilinx Logic block:
In Xilinx logic block Look up table is used to implement any number of different functionality. The input lines go into the input and enable of lookup table. The output of the lookup table gives the result of the logic function that it implements. Lookup table is implemented using SRAM. A k-input logic function is implemented using 2^k * 1 size SRAM. Number of different possible functions for k input LUT is 2^2^k. Advantage of such an architecture is that it supports implementation of so many logic functions, however the disadvantage is unusually large number of memory cells required to implement such a logic block in case number of inputs is large. Figure below shows 5-input LUT based implementation of logic block.Xilinx - LUT based
LUT based design provides for better logic block utilization. A k-input LUT based logic block can be implemented in number of different ways with tradeoff between performance and logic density.
An n-lut can be shown as a direct implementation of a function truth-table. Each of the latch holds the value of the function corresponding to one input combination. For Example: 2-lut shown in figure below implements 2 input AND and OR functions.
Altera Logic Block
Altera's logic block has evolved from earlier PLDs. It consists of wide fan in (up to 100 input) AND gates feeding into an OR gate with 3-8 inputs. If floating gate transistor based programmable switch is provide any vertical wire passing near AND gate can be used as input to the AND gate. IF each input signal is present both original and complemented form functional capability of block increases significantly. The advantage of large fan in AND gate based implementation is that few logic blocks can implement the entire functionality thereby reducing the amount of area required by interconnects. On the other hand disadvantage is the low density usage of logic blocks in a design that requires fewer input logic.
Another disadvantage is the use of pull up devices (AND gates) that consume static power. To improve power manufacturers provide low power consuming logic blocks at the expense of delay. Such logic blocks have gates with high Threshold as a result they consume less power. Such logic blocks can be used in non-critical paths.
Altera, Xilinx are coarse grain architecture.
Tradeoff - Size of Logic block Vs Performance
Size of logic block plays an important role in deciding density of logic blocks and area utilization in an FPGA. It also effects the performance of the FPGA.
- A large size logic block implements more logic and hence requires less number of logic blocks to implement a functionality on the FPGA. On the other hand a large logic block will consume more space on the FPGA. So optimal size of logic block is one that optimally uses lesser number of logic blocks for functionality implementation while consuming as little space as possible.
- Active logic area is generally less than total logic area due to presence of programmable connections. Total logic area is sum of active logic area and area consumed by programmable connections.
- Routing area in an FPGA is typically more than the active area. It is 70 to 90 percent of total area in an FPGA.
- In case of Lookup table based FPGA, a 4-input lookup table gives best results in terms of logic synthesised and area consumed.
- Granularity of logic block has influence on performance of an FPGA. Typically higher granularity level results in lesser delay between input and output. As the granularity of logic block increases, number of levels of logic in critical path decreases, and hence delay in critical path decreases. On the flip side with increase in granularity level average fan out increases and number of switches also increases as each block has more pins. Also the length of wires increases with increase in size of logic block.
FPGA Routing Techniques
| || |
Routing architecture comprises of programmable switches and wires. Routing provides connection between I/O blocks and logic blocks, and between one logic block and another logic block.
The type of routing architecture decides area consumed by routing and density of logic blocks.
Routing technique used in an FPGA largely decides the amount of area used by wire segments and programmable switches as compared to area consumed by logic blocks.
A wire segment can be described as two end points of an interconnect with no programmable switch between them. A sequence of one or more wire segments in an FPGA can be termed as a track.
Typically an FPGA has logic blocks, interconnects and Input/Output blocks. Input Output blocks lie in the periphery of logic blocks and interconnect. Wire segments connect I/O blocks to wire segments through connection blocks. Connection blocks are connected to logic blocks, depending on the design requirement one logic block is connected to another and so on.
Xilinx Routing architecture
In Xilinx routing, connections are made from logic block into the channel through a connection block. As SRAM technology is used to implement Lookup Tables, connection sites are large. A logic block is surrounded by connection blocks on all four sides. They connect logic block pins to wire segments. Pass transistors are used to implement connection for output pins, while use of multiplexers for input pins saves the number of SRAM cells required per pin. The logic block pins connecting to connection blocks can then be connected to any number of wire segments through switching blocks.
there are four types of wire segments available :
Actel routing methodology
Actel's design has more wire segments in horizontal direction than in vertical direction. The input pins connect to all tracks of the channel that is on the same side as the pin. The output pins extend across two channels above the logic block and two channels below it. Output pin can be connected to all 4 channels that it crosses. The switch blocks are distributed throughout the horizontal channels. All vertical tracks can make a connection with every incidental horizontal track. This allows for the flexibility that a horizontal track can switch into a vertical track, thus allowing for horizontal and vertical routing of same wire. The drawback is more switches are required which add up to more capacitive load.
Altera routing methodology
Altera routing architecture has two level hierarchy. At the first level of the hierarchy, 16 or 32 of the logic
blocks are grouped into a Logic Array Block, structure of the LAB is very similar to a traditional PLD. the connection is formed using EPROM- like floating-gate transistors. The channel here is set of wires that run vertically along the length of the FPGA. Tracks are used for four types of connections :
- connections from output of all logic blocks in LAB.
- connection from logic expanders.
- connections from output of logic blocks in other LABs
- connections to and from Input output pads
all four types of tracks connect to every logic block in the array block. The connection block makes sure that every such track can connect to every logic block pin. Any track can connect to into any input which makes this routing simple. The intra-LAB routing consists of segmented channel, where segments are as long as possible. Global interconnect structure called programmable interconnect array(PIA) is used to make connections among LABs. Its internal structure is similar to internal routing of a LAB. Advantage of this scheme is that regularity of physical design of silicon allows it to be packed tightly and efficiently. The disadvantage is the large number of switches required, which adds to capacitive load.