A Changing Technology Landscape

1835  1946  1971  2012

You will be soon designing with “post-CMOS” devices
Signal processing content expanding

Specialized hardware for energy efficiency

Keeping up with Standards
New standard = New chip?

Media

Radio
Today: CPUs + Accelerators

**DARK** silicon: area inefficient

Accelerators for fixed standards

NVIDIA Tegra 2

---

Ways to Achieve **FLEXIBILITY**

<table>
<thead>
<tr>
<th>Feature</th>
<th>Programmable DSP</th>
<th>FPGA (Flexible DSP)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Architecture</td>
<td>Fixed</td>
<td>Reconfigurable</td>
</tr>
<tr>
<td>Operations</td>
<td>Conditional</td>
<td>Repetitive</td>
</tr>
<tr>
<td>Multi-core</td>
<td>Hard</td>
<td>Easy</td>
</tr>
<tr>
<td>Throughput</td>
<td>Low/mid</td>
<td>High</td>
</tr>
</tbody>
</table>
Efficient & Flexible Hardware?

![Graph showing average area efficiency vs. average energy efficiency](image)

- Average Area Efficiency [GOPS/mm²]
- Average Energy Efficiency [GOPS/mW]

ISSCC & VLSI 1999-2011

Narrow This Gap

- Dedicated
- Prog. DSP
- FPGA
- μProc

Efficient & Flexible Hardware?

rising COST of Dedicated Chips

<table>
<thead>
<tr>
<th>Node (nm)</th>
<th>2002</th>
<th>2003</th>
<th>2004</th>
<th>2005</th>
<th>2006</th>
<th>2007</th>
<th>2008</th>
<th>2009</th>
<th>2010</th>
<th>2011</th>
<th>Cost ($M)</th>
</tr>
</thead>
<tbody>
<tr>
<td>28</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>110</td>
</tr>
<tr>
<td>32</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>80</td>
</tr>
<tr>
<td>40/5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>60</td>
</tr>
<tr>
<td>65</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>55</td>
</tr>
<tr>
<td>90</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>30</td>
</tr>
<tr>
<td>130</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>20</td>
</tr>
<tr>
<td>180</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>13</td>
</tr>
<tr>
<td>250</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>5</td>
</tr>
</tbody>
</table>

Source: Altera & Gartner (2009)
rising COST of Dedicated Chips

<table>
<thead>
<tr>
<th>Node (nm)</th>
<th>2002</th>
<th>2003</th>
<th>2004</th>
<th>2005</th>
<th>2006</th>
<th>2007</th>
<th>2008</th>
<th>2009</th>
<th>2010</th>
<th>2011</th>
</tr>
</thead>
<tbody>
<tr>
<td>28</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>32</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>40/5</td>
<td>Dedicated</td>
<td>FPGA</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>65</td>
<td>FPGA</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>Flexible</td>
<td>Flexible</td>
</tr>
<tr>
<td>90</td>
<td>FPGA</td>
<td>FPGA</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>Efficient</td>
<td>Efficient</td>
</tr>
<tr>
<td>130</td>
<td>FPGA</td>
<td>FPGA</td>
<td>FPGA</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>180</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>250</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Source: Altera & Gartner (2009)

WHY are FPGAs Inefficient?
Interconnect


2D-Mesh Interconnects
2D-Mesh is NOT Scalable

- From $O(N^2)$ complexity
- Full connectivity impractical

Hierarchical Networks

Tree of Meshes

Butterfly Fat Tree

Limited connectivity

Excess delay

FPGA Architecture

1. Logic-Block Architecture
2. Routing Architecture
3. Interconnect Switches

Fine-Grain Logic-Block Architecture

Few, simple logic elements in a block

+ High utilization of logic block

− Lots of interconnects & programmable switches
  → Larger chip area
  → Lower performance
Coarse-Grain Logic-Block Architecture

Few complex logic elements that perform various functionality

- Most FPGAs
- Example: Actel ACT1
  - 8 inputs to logic block
  - All 2-input functions, most 3-input functions, some 4-input functions

Routing Architecture Dominates Chip Area

>80% Interconnect
**Type 1: Row-Based Layout**

- Cells adjacent to routing channel
- Horizontal routing channel
- Estimating optimum # tracks and segments difficult
- Main tradeoff: performance vs. routability

**Fully-Segmented Channel**

- Switches needed between every cross-point
- Many switches
Non-Segmented Channel

- One track per connection
- Few switches

1-Segment Routing

- Divide segments into various lengths on tracks
- Few switches
### 2-Segment Routing

- **Programmable segments** – more flexible
- **Less tracks**

![Diagram of 2-Segment Routing](image)

### Type 2: Matrix-Based Layout

- **Horizontal & vertical routing channels**
- **Long interconnect lines**

![Diagram of Type 2: Matrix-Based Layout](image)
Routing Techniques (Matrix-based)

- **Connection Blocks (C-Block)**
  - Connect I/Os of logic blocks to routing channel

- **Switch Blocks (S-Block)**
  - Connect segments at intersection of routing channels

Routing Techniques (Matrix-based)

- **PIP (Programmable Interconnect Point)**
  - Fewer in number: higher speed but lower routability
  - Buffering b/w switches reduces loading and thus delay
Interconnect Switches

- Antifuse
- SRAM
- EPROM

Antifuse Interconnect Switches

- High voltage (11-21V) to blow the fuse
- Not reprogrammable, volatile
- Requires additional programming circuit
SRAM-based Interconnect Switches

- Uses pass transistors
- Controlled by SRAM bit from a lookup table
- Higher interconnect R & C than antifuse
- Reprogrammable, volatile

EPROM-based Interconnect Switches

- Uses floating gate transistor
- Turns OFF by injecting charge on the gate
- Memory retained when power is down
- Reprogrammable, non-volatile
Xilinx FPGAs

Two Main Series

- **Spartan**
  - Older tech
  - Small
  - Slow

- **Virtex**
  - Newer tech
  - Large
  - Fast
Technological Side Effects (65nm)

- **Soft errors**

- **Wear-out mechanisms**
  - Hot Carrier Injection (HCI)
  - Time Dependent Dielectric Breakdown (TDDB)
  - Negative Bias Temperature Instability (NBTI)
  - **Solution:** Lower voltage and thicker oxide (triple-ox devices) at the expense of reduced performance
    - Thin-oxide: performance-critical paths
    - Mid-oxide: config memory, pass-gate switches
    - Thick-oxide: high-voltage I/Os

---

**Example 18.1: Virtex-5 FPGA Family**

- **6-in LUTs introduced**
  - More logic within LUT
  - Smaller transistors
  - Lower t-size/log-capacity ratio

- **Overview**

- **Configurable Logic Blocks (CLBs)**

- **Inputs and Outputs**

- **Block RAM**

- **Clock Resources**

- **Power Minimization**
Virtex-5 FPGA Overview

- 65nm copper CMOS process
  - 1.0V core voltage (down from 1.2V in V-4)
  - 12 metal layers
- 550MHz clock
- Up to 50K Virtex-5 slices (330K logic cells)
  - 4 LUTs and 4 FFs per slice
- Up to 1000 DSP48E slices
  - DSP48E slice: 1 25x18 Mult, 1 Add, and 1 Acc
- Up to 18Mbits (36Mb – 9Mb blocks) of memory
- Up to 1,200 user I/Os
  - 1.2 to 3.3V I/O operation

Four Types of Virtex-5 FPGA

- Virtex-5 LX
  - High-performance general logic
- Virtex-5 LXT
  - High-performance logic with advanced serial connectivity
- Virtex-5 SXT
  - High-performance DSP with advanced serial connectivity
- Virtex-5 FXT
  - High-performance embedded systems with advanced serial connectivity
### Virtex-5 FPGAs

![Virtex-5 FPGAs Graph](image)

### Device Specifications

| Device | Configurable Logic Blocks (CLBs) | Block RAM Blocks | DSP48E Slices | Power-PC Processors Blocks | Endpoint Blocks for PCI Express | Ethernet MAC Blocks | Max RocketIQ Transceivers | Total ID Users | Max User VPP
<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>XC5VLX30</td>
<td>80 x 30</td>
<td>4,800</td>
<td>320</td>
<td>32</td>
<td>64</td>
<td>32</td>
<td>1,152</td>
<td>2</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VLX50</td>
<td>120 x 30</td>
<td>7,200</td>
<td>480</td>
<td>48</td>
<td>96</td>
<td>48</td>
<td>1,728</td>
<td>6</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VLX85</td>
<td>120 x 54</td>
<td>12,960</td>
<td>840</td>
<td>48</td>
<td>192</td>
<td>96</td>
<td>3,456</td>
<td>6</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VLX110</td>
<td>160 x 54</td>
<td>17,280</td>
<td>1,120</td>
<td>64</td>
<td>256</td>
<td>128</td>
<td>4,688</td>
<td>6</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VLX155</td>
<td>160 x 76</td>
<td>24,320</td>
<td>1,540</td>
<td>128</td>
<td>384</td>
<td>192</td>
<td>6,912</td>
<td>6</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VLX220</td>
<td>160 x 108</td>
<td>34,560</td>
<td>2,280</td>
<td>128</td>
<td>384</td>
<td>192</td>
<td>6,912</td>
<td>6</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VLX300</td>
<td>240 x 108</td>
<td>51,840</td>
<td>3,420</td>
<td>192</td>
<td>576</td>
<td>288</td>
<td>10,368</td>
<td>6</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VLX30T</td>
<td>60 x 26</td>
<td>3,120</td>
<td>210</td>
<td>24</td>
<td>52</td>
<td>26</td>
<td>936</td>
<td>1</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VLX90T</td>
<td>80 x 30</td>
<td>4,800</td>
<td>320</td>
<td>32</td>
<td>72</td>
<td>36</td>
<td>1,296</td>
<td>2</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VLX50T</td>
<td>120 x 30</td>
<td>7,200</td>
<td>480</td>
<td>48</td>
<td>120</td>
<td>60</td>
<td>2,160</td>
<td>6</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VLX85T</td>
<td>120 x 54</td>
<td>12,960</td>
<td>840</td>
<td>48</td>
<td>216</td>
<td>108</td>
<td>3,888</td>
<td>6</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VLX110T</td>
<td>160 x 54</td>
<td>17,280</td>
<td>1,120</td>
<td>64</td>
<td>296</td>
<td>148</td>
<td>5,328</td>
<td>6</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VLX155T</td>
<td>160 x 76</td>
<td>24,320</td>
<td>1,540</td>
<td>128</td>
<td>424</td>
<td>212</td>
<td>7,632</td>
<td>6</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VLX220T</td>
<td>160 x 108</td>
<td>34,560</td>
<td>2,280</td>
<td>128</td>
<td>424</td>
<td>212</td>
<td>7,632</td>
<td>6</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VLX300T</td>
<td>240 x 108</td>
<td>51,840</td>
<td>3,420</td>
<td>192</td>
<td>648</td>
<td>324</td>
<td>11,664</td>
<td>6</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VFX06T</td>
<td>80 x 34</td>
<td>5,440</td>
<td>520</td>
<td>192</td>
<td>168</td>
<td>84</td>
<td>3,024</td>
<td>2</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VFX50T</td>
<td>120 x 34</td>
<td>8,160</td>
<td>780</td>
<td>288</td>
<td>264</td>
<td>132</td>
<td>4,752</td>
<td>6</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VFX85T</td>
<td>160 x 44</td>
<td>14,720</td>
<td>1,520</td>
<td>640</td>
<td>488</td>
<td>244</td>
<td>8,784</td>
<td>6</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VFX120T</td>
<td>60 x 26</td>
<td>3,120</td>
<td>210</td>
<td>24</td>
<td>52</td>
<td>26</td>
<td>936</td>
<td>1</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VFX180T</td>
<td>80 x 30</td>
<td>4,800</td>
<td>320</td>
<td>32</td>
<td>72</td>
<td>36</td>
<td>1,296</td>
<td>2</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VFX220T</td>
<td>120 x 30</td>
<td>7,200</td>
<td>480</td>
<td>48</td>
<td>120</td>
<td>60</td>
<td>2,160</td>
<td>6</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VFX270T</td>
<td>160 x 30</td>
<td>11,200</td>
<td>620</td>
<td>48</td>
<td>296</td>
<td>148</td>
<td>5,328</td>
<td>6</td>
<td>N/A</td>
</tr>
<tr>
<td>XC5VFX320T</td>
<td>160 x 40</td>
<td>16,000</td>
<td>1,240</td>
<td>256</td>
<td>456</td>
<td>228</td>
<td>8,256</td>
<td>6</td>
<td>2</td>
</tr>
<tr>
<td>XC5VFX370T</td>
<td>200 x 40</td>
<td>20,480</td>
<td>1,580</td>
<td>320</td>
<td>596</td>
<td>298</td>
<td>10,728</td>
<td>6</td>
<td>2</td>
</tr>
<tr>
<td>XC5VFX420T</td>
<td>240 x 40</td>
<td>30,720</td>
<td>2,280</td>
<td>384</td>
<td>912</td>
<td>456</td>
<td>16,416</td>
<td>6</td>
<td>2</td>
</tr>
</tbody>
</table>
Virtex-5 Configurable Logic Blocks (CLBs)

- CLBs implement seq. and comb. functions
- CLB = two unconnected independent slices

Virtex-5 CLBs: Fast Carry Logic

- Each slice connected to the global routing paths
- Slice columns connected by fast carry logic
Two Types of Slices

- **SLICEL**: regular
  - Every CLB contains one or two SLICEL

- **SLICEM**: more functions
  - Every other CLB contains a SLICEM
Virtex-5: **SLICEL LUT**

- Four independent, 6-input LUTs
- Can be used as just ROMs
- Can be used as two 5-input LUTs (shared inputs)

Virtex-5 CLBs: **Carry Logic**

- Dedicated carry logic
- Carry chain is running upward through multiple CLBs with **4bits per slice**
- S for “propagate” and DI for “generate”
- CYINIT may be used as the first carry bit
Virtex-5 CLBs: Storage Elements

- Edge triggered (+/-) FF or level sensitive (H/L) latch
- Sync. or async. set/reset (using SR and REV inputs)
- D-inputs from LUTs or using AX, BX, CX, and DX

![Virtex-5 CLBs Diagram](image)

**SLICEM Diagram**

![SLICEM Diagram](image)
### Virtex-5 SLICEM: Additional Configurations

- Single/dual-port 32x1bit RAM
- Quad-port 32x2bit RAM
- Simple dual-port 32x6bit RAM
- Single/dual-port 64x1bit RAM
- Quad-port 64x1bit RAM
- Simple dual-port 64x3bit RAM
- Single/dual-port 128x1bit RAM
- Single-port 256x1bit RAM
- 32-bit shift reg w/o using slice FFs

---

### Virtex-5 CLBs: Distributed RAM

- **RAM inside SLICEM** are called Distributed RAM
- Distributed RAM modules have sync. input and async. output
- The outputs can be synchronous by going through the SLICEM FFs
Virtex-5 I/O

- Each I/O Pad connected to an I/O Block, ILOGIC, OLOGIC, I/O Delay blocks
- The I/O block may be configured to a wide variety of I/O standards

Virtex-5 I/O: IOB Block

- PAD
- DIFFO_IN
- DIFFO_OUT
- PADOOUT
- I
- OUTBUF
- INBUF
- DIFFI_IN
Virtex-5 I/O: ILOGIC Block

Virtex-5 I/O: OLOGIC Block
Virtex-5 I/O: DCI

- The I/O Blocks are equipped by Digitally Controlled Impedance (DCI)
  - Adjusts the o/p impedance or i/p termination to accurately match the c/c impedance of the PCB transmission line
  - Continuously adjusts the impedance compensating the impedance changes due to process variations, temperature, and supply voltage fluctuations
  - Provides the parallel or series termination for transmitters and receivers

Virtex-5 Block RAM Features

- Each block RAM can store up to 36Kb
- Can be configured as FIFO or as two independent 18Kb RAMs
- Write and read are synchronous
- Read and write ports are independent
  - One clock cycle for Rd or Wr
Virtex-5: Global Clock

- Each Virtex-5 device has 32 global Clk lines
- It can clock all sequential resources (CLBs, Block RAMs, and I/Os)
- Global Clk lines driven by a global Clk buffer
  - Can be used as a clock enable
  - Can select between two clock sources
- A global clock buffer is driven by a Clock Management Tile (CMT) that adjusts the clock delay relative to another clock

Virtex-5: Regional Clocks

- A Virtex-5 device is divided into regions (8 to 24)
- Each region has two regional clock buffers and four regional clock trees
- Each region is assigned an I/O bank that has four clock-capable clock inputs
- A regional clock buffer can divide the incoming clock rate by any integer number from 1 to 8
- A regional clocks can drive regional clock trees from the adjacent regions
Static Power Reduction

- **Triple Oxide Process Technology**
  - The right transistor for the right job

- The use of 6-inputs LUTs (for the first time) allows increase of logic capacity

- More logic happens locally

- Less drivers needed and hence less leakage

Dynamic Power Reduction

- **Big LUTs localize the logic** → reduced $C_{load}$ from the programmable interconnect

- **Virtex-5 have a new more uniform routing** reducing the number of hops i.e. reduced capacitance

- **The block RAMs are composed of smaller 9Kb RAMs.** The proper 9Kb is selected during read or write reducing power in the other 9Kb
FPGAs going **Multi-core...**

**BEE2**

13” x 17”

22-layer PCB

**Rack & Server: 21 BEE2, 1008 Cores**

- 21 BEE2s, each with 48 cores, for a total of 1008 MicroBlaze cores.
- Picture also includes the NFS server, monitoring and debugging machine.

*Courtesy: J. Wawrzynek (UCB)*

---

**Novo-G FPGA-based Machine**

1 head-node server:
- 1U rackmount chassis
- Dual Xeon E5520 quad-core CPUs @ 2.26 GHz, 4MB $\$, 5.86 GT/s QPI
- 24GB ECC DDR3, 1333 MHz
- Integrated dual-GigE ports & video
- ICH10R controller for 6 SATA drives
- 3 x 1TB Enterprise SATA2 drives

24 compute servers, each with:
- 4U rackmount, with 645W P/S
- Intel Xeon E5520 quad-core CPU
- 6GB ECC DDR3, 1333 MHz
- Integrated dual-GigE ports & video
- 2 GiDEL ProcStar-III PCIe x8 cards
- Mellanox DDR InfiniBand PCIe card
- 250GB SATA2 drive

Not visible (IB & GigE switches, PDUs)

*KVM/LCD for head node*

*~300W*

*~8 kW total*

*Courtesy: A. George, et al. (U Florida)*
**Novo-G ProcStar-III Board**

- **GiDEL ProcStar-III**
  - fclk: 100-325MHz
  - DMA ch: 32
  - DDR2 slots: 8

- **Altera Stratix-III E260**
  - 254,400 Logic Elements
  - 768 Multipliers (18×18)
  - 14,688 Kbits embedded
  - 65nm Technology

- **~50W**

- **2×2GB = 4GB DDR2 RAM per FPGA**

**Novo-G Memory & Connectivity**

- **Head node**
- **24 GB DDR3**

- **6 GB DDR3**

- **FPGA1**
  - 2×2GB DDR2 SODIMM
  - 256 MB DDR2 667MHz
  - Memory Bus
  - Main bus
  - PCI-Express x8

- **FPGA2 + memory**
  - GigE
  - Infiniband

- **FPGA3 + memory**

- **FPGA4 + memory**

**Courtesy:** A. George, et al. (U Florida)
Conclusion

- Limited technology improvements
  *Energy efficient* design
- Increasing functional diversity
  *Flexible* design
- Architecture of interconnect networks
  *Efficiency and flexibility*

References

- Virtex-5 FPGA User Guide (xilinx.com)
- Virtex-5 Family Overview (xilinx.com)
- [http://www.ecs.umass.edu/ece/tessier/courses/697ff/lect13-ece697f.ppt](http://www.ecs.umass.edu/ece/tessier/courses/697ff/lect13-ece697f.ppt)
- [http://www.eecg.toronto.edu/~vaughn/challenge/fpga_arch.html](http://www.eecg.toronto.edu/~vaughn/challenge/fpga_arch.html)
- [http://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15828-s98/lectures/0119/index.htm](http://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15828-s98/lectures/0119/index.htm)
- [http://www.edacafe.com/books/ASIC/Book/CH05/CH05.1.php](http://www.edacafe.com/books/ASIC/Book/CH05/CH05.1.php)
- Stephen Brown and Jonathan Rose, “Architecture of FPGAs and CPLDs: A Tutorial”, (Univ. of Toronto)
- Derek Curd, “Power Consumption In 65nm FPGAs”, Xilinx WP246 (V1.2) February 1, 2007

* Available on classwiki