

COLLABORATIVE

RESEARCH

PIONEERS IN

# **Computation vis-à-vis Physics: A Framework**

VIA 2020 Forum, July 10 & 11, 2008 Ralph K. Cavin, III, Victor V. Zhirnov & Sadasivan Shankar (Intel)

# Outline



- Relations between maximum computational performance and device physics
  - Correlations between computer performance and technology capability
  - Von Neumann threshold
- Binary switch abstraction
  - Generic floorplan and energetics
  - Connected Binary Switches
  - Floorspace, Timing and Energy for Communication between Binary Switches

#### 'Minimal' Turing Machine

- System scaling limits
- Energetics and efficiency

## Moore's Law: Binary Information Throughput







Sources: The Intel Microprocessor Quick Reference Guide and TSCP Benchmark Scores



# THESIS



There appears to be a functional relationship between ultimate technology capability defined as the maximum number of binary transitions per unit time,  $\beta$ , and the millions of instructions executed per section,  $\mu$ , executed by a single processor: Number of binary







Werner Heisenberg

Ludwig Boltzmann

Can computational theory suggest new devices? Stan Williams @ Nanomorphic Forum



# Two-well bit – Universal Device Model



# How can we increase MIPS?





Sources: The Intel Microprocessor Quick Reference Guide and TSCP Benchmark Scores



Sources: The Intel Microprocessor Quick Reference Guide and TSCP Benchmark Scores



**Total Power Dissipation** - A Catastrophe!  $(@E_{bit} = kTln(2))$ 00

$$P_{chip} = \frac{n \cdot E_{bit}}{t} = 6 \cdot 10^{12} [cm^{-2}] \cdot \frac{10^{-20} [J]}{4 \cdot 10^{-14} [s]}$$

$$E_{bit} = 3k_B T \ln 2 \approx 10^{-20} J$$

$$E_{bit} = 3k_BT \ln 2 \approx 10^{-20} J$$

$$P_{chip} = 1.5 \times 10^6 \frac{W}{C_{rev}^2}$$
The circuit would vaporize when it is turned on!
Limits of Cooling?



SRO

#### **Energy Costs of Computation:** *Energy Consumed and Heat generated*







PIONEERS IN Collaborative

#### RESEARCH®

# **Biological Computation?**

#### **'Computers Are Like Brains? Don't They Wish'** *The Wall Street Journal, July 9 2008*

# Most complex information-management system in the universe...





A CMOS machine at the limits of scaling would use prodigious amounts of power

# When will computer hardware match the human brain?



# Computing Power: MIPS ( $\mu$ ) vs. BIT ( $\beta$ )







PIDNEERS IN COLLABORATIVE RESEARCH®

# **Chip Multiprocessors**

#### **Ralph K. Cavin III and Victor V. Zhirnov**

#### Semiconductor Research Corporation

IEEE/ACM International Symposium on Nanoscale Architectures San Jose, CA, October 21-22, 2007



Sources: The Intel Microprocessor Quick Reference Guide and TSCP Benchmark Scores





Sources: The Intel Microprocessor Quick Reference Guide and TSCP Benchmark Scores





Multi-Core Architectures: A number K of light-weight processors instead of one heavy-weight processor

A Multi-Core processor consists of a total of *N* binary switches organized in *K* supercells or cores. Each core in this organization is a lighter-weight general-purpose information processor, containing *M* binary switches: M=N/K

$$M < N \Rightarrow E_{b_{\min}}(M) > E_{b_{\min}}(N)$$
 for the same error probability

#### **Favorable Multi-Core Postulates**

1) The collective action of all *K* cores is equivalent to the action of the single-core

- 2) All processors are engaged in useful work
- 3) Each core contains an error-detecting mechanism
- 4) The other cores are able to wait until the failed microtask computation on a core repeats the microtask to generate correct answer

# **Extreme Multi-Core Analysis**



Power consumption by K cores:

$$P_{K} = K \cdot \frac{M}{t_{sw}(M)} \cdot E_{b\min}(M) = \frac{N}{t_{sw}(M)} \cdot E_{b\min}(N)$$



# "Coreness" / "Weight"- Dilemma



 The is a limit for a maximum number of transistors in 1cm<sup>2</sup> of chip area

$$N_{\rm max} \sim 10^{10} cm^{-2}$$
 (L<sub>g</sub>=5 nm)

- A Multi-Core Information processor consists of a total of N binary switches organized in K supercells or cores.
  - Each core in this organization is a lighter-weight generalpurpose information processor, containing *M* binary switches: *M*=*N*/*K*
  - In the limit:

it:  

$$M = \frac{10^{10}}{K}$$
What is smallest M?  
System scaling limits need to be understood

# **Different Facets of Scaling**





| Scaling of 8080 MPU |              |  |                  |                             | RC® |
|---------------------|--------------|--|------------------|-----------------------------|-----|
|                     |              |  |                  | 020                         |     |
| Technology:         | NMOS         |  | Technology:      | CMOS                        |     |
| Feature size:       | 6 μ <b>m</b> |  | Feature size:    | 6 nm                        |     |
| # of transistors    | : 4500       |  | # of transistors | : 4500                      |     |
| Die size:           | 5 mm x 4 mm  |  | Die size:        | 5 μ <b>m x 4</b> μ <b>m</b> |     |
| Voltage:            | 5V, 12 V     |  | Voltage:         | ~0.5 V                      |     |
| Frequency:          | 2 MHz        |  | Frequency:       | ~2 MHz-1GHz                 |     |
| Power:              | 1.5 W        |  | Power:           | ~10nW-10µW                  |     |

# **System scaling limits**



# Multi-core CPU

 What is the maximum possible number of cores in multi-core processors

# • 'Mobile supercomputers'

What is the smallest possible size of an intelligent 'piconode'?



# Von Neumann's Threshold



"If one constructs the automaton (A) correctly, then any additional requirements about the automaton can be handled by sufficiently elaborated instructions. This is only true if A is sufficiently complicated, if it has reached a certain minimum of complexity" (J. von Neumann)

n

Capability for general-purpose computing?: C>1 Yes C<1 No 1



Von Neumann threshold

'Minimal' Turing Machine

# Binary switch abstraction: Generic floorplan and energetics





# **Connected Binary Switches**





# **Connected Binary Switches**





# Interconnect abstraction: Extended Well Model





# Connecting Binary Switches via Wires: Extended Well Model

The problem is to 'place' the electron on the down stream gate – more than one electron is needed to 'charge' the line



# Connecting Binary Switches via Wires (L>4a, N electrons)



For logic operation, a binary switch needs to control at least two other binary switches



## Minimum number of electrons in interconnect line for communication and fan-out





# Minimum switching energy for connected binary switches





# **Energy per interconnect tile**



#### **Floorspace Expenses of Communication** between Binary Switches



Assumption: For each of 3 tiles of Binary Switch and for a fan-out **of three**, we need at least:

One contacting interconnect tile (3 total) and one connecting interconnect tile (3 total)



#### Digital circuit abstraction: Generic floorplan and energetics and speed

Switching energy of one binary switch in a circuit (FO3)



$$\begin{array}{c}
3 \text{ switch tiles} \\
E_{sw} = 3E_b + 6E_b = 9k_BT\ln 2 \\
6 \text{ wire tiles}
\end{array}$$
Operational energy of a circuit of binary switches:  
(50% activity)
$$\begin{array}{c}
E_{op} = \frac{9}{2}nk_BT\ln 2 \\
E_{op} = \frac{9}{2}nk_BT\ln 2
\end{array}$$

 $Area_{\min} = n \cdot 8a^2$  Joyner tiling

Switching delay of one binary switch in a circuit:

Speed: 
$$\tau_{\min}$$
/tile  
 $\tau_{\min} = \frac{\hbar}{kT \ln 2}$  ~40 fs

$$t_{sw} = 9 \tau_{min}$$







The minimal ALU does 2²=4 operations on two 1-bit X and Y:Operation 1: X AND YOperation 2: X OR YOperation 3: (X+Y)Operation 4: (X+(NOT Y))Jan Rabaey,Digital Integrated Circuits

### Minimal ALU abstraction: *Energetics*

$$E_{ALU} = \frac{9}{2} \cdot 98 \cdot k_B T \ln 2 \sim 300 k_B T$$

$$E_{ALU} = \frac{9}{2} \cdot 98 \cdot k_B T \ln 2 \sim 9k_B T$$

$$\eta_{AND} \sim 3\%$$

$$E_{ADD} = \frac{9}{2} \cdot 27 \cdot k_B T \ln 2 \sim 84 k_B T$$

$$\eta_{ADD} \sim 28\%$$

All 4 units execute even though only one output is used

#### Total: 98 devices

R

# Can we increase ALU efficiency?





#### Minimal ALU abstraction: Timing



SSC

# **Minimal CPU**





# **Minimal Turing Machine**





8 bit per cycle

# **Program Memory per operation**



# **Minimal Turing Machine**





#### **Turing Machine Implementation:** *Generic floorplan and energetics*





Per full CPU operation:

$$E_{op} = 3 \cdot 4 \cdot 10^{-18} \frac{J}{cycle} \approx 10^{-17} \frac{J}{operation}$$

47

# Minimal Turing Machine: A summary







## Computing Power: MIPS ( $\mu$ ) vs. BIT ( $\beta$ )





## Computing Power: MIPS ( $\mu$ ) vs. BIT ( $\beta$ )





## Computing Power: MIPS ( $\mu$ ) vs. BIT ( $\beta$ )





# Summary



- The Minimal Turing Machine lies on the different performance trajectories from conventional computers
   It has slope to meet brain performance
- More detailed physics based analysis is needed
  - System thermodynamics of computation
    - Carnot's equivalent for Computational Engine?
- Lessons from Biological Computation?
- Candidates for beyond-CMOS nano-electronics should be evaluated in the context of system scaling
   e.g. spintronic minimal Turing Machine?



PIONEERS IN

COLLABORATIVE

RESEARCH®

#### **Back-up**

#### **Extreme Multi-Core Analysis**



Power consumption by K cores:

$$P_K = K \cdot \frac{M}{t_{sw}(M)} \cdot E_{b\min}(M) = \frac{N}{t_{sw}(M)} \cdot E_{b\min}(N)$$

