

# Circuit and Signaling Co-Design for Ultra-Wideband Communications

Armin Tajalli Laboratory of Circuits and Systems (LCAS) — University of Utah June 18, 2019 — IEEE Swiss Solid-State Circuit Talk

## Outlook

- Data Movement in High Performance Computing
  - General roadmap of HPC
  - Impact of communication
- Energy Efficient Wireline (Copper) Design
  - Effective signaling methods
  - Matched hardware topologies
- Summary

Introduction to LCAS, ECE, University of Utah

# Data Movement in High-Performance Computing



# **Future Applications & Data Flow**



### **Processing Power**



[M. Horowitz, et. al.] [karlrupp.net]

### **Processing Power Physical Limits: Power Density and Heat**



### **Processing Power Multi-Core Processors**

- IBM Power9
- 24x cores
- 14 nm FinFET, 17 ML, 8B transistors
- Includes eDRAM and SRAM
- 16 and 25 Gb/s interfaces
- 13 Tb/s off chip BW
- Power consumption (approximately):
  - Core: 57%
  - Clock: 10%
  - Cache: 5%
  - I/O: 15%
  - Leakage: 13%





[IBM, ISSCC'2017]

### Processing Power Physical Limits: Yield and Cost



## Processing Power Multi-Core MCM Processors

- AMD Zeppelin SoC targeting server market
- FinFET 14 nm
- 4x die multi-chip module (MCM)
- 8x Zen cores per each die
- L3 cache 16MB
- 32x high speed serdes lanes
- Similar monolithic chip (32x cores) would cost 70% more than 4x smaller chips
- Yield of making MCM much better than yield of large size chips (expected size 777 mm<sup>2</sup>)





[AMD, ISSCC'2018]

### **Observations Following Trends**



### **Observations Following Trends**



## **Processing Power Physical Limits: Communication**



### What is ISI?



### **Overcome ISI Barrier**



### **Observations**



### **Observations**



# **Energy Efficient Link Design**



### **Differential Transceiver**





#### **Differential signaling**

• Robust, however requires two wires to carry one bit

### **Differential Signaling**



Single ended to differential conversion

### **Differential Signaling**



#### **Properties of differential signaling:**

- Robust against crosstalk, supply noise, and common mode noise
- Produces no supply noise SSO
- ISI ratio is ONE

# Differential Signaling Encoder



Walsh-Hadamard Matrix

#### **Properties of differential signaling:**

- Robust against crosstalk, supply noise, and common mode noise
- Produces no supply noise SSO
- ISI ratio is ONE
- Puts 1b over 2 wires

## Differential Signaling Constellation



**Comparison levels are orthogonal to CM** 

### Differential Signaling Decoder

#### Decoder

| +1 | +1 |
|----|----|
| +1 | -1 |

VCM + D VCM - D

Walsh-Hadamard Matrix

# Differential Signaling Reduce Redundancy



H: Walsh-Hadamard Matrix

#### **Properties of differential signaling:**

- Robust against crosstalk, supply noise, and common mode noise
- Produces no supply noise SSO
- ISI ratio is ONE
- Puts N bits over N+1 wires

# Circuit Topology Linear Analog Encoder/Decoder



#### **Observations:**

- A linear (analog) decoder can convert a four level signal back to a binary signal in front of slicer
- ISI sensitivity improves by a factor a factor of 3x
- Analog linear combiner can be found for Walsh-Hadamard based transformations, but not for PAM<sub>n</sub>, n>2.

### Example

#### **Example:**

- Take 5 bits and put them over 6 wires
- ISI sensitivity stays as good as differential signaling (ISIRatio = 1)
- Improves spectrum efficiency by a factor of close to 2x.

$$CNRZ = \begin{bmatrix} +3 & +3 & +3 & +3 & +3 & +3 \\ -3 & -3 & -3 & +3 & +3 & +3 \\ +2 & +2 & -4 & 0 & 0 & 0 \\ +3 & -3 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & -4 & +2 & +2 \\ 0 & 0 & 0 & 0 & -3 & +3 \end{bmatrix}$$

### **Circuit Topology Linear Analog Encoder**

#### **Example:**

- Take 5 bits and put them over 6 wires
- Does not add any extra latency
- Simple voltage mode or current mode driver can be used.



### **Circuit Topology Linear Analog Decoder**



#### Example:

- Take 6 wires as its input and produces a binary output
- Does not add any extra latency

### **Evolution of Signaling Schemes**

Single Wire PAM2 SES

• PAM2 (SES or DS) are part of a larger family of signaling method that exhibit the same ISI sensitivity (based on definition: ISI Ratio = 1).



### **Evolution of Signaling Schemes**





- ENRZ: Ensemble-NRZ
- CNRZ: Correlated-NRZ



# **Correlated NRZ Lane**



### High Density Lane Architecture



# High Density Lane Clocking



- Forwarded clock to track jitter
- Rx PLL BW can be as high as 1.5 GHz
- Clock/data alignment algorithm is used to track the best sampling point.

# High Density Lane Rx PLL



- Type-II PLL is used
- Feedback delay is minimized using PDXI element
- Jitter generation: 220 fs-rms

# High Density Lane Rx PLL



- PDXI: phase detector, phase interpolator, and charge-pump
- Decomposed XOR based architecture is used to reduce cost.

# High Density Lane Rx Front-End



## High Density Lane Chip Photo



- 500 Gb/s Transceiver
- FinFET 16 nm
- Each Tx/Rx includes 24 data wires and 2 clock wires

### High Density Lane Channel

Rx Τх Тχ Rx Тх Rx Тх Rx Rx Тх Тх Rx Rx T x Rx Rx Τх Тх Rx \* \* \* \* \*

- Link performance has been measured for different channel lengths (5 mm unto 30 mm)
- Channel loss can be as high as 6 dB

## **High Density Lane Experimental Data**



- Each Chord includes 6 wires, carrying 5 bits
- All 5 bits exhibit binary eye diagrams with the same height.



42 Mead Course — Advanced Signaling for Communication Over Copper— March 28, 2019 — Sana Oraz — Advanced Signaling for Communication Over Copper— March 28, 2019 — Sana Oraz — Advanced Signaling for Communication Over Copper— March 28, 2019 — Sana Oraz — Advanced Signaling for Communication Over Copper— March 28, 2019 — Sana Oraz — Advanced Signaling for Communication Over Copper— March 28, 2019 — Sana Oraz — Advanced Signaling for Communication Over Copper— March 28, 2019 — Sana Oraz — Advanced Signaling for Communication Over Copper— March 28, 2019 — Sana Oraz — Advanced Signaling for Communication Over Copper— March 28, 2019 — Sana Oraz — Advanced Signaling for Communication Over Copper— March 28, 2019 — Sana Oraz — Advanced Signaling for Communication Over Copper— March 28, 2019 — Sana Oraz — Advanced Signaling for Communication Over Copper— March 28, 2019 — Sana Oraz — Advanced Signaling for Communication Over Copper— March 28, 2019 — Sana Oraz

## High Density Lane Performance Summary

| Reference                      |           | [4]        | [3]        | [2]       | [This Work]  |
|--------------------------------|-----------|------------|------------|-----------|--------------|
| Signaling                      |           | SES (*1)   | SES        | CNRZ (*2) | CNRZ-EE (*3) |
| Data rate/pin                  | [Gb/s/w]  | 20         | 25         | 20.83     | 20.83        |
| Channel Loss                   | [dB]      | 1          | 8.5        | 3         | 6            |
| CMOS Technology                | [nm]      | 28         | 16         | 28        | 16           |
| BER                            | [b/b]     | 1E-12      | 1E-15      | 1E-15     | 1E-15        |
| Energy Consumption             | [pJ/bit]  | 0.54       | 1.17       | 0.94      | 1.02         |
| Throughput Rx+Tx               | [Gb/s]    | 160 (*4)   | 200        | 250       | 1000         |
| Throughput Die Edge Density    | [Gb/s/mm] | 233.2 (*4) | 291.5      | 166.3     | 416.7        |
| (*1) SES: Single Ended Signali | ng        | (*3) CNRZ  | -EE: Equal | eye CNRZ  |              |

(\*2) CNRZ: Correlated NRZ

(\*4) Assumed the same bump map as [3]

[2] A. Shokrollahi, et al., ISSCC'2016.[3] J. M. Wilson, et al., ISSCC'2018.

[4] J. Poulton, JSSC'2013.

### High Density Lane Comparison: Chord vs. PAM4



# Conclusion





- High performance and high speed serial links are key elements in modern distributed processing systems. In such systems energy efficiency, pin-efficiency, and more precisely data density are critically important.
- Virtually all multi-core and multi-chip systems are enabled by innovative short distance data communication solutions with very high bandwidth ( > Tb/s) at low energy cost ( < 1 pJ/b). Applications such as Machine learning highly depend on such links.
- There are plenty of **opportunities** for improving the data BW, speed and energy efficiency, with innovative circuit and system level solutions.
- Unlike in wireless communication systems, the potentials of using different coding and signaling methods have not been accurately investigated in wire-line systems.

• Acknowledgement: Thanks to Kandou Bus for supporting this work and providing information.

### **Summary**



**Circuit Design** 

| Solid State Devices |            |     |  |  |  |
|---------------------|------------|-----|--|--|--|
| Lamp, BJT           | MOS/FinFET | GAA |  |  |  |



LCAS Employs Advance Signal Processing, Communications, and Coding For Extreme High Performance & Energy Efficient Circuit Design

> Electrical & Computer Engineering Department , University Of Utah Laboratory of Integrated Circuits and Systems (LCAS)

