



### **Signal Integrity Issues and High-Speed Interconnects**

#### Prof. Ram Achar, Fellow IEEE, Fellow EIC Carleton University 5170ME, Dept. of Electronics Ottawa, Ontario, Canada – K1S 5B6 Email: achar@doe.carleton.ca; Ph: 613-520-2600, Ext: 5651; Fax: 613-520-5708

IEEE Electronics Packaging Society Distinguished Lecture



## This Presentation is supported by the IEEE Electronics Packaging Society's Distinguished Lecturer Program

eps.ieee.org





IEEE

RONICS



### **Electronics Packaging Society**

A Global Society with...

...Chapters, members, constituents spanning the world 38 Chapters located in US, Asia/Pacific, Europe **12 Technical Committees** 2200+ members worldwide 650k Trans/Conf Downloads/yr 4500 attendees at 25 Conferences/Workshops Packaging Field + 6 EPS Awards + PhD Fellowship Peer Reviewed Transactions EEE

### **EPS Local Chapters**

Bangalore Israel Russia Beijing **Ko**rea Shanghai Benelux Japan Singapore Bulgaria Malaysia Switzerland Canada (4) Nordic Taipei France (Sweden, Denmark, Ukraine (2) Finland, Norway, Estonia, United Kingdom & Germany Iceland) Hong Kong **Republic of Ireland** Poland Hungary/Romania United States (12)

### **EPS Technical Committees**

- Materials & Processes
  - Chair: Bing Dang
- High Density Substrates & Boards
  - Chair: Yasumitsu Orii
- Electrical Design, Modeling & Simulation
  - Chair: Stefano Grivet-Talocia
- Thermal & Mechanical
  - Chair: Ankur Jain
- Emerging Technology
  - Chair: Benson Chan
- Nanotechnology
  - Chairs: Americas: Raj M. Pulugurtha, Europe: Attila Bonyar, Asia: Jian Cai

- Power & Energy
  - Chair: Patrick McCluskey
- **RF & Thz Technologies** 
  - Chair: Manos Tentzeris,
- Photonics Communication, Sensing, Lighting
  - Chair: Gnyaneshwar Ramakrishna
- 3D/TSV
  - Chair: Peter Ramm
- Reliability
  - Chair: Przemek Gromala
  - Test
    - Chair: Pooya Taday
      EEEE

### **Peer-Reviewed Technical Publication**



IEEE TRANSACTIONS ON COMPONENTS, PACKAGING, AND MANUFACTURING TECHNOLOGY



#### **Transactions on CPMT**

- 595 submissions (2020)
- 240 papers published
- Impact Factor: ~ 1.7 (2020)
- Xplore Usage: 40,000+

VP Publications – Dr. Ravi Mahajan, Intel CPMT Transactions, Monthly eNewsletter, and Bi-Annual printed Newsletter



#### Technical Program (data as of last year)

|                    |                 | Ave     | rage                | Electron | nics Packaging |  |
|--------------------|-----------------|---------|---------------------|----------|----------------|--|
| # of PDFs Publish  | ed              |         | 8,469               |          | 6,822          |  |
| # of Events        |                 |         | 76                  |          | 74             |  |
| # of Articles View | ved (Downloads) | 1       | 1,666,555 1,395,158 |          | 1,395,158      |  |
| PDFs/Conference    | 2               | 112     |                     | 92       |                |  |
| Downloads/PDF      |                 |         | 197                 |          | 204            |  |
| Avg OU Package     | Net             |         | 1.5%                |          | 1.3%           |  |
| Financial Spon     | sor Co Sponsor  | Techr   | nical               | Spor     | nsor           |  |
| ► ECTC             | ► EPEPS         | ASMC    |                     | ► IIV    | IPACT          |  |
| EDAPS              | ► EPTC          | DTIP    |                     | ► ISS    | SE             |  |
| ► HOLM             | ► ESTC          | EDPS    |                     | ► M      | ID             |  |
| ► ICSJ             | ► IEMT          | EMPC    |                     | ► No     | ordPac         |  |
| ITherm             | ► EMAP          | ► EOS/E | SD                  | ► Pa     | n Pacific      |  |
| ► SPI              | ► IWASI         | EuroS   | imE                 | ► SE     | MI THERM       |  |
| ► 3DIC             | ► IWIPP         | ICEPT   |                     | ► TH     | IERMINIC       |  |
|                    |                 | 3D-PE   | IM                  |          |                |  |



### **EPS Awards & Recognition**

IEEE Electronics Packaging Award (IEEE Technical Field Award)



**Outstanding Sustained Technical Contribution Award** 

**Electronics Manufacturing Technology** 

**David Feldman Outstanding Contribution** 

**Exceptional Technical Achievement** 

**Outstanding Young Engineer** 

l Contributions

**Transactions Best Papers** 

Regional Contributions

PhD Fellowship





### Carleton University – Canada's Capital University



### **Research @ CAD Group in Carleton**

- High-Speed Interconnects,
- Signal & Power Integrity
- Circuit Simulation
- Timing Analysis
- Model-Order Reduction Algorithms
- Variational Analysis
- Optimization
- Mixed Digital, Analog, EM, RF Analysis
- Parallel Algorithms
- Neural Networks

• . . . . . .



## Design Trends

- Faster Devices
- Compact Products
- Multi-Function
- Less Clutter
- Use Less Power

- High-Frequency
- High-Density
- Wireless
- Low-Power



### **Agenda**

- Emerging Product Trends
- Interconnect Scaling
- Signal Integrity Issues
- Interconnect Hierarchy
- What is a "High-Speed Interconnect"?
- Interconnect Models and Simulation Challenges
- Advanced/Recent Models
- Conclusions

## **High-Speed Design Issues**



### Impact of Signal Integrity Issues



### **Agenda**

- Emerging Product Trends
- Interconnect Scaling: Trend & Issues
  - High-Speed Design Issues
  - Interconnect Hierarchy
  - What is a "High-Speed Interconnect"?
  - Interconnect Models and Simulation Challenges
  - Advanced/Recent Models
  - Conclusions

### Moore's Law on Density/Speed



intel

#### Source: Intel, www.intel.com

### **Evolution of Density and Frequency**

- In a Billion Transistor Design, there could be multiple billions of interconnects, many of which do not scale in performance
- High-Speed Signal Propagation Issues

### **Transistor, Interconnect – Scaling & SI Issues**



### **Technology Scaling:**

## Scale W, L & t<sub>ox</sub> by a factor of 'S'

### **Transistor, Interconnect – Scaling & SI Issues**

| Parameter                        | Relation              | Scaling<br>Factor |
|----------------------------------|-----------------------|-------------------|
| Dimensions                       | W, L, t <sub>ox</sub> | 1/S               |
| Voltages                         | $V_{DD}, V_{T}$       | 1/S               |
| Currents                         | I <sub>DS</sub>       | 1/S               |
| <b>Power Dissipation/Gate</b>    | P = IV                | 1/S <sup>2</sup>  |
| Area Per Device                  | A = WxL               | 1/S <sup>2</sup>  |
| <b>Power Dissipation Density</b> | P/A                   | 1                 |

## **Transistor - Ideal Scaling**

### Impact on Transistor Related Delays

| Parameter                | Relation                                   | Scaling<br>Factor |
|--------------------------|--------------------------------------------|-------------------|
| Gate Capacitance         | $C_g = (\epsilon_{ox}/t_{ox})(W \times L)$ | 1/S               |
| Transistor On Resistance | $R_{tr} = V_{DD} / I_{DS}$                 | 1                 |
| Intrinsic Gate Delay     | $T_g = R_{tr} C_g$                         | 1/S               |

## A look at Interconnects.....



### **Scaling: Local Interconnections**

| Parameter                                   | Relation                                                                  | Scaling<br>Factor     |
|---------------------------------------------|---------------------------------------------------------------------------|-----------------------|
| Cross-sectional Dimensions                  | W <sub>int</sub> , H <sub>int</sub> , W <sub>sp</sub> , T <sub>int</sub>  | 1/S                   |
| Capacitance Per Unit Length                 | C <sub>int</sub> α (ε <sub>ox</sub> )(W <sub>int</sub> /t <sub>ox</sub> ) | 1                     |
| Resistance Per Unit Length                  | $R_{int} = \zeta_{int} / (W_{int} \times H_{int})$                        | <b>S</b> <sup>2</sup> |
| RC Time Constant per unit<br>Length         | R <sub>int</sub> x C <sub>int</sub>                                       | <b>S</b> <sup>2</sup> |
| Local Interconnection Length                | L <sub>int</sub>                                                          | 1/S                   |
| <b>Total Local Interconnection RC Delay</b> | $T = (R_{int} C_{int}) L^{2}_{int}$                                       | 1                     |

### **Scaling: Global Interconnections**

| Parameter                                         | Relation                           | Scaling Factor                                           |
|---------------------------------------------------|------------------------------------|----------------------------------------------------------|
| Die Size                                          |                                    | S <sub>c</sub>                                           |
| Global Interconnection<br>Length                  | L <sub>glob</sub>                  | S <sub>c</sub>                                           |
| <b>Global Interconnections</b><br><b>RC Delay</b> | $T = (R_{int} C_{int}) L^2_{glob}$ | <b>S</b> <sup>2</sup> <b>S</b> <sup>2</sup> <sub>c</sub> |

### **Agenda**

- Emerging Product Trends
- Interconnect Scaling: Trend & Issues
- High-Speed Design Issues
  - Interconnect Hierarchy
  - What is a "High-Speed Interconnect"?
  - Interconnect Models and Simulation Challenges
  - Advanced/Recent Models
  - Conclusions

## **Attenuation**



## Interconnect Effects: Delay



© R. Achar, "High-Speed Interconnects and Signal Integrity Challenges"

## **Reflection/Ringing**



## **Crosstalk**





### **Agenda**

- Emerging Product Trends
- Interconnect Scaling: Trend & Issues
- High-Speed Design Issues
- Interconnect Hierarchy
  - What is a "High-Speed Interconnect"?
  - Interconnect Models and Simulation Challenges
  - Advanced/Recent Models
  - Conclusions

## **Ubiquitous Interconnects**



## **Role of the Package**

- Distribute power and signals,
- Dissipate the heat generated by the IC.
- Mechanical support for the chip,

Issues to tackle, parasitic elements such as:

- capacitive coupling between connections or leads,
- inductance of the connections or leads,
- resistance of the connections
- The values of the parasitic elements depend on the package layout and structure,
   Have Significant Impact on the Package Performance.

## **Role of the Package**

- The design and construction of packages vary significantly, but most of them are fabricated either from plastic or ceramic materials.
- The chip to package interconnects can be divided into three main categories:

Wire-bond (WB) Tape-automated-bond (TAB) Flip-chip

## Package – Wire Bond

Wire-bond (WB): Although the oldest method, wire-bonding is still the dominant packaging method used today.





Due to the self-inductance and mutual inductance of the wires, noise on the PDN and crosstalk between adjacent signals might occur.

## **Ground Bounce**

**On-board VCC bus** 



On-board ground bus

# $V_{g} = L_{g} \times \frac{d [i_{discharge}(t) \ total]}{dt}$

## Flip Chip Technology



### -Uniform Distribution of Power -Shorter contacts: reduced Parasitics -More Widely used
### **Agenda**

- Emerging Product Trends
- Interconnect Scaling: Trend & Issues
- High-Speed Design Issues
- Interconnect Hierarchy
- What is a "High-Speed Interconnect"?
  - Interconnect Models and Simulation Challenges
  - Advanced/Recent Models
  - Conclusions

## High-Speed!! What is it??

- Time taken to travel through the interconnect can no longer be neglected!!
- Interconnects can no more be treated as electrically short ( $\rightarrow d \le \lambda/10$ )
- Need to start worrying about high-frequency effects when:

### $d \ge \lambda/10 \rightarrow$ Electrically Long Interconnects

### What is a High-Speed Interconnect?





At higher frequencies, Interconnect length becomes comparable to the Wavelength



Frequency = 1GHz 
$$\rightarrow \lambda \approx \frac{\mathbf{v}}{\mathbf{f}} = \frac{1.5 \times 10^{10}}{1 \times 10^9} = 15 \text{ cm}$$
  
 $\rightarrow d > 1.5 \text{ cm}$ 

### **Time-Freq Relations**



### What is an High-Speed Interconnect?

Sharper pulses contain higher frequency harmonics

$$f_{\text{max}} = \frac{0.35}{t_r \sqrt{t_r}}$$

### $t_r = 0.1ns \rightarrow f_{max} = 3.5GHz \rightarrow$

→ d < 4mm

$$\lambda \approx \frac{\mathbf{v}}{\mathbf{f}} = \frac{1.5 \times 10^{10}}{3.5 \times 10^9} \approx 4 \text{ cm}$$

### **Agenda**

- Emerging Product Trends
- Interconnect Scaling: Trend & Issues
- High-Speed Design Issues
- Interconnect Hierarchy
- What is a "High-Speed Interconnect"?
- Interconnect Models and Simulation Challenges
  - Advanced/Recent Models
  - Conclusions

### **Modeling of Interconnects**



### **Interconnect Models**



### **Distributed Transmission Lines**

#### Distributed Transmission Lines

- ➔ Lossless, Lossy
- → Single, Multiconductor
- Frequency Independent/ dependent
  - p.u.l. parameters
  - (skin/proximity/edge effects)
- → Uniform/Non-uniform
- Current Distribution Related Effects:
  - Skin Effect
     Edge Effect
     Proximity Effect
- Surface Roughness Effects

### Frequency Dependence of R & L Parameters



### **Agenda**

- Emerging Product Trends
- Interconnect Scaling: Trend & Issues
- High-Speed Design Issues
- Interconnect Hierarchy
- What is a "High-Speed Interconnect"?
- Interconnect Models and Simulation Challenges
  - Advanced/Recent Models
  - Conclusions

### **Multi-Conductor Transmission Lines**



### **Transient Analysis**

### **Mixed Frequency/Time Simulation**



### **Lumped Segmentation - Large Circuit**



Large CPU Cost

### **Transient Simulation Issues**

Mixed Frequency/Time

- Complexity
- CPU time
- Memory

### Simulator Interface

### MACROMODELING

 $\frac{\partial}{\partial z} V(z,t) = -R I(z,t) - L \frac{\partial}{\partial t} I(z,t)$  $\frac{\partial}{\partial z} I(z,t) = -G V(z,t) - C \frac{\partial}{\partial t} V(z,t)$ 

### Macromodeling

Circuit<br/>Simulators $\int$  $\frac{d}{dt}x = Ax + Bu$ <br/>y = Cx

### Macromodeling

- Lumped Segmentation
- Method of Characteristics
- Least-Square Optimization
- Chebeshev, Wavelet Polynomials
- Compact Finite Differences
- Integrated Congruent Transformation
- Matrix Rational Approximation
- Model-Order Reduction Methods....

### **Discretization**



### Lumped Models - Pi

Lumped LC Lossless Line



Lumped RLGC Lossy Line



Lumped Cascaded Lossless Distributed Line









#### 50 segments v/s Distributed Model



### Transient Analysis with Distributed Interconnect



### **Frequency Response of Lumped Models**



### Frequency Response: Lumped Model with Input Pulse Spectrum



### **Transient Response**



A practically used expression to determine the number of sections (N) needed is given by:

$$N = (10\tau d)/t_r \qquad t_r = risetime$$
  
$$\tau = line \ delay$$
  
$$d = line \ length$$

Example: Rise time = 0.2ns; Lossless line: 10cm, per-unit delay =of 70.7ps.

$$N = (10 \times 70.7e^{-12} \times 10) / (0.2e^{-9}) = 35$$

### **Direct Lumped Segmentation**



### <u>Agenda</u>

- Emerging Product Trends
- Interconnect Scaling: Trend & Issues
- High-Speed Design Issues
- Interconnect Hierarchy
- What is a "High-Speed Interconnect"?
- Interconnect Models and Simulation Challenges
- Advanced/Recent Models
  - > MRA, DEPACT
  - > WR+TP, WR+TP+EMI
  - Tabulated Data, Parallelization
- Conclusions

### **Possible Efficient Macromodeling Approaches**

### 1) MoC based Algorithms

### 2) Matrix Rational Approximations

### **MoC based Algorithms**

### → Delay Extraction + Rational Approximation

### Efficient for Long Low Loss Lines

### **Difficulties**

1) Coupled Lines: Curve Fitting
 n Lines → (2n<sup>2</sup> + n) Tr. functions
 Eg. 10 Lines → 210 Tr. functions

### 2) Does not Guarantee Passivity



- Limited Bandwidth of Approximation
- Individual numerical fitting of parameters of a matrix function
- Loss of physical properties such as passivity

### **Importance of Passivity**



# Error: Failure to Converge Error: Time Step Too Small – Abort Error: .....

### **Unstable response**



#### **Time response of stable but nonpassive reduced model**

### Passivity



Y(s) is passive iff **1)**  $Y(s^*) = Y^*(s)$ 2)  $z^{*t}[Y^t(s^*) + Y(s)]z \ge 0$ , Re(s)>0 Y(s) is a positive real matrix **Ensuring passivity of the reduced Macromodel is a challenging task!!** 

### **Defining the objectives.....**

 Can we improve the bandwidth of approximation without facing ill-conditioning?

 Can we come up with a matrix based approximation without resorting to individual entity approximations?

 Can we do the approximation analytically without resorting to numerical curve fitting?

Can we ensure the physical properties for the model?
#### <u>Agenda</u>

- Emerging Product Trends
- Interconnect Scaling: Trend & Issues
- High-Speed Design Issues
- Interconnect Hierarchy
- What is a "High-Speed Interconnect"?
- Interconnect Models and Simulation Challenges
- Advanced/Recent Models
  - > MRA, DEPACT
    - > WR+TP, WR+TP+EMC
    - Tabulated Data, Parallelization
- Conclusions

$$\frac{\partial}{\partial z} \mathbf{V}(z,t) = -\mathbf{R} \mathbf{I}(z,t) - \mathbf{L} \frac{\partial}{\partial t} \mathbf{I}(z,t)$$
$$\frac{\partial}{\partial z} \mathbf{I}(z,t) = -\mathbf{G} \mathbf{V}(z,t) - \mathbf{C} \frac{\partial}{\partial t} \mathbf{V}(z,t)$$
$$\begin{bmatrix} \mathbf{I}(d,s) \\ \mathbf{V}(d,s) \end{bmatrix} = e^{\mathbf{Z}} \begin{bmatrix} \mathbf{I}(0,s) \\ \mathbf{V}(0,s) \end{bmatrix}$$
$$\mathbf{Z} = (\mathbf{D} + s\mathbf{E})d \qquad \mathbf{D} = \begin{bmatrix} \mathbf{0} & -\mathbf{R} \\ -\mathbf{G} & \mathbf{0} \end{bmatrix} \qquad \mathbf{E} = \begin{bmatrix} \mathbf{0} & -\mathbf{L} \\ -\mathbf{C} & \mathbf{0} \end{bmatrix}$$

### This is what we are approximating.....

## **Matrix Rational Approximation**



Closed-Form Relations
 Can achieve higher bandwidth

#### **CONCEPT - MRA**

Pade' Approximation of Exponential Function



→Closed-form relation for coefficients
→High-order approximation possible
→No ill-conditioning

### **Time-Domain Macromodel**



### **Passivity??**



#### Time response of stable but nonpassive reduced model

#### **Passivity Conditions**

<u>Theorem</u>: Let the rational function of *e*<sup>x</sup> be:

$$\mathbf{e}^{\mathbf{x}} \approx \frac{Q_{N}(\mathbf{x})}{Q_{N}(-\mathbf{x})} = \frac{\sum_{i=0}^{N} q_{i} \mathbf{x}^{i}}{\sum_{i=0}^{N} q_{i} (-\mathbf{x})^{i}}$$

IF the polynomial  $Q_N(x)$  is strictly Hurwitz<sup>\*</sup> THEN

the rational matrix obtained by replacing the scalar x with the matrix Z=(D+sE)d results in a passive transmission line macromodel

# \*A <u>polynomial</u> with <u>real positive</u> <u>coefficients</u> and <u>roots</u> which are either <u>negative</u> or pairwise conjugate with <u>negative</u> <u>real parts</u>.

#### **Example 1: Coupled Lossy TL**



#### **Input: step response, rise time = 0.2 ns**

#### **Example 1: Far End Active Line**



© R. Achar, "High-Speed Interconnects and Signal Integrity Challenges"

### **Performance Comparison**

| Simulations         | MRA        | Lumped     | MNA     |
|---------------------|------------|------------|---------|
|                     | (MNA size) | (MNA size) | savings |
| Example 1           | 8281       | 48000      | 83%     |
| Example 2           | 355        | 2 482      | 86%     |
| Example3 (5cm)      | 914        | 6 002      | 85%     |
| Example 3<br>(20cm) | 3 650      | 24 002     | 85%     |
| Example 3<br>(40cm) | 7 298      | 80 002     | 91%     |

### **Example 3: Nonlinear Terminations**



© R. Achar, "High-Speed Interconnects and Signal Integrity Challenges"

#### **Transient Responses**



© R. Achar, "High-Speed Interconnects and Signal Integrity Challenges"

#### **CPU Comparison**

| Algorithm              | Total number<br>of lumped<br>sections | CPU time<br>(SPARC Ultra 5-10)<br>(seconds) |
|------------------------|---------------------------------------|---------------------------------------------|
| Conventional<br>Lumped | 300                                   | 3282                                        |
| Proposed               | 31                                    | 315                                         |

### How About Long Low-Loss Lines???



Why?

#### Without Delay Extraction



### Macromodel: Objectives

## 1) Closed Form

### Large Number of Coupled Lines

### 2) Guaranteed Passivity

# **3) Delay Extraction**



#### **Advanced Interconnect Modeling Methods**

- MoC, MRA
  DEPACT

  - WR+TP, EMI
  - Tabulated Data Macromodelling, Parallelization

### Conclusions

# **Delay Extraction - DEPACT**

 Extracts pure delay components of equivalent circuit matrix

 Pure-delay components can be simulated with existing tline (T)

Highly compact/passive macromodels

#### **Error Estimates**



### **Error Estimates**



### Example: Lossy Coupled TL

#### 5cm, 20cm, 40cm



#### Input: step response, rise time = 0.035 ns

### IBM: Line 6 (40cm)



© R. Achar, "High-Speed Interconnects and Signal Integrity Challenges"

### IBM: Line 6

| Simulation  | DEPACT   | MRA      | Lumped   |
|-------------|----------|----------|----------|
|             | time (s) | time (s) | time (s) |
| 5cm         | 0.89     | 4.16     | 32.4     |
| <b>20cm</b> | 2.47     | 25       | 292      |
| <b>40cm</b> | 4.32     | 74       | 4641     |

#### Computer: SUN Blade-1000 workstation with 900MHz UltraSPARC-III CPU.

### Problem Addressed so far.....

- Limited Bandwidth of Approximation
- Individual numerical fitting of parameters of a matrix function
- Loss of physical properties such as passivity
- Delay Extraction + Passivity

## How about large coupled lines?



#### **Advanced Interconnect Modeling Methods**

MoC, MRA, DEPACT
WR+TP
EMI
Tabulated Data

### Conclusions

#### Large # of Coupled Lines: Traditional approaches



### Large # of Coupled Lines



# Average cost of simulating N-coupled lines circuit is proportional to N^{\alpha} , where $3 \le \alpha \le 4$

# In the literature.....





#### WR + TP of Unexcited Lines- WR Sources



# **Computation of WR Source**

#### **Homogeneous Telegrapher's Equation**

$$\frac{\partial}{\partial z} \begin{bmatrix} \mathbf{V}(z,t) \\ \mathbf{I}(z,t) \end{bmatrix} = \begin{bmatrix} \mathbf{0} & -\mathbf{R} \\ -\mathbf{G} & \mathbf{0} \end{bmatrix} \begin{bmatrix} \mathbf{V}(z,t) \\ \mathbf{I}(z,t) \end{bmatrix} + \begin{bmatrix} \mathbf{0} & -\mathbf{L} \\ -\mathbf{C} & \mathbf{0} \end{bmatrix} \frac{\partial}{\partial t} \begin{bmatrix} \mathbf{V}(z,t) \\ \mathbf{I}(z,t) \end{bmatrix}$$
for k<sup>th</sup> line
$$\frac{\partial}{\partial z} v_k(z,t) = -R_{kk} i_k(z,t) - L_{kk} \frac{\partial}{\partial t} i_k(z,t) - \sum_{\substack{j=1\\j\neq k}}^{n} \frac{\mathbf{e}_k(z,t)}{\mathbf{e}_k(z,t)} = \frac{\partial}{\partial t} i_k(z,t) - \frac{\partial}{\partial t} \frac{\partial}{\partial t} v_k(z,t) - \sum_{\substack{j=1\\j\neq k}}^{n} \frac{\partial}{\partial t} \frac{\partial}{\partial t} \left( z, t \right)$$

### TRANSVERSE PARTITIONING + WR



# **Electromagnetic Interference**



### **EMI Analysis & High-Speed Interconnects**



#### **Conventional EMI Analysis: Large # of Coupled Lines**



# Simulation Time→O((n<sup>2</sup>)<sup>2</sup>)=O(n<sup>4</sup>) !!!

# WR+TP + EMI

#### **Description of EMI:**

Wa

$$\frac{\partial}{\partial z} \begin{bmatrix} \mathbf{V}(z,s) \\ \mathbf{I}(z,s) \end{bmatrix} = \begin{bmatrix} \mathbf{0} & -(\mathbf{R}+s\mathbf{L}) \\ -(\mathbf{G}+s\mathbf{C}) & \mathbf{0} \end{bmatrix} \begin{bmatrix} \mathbf{V}(z,s) \\ \mathbf{I}(z,s) \end{bmatrix} + \begin{bmatrix} \mathbf{V}_F(z,s) \\ \mathbf{I}_F(z,s) \end{bmatrix}$$

### mutually compatible representation POSSIBLE!

$$\frac{\partial}{\partial z}v_{k}(z,t) = -R_{kk}i_{k}(z,t) - L_{kk}\frac{\partial}{\partial t}i_{k}(z,t) - \tilde{e}_{k}(z,t)$$
$$\frac{\partial}{\partial z}i_{k}(z,t) = -\hat{G}_{kk}v_{k}(z,t) - \hat{C}_{kk}\frac{\partial}{\partial t}v_{k}(z,t) - \tilde{q}_{k}(z,t)$$
## **Combined representation for each iteration**



© R. Achar, "VLSI Interconnects and Signal Integrity", IIT-Khargpur, Dec. 2011

## Example: WR+TP + EMI

Example 2: 3 Line, Active terminations, Trapezoidal Incidence



# Example: WR+TP + EMI



# Example: WR+TP + EMI



© R. Achar, "VLSI Interconnects and Signal Integrity", IIT-Khargpur, Dec. 2011

# **Example: Complexity Analysis**



© R. Achar, "VLSI Interconnects and Signal Integrity", IIT-Khargpur, Dec. 2011



#### **Advanced Interconnect Modeling Methods**

- > MoC, MRA, DEPACT
- > WR+TP. EMI
- Tabulated Data Macromodeling:
  - ➢ Parallelization, GVF

### **Conclusions**

© R. Achar, "High-Speed Interconnects and Signal Integrity Challenges"

## Why measured/simulated data

At high frequencies, many complex electrical devices may have no analytical models

#### Example: 3D transmission lines, vias, packages non-uniform transmission lines on-chip passive devices

Characterized by tabulated data in terms of multiport Terminal parameters To identify a system for circuit simulations



Impedance parameters or Z-parameters
Admittance parameters or Y-parameters
Hybrid parameters or H-parameters
Scattering parameters or S-parameters

### **System identification via Direct Curve Fitting**



### **System identification via Direct Curve Fitting**

## **System identification via Direct Curve Fitting**

#### ill-conditioned

Can not achieve Higher-Order Approximations

# Solution -> Vector fitting algorithm

*Original paper:* B. Gustavsen and A. Semlyen, "Rational Approximation of Frequency Domain Responses by Vector Fitting, " *IEEE Transactions on Power Delivery*, vol. 14, no. 3, pp. 1052-1061, July 1999.

$$f(s) = \sum_{n=1}^{N} \frac{\tilde{r}_n}{s - p_n} + d + se$$

Continuously refined and evolved over last 20+ years

#### VF – an Iterative Algorithm

# Assume an initial set of poles and a scaling function $\sum_{n=1}^{N} \tilde{r}_{n}$



#### **Step1:Computation of Poles**

Therefore, for *k* frequency points *AX*=*b* where,



### **Computation of zeros of** $\sigma(s)$ : **Real Case**

With,  $X = [\tilde{c}_1 \dots \tilde{c}_n d e c_1 \dots c_N]^T$  known from AX=b

Refined poles for the next iteration can be obtained in terms of the zeros of  $\sigma(s)$ 

Zeros  $\bar{z}_n$  of  $\sigma(s)$  are the eigenvalues of the matrix

 $H = A - B C^{T}$ 



#### **Computation of Poles: Multiport Case**

$$Y(s) = \begin{bmatrix} Y_{11} & Y_{12} & \dots & Y_{1P} \\ Y_{21} & Y_{22} & \dots & Y_{2P} \\ \dots & \dots & \dots & \dots \\ Y_{P1} & Y_{P2} & \dots & Y_{PP} \end{bmatrix};$$

$$F_{p1} = \begin{bmatrix} \sum_{n=1}^{N} \frac{\tilde{c}_{n}^{1,1}}{s - \bar{p}_{n}} + d^{1,1} & \sum_{n=1}^{N} \frac{\tilde{c}_{n}^{1,2}}{s - \bar{p}_{n}} + d^{1,2} & \dots & \sum_{n=1}^{N} \frac{\tilde{c}_{n}^{1,\bar{p}}}{s - \bar{p}_{n}} + d^{1,P} \\ \sum_{n=1}^{N} \frac{\tilde{c}_{n}^{2,1}}{s - \bar{p}_{n}} + d^{2,1} & \sum_{n=1}^{N} \frac{\tilde{c}_{n}^{2,2}}{s - \bar{p}_{n}} + d^{2,2} & \dots & \sum_{n=1}^{N} \frac{\tilde{c}_{n}^{2,p}}{s - \bar{p}_{n}} + d^{2,P} \\ \dots & \dots & \dots & \dots \\ \sum_{n=1}^{N} \frac{\tilde{c}_{n}^{P,1}}{s - \bar{p}_{n}} + d^{P,1} & \sum_{n=1}^{N} \frac{\tilde{c}_{n}^{P,2}}{s - \bar{p}_{n}} + d^{P,2} & \dots & \sum_{n=1}^{N} \frac{\tilde{c}_{n}^{p,p}}{s - \bar{p}_{n}} + d^{P,P} \end{bmatrix}$$

→ A. Chinea and S. Grivet-Talocia, "On the Parallelization of Vector Fitting Algorithms", IEEE Trans. on Components, Packaging and Manufacturing Technology, vol. 1, no. 11, November 2011.

#### **Review of Parallel VF for Multi-CPU Environment**



# **Splitting Strategies**

| None Splitting                                                              | All Splitting                                                                                      |
|-----------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|
| Uses only one Scaling<br>Function for the Entire S<br>matrix                | Uses individual Scaling<br>Function for each S-<br>Element                                         |
| Leads to a common pole-set                                                  | Leads to individual pole sets for each S-element                                                   |
| → CPU1: $A_1 = [Q_1, R_1]$<br>CPU2: $A_2 = [Q_2, R_2]$                      | → CPU1: $A_1 = [Q_1, R_1] \rightarrow \hat{c}_1$<br>CPU2: $A_2 = [Q_2, R_2] \rightarrow \hat{c}_2$ |
| :<br><b>CPUT:</b> $\boldsymbol{A}_T = [\boldsymbol{Q}_T, \boldsymbol{R}_T]$ | <b>CPUT:</b> $A_T = [Q_T, R_T] \rightarrow \hat{c}_T$                                              |
| <b>Compute</b> $\hat{c}$ in a single CPU                                    |                                                                                                    |

## **GVF: GPU Based Vector Fitting**

- Emerging Computing Platform of GPUs
- Thousands of Cores
- Exploit massive parallelization potential

\*\*S. Ganeshan, N. Kumar and R. Achar and W. Lee, "GVF: GPU based Vector Fitting for Modelling of Multiport Tabulated Data Networks", IEEE CPMT, pp. 1375-1387, Aug. 2020.

# GVF: GPU based parallel Vector Fitting -None Splitting Strategy

**Bulk Transfer from CPU to GPU:**  $S_l$ , g,  $\alpha_l$ 

**GPU** 



#### Formulate and transfer to GPU: $\widehat{\Phi}$



Formulate:  $A_1 \cdots A_{p^2}$ 



**Perform QR:**  $R_1 \cdots R_{P^2}$  (MAGMA)



**Extract:**  $R_{l_{N\times N}}^{22}$ :  $R_1^{22} \cdots R_{P^2}^{22}$ 



Collect  $R_{l_{N\times N}}^{22}$  for each  $A_l$  for evaluation of residues of the scaling function



Example 1 – 60 ports, 70 poles, 1000 Frequency Samples

### Size of $\widehat{\Phi}$ = (1000 x 70); $A_l(P^2 = 3600) \rightarrow$ (2000 x 143)

| Cost of individual steps in GVF                            | Time (msec)    | Time (msec)    |
|------------------------------------------------------------|----------------|----------------|
| (Using MAGMA library)                                      | (Approach (a)) | (Approach (b)) |
|                                                            |                |                |
| Transfer of $S, g, \alpha$ from CPU to GPU                 | 20.19          | 20.19          |
| Formulation of $\widehat{\Phi}$ in CPU and transfer to GPU | 1.97           | 1.97           |
| Formulation of <i>A</i> in GPU ( <i>customized</i> )       | 0.04           | 0.04           |
| QR factorization in GPU (MAGMA)                            | 543.09         | 543.09         |
| $R_l^{22}$ extraction in GPU ( <i>customized</i> )         | 0.02           | 0.02           |
| <b>Transfer back</b> $R_l^{22}$ from GPU to CPU            | 56.00          | -              |
| <b>Residue</b> $\hat{c}$ calculation in CPU (LAPACK)       | 460.02         | -              |
| <b>Residue</b> $\hat{c}$ calculation in GPU (MAGMA)        | -              | 562.50         |
| Transfer back $\hat{c}$ from GPU to CPU                    | -              | 0.03           |
| Zeros computation in CPU                                   | 3.60           | 3.60           |
| Total Time Taken                                           | 1084.9         | 1131.44        |

Example 1 – 60 ports, 70 poles, 1000 Frequency Samples: None Splitting Strategy



### **GVF: All Splitting Strategy**



#### Example 1 – 60 ports, 70 poles, 1000 Frequency Samples: All Splitting Strategy



#### **GVF: GPU based Vector fitting algorithm** Example case: 120 ports, 80 poles, 1000 Frequency Samples All Splitting Strategy

|                                                                                                                      | PVF [34] (multi CPU) |        | <b>Proposed GVF</b> time |             |
|----------------------------------------------------------------------------------------------------------------------|----------------------|--------|--------------------------|-------------|
| Example                                                                                                              | # CPU                | Time   | using MAGMA library      | Speedup     |
|                                                                                                                      | cores                | (sec)  | (sec)                    |             |
|                                                                                                                      | used                 |        |                          |             |
| #ports = 120<br>#poles = 80<br>#fpoints = 1000<br>#size of $\widehat{\Phi}$ =1000 × 80<br>Total # of $\Phi_l$ =14400 | 1                    | 193.90 | 42.86                    | <b>4.52</b> |
|                                                                                                                      | 2                    | 95.45  | 21.21                    | <b>4.50</b> |
|                                                                                                                      | 4                    | 48.40  | 11.10                    | 4.36        |
|                                                                                                                      | 6                    | 36.80  | 8.72                     | 4.22        |
|                                                                                                                      | 8                    | 25.03  | 6.03                     | 4.15        |
|                                                                                                                      | 12                   | 17.71  | 4.43                     | 4.00        |
|                                                                                                                      | 16                   | 14.42  | 3.81                     | 3.78        |
|                                                                                                                      | 20                   | 12.23  | 3.69                     | 3.31        |

#### Example 2 – 120 ports, 80 poles, 1000 Frequency Samples: All Splitting Strategy



# Conclusions

Advanced Interconnect Modeling and Simulation methods still emerging:

- to meet the fundamental challenges such as passivity
- to be more efficient/accurate
- to meet the high-speed designer's dream of seamless analysis platform for mixed circuit/EM/RF design.



© R. Achar, "High-Speed Interconnects and Signal Integrity Challenges"