# THE MICROTCA FAST CONTROL BOARD FOR GENERIC CONTROL AND DATA ACQUISITION APPLICATIONS

Jie Zhang, Cong He (IHEP)

The 2nd MTCA/ATCA Workshop for Research and Industry Aug. 24th, 2021

### WHAT ABOUT FPGAS FOR HIGH ENERGY PHYSICS (HEP) & HIGH ENERGY PHOTON SOURCE (HEPS)?

Workloads for FPGA:

- Front-end Electronics Control
  - Fast Control
  - Slow Control
- Clock Synchronization
- Monitoring
- Data Acquisition
  - Signal processing, filtering

#### Trends:

- Machine Learning (Deep Learning)
- Data Analytics

Source: Deeper, Faster Learning with FPGA Co-Processors



### MOTIVATION

- The data volume of HEP/HEPS experiments need to reduce, PB/year to ?
- Finding new physics requires massive increase of processing power, much more flexible algorithms in software and much faster interconnects



# MICROTCA.4 GROUPING

- Divided into three groups, in each group:
  - 3 AMCs with RTM
    - FPGA boards
  - 1 CPU board
    - 4-core Xeon Processor E3-1505M (3GHz)
      - ConCurrent Technologies AM G64/472-51

5

Vadatech AMC725



# ANC AND RTN BOARDS RTM

#### u4FC&P hardware

#### AMC board

- FPGA (Kintex Ultrascale+)
- DDR4 SODIMM connector x2
- AMC backplane & RTM
  - Internal communication
- FMC Mezzanine Card (FMC) connector x2
  - ADC cards
  - FPGA cards
  - SPF+/QSFP cards
- Rear Transfer Module board
  - Fast data storage
  - FPGA (Kintex-7)
  - External communication through FMCs



AMC



# BACKPLANE TOPOLOGY

#### N.A.T NATIVE-R9-WR Crate

#### Port 0: 1GbE

- Port 1: Redundant 1GbE
- Port 2~3: Internal links
- Port 4~7: PCIe x4
- Port 8~11: Redundant PCIe x4
- Port 12~15: Internal links
- Port 17~20: Triggers, Clocks or Interlocks
- TCLKA, TCLKB: System clocks
- TCLKC, TCLKD: Redundant system clocks
- TCLKD: PCIe reference clock



#### MICROTCA.4 CRATE WITH MCH, CPU AND FPGA BOARDS





**Back View** 

#### **Front View**

# UTCA FPGA COMPUTING SPECIFICATION



#### uFC v2

- Xilinx Kintex-7 28nm 7K325T
  - 0.32 Million System Logic
  - 840 DSP
- PCIe2.0 x4
- 8GB DDR3 800MHz SDRAM ECC
- 8\*10G High-Speed Serial Links



#### u4FCV

- Xilinx Virtex-7 28nm 7VX690T
  - 0.69 Million System Logic
  - 3600 DSP
- 3\*PCIe3.0 x4
- 2\*8GB DDR3 800MHz SDRAM ECC
- 16\*10G High-Speed Serial Links



#### u4FC&P v1

- Kintex Ultrascale+ 16nm KU11P
  - 0.65 Million System Logic
  - 2928 DSP
- 4\*PCIe4.0 x4 + PCIe4.0 x8
- 16GB DDR4 1200MHz SDRAM ECC
- 2\*100G High-Speed Serial Links

| Name     | Instance Specs |         |                   |       |         |           |
|----------|----------------|---------|-------------------|-------|---------|-----------|
|          | Status         | FPGA    | Memory            | NVMe  | PCIe BW | Network   |
| uFC v2   | Ready          | 7K325T  | 8GB               | -     | 2 GB/s  | 10GbE     |
| u4FCV    | Planned        | 7VX690T | 8GB * 2           | -     | 4 GB/s  | 40/100GbE |
| uFC&P v1 | Ready          | KU11P   | 16 <b>G</b> B * 2 | 4*1TB | 8 GB/s  | 40/100GbE |

# **APPLICATIONS - PHYSICS EXPERIMENTS**

- Taishan Anti-neutrino Observatory (TAO), a satellite experiment of JUNO (Jiangmen Underground Neutrino Observatory)
  - Taishan Nuclear Power Plant, 30 35 m from one of the 4.6 GW reactor cores
- Measure reactor neutrino spectrum
  - Ton scale Gd-doped Liquid Scintillator (Gd-LS)
  - Full coverage of SiPM (Silicon photomultiplier)
    - Operate at -50 °C (reduce SiPM dark noise)
    - Water tanks and plastic scintillator for muon veto and shielding
- Under construction
  - Online in end of 2022
- CDR was released in 2020 (arXiv:2005.08745)



GdLS $\rightarrow$ Acrylic vessel  $\rightarrow$  SiPM/support  $\rightarrow$ Cryogenic vessel (SS + insulation)  $\rightarrow$  1.2 m water or HDPE shielding  $\rightarrow$  Muon veto

# ELECTRONIC READOUT FOR TAO

- Total 8028 channels
- ADC is on FEC, used to digitize analog signals from FEB
- FPGA & Power boards in MicroTCA.4 crate
  - Q/T information is extracted with FPGA (waveform analysis)
  - Trigger & DAQ
  - White Rabbit (WR) for system clock synchronization



11

### **APPLICATIONS - SYNCHROTRON SOURCE EXPERIMENTS**



High Energy Photon Source (HEPS)

- Under construction at Huairou District, Beijing
  - Start the user operation in 2026
- Key-Parameters

| Parameters               | Nominal                                   |  |  |  |
|--------------------------|-------------------------------------------|--|--|--|
| Beam energy              | 6.0 GeV                                   |  |  |  |
| Emittance                | better than 0.06nm×rad                    |  |  |  |
| Beam                     | Higher than 1×1022 phs/s/mm2/mrad2/0.1%BW |  |  |  |
| Spatial resolution       | 10 nm                                     |  |  |  |
| <b>Energy resolution</b> | 1 meV                                     |  |  |  |
| Photon energy            | Up to 300keV                              |  |  |  |

- More than 90 beamlines and end-stations
- Ref: <u>http://english.ihep.cas.cn/heps/index.html</u>



Shanghai HIgh repetitioN rate xfel and Extreme light facility (SHINE)

- Under construction at Zhangjiang, Shanghai
  - Start the user operation in 2026
- Key-Parameters

| Parameters    | Nominal      |  |  |
|---------------|--------------|--|--|
| Beam energy   | 8.0 GeV      |  |  |
| Bunch charge  | 100 pC       |  |  |
| Max rep-rate  | 1 MHz        |  |  |
| Beam power    | 0.8 MW       |  |  |
| Photon energy | 0.4 – 25 keV |  |  |
| Pulse length  | 20 – 50 fs   |  |  |

- 3 beamlines and 10 end-stations
- Ref: <u>https://indico.desy.de/event/21806/</u>

### ELECTRONIC READOUT FOR X-RAY DETECTION



### ELECTRONIC READOUT FOR X-RAY DETECTION



### SUMMARY

- MicroTCA architecture
  - Suitable for small and medium-sized experiments
- uFC series boards
  - FPGA-based MicroTCA compatible AMC board
    - For generic system control and data acquisition in HEP/HEPS experiments
  - HPC FMC sockets
    - Provide additional clock signals, user-specific I/O and high-speed transceivers that can be used to extend the connectivity as well as the I/O bandwidth
  - Successfully demonstrated the feasibility of the uFC in HEP/HEPS experiments
- Outlook
  - High-level tools for software development productivity
    - Vivado HLS, OpenCL, etc.
    - Applications in HEP/HEPS
      - Need the cooperation with PHY/SIM/DAQ/Online-tracking groups
  - Long-term experience with respect to reliability and availability



## NOUNS

• **IP**, Intellectual Property

In electronic design a semiconductor intellectual property core, IP core, or IP block is a reusable unit of logic, cell, or integrated circuit (commonly called a "chip") layout design that is the intellectual property of one party.

 DMA, Direct memory access DMA is a feature of computer systems that allows certain hardware subsystems to access main system memory (random-access memory), independent of the central processing unit (CPU).

#### • XDMA, DMA from Xilinx

- RTL, register-transfer level In digital circuit design, RTL is a design abstraction which models a synchronous digital circuit in terms of the flow of digital signals (data) between hardware registers, and the logical operations performed on those signals.
- **NVMe SSD**, Non-Volatile Memory Express Solid State Drives
- **Iperf**, is a network testing utility helpful for determining network performance.

# FPGA SHELL OPTIONS



#### Xilinx SDAccel Based Shell



- Scenario: Rapid development, block computing
  - User logic: OpenCL C, HLS C and RTL supported
  - Suited for quick evaluation/porting of existing customer code
- Shell feature:
  - Xilinx scatter-gather XDMA optimized for big block data transfer
  - Serial message notification
  - Offload acceleration

# KEY BENEFITS TO FPGA COMPUTING

- Balances programmability and high performance for key workloads
- Utilizing FPGA technology as a utility, resulting in faster access to the newest technology



#### COMPARISON BETWEEN DAMC-FMC25, FC7 AND UFC V1

- With the popularization and development of MicroTCA, various FPGA-based AMCs with dual-FMC have been used in HEP experiments (e.g. LHC, E-XFEL, J-PARC)
- DAMC-FMC25 is one which developed by DESY, transformed into commercial product by CAENels.
- FC7, Built upon the success of existing hardware developments the Gigabit Link Interface Board (GLIB), is a new generation AMC for generic DAQ and control applications in CMS.

| COMPARISON TABLES FOR DAMC-FMC25, FC7 AND UFC |  |
|-----------------------------------------------|--|
|-----------------------------------------------|--|

|                   |                | DAMC-FMC25            | FC7         | uFC                   |
|-------------------|----------------|-----------------------|-------------|-----------------------|
|                   |                | XC5VFX70T/            | XC7K420T    | XC7K325T              |
| FF                | PGA            | XC5VFX100T            |             |                       |
|                   |                | 2FFG1136 FFG1156      |             | FFG900                |
| Me                | mory           | 256MB DDR2            | 0.5GB DDR3  | Up to 8GB DDR3 SODIMM |
| FMC               | IO             | 68                    | 68          | 116                   |
| x2 MGT            |                | 2/4                   | 12/8        | 4/4                   |
|                   | SFP+           | -                     | -           | 2                     |
| Communication     | AMC high-speed | Port 0, 1, 4~7, 12~15 | Port 0~11   | Port 0, 1, 4~7        |
|                   | connectivity   | Class D.1. for RTM    | Without RTM | Without RTM           |
|                   | LEMO/SMA       | 1                     | 2           | 4                     |
| White Rabbit (WR) |                | -                     | -           | Yes                   |

# WHAT IS XTCA?

The dimension of a xTCA crate is depending on:

- Numbers and sizes of slots
- Cooling concept
- Heat dissipation
- Request for redundancy





- Fully integrated into the ATCA IPMI management structure
- Hot Swap capability



#### ATCA Shelf



MTCA Shelf

Source: N.A.T.

# COMMUNICATION EVALUATION

- FPGA connects NVMe SSD directly with file system
  - Without CPU or external memory.
  - It is the best solution for applications which require huge capacity and ultra high-speed.
  - NVMe-IP: FAT32 or exFAT
    - From Design Gateway Co., Ltd
    - Tested via Xilinx KCU105 evaluation board



| NVMe SSD M.2                                                    | HP EX900      | Samsung 970 EVO | Samsung 970 PRO |
|-----------------------------------------------------------------|---------------|-----------------|-----------------|
| Writing Speed from datasheet                                    | 1300 MB/s     | Up to 2300 MB/s | Up to 2300 MB/s |
| NVMe-IP<br>Tested Average Writing Speed<br>@ Block size: 128 KB | 80~100 MB/s * | ~800 Mb/s *     | ~2200 MB/s      |
| Cost(512GB)                                                     | ¥ 469         | ¥ 889           | ¥ 1349          |

\* Very fast at the beginning

## **COMMUNICATION EVALUATION**

#### • CPU 10 GbE network performance

|                                | Model                              | Core                               | DDR                         | NIC                                         |
|--------------------------------|------------------------------------|------------------------------------|-----------------------------|---------------------------------------------|
|                                | AMC725                             | Intel Xeon E3-1125<br>@ 2.5GHz x8  | 8GB DDR                     | Intel 82599ES                               |
|                                | AM G64/472-51                      | Intel Xeon E3-1505M<br>@ 3.0GHz x4 | 32GB DDR                    | Intel X710-BM2                              |
| <ul> <li>Test Setup</li> </ul> |                                    | iperf3-3.1.3<br>60 seconds         | Intel Core i5-3<br>80<br>Ce | PC<br>3470 @ 3.2GHz x4<br>3B DDR<br>entOS 7 |
|                                | AMC72<br>Or<br>AM G64/47<br>CentOS | 5<br>Port 0<br>7<br>Port 1         | Intel 82599<br>SFI/SFP+ Net | ES 10-Gigabit<br>work Connection            |

# COMMUNICATION EVALUATION

#### • CPU 10 GbE network performance

|               | Core                               | DDR      | NIC            | Hard disk                 | Cost    |
|---------------|------------------------------------|----------|----------------|---------------------------|---------|
| AMC725        | Intel Xeon E3-1125<br>@ 2.5GHz x8  | 8GB DDR  | Intel 82599ES  | Dual 2.5 inch<br>SATA SSD | ¥ 32766 |
| AM G64/472-51 | Intel Xeon E3-1505M<br>@ 3.0GHz x4 | 32GB DDR | Intel X710-BM2 | Dual M.2 SSD              | ¥ 42813 |

#### Single Port

#### **Dual Ports**

|               | Iperf Server | Iperf Client  |        |           | Iperf Server | Iperf Client |
|---------------|--------------|---------------|--------|-----------|--------------|--------------|
| AMC725        | 9.43 Gbps    | 9.42 Gbps     | AMC725 | Port0     | 6.2 Gbps     | 5.70 Gbps    |
| AM G64/472-51 | 9.42 Gbps    | 9.41 Gbps     |        | Portl     | 6.2 Gbps     | 5.68 Gbps    |
|               |              | AM G64/472-51 | Port0  | 5.58 Gbps | 6.74 Gbps    |              |
|               |              |               |        | Portl     | 5 49 Ghps    | 674 Ghns     |

Due to the limitation of PCIe x4, the two ports share the bandwidth.

# APPLICATIONS - X-RAY DETECTOR





Fig. 7. uFC with Quad SFP/SFP+ transceiver FMC and Quad Molex Nano-Pitch I/O<sup>TM</sup> Interconnect FMC mounted.

#### X-ray image

#### Back-end hardware



Fig. 6. Block diagram of HEPS-BPIX detector. The assembled front-end modules plug and play into the uFC in daisy chain via Molex Nano-Pitch  $I/O^{TM}$  Interconnect Cable. The uFC connects to DAQ with four 10G Ethernet cables

#### FPGA firmware

Jie Zhang, et al. (2018). <u>"The MicroTCA fast control board for generic control and data acquisition applications in HEP experiments"</u>, IEEE Transactions on Nuclear Science (Volume: 66, Issue: 7, July 2019)

# FEASIBILITY

- AMC+RTM boards
  - With different FMC boards





QSFP28 x2

QSFP28 x2