## Readout System

Mengging Wu Radboud University Nijmegen/Nikhef On behalf of the ATLAS TDAQ Collaboration

2024 CEPC international workshop 24 October 2024









Nikhef

**Radboud University** 







## The FELIX readout system

- A generic detector readout concept, proposed by the ATLAS collaboration
  - connecting front-end serial links to commodity network
- Collaboration of different institutes Open-source firmware and software
- Applications
  - At LHC: ATLAS, ATLAS Phase-II
  - <u>Beyond LHC</u>: NA62, protoDUNE, sPHENIX@RHIC, CBM@FAIR, LUXE



## The ATLAS TDAQ architecture

- ATLAS is a general-purpose particle detector at the Large Hadron Collider (LHC)
  LHC collides protons/ions at a 40 MHz rate
- A first level (L1) trigger, implemented in hardware, selects events at maximum rate of 100 kHz
- A "high-level" trigger, implemented in software, selects
   events at maximum rate of 1-3 kHz



## ATLAS Readout for 2022–26



High-Level Trigger Farm

### Run 3 - Now

### FELIX: Front-End Link EXchange

- Custom PCIe card hosted on commercial computer
- Interaction with FE includes <u>readout</u>, <u>configuration</u>, <u>trigger & clock distribution</u>, <u>monitoring</u>, <u>BUSY</u>
- Scale: ~100 cards, 60 host PCs

### SW ROD: software Readout Driver

- Software running on commercial computer
- <u>Builds and aggregates events</u>, <u>detector</u>
   <u>specific data processing</u>
- Scale: ~ 30 servers

## Benefits of the FELIX system



High-Level Trigger Farm

- Less custom components
- Less hardware and firmware development effort
- Data transport decoupled from data processing
- Industry-standard data networks introduced earlier in the readout chain
- Aggregation of many links into a single high speed network link
- As a result, less support efforts because of common hardware platform and simplicity

## The choice of a custom FPGA card

### LHC Clock distribution

- During data taking the 40.079 MHZ clock signal is provided by the LHC
  - LHC clock is in sync with bunch crossing
  - All front-end and DAQ components need to be synchronised with the LHC clock
- FELIX needs an interface to the custom ATLAS TTC (Timing, Trigger and Control) system to distribute clock and L1 trigger signals to front-ends

### **Font-end radiation hardness**

- To support data protocols used by radiation-hard front-ends
  - GBT, IpGBT ASICs and the according data protocols developed at CERN TCP/IP over Ethernet not an option so far

Availability and cost of commercial solution

The above constraints strongly limit the selection of commercial products

## The FLX-712 card

**Interface to Timing Trigger and Control** (TTC) systems (L1 triggers + LHC clock on a fibre) Busy output (on a LEMO)

Communicate with detectors via fibres



### **FPGA: Xilinx Kintex UltraScale XCKU115**

### ~300 cards produced 2020-2022

### **Using DMA (direct** memory access) enabling data sent directly from an attached device to host server's memory

8\* MiniPOD Tx/Rx Transceiver Up to 14 Gb/s

## The current FELIX system

## FLX-f12 events 1 Data available

### Software

- Transfer data over the network using RDMA technology (for low overhead transfers)
- Custom network library based on libfabric
- Runs as a daemon on FELIX servers (each hosts up to two FLX-712 cards)

### Firmware

Busy state

- Decodes incoming data, encodes outgoing data
- Transfers data to and from a buffer in the host computer
- Come in different "flavours" corresponding to different link protocol
  - FULL: interface to other FPGA-base systems
  - GBT: interface to GBTx, a radiation hard ASIC







## FELX©HL-LHC The current FELIX development





## From Phase-I to Phase-II





High-Level Trigger Farm

## **ATLAS Readout from 2030**



Run 4

- 1MHz level-0 trigger rate (~10x run3)

- Scale to readout of all sub-detectors
  - incl. all detector-specific functionalities
- Support for additional protocols
- ~14000 optical links with bandwidth up to 25 Gb/s

New FELIX hardware, firmware and software under development

### ATLAS Phase-II Trigger see Weiming's <u>Talk</u> today

Similar architecture as the current run

### Run 4 conditions

 Up to 200 mean number of interactions per crossing (~3x run3)

• 4.6 TB/s total data throughput (larger event size) (~20x run3)

### FELIX requirements



## The FLX-182 card - one prototype



Electrical interface for testing/ monitoring

> TTC-LTI links

**FE links** 



**Access to the PetaLinux** running on the SoC

### **AMD Versal Prime VM1802**

### ~46 cards produced for detector integration

interface – 240 Gb/s

interface to **TTC-LTI** 

**bi-directional optical** links (25Gb/s)

## 



## The FLX–155 card – the final design candidate



**Supports White Rabbit** 

**16-lane PCIe Gen5** interface — 480 Gb/s

### **AMD Versal Premium VP1552**

# (1+2 cards to arrive)

1\* FireFly transceiver for TTC-LTI

1\* FireFly transceiver for 100 GbE

8\* FireFly transceivers — support 48 **bi-directional optical** links (25Gb/s)

**Testing stage** 



## **Towards Phase-II**

### Hardware

- FLX155 is the final design candidate, FLX182 card for detector integration: Same interface, many common components

## Firmware

Support various data protocols incl. lpGBT and Interlaken for 25 Gbps links

## Software

- Developments ongoing to scale up the current architecture to the Run-4 requirements
- Retaining RDMA technology to fully use the 400 Gbps network bandwidth

## Time and clock distribution – one f/w challenge to highlight

- Detector like High Granularity Timing Detector (*HGTD*, see <u>Mei's talk</u> this morning) is time-sensitive, thus imposing stringent requirements on clock precision
- Clock is distributed in a long chain via several systems
- FELIX keeps the gate for clock input to the detector
- TCLink (by CERN High Precision Timing Distribution group) was developed for phase determinism and long-term phase stability
  - Together with *Knypaegje* (by FELIX team) FLX-182 demonstrates a spread less than 4 ps among startup phases

See M. Leguijt's poster in TWEPP2024



## Fragment Building – one s/w challenge to highlight

- Data packets from O(100) E–Links must be aggregated into one fragment
  - Total packet rate O(100)MHz
- Requires extensive and efficient use of multithreading to utilise full power of modern CPUs



### SW/HW environment

Network latency is O(1) us Worst case OS scheduler latency is O(1) ms

Can only be measured and accepted



### **Code quality**

CPU cache-friendly Efficient multi-threading Scalable to many CPU cores

### Under control of the SW developers



**OS** services

Memory management

Must be taken under control



- Memory management was the main issue that affected performance and maximum latency
- Using custom Memory Pool (single open) <u>source header</u>) implementation, the issues have been successfully addressed





## Summary

- FELIX is a versatile data acquisition platform
  - hard technology
- The first FELIX implementation used in production
  - Successful data taking with protoDUNE-SP
  - Stable readout in ongoing ATLAS data taking
  - NA62 and sPhenix take data with FELIX
  - EIC considering FELIX for all the systems
- - Final design review expected in the next spring

Particularly useful for experiments readout with GBT, IpGBT or similar radiation-

An evolution of FELIX for the HL-LHC phase of ATLAS is under development



## FELIX Phase determinism

- Clock recovered by the GTYe5 transceivers on the Versal chip, shows a phase difference w.r.t. the reference clock → double peak
- Knypaegje is a program that runs on the ARM processor using the correlation between eye opening and the startup phase





- Developed on FLX-182 for proof of concept
- Final implementation is under development